Japanese - AI text splitting changes words

I thought I was going crazy. I kept finding weird words and sometimes repeated letters. But I just figured out the AI text splitting feature changes the っちゃ contraction to っては in words that contain it, this is not noticeable for things like 切っちゃった because it creates 切って ちゃった but a word like ちっちゃな becomes ちってはな which is NOT a word. Additionally, it does weird things with って and variations both in being unable to parse these and sometimes creating things like ってってってってって.

I hope this can be fixed. The AI not being able to parse a sentence is fine but changing the actual text content is probably not a good idea.

Thanks, we will look into this.

1 Like

Hi @egreene !
Could you please clarify if you referring to the splitting which happens automatically after the import, or the “Resplit text with AI” feature triggered from the Lesson Editor?
It would also help if you could provide an example of the lesson where the issue is present.

So sorry I missed your message until now. If you add the following line, you will see what I mean:

おや 、 おや 、 また ちっちゃな キキ が 足 を ひっかけた ね

The ちっちゃな will be converted by the AI to ちってはな, changing the っちゃ to っては. Unfortunately, this happens in absolutely every instance where the っちゃ contraction is followed by another character such as な or った as in my previous example with 切っちゃった. Which will become 切って いて ちゃった. It does not seem to happen when it is in its simple form ちゃう.

You can also add 切っちゃった and see how it changes.

I haven’t had the って repetition effect come out in a while and hand fixed it when it did come out. If it comes out again i will share it. I noticed that one way back so it may be gone now.

Thanks so much.

Edited to add: it happens with the "Resplit text with AI” feature. Which otherwise improves the text substantially compared to the regular parsing, so I use it all the time. :wink:

I’ve found another somewhat related word changing behavior, now it’s with a って, which gets changed to ってい, I’m assuming because a た follows and so the AI assumes it’s っていた. The concerning thing is that the AI keeps changing words, I feel like it is not limited to these examples. But I can’t check every upload side by side.

You can replicate this new one with the following sentence.

何度もひっかかってたまるものか

which will become:

何度もひっかかっていたまるものか

Thanks, we will have this fixed.

1 Like