“Regenerate lesson” re-segments Chinese

The lingq Chinese segmentation/word grouping is quite good, and I mostly don’t mind hand-correcting when there are problems. However, sometimes there is an error that occurs many times in one lesson and I want to mass edit. There is no easy way to do that. The normal edit page is tedious and each correction must be made manually. The “regenerate lesson” page almost works. It looks like I can copy the entire text into a separate program, use find-and-replace, and copy it back. However, once saved, it seems lingq reruns the segmenter, undoing all that work. It also undoes the previous segmentation edits I had made one-by-one in the reader.

It seems to me that the “regenerate lesson” page should not resegment the lesson if the text is clearly already segmented (as determined by the presence of spaces).

Alternatively, the option to import pre-segmented texts to lingq (so I can use another segmenter, edit it, and import) would also work. Currently the import function appears to completely ignore pre-existing segments.

1 Like

I was thinking about this and realized another solution would be to include a field for custom words for each course or lesson. This way unusual terms and proper nouns can be fed to the segmenter so it treats them properly. The Stanford segmenter allows for this and the documentation says it improves results dramatically.

Thanks for your feedback and suggestion here. I’ll forward this to our developers and we will see what we can do about it.

thank you!