So true. Korean can be such a headache at times. But I forgive it because it is a beautiful language.
@ Tamarind
A 675K for the C1 in Korean would be a lofty goal. There are more effective ways of measuring one’s core skills in the language. As you have pointed out, a reference to the general framework at CEFR would be more appropriate to determine one’s level on top of known words.
One way to find out is how comfortably one feels dealing with content. I will be at B1 in reading and listening after completing academic courses like the Cyber University of Korea offers. Reading my first book and watching TV series would take me to the B2. Reading 10 or 15 more books on general topics in personal finance, gardening, computing, etc., and watching more videos would take me closer to C1.
Another way is to list the words according to the difficulty level. I would list the words you quoted with the corresponding level according to my preference. We will have a general idea of our level by the vocabulary we acquire.
- 아버지 A1
- 부친 A2
- 신부 B1
- 창시자 B1
- 성직자 C1
@Hagowingchun
It is not mandatory to learn Hanja, although it is handy when it comes to learning and parsing new words more understandably. To make things easier, I treat the different and unrelated definitions under the same word as separate entries in the dictionary. Written words in Korean are a phonetical representation of the actual pronunciation, and they could have been different due to many factors. I am keen to learn the new word’s meaning with the word used with another one under a similar or the same shade without resorting to Hanja for help even though I have a quiet knowledge of it. A search with a Chinese character in the dictionary may return more related terms than using English.
상 prize; reward
good; top
bereavement; death in the family
etc
-상
A suffix is used to mean a merchant or a store.
상금
prize; reward usually in the form of monetary compensation
상벌
reward and punishment; prizes and penalties
상품 product; goods
상가 shopping complex; shopping mall
부친상 father’s death
국상 national funeral
@llearner
I always enjoy your 漢字 thoughts.
My goal for C1 is to be able to take an online college level history course with Korean people, have Zoom discussions with native Koreans and not cry with frustration.
My goal for B2 is to be able to pick up any light novel, or a high school level non-fiction book and be able to read without using a dictionary on every single page.
Being able to read just one book and watch one TV series is equivalent to B1 in my mind. Especially for a language like Korean where vocabulary can get very particular according to domain. I’ve read a couple of books but I still feel overwhelmed when looking through a new book about a completely different subject matter.
But I imagine it’s very different for a native Chinese speaker. The higher level Sino-Korean words are paradoxically easier for people from Asia learning Korean.
@Tamarind
Glad to know that you enjoy reading them.
Good knowledge of Hanja has helped me establish a solid foothold in Korean. It will nudge me further as I progress from B1 to C1 in acquiring new vocabulary and reading more books. However, I would only attribute some of my progress to my knowledge of the Chinese language. I can relate Roman languages to English more than Korean to Chinese because of a different writing system. I am just an average Joe in language learning with some weak sides, such as poor grammar and pronunciation.
I am consolidating my learning by reviewing lessons for the time being. Watching a TV series on Netflix or Jadoo is on my to-do list. As for the Hanja, perusing entries in a dictionary for review could be a good idea to understand word stems with similar meanings. For example, 경찰 (police) is a profession associated with protecting, keeping order, and being alert. 경고 (warning; caution) and 경각심 (vigilance) pop right out of the dictionary. Now if we think about the word “train,” if we were to guess the actual word in an Asian language, the characteristics, function, and component or raw material for the construction would be an excellent hint to the exact spelling of the words in the language. Interestingly, it turns out to be 기차 (steamed vehicle or steamed locomotive) in Korean and 火车 (fire-seen car operating on combustible fuel) in Chinese, respectively.
The meaning of some terms is more conspicuous than others, and we have to develop a good sense of the language by noticing more familiar patterns. I would say we rely more on our common sense to spot the word 화재 consisting of fire and disaster for a calamity in which houses are burnt and 화산 consisting of fire and mountain for the volcano, and with similar expressions in English “add fuel to the fire” and “to blaze up” to have a more intuitive understanding of 화나다 for getting angry, than relying solely on the knowledge of Chinese characters.
Once we connect all dots, we can see a bigger picture, and everything will come into place. Happy learning.
@ Florian
The space in Korean text serves as an effective delimiter, thus eliminating the need to implement a special word splitter for users at Lingq. It would be more than an annoying nuisance for me if I were to learn Chinese as L2, and I am least surprised by your frustration when you mentioned the original writing was “fast and furious” in another post.
I have some questions and suggestions and would like to know if they serve well.
-
The text with English words can be preprocessed by eliminating the English terms before feeding to the word splitter software, reinserting the English words into parsed text, and displaying them to the users. English words are exceptions that need to be handled and should have been taken care of by the software alone. I suspect the problem with your particular text could be the capitalization of English words and the self-defined dictionary used by Lingq.
Issues · fxsjy/jieba · GitHub -
Importing a dictionary would be the same as the one Lingq used or customized for the software. Have you tried importing a dictionary with English and Chinese entries with space employed as a delimiter?
Alternatively, we can edit the lesson by inserting spaces around English words and checking to see if it splits well for the original file. If the condition remains the same, commingled terms could be
“locked” per the Lingq database or process. -
We need to determine how effective two functions under Modify dictionary on Jieba are in improving parsing accuracy. Lingq can compile a “blacklist” for illegitimate parsed terms in addition to the “whitelist” of the dictionary. Lingq can either pre-remove them like English words or use the built-in functions, provided they are reliable. Most Chinese terms contain only up to three characters except Chenyu and other idioms.
-
Unless terms with more than three Chinese characters have been confirmed with a dictionary, it would be better to release the “locked” status of the word to allow users to make a selectable entry on their own. So the basic workflow goes as follows:
-
Preprocess the text to eliminate English or expect the parser to handle them properly.
-
Use both the user database and the “blacklist” to check the validity of parsing on a confirmed basis. To ensure the legitimacy of the user database and the “blacklist,” only the words adopted by multiple users are included in the user database, and any reported or ignored terms multiple times are included in the “blacklist.” These two data can be excellent feeder data for the aforementioned built-in functions in Jieba.
-
Reprocess the text to include English if it needs to be added. Release the “locked” status to allow free selection and making dictionary entries. We will get a more integrated database that reciprocally benefits the parsing process with higher accuracy, user input, and learning with the text.
Is the total of unlearnt lingqed terms plus known words a good representation of total parsed word segmentation in a language for a particular learner?
Imo, don’t track known words. The most important stat is words read. From my experience here are my milestones and the approximate words read it took for me to reach them:
100,000 read: Survival Korean
1 MIL: Meaningful conversations
3 MIL: Basic fluency
5 MIL+ Fluency (?)
The last figure is just a conjecture based on my own personal extrapolations, so take it with a grain of salt.
Fluency in this context (this is my own benchmark for the standard) means being able to understand the language in many contexts and being able to speak/write compehensively on various topics within and across those myriad contexts.