Simplified Lesson Feature on the IOS App

I just noticed this new simplified lesson feature. Is there any info available on how it works?

It appears to fail for longer lessons, been generating this one for 30 mins.

1 Like

Yes, this seems to be LingQ’s first stab at incorporating ChatGPT. I tried it as soon as it came out (14 days ago), but to this day, I’m unable to open the resulting lesson. (Reported it already). After that my enthusiasm has mysteriously vanished. Hopefully it’s more helpful to others.
Tbh. Not sure how valuable simplified lessons are in general, I’ve always been fine using dictionaries and translation to make difficult content comprehensible.

3 Likes

I have a feeling they are using the 4K context model and it is just failing when the input/output exceeds the 4K tokens. Will test out some lessons where the input/output would be ~4K or less to find out.

I’m always happy for more options but was there an announcement for this somewhere?

3 Likes

I don’t really use ChatGPT as much as I should (esp. for language learning), but I think you can get an estimate of the number of tokens before actually using it (API or web), here: OpenAI Platform
The text I tried to simplify comes out at:
9,024 Tokens and 8967 Characters. That was probably just a bit too much, might have to try something more digestible. Although, on LingQ it’s labeled as just 3000 words, so still far from the maximum 6k words.

I have seen neither announcement nor documentation anywhere.

1 Like

I’ll sniff around the IOS API calls later and see what I can find out about this feature. 6K word lessons wouldn’t even work for 16k context because of the amount of tokens required. They would need to split their requests and patch the pieces together. Seems like a real money burner if someone is sitting their constantly simplifying their 6K word lessons

Very interesting, I just checked out the pricing here: Pricing
It’s actually very reasonable for the GPT3.5 Turbo. I couldn’t find a program that would convert text to tokens and calculate the price, so I hacked something together, considering only the input tokens. Output tokens shouldn’t be significant for simplified lessons.
What I find fascinating is to compare the tokenizer’s efficiency. It’s clearly built for “normal” languages (probably mainly for English). Converting text to tokens is quite inefficient in very different languages like Chinese or uncommon ones like Icelandic. This in turn means that you may have to pay more money depending on your language.
I know it’s a little off topic but I quickly crunched the numbers for the 7 Harry Potter books in 4 languages (Mandarin Chinese, Portuguese, Norwegian, Icelandic) for comparison:

ZH POR NO IS
Number of Characters 2150096 6574814 6282025 6563459
Number of Tokens 2694967 1973143 2111677 2885686
GPT-4 8K $80.8490 $59.1943 $63.3503 $86.5706
GPT-4 32K $161.6980 $118.3886 $126.7006 $173.1412
GPT-3.5 Turbo 4k $4.0425 $2.9597 $3.1675 $4.3285
GPT-3.5 Turbo 16k $8.0849 $5.9194 $6.3350 $8.6571

Obviously the amount of text far exceeds the context length and you would have to feed it in chunks if you really wanted to get HP into ChatGPT.

Code (probably wrong, don’t use for actual calculations!):

Summary
import tiktoken

token_prices = {
    "GPT-4 8K: context": 0.03,
    "GPT-4 32K: context": 0.06,
    "GPT-3.5 Turbo: 4K context": 0.0015,
    "GPT-3.5 Turbo: 16K context": 0.003
}

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

def num_tokens_from_file(file_path: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text file."""
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
        return num_tokens_from_string(text, encoding_name)

input_file_path = "/Users/bamboo/harry.txt"
encoding_name = "cl100k_base"

num_tokens = num_tokens_from_file(input_file_path, encoding_name)

with open(input_file_path, 'r', encoding='utf-8') as file:
    text = file.read()

print(f"Number of Characters: {len(text)}")
print(f"Number of Tokens: {num_tokens}\n")

for model, price_per_1k_tokens in token_prices.items():
    price = (num_tokens / 1000) * price_per_1k_tokens
    print(f"{model}: ${price:.4f}")
1 Like

Thanks for the work. Charging more based on the language seems like a dog move (I hope it’s just a technical limitation) and that they improve that in the future.

Definitely seems like the move is chunking the 4K context. That price point is very good and you wouldn’t expect every user to simplify the entire Harry Potter series. I’ve got no computer for a couple days so will investigate the mobile API calls when I get back.

Not much info available except need to POST (an empty package) to this address.

The feature is pretty useless at the moment because most of my lessons are just returning in straight English when simplified.

2 Likes

I understand that it must be in the beta phase, and that they will mainly need to play around with the prompts to simplify the content. For example, in my tests with Claude (who has a much larger context window), when I’ve simplified things, I ask him to try to keep the same sentences, sentence structure, and grammar but simplify the vocabulary for me. I think it can be very useful for extensive reading (very quickly) of things that are still way above your level. I’m very pleased that LingQ is working on this.

1 Like

Woah, I never even thought about making a lesson more comprehensible. Would be amazing if the simplifying was actually using your LingQ data and simplifying according to your Known / Unknown Words and turning a 50% unknown words lesson to 5-10% for example.

4 Likes

Was able to use this feature today for a Russian lesson. It took a text with 47% unknown words and shaved it down to 30%. Noticeable difference in comprehensibility, very cool.

2 Likes