The performance of the model is really good. The result is like a professional voice actor’s work. With some proper prompt, I could make a really good quality audiobook.
Listen to the demo below. There are only English demo videos, but it also works well in non-English languages (At least with my native one).
Book reading test. It’s really good. I will use this to make an audio book for a book I read next.
gemini 2.5 flash
gemini 2.5 pro
Script
Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, “and what is the use of a book,” thought Alice “without pictures or conversations?”
So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.
There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, “Oh dear! Oh dear! I shall be late!” (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.
In another moment down went Alice after it, never once considering how in the world she was to get out again.
The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well.
It works perfectly with English, but not that much good with non-English. Sound quality is a minor issue; it sometimes omits some words and doesn’t read the input exactly. Also, its maximum input/output length is pretty short. For now, I don’t think is good for making an audiobook.
They don’t even fix problems quickly, even if I reported the problem with the cause and the fixing code. This is why I would rather fix the problems by myself with a userscript.
Thanks so much for sharing the Google AI Studio link — this really completes the loop!
As I mentioned in another post, when I reach the 3-audio limit in Notebook KLM, I ask it to generate a text-based dialogue instead (like the ones it normally produces in audio). That workaround alone was already super useful.
But now, thanks to your tip, I can take that text and use Google AI Studio to generate audio with two voices, just like the original. And all of this is 100% free.
Also, just to stress something amazing about Notebook KLM: it works with any YouTube video, even if it has no subtitles or transcript at all. I have no idea what tech they’re using, but it just understands the content — and always gives you the output in your target language (Chinese, for me). That’s pretty mind-blowing.
Thanks again for your contribution — it really helped push this whole process to the next level!
While all of this is very helpful, I find myself wishing Google AI Studio allowed for the conversion of much larger text passages into audio. It seems the current limitation is around 10 minutes per audio file, which amounts to about five pages in a Word document. Consequently, if I have extensive texts, perhaps 30 pages in a Word document, I would, based on my understanding, need to process them in multiple smaller segments within Google AI Studio…
Finally found an audio solution To Al generated dialogues without character limit. I hope it remains free. I think audio quality seems so real for English. Thanks a lot. You are sharing really good stuff that really makes language study a breeze.
Could you explain how you found that there’s no character limit? When I uploaded a text of roughly an 8-page Word document, the audio for the single voice only extended to about page 5. The latter portion of the text was completely omitted from the audio, which makes me suspect there is indeed a limit
I only tested it for dialogues which are not that long. For longer audio I simply use real audiobooks which I can borrow on apps like Hoopla having a library card at any American public library. It is totally free.
I also feel like that. Long-term stability is low (especially for non-English), and sometimes it omits some parts of the given content. So I think is’t not suitable for making a full-length audiobook for now.