Transcribing

Have any strategies been talked about on the forum as far as transcribing? How do people here do it when they need to? Do they just sit down and start typing, do they use a computer program and then edit heavily? I’d be interested in hearing what people do…

Hi Chris,
If I have the allowance to transcribe a podcast for LingQ, I put the audio on my mp3 Player. Then I type the text in word. The mp3 player is in front of my keyboard. I stop after hearing one sentence, or a part of a sentence, if the speaker speaks quicker than I can type. Word helps me to find typing and spelling errors.
First I tried to hear and write on my computer but that was not ideal for me. I know there are programs that could help, or pedals that you can use to control stopping and continuing the audio. But I don’t want to invest money.

Does anyone know if there isn’t any freeware program that transcribes and then we could just go in and edit it to match the audio? That would be easier I think than pausing, typing, rewinding ect…

…and does everyone that does the particular podcasts for each language transcribe just by hand?

Our rule at LingQ is as follows. A member who transcribes the content that goes into the library becomes the provider and “owns” the right to collect points for the item. If we at LingQ transcribe, or , in fact, pay to have the content transcribed, we own the rights.

We usually just look after transcribing for podcasts because podcasts can help us promote our site. So we transcribe our SpanishLingQ. ItalianLingQ, ChineseLingQ etc, podcasts. In fact we pay people, in money or points, to produce these podcasts, and then transcribe them, because these podcasts drive traffic to our site.

We will also provide transcripts to established podcasters in return for the permission to use their content, and , more important, for them to mention LingQ in their podcast and link to our site.

Sue Ellen Reager, CEO of International Services, claims the transcribing is a rapidly growing business. (She also claims to speak 17 languages, if I am not mistaken).

http://www.streamingmedia.com/article.asp?id=11026&page=2&c=20

Here the extract from the paper, The full papaer gives an idea about the tools used by professional transcribers.

                        "Where Does the Money Go?

An hour of outsourced captioning varies widely depending on the vendor, with $600 being fairly normal, but this is not the only price around. The bean counter in you may respond, “But transcription sounds so easy. Where does the money go?” First, transcription is a long and precise process. There can be 10,000 words in 1 hour of content. A typist who types at 100 words per minute would need 10 hours to finish such a transcription. A fast typist at 200 words per minute would need 5 hours. Then, the transcription must be proofread before being broken down into captions and timecode assignments.

A professional transcriptionist transcribes much faster, and more thoroughly, than a typist. Professionals in transcription often use 10-key court-reporting equipment that speeds up the transcription turnaround significantly. In addition, they are usually excellent proofreaders."

Ilya, Let us just say that we do not pay those rates, it does not take our transcribers that long and we live with the odd typo and inaccuracy. We are not lawyers nor are we dealing with contracts or international treaties.

Yes Steve,

I was also stricken by the claim of $600 being fairly normal (for the transcribing of an hour of audio). Everyone feels and knows it is an order of magnitude less. Moreover, the conclusion one makes after having read the article, and from the Web situation in general, is that the costs will have to go down. Competitive technologies are already responding to the demand for the “batch” transcribing.

And this is going to change the things. I fully agree with what you have written on your blog:

“… spend the money on transcribing all the wonderful content that is available free of charge on the Internet”.

I am very impressed with the Google Translation services. With the Google “batch” translating services.

Automatic transcription is a technological challenge yet to be overcome… It’s very hard to do something barely useful nowadays. Even the opposite (automatic reading), which is much easier, deliver results only reasonable, but not wonderful…
But it is a more realistic alternative. To find texts and use one of these tools for automatic reading. It is not like a human speaking, but the good tools do a reasonably decent job.

I forgot to say that there are softwares which can slow down the audio so you can transcribe more easily…
A famous one is Express Scribe.

Yes Ana, you are right, the fully automatic transcription or, generally, the automatic speech recognition (ASR) is still a challenge. What I meant to say is that the competitive technologies and companies are improving the efficiency of human transcribers. And your example of Express Scribe means the same.

Many things were done in the very recent years. For example…Oh no, we shall be blamed for being technical!

I have a really cool tv tuner card for my computer which will automatically dump the closed captioning of a tv show to a text file. So all I have to do is record the show to disk, simply convert he video to an mp3, then upload it all! I was doing this before outside of Linq, but I will start doing it inside now. I do it for hbo Latino, but lots of other Spanish shows like on univision use captions now. I used to record a movie on hbo and hbo Latino (they play the same movie simultaneously) then print out both sets of transcripts and read and listen. The problem I am having now is that I’m really excited about linq, yet I want to finish fsi since I’m already up till unit 35. 20 more to go! Closed captioning is better than subtitlng because the words they speak match the transcript exactly. Steve, if I do this for some non copyrighted material I can still earn points? (this will be for you english learners since most public/community stuff is in English)

You’ll activity points, but you can’t put up copyrighted material for the public without permission…
Hopefully one day there won’t be any such thing as ‘copyright’ at least not in the sense that we know it today, but for now it exists and reputable sites like Lingq have to follow the law… ;(

Ilya,

Take a ten-minute-long audio of, say, a conference or a heated debate and try to transcribe it. The audio may not always be of a Hollywood-studio quality. Note, that transcribing must follow certain rules: indicating speakers, time codes, handling gaps, interjections, non-lexemes, noises, corrections, repetitions, hesitations, quotations, false starts, etc. Then go over it a few times making sure it’s as good as you can get it. Then think how much you’d charge for it.

We shouldn’t judge the complexity (or simplicity) or something based on the results. In fact, it is often the case that the simpler something looks, to harder it was to get it to look that way.

Rjtrudel,

If the content is free of copyright you can earn points if you share it. Check the nature of the creative commons license. If it is Attribution-ShareAlike you can use it without asking. Otherwise you would have to ask for permission. Even though the content at LingQ is available free or charge at LingQ, since we do charge fees for certain services we are considered a commercial site.

If you can share it you will earn points that you can use in your learning. You will load the text and audio and should probably include a link to the video.

Creative Commons Licenses

The following are the six current license choices available from our choose a license application, along with previous versions that have been phased out. They are shown by name along the license characteristics that accompany them. This page explains what each icon represents.
Name Characteristics
Version 3.0 Licenses:
Attribution
Attribution-NoDerivs
Attribution-NonCommercial-NoDerivs
Attribution-NonCommercial
Attribution-NonCommercial-ShareAlike
Attribution-ShareAlike

HI,

I don’t manage downloading Express Scribe ! Do you know how I can do that?

Clickety-click:

Ah, this Express Scribe software can be useful for language learning as well! Useful for slowing down the speed of the MP3 when someone is speaking too fast.

Richard, what is this software that you use to get the closed captions from movies. This might be of interest to some our members. Thanks.

Sorry for the late reply, I was using the ATI TV wonder 2.0 USB, which did what I said above, but I found an easier way for people who have dvd recorders. There is software called ccextractor which will extract the closed captioning from the dvd’s you record with your dvd recorder. Its a lot less buggy than the using the tv wonder card which you need to know a little about software to get it set up right. So I just record whatever movie I want on HBO Latino (they don’t use drm in my area), rip the closed captioning with ccextractor (free), and upload it to lingq. Of course, I am still finishing up on FSI Spanish (unit 45!, can’t wait to finish), so I haven’t done this yet on lingq, on a full scale. However, I did see the audiria spanish stuff and will use that first, especially the conversation ones, since they are already up there for me.