You are here: Transendia - Searchable Video » Blog Posts »

YouTube Rolls Out Auto Captioning – Does it Work?

YouTube Rolls Out Auto Captioning – Does it Work?

YouTube unveiled its automatic, machine-generated captioning technology on Thursday, November 19.

While this is a step forward in video search, is it really making video content accessible for people who are deaf and hard-of-hearing?  YouTube is utilizing Google’s Automatic Speech Recognition (ASR) technology to perform the automatic transcription, and then in some instances they are using automatic translation to translate that English text into other languages.  The next logical question:  How accurate are the foreign language translations?

Automatic Speech Recognition

The premise behind the Google official blog announcing the automatic captions is that this was done to make video accessible for the folks with hearing loss.  YouTube lists some of the entities that are using this ASR technology currently, so I decided to check it out.  I clicked on the first link listed, which is UC Berkeley, and I clicked on the first video I encountered, which was a short video of Professor Williamson’s Nobel Prize day highlights.

Interestingly enough, when automatic speech recognition translation is used, the transcripts are rarely made available in their entirety.  Could this be because showing an entire transcript makes it painfully obvious how error-laden the transcription is?  Google’s Gaudi project has been in beta testing for quite some time now, and the same thing occurs there.  The automatic transcription is only available in snippets, without much context.

Here are the results of my short experiment.  I hit PLAY on the video.  The automatic captioning must be invoked by the viewer, and the captions do not “stay with” the video or become a permanent part of the video;  they simply appear on the screen and disappear.   After a few seconds, the captions began to appear.   To capture the actual text, I created a screen shot.  In fairness, I admit that the audio quality was poor as the video began, showing people mingling and chatting before the speech.  The “auto-cap,” as they call it, picked up a few false captions and nonsense words.  Confusing to someone who can’t hear, but nothing major.

Once the audio improved and the speaker took the podium, this was the result:  He said, “This is a little daunting.”  The automatic captions read, “so the vote don’t you,” with no punctuation.  (see figure 1, below).

"This is a little daunting."

These results were typical and continued throughout the remainder of the video.  I was not surprised by this type of result, as I’ve followed automatic speech recognition technology for years.

“Something is Better Than Nothing”

In YouTube’s video announcing  the new service on its official blog, they admitted that the system had its flaws; that they were aware of the issues; that it is going to get better as time goes on; and, that for someone who is Deaf, “something is better than nothing.”

As a long-time advocate for persons with hearing loss (and a realtime captioner and CART provider), I initially bristled at that statement.  Upon further reflection, and in reading what some Deaf and hard-of-hearing people had to say about it, I suppose I agree with that statement, to a degree.

Here is the distinction.   In the case of the millions of videos being posted on the internet by individuals and small businesses who can’t afford to subtitle or caption their content, perhaps something is better than nothing.  (Although, I question that at times.  In the example above, perhaps those captioning errors are more confusing for the viewer than no captioning.)  But for universities and colleges, big businesses, media outlets and governmental entities, they could and should do better.  In the case of broadcast television networks and others that fall under the Federal Communications Commission (FCC) regulation and must provide closed captioning on their broadcast programming, to post the same video content on the internet without the captions is irresponsible and offensive to those who depend on captioning.  Yet, sadly, that is a very common practice.

Foreign Language Translation

Now let’s discuss the automatic foreign language translation feature.  As anyone who is fluent in more than one language can tell you, automatic translation software doesn’t work well enough to properly convey meaning.  Talk to any foreign language translator, and they will explain how this works and the problems it poses.  Add to that the fact that the base text file being used for the YouTube translation is what the auto-cap produced.  Even if the foreign language translation were flawless, it would still have the speaker in our example saying (in the language of your choice), “so the vote don’t you” instead of “This is a little daunting.”

Synchronization of Accurate Transcripts

The second part of the service that YouTube unveiled is automatic synchronization of text and video for people who upload video content and a transcript.  The system aligns the transcript with the video and posts it as captions.  The video owner has the option of submitting their own transcript or having one prepared by a human transcriptionist.  Ideally, the transcripts provided by the captioning or transcription companies will be accurate and in a format that is more like the captioning that appears on television broadcasts.  Accurate transcripts … now that is a good idea!

Of course, one must pay for the transcription, and when we speak of individuals uploading massive amounts of video content, that was the problem to begin with.

Searchable Video & Audio

Once accurate transcripts are prepared – whether the product of a captioning company, a professional transcriptionist, a script, or the content owners themselves – the possibilities are endless.  Accurate transcripts are the foundation for accurate video search, for accurate language translations, and accurate captioning.  Now you’re talkin’!

Tanya Ward English
About: twenglish:
Tanya Ward English is a founder and the technology officer of Transendia and its parent, Realtime Transcription, Inc. She is a realtime captioner with a history of advocacy for persons with hearing loss.


  1. avatar Eric says:

    This is a very interesting and informative article. Keep ’em coming!

  2. avatar Linda says:

    Well said! This is what we need to know…

  3. avatar ehaya says:

    This is really informative. I’m sure the deaf and Hard of Hearing D/HOH community is appreciative of your efforts in championing for them.

  4. Tanya, my name is Denise Phipps. I am a certified captioner. I’ve been working with my partner company out of Israel. They’ve developed a solution for realtime LIVE captioning that is searchable. There is also a translation option into different languages in realtime. The translations aren’t perfect in the live setting, but if someone wanted the text in a different language, it can be done after the fact. Here is a link to an article. This puts Google’s app to shame because it uses live captioners!

  5. avatar twenglish says:

    Hi, Denise. I’ve seen your name around for years. I am aware of SubPly, and I’m happy to see the new awareness for captioning and search that this YouTube announcement has generated.

    Good to hear from you.

  6. avatar Ale says:

    Excellent article!

  7. This is a fantastic article. It’s absolutely spot on. People seem not to understand that without comprehension of meaning and context, computers will never be able to transcribe spoken language as effectively as human beings. One benefit of a program as widespread as Google’s is that people will be able to see, for the first time, how inaccurate machine speech recognition really is; with luck, that might impel them finally to understand the value of accurate human transcription.

  8. I’ve shared you post on digg, well written

  9. If this process will be successful then it will be great for us. After this all people will see the youtube videos in the languages to which they are comfortable and this thing is not at all possible without subtitle translator.

  10. avatar Dirk Dillin says:

    Great post can i have your permision to translate into Dutch for my sites viewers? Thanks

  11. Thankyou for this marvelous post, I am glad I observed this site on yahoo.

Leave a Reply

Sharing Buttons by Linksku