YouTube Rolls Out Auto Captioning – Does it Work?
Posted by Tanya English at 2:49 pm
YouTube unveiled its automatic, machine-generated captioning technology on Thursday, November 19.
While this is a step forward in video search, is it really making video content accessible for people who are deaf and hard-of-hearing? YouTube is utilizing Google’s Automatic Speech Recognition (ASR) technology to perform the automatic transcription, and then in some instances they are using automatic translation to translate that English text into other languages. The next logical question: How accurate are the foreign language translations?
Automatic Speech Recognition
The premise behind the Google official blog announcing the automatic captions is that this was done to make video accessible for the folks with hearing loss. YouTube lists some of the entities that are using this ASR technology currently, so I decided to check it out. I clicked on the first link listed, which is UC Berkeley, and I clicked on the first video I encountered, which was a short video of Professor Williamson’s Nobel Prize day highlights.
Interestingly enough, when automatic speech recognition translation is used, the transcripts are rarely made available in their entirety. Could this be because showing an entire transcript makes it painfully obvious how error-laden the transcription is? Google’s Gaudi project has been in beta testing for quite some time now, and the same thing occurs there. The automatic transcription is only available in snippets, without much context.
Here are the results of my short experiment. I hit PLAY on the video. The automatic captioning must be invoked by the viewer, and the captions do not “stay with” the video or become a permanent part of the video; they simply appear on the screen and disappear. After a few seconds, the captions began to appear. To capture the actual text, I created a screen shot. In fairness, I admit that the audio quality was poor as the video began, showing people mingling and chatting before the speech. The “auto-cap,” as they call it, picked up a few false captions and nonsense words. Confusing to someone who can’t hear, but nothing major.
Once the audio improved and the speaker took the podium, this was the result: He said, “This is a little daunting.” The automatic captions read, “so the vote don’t you,” with no punctuation. (see figure 1, below).
These results were typical and continued throughout the remainder of the video. I was not surprised by this type of result, as I’ve followed automatic speech recognition technology for years.
“Something is Better Than Nothing”
In YouTube’s video announcing the new service on its official blog, they admitted that the system had its flaws; that they were aware of the issues; that it is going to get better as time goes on; and, that for someone who is Deaf, “something is better than nothing.”
As a long-time advocate for persons with hearing loss (and a realtime captioner and CART provider), I initially bristled at that statement. Upon further reflection, and in reading what some Deaf and hard-of-hearing people had to say about it, I suppose I agree with that statement, to a degree.
Here is the distinction. In the case of the millions of videos being posted on the internet by individuals and small businesses who can’t afford to subtitle or caption their content, perhaps something is better than nothing. (Although, I question that at times. In the example above, perhaps those captioning errors are more confusing for the viewer than no captioning.) But for universities and colleges, big businesses, media outlets and governmental entities, they could and should do better. In the case of broadcast television networks and others that fall under the Federal Communications Commission (FCC) regulation and must provide closed captioning on their broadcast programming, to post the same video content on the internet without the captions is irresponsible and offensive to those who depend on captioning. Yet, sadly, that is a very common practice.
Foreign Language Translation
Now let’s discuss the automatic foreign language translation feature. As anyone who is fluent in more than one language can tell you, automatic translation software doesn’t work well enough to properly convey meaning. Talk to any foreign language translator, and they will explain how this works and the problems it poses. Add to that the fact that the base text file being used for the YouTube translation is what the auto-cap produced. Even if the foreign language translation were flawless, it would still have the speaker in our example saying (in the language of your choice), “so the vote don’t you” instead of “This is a little daunting.”
Synchronization of Accurate Transcripts
The second part of the service that YouTube unveiled is automatic synchronization of text and video for people who upload video content and a transcript. The system aligns the transcript with the video and posts it as captions. The video owner has the option of submitting their own transcript or having one prepared by a human transcriptionist. Ideally, the transcripts provided by the captioning or transcription companies will be accurate and in a format that is more like the captioning that appears on television broadcasts. Accurate transcripts … now that is a good idea!
Of course, one must pay for the transcription, and when we speak of individuals uploading massive amounts of video content, that was the problem to begin with.
Searchable Video & Audio
Once accurate transcripts are prepared – whether the product of a captioning company, a professional transcriptionist, a script, or the content owners themselves – the possibilities are endless. Accurate transcripts are the foundation for accurate video search, for accurate language translations, and accurate captioning. Now you’re talkin’!
Tanya Ward English
Tanya Ward English is a founder and the technology officer of Transendia and its parent, Realtime Transcription, Inc. She is a realtime captioner with a history of advocacy for persons with hearing loss. |
This is a very interesting and informative article. Keep ’em coming!