From Video To Closed Captions And Transcription

Radiant Media Player Blog

Last updated on July 20, 2021 - Back to blog home page

Making your video content accessible

Making your video content accessible is an important step to take when building a video application or platform. Radiant Media Player supports a large array of accessibility features (keyboard navigation, closed captions, interactive transcript & more) and we frequently test our player on screen readers and other assistive technologies.

Getting there though may not be as straight forward as one may think. In this blog post we are going to review some speech-to-text solutions that can work with our player (or any HTML5 video player). The goal is to extract audio content from a video (automatically or manually), read that audio and turn it into text. But not any text, we want to turn our audio into a WebVTT file with accurate timestamps. As a reminder WebVTT is the most common format for displaying timed text tracks with HTML5 video.

Creating subtitles, closed captions or transcription for your content comes with varying requirements, depending on your budget and project. Following are some questions you may have to ask yourself:

  • Do I need speech-to-text for on-demand or live video content (or both)?
  • Do I need multi-language translation?
  • Am I doing the speech-to-text transcription internally or do I want to use an external professional solution?
  • Do I need automated speech-to-text accuracy (generally between 80-90% - less expensive) or human-verified speech-to-text accuracy (~99% - more expensive)?

We have tested the following solutions with our player. This list is not comprehensive and other solutions on the market could be available to suit your needs.

I want to do the speech-to-text transcription internally

One of the most popular solution we came by if you want to enable closed caption for your video, while doing the speech-to-text transcription internally, is Amara online platform.

Amara's technology enables you to caption and subtitle any video for free. For larger subtitling projects the platform makes it easy to manage teams of translators. You can also purchase high-quality captions or translations from Amara professional linguists.

I want to use an external professional captioning solution


YouTube has an advanced speech-to-text system and will let you download the result as WebVTT files. You can then host those WebVTT files and pass them directly to our player. It also supports live automatic captions, multi-language translation and editing for fine tuning your subtitles. Some requirements need to be met for all features to be made available, but it is all free.

Just upload a video to YoutTube and follow the guide here. More information on YouTube automatic captioning can be found here.


Trint is an online platform that will let you turn audio and video into searchable, editable and shareable text content in up to 31 languages. This includes producing WebVTT files that you can host and pass to our player.

Tring platform is automated speech-to-text captioning/transcription only but comes with many useful features: editing captions, multi-language translation and live captioning.


Videolinq is a live video streaming platform that offers real-time closed captioning insertion from traditional stenograph operators, offering an easy way to add captions to video sent to social media platforms and other CDN's.

Videolinq is human-made captioning/transcription only for live video content.


Happyscribe is a transcription & subtitles all-in-one platform. It features state of the art A.I. working side by side with language professionals. Happyscribe can produce WebVTT files that you can host and pass to our player.

Happyscribe platform can offer both automated and human-made speech-to-text captioning/transcription. It has many features you may be looking for: editing captions, multi-language translation and human-made transcription but does not support live captioning at time of writing this article.

The magic behind speech-to-text transcription

The development of A.I. assisted technologies has allowed for the democratisation of speech-to-text solutions. The following projects are worth to be noted if you want to learn more on the subject. Note that a speech-to-text solution is not enough in itself to provide captions to a HTML5 video player. The text needs to be timed and presented in a format that can be understood by a HTML5 video player (WebVTT), hence the list of solutions presented above.

  • DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu’s Deep Speech research paper.
  • IBM Watson Speech to Text: AI-powered speech recognition and transcription. This commercial solution comes with a free tier of 500 Minutes per Month.
  • This GitHub project offers an end-to-end implementation of auto-generated subtitles sourced from a HLS Live Stream (using Google Cloud Speech-to-Text API). This project is a proof-of-concept (PoC) though so it may not be feature complete as of now. It is a good resource to checkout on the subject of automated captions from HLS to WebVTT nevertheless.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License.

©2015-2024 Radiant Media Player. All Rights Reserved.