Last updated on July 20, 2021 - Back to blog home page
Making your video content accessible is an important step to take when building a video application or platform. Radiant Media Player supports a large array of accessibility features (keyboard navigation, closed captions, interactive transcript & more) and we frequently test our player on screen readers and other assistive technologies.
Getting there though may not be as straight forward as one may think. In this blog post we are going to review some speech-to-text solutions that can work with our player (or any HTML5 video player). The goal is to extract audio content from a video (automatically or manually), read that audio and turn it into text. But not any text, we want to turn our audio into a WebVTT file with accurate timestamps. As a reminder WebVTT is the most common format for displaying timed text tracks with HTML5 video.
Creating subtitles, closed captions or transcription for your content comes with varying requirements, depending on your budget and project. Following are some questions you may have to ask yourself:
We have tested the following solutions with our player. This list is not comprehensive and other solutions on the market could be available to suit your needs.
One of the most popular solution we came by if you want to enable closed caption for your video, while doing the speech-to-text transcription internally, is Amara online platform.
Amara's technology enables you to caption and subtitle any video for free. For larger subtitling projects the platform makes it easy to manage teams of translators. You can also purchase high-quality captions or translations from Amara professional linguists.
YouTube has an advanced speech-to-text system and will let you download the result as WebVTT files. You can then host those WebVTT files and pass them directly to our player. It also supports live automatic captions, multi-language translation and editing for fine tuning your subtitles. Some requirements need to be met for all features to be made available, but it is all free.
Trint is an online platform that will let you turn audio and video into searchable, editable and shareable text content in up to 31 languages. This includes producing WebVTT files that you can host and pass to our player.
Tring platform is automated speech-to-text captioning/transcription only but comes with many useful features: editing captions, multi-language translation and live captioning.
Videolinq is a live video streaming platform that offers real-time closed captioning insertion from traditional stenograph operators, offering an easy way to add captions to video sent to social media platforms and other CDN's.
Videolinq is human-made captioning/transcription only for live video content.
Happyscribe is a transcription & subtitles all-in-one platform. It features state of the art A.I. working side by side with language professionals. Happyscribe can produce WebVTT files that you can host and pass to our player.
Happyscribe platform can offer both automated and human-made speech-to-text captioning/transcription. It has many features you may be looking for: editing captions, multi-language translation and human-made transcription but does not support live captioning at time of writing this article.
The development of A.I. assisted technologies has allowed for the democratisation of speech-to-text solutions. The following projects are worth to be noted if you want to learn more on the subject. Note that a speech-to-text solution is not enough in itself to provide captions to a HTML5 video player. The text needs to be timed and presented in a format that can be understood by a HTML5 video player (WebVTT), hence the list of solutions presented above.