Operation speech-to-text
The speech-to-text operation allows you to convert spoken audio to text. This process is known as speech recognition or automatic speech recognition (ASR).
It can be used to transcribe audio recordings of meetings, lectures, or interviews, or to add closed captions to audio or video content.
Available Options
Option Name | Type | Possible Values | Description |
---|---|---|---|
language | string | ar, ca, cn, cs, de, el-gr, en-in, en-us, es, fa, fr, hi, it, ja, kz, nl, pl, pt-fb, ru, sv, tl-ph, tr, uk, vs | The language of the audio input. This is important because different languages have different phonemes, intonation patterns, and grammatical structures, and using the correct language model can improve the accuracy of the transcription. |
target_format | string | txt, srt | The desired output format for the transcription. |
allow_multiple_outputs | boolean | true, false | If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file. |
The SRT (SubRip subtitle format) is a widely-supported subtitle format that is commonly used for subtitling videos. It is a simple text-based format that consists of sequential numbered lines, with each line representing a subtitle.
This can be useful if you want to create closed captions for a video or to display the transcription in a specific way.
Example
{ "conversion": [{ "category": "operation", "target": "speech-to-text", "options": { "language": "en-us", "target_format": "srt" } }], }