Operation speech-to-text

The speech-to-text operation allows you to convert spoken audio to text. This process is known as speech recognition or automatic speech recognition (ASR).
It can be used to transcribe audio recordings of meetings, lectures, or interviews, or to add closed captions to audio or video content.

Available Options

Option Name Type Possible Values Description
language string ar, ca, cn, cs, de, el-gr, en-in, en-us, es, fa, fr, hi, it, ja, kz, nl, pl, pt-fb, ru, sv, tl-ph, tr, uk, vs The language of the audio input. This is important because different languages have different phonemes, intonation patterns, and grammatical structures, and using the correct language model can improve the accuracy of the transcription.
target_format string txt, srt The desired output format for the transcription.
allow_multiple_outputs boolean true, false If the comparison produces more than one output file, by default all of them are compressed in just one file. Set this option to true if you want a download link for each file.

The SRT (SubRip subtitle format) is a widely-supported subtitle format that is commonly used for subtitling videos. It is a simple text-based format that consists of sequential numbered lines, with each line representing a subtitle.
This can be useful if you want to create closed captions for a video or to display the transcription in a specific way.


    "conversion": [{
        "category": "operation",
        "target": "speech-to-text",
        "options": {
            "language": "en-us",
            "target_format": "srt"

