Skills/CLI Tools/Transcribe

Transcribe

Generate word-level captions from audio or video files using whisper_timestamped.

Usage

npx editframe transcribe <input> [options]

Arguments

  • input — Input audio or video file path

Options

OptionDefaultDescription
-o, --output <file>captions.jsonOutput JSON file
-l, --language <lang>enLanguage code (en, es, fr, de, it, pt, nl, etc.)

Examples

# Basic transcription
npx editframe transcribe video.mp4
# Custom output and language
npx editframe transcribe interview.mp4 -o spanish.json -l es

Requirements

Requires whisper_timestamped:

pip3 install whisper-timestamped

Check installation:

npx editframe check

Output Format

Generates JSON compatible with ef-captions:

{
"segments": [
{ "start": 0, "end": 2500, "text": "Hello world" }
],
"word_segments": [
{ "text": "Hello", "start": 0, "end": 800 },
{ "text": "world", "start": 900, "end": 2500 }
]
}

Times are in milliseconds.

Use in Compositions

<ef-captions captions-src="/assets/captions.json">
<ef-captions-active-word class="text-yellow-300 font-bold"></ef-captions-active-word>
</ef-captions>

See the elements-composition skill's captions reference for full styling options.