Start
//

Video Composition

Captions Concepts

Captions Data Format: Dual-Level Timing Architecture

Editframe captions use a dual-level timing structure with both segments (phrase-level) and word_segments (word-level) timing data. This architecture enables flexible caption displays while maintaining precise synchronization and efficient rendering.

Why Not a Single Level?

Phrase-only systems display complete sentences at once. They cannot highlight individual words as they're spoken, offer limited animation possibilities, and give poor support for karaoke-style or word-by-word displays.

Word-only systems require complex logic to group words into readable phrases, making it difficult to display complete thoughts or sentences and breaking natural reading flow.

Fixed-format systems are locked into a single display style and cannot adapt to accessibility, karaoke, or educational use cases without format conversion.

The Dual-Level Solution

segments carry complete phrases, sentences, or thought units at natural reading boundaries. word_segments carry individual word timing with precise start/end boundaries.

{
"segments": [
{ "start": 0, "end": 4, "text": "The quick brown fox jumps." }
],
"word_segments": [
{ "start": 0, "end": 0.3, "text": "The" },
{ "start": 0.3, "end": 0.7, "text": "quick" },
{ "start": 0.7, "end": 1.1, "text": "brown" },
{ "start": 1.1, "end": 1.4, "text": "fox" },
{ "start": 1.4, "end": 2.0, "text": "jumps." }
]
}

Times are in seconds relative to the parent timegroup.

What Each Level Enables

Both arrays are provided in a single data structure; child elements select the level they need:

Child elementData usedUse case
ef-captions-segmentsegmentsTraditional subtitle blocks
ef-captions-active-wordword_segmentsWord-by-word highlighting
ef-captions-before-active-wordword_segmentsWords already spoken
ef-captions-after-active-wordword_segmentsWords not yet spoken

Segment display — show complete phrases:

<ef-captions captions-script="my-captions">
<ef-captions-segment></ef-captions-segment>
</ef-captions>

Word display — highlight individual words:

<ef-captions captions-script="my-captions">
<ef-captions-active-word></ef-captions-active-word>
</ef-captions>

Context display — show before/active/after for reading flow:

<ef-captions captions-script="my-captions">
<ef-captions-before-active-word></ef-captions-before-active-word>
<ef-captions-active-word></ef-captions-active-word>
<ef-captions-after-active-word></ef-captions-after-active-word>
</ef-captions>

Segments provide natural grouping boundaries that reduce DOM updates. Word segments enable precise timing without processing overhead. The system only updates elements when timing boundaries are crossed.


Temporal Synchronization: Unified Timeline Architecture

Editframe captions integrate seamlessly with the temporal composition system, synchronizing with media elements and timegroups through a unified timeline. This design enables precise caption timing while maintaining flexibility for standalone displays.

Two Operating Modes

Synchronized with a media element — set the target attribute to the ID of an ef-video or ef-audio element:

<ef-video id="my-video" src="video.mp4"></ef-video>
<ef-captions target="my-video">
<ef-captions-segment></ef-captions-segment>
</ef-captions>

When target is set, captions automatically sync with the media element's playback state — no manual timing calculations required. When the connected media has a transcription, captions load fragment data on demand for the current playback position, keeping initial load time low.

Standalone within a timegroup — omit target and captions run on the timegroup's own clock:

<ef-timegroup mode="contain">
<ef-captions captions-script="my-captions">
<ef-captions-active-word></ef-captions-active-word>
</ef-captions>
</ef-timegroup>

Why a Unified Timeline?

Native caption systems typically maintain a separate timeline from media, requiring manual synchronization that is error-prone when media is trimmed or edited, difficult to compose with other temporal elements, and complex to maintain across playback state changes.

Because ef-captions shares the same timeline system as all other elements in the composition, time coordinates are consistent across the entire composition. Trimming and offset attributes work the same way they do on any other element:

<ef-captions target="my-video" trimstart="10s" trimend="5s" offset="2s">
<ef-captions-segment></ef-captions-segment>
</ef-captions>

Fragment loading means only caption data for the current playback position is loaded, mirroring how JIT transcoding works for video.


CSS Animation Variables: Deterministic Variation

Editframe captions provide CSS variables on <ef-captions-active-word> that enable creating varied, non-repetitive animations while maintaining deterministic behavior. This solves the problem of creating visually interesting animations without randomness or inconsistency.

The Core Variable: --ef-word-seed

--ef-word-seed gives each word a value between 0 and 1 based on its position in the transcript. It is calculated from the word's index using prime-number multiplication for good distribution, then normalized to the 0–1 range. The same word at the same position always produces the same seed value — consistent across every replay, easy to debug, no flickering.

Because the distribution uses prime numbers, values appear random without clustering into obvious patterns, and without being truly random.

How to Use It

--ef-word-seed is available as a standard CSS variable in any CSS property via calc():

Color variation:

ef-captions-active-word {
color: hsl(calc(360 * var(--ef-word-seed)), 70%, 60%);
}

Scale and rotation:

ef-captions-active-word {
transform: scale(calc(1 + 0.3 * var(--ef-word-seed)))
rotate(calc(var(--ef-word-seed) * 20deg - 10deg));
}

Combined effects:

ef-captions-active-word {
color: hsl(calc(360 * var(--ef-word-seed)), 70%, 60%);
transform: scale(calc(1 + 0.2 * var(--ef-word-seed)));
opacity: calc(0.7 + 0.3 * var(--ef-word-seed));
}

Or inline on the element:

<ef-captions-active-word
style="color: hsl(calc(360 * var(--ef-word-seed)), 80%, 70%);
transform: scale(calc(1 + 0.3 * var(--ef-word-seed)));">
</ef-captions-active-word>

Additional Variables on ef-captions-active-word

VariableTypeDescription
--ef-word-seednumber (0–1)Deterministic pseudo-random value per word
--ef-progressnumber (0–1)Playback progress through the current word
--ef-durationnumber (ms)Duration of the current word in milliseconds

--ef-progress and --ef-duration together power karaoke-style fill animations:

.karaoke-word {
background: linear-gradient(
90deg,
#fbbf24 0%,
#f59e0b calc(var(--ef-progress) * 100%),
#ffffff calc(var(--ef-progress) * 100%),
#ffffff 100%
);
background-clip: text;
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
animation: karaoke-fill calc(var(--ef-duration) * 1ms) linear;
}

Note that --ef-progress is a number (0–1), not a percentage — use calc(var(--ef-progress) * 100%) in gradient stops.

Why Not Math.random()?

Using JavaScript random values creates different styles on each render, causing inconsistent appearance across replays and making issues difficult to reproduce. The seed approach gives every word its own stable identity derived from its position, so the visual variety is reliable and reproducible. The seed is calculated once per word, not per frame, so there is no JavaScript overhead during animation — CSS handles all calculations.