simple_speech - Local-First Media Intelligence Pipeline

Overview

simple_speech is a local-first speech-to-text and media-structuring engine that automatically turns raw audio and video into navigable, captioned, chaptered media.

Capability	Market Status	simple_speech
Transcription	Commodity	Yes
Captions (SRT/VTT)	Expected	Yes
Auto-Chapters	Rare	Yes
Embedded Metadata	Extremely Rare	Yes
Offline Determinism	High-Trust Differentiator	Yes

We do not just transcribe media. We make it self-describing - automatically.

Installation

1. Environment Setup

Set SIMPLE_EIFFEL to your installation directory:

set SIMPLE_EIFFEL=C:\path	o\your\eiffel\libraries

2. FFmpeg Installation

Download from gyan.dev
Extract to C:fmpeg
Add C:fmpegin to PATH
Verify: ffmpeg -version

3. Whisper.cpp Setup

Option A: Download from whisper.cpp Releases

Option B: Build from source:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release

4. Whisper Model

Model	Size	Best For
ggml-tiny.en.bin	75 MB	Testing
ggml-base.en.bin	142 MB	General use (recommended)
ggml-small.en.bin	466 MB	Production
ggml-large-v3.bin	3.1 GB	Maximum accuracy

Download: Hugging Face

5. AI Setup (Optional)

Provider	Env Variable	Get Key
Claude	ANTHROPIC_API_KEY	anthropic.com
Gemini	GEMINI_API_KEY	google.com
Grok	XAI_API_KEY	x.ai
Ollama	(none needed)	ollama.com

6. Add to ECF

<library name="simple_speech" location="$SIMPLE_EIFFEL/simple_speech/simple_speech.ecf"/>

Quick Start

local
    quick: SPEECH_QUICK
do
    create quick.make_with_model ("models/ggml-base.en.bin")
    if quick.process_video ("input.mp4", "output.mp4") then
        print ("Success: " + quick.segment_count.out + " segments%N")
    end
end

See Cookbook for more recipes.

Command-Line Interface

v1.1.0 - Standalone CLI for use without Eiffel code.

Installation

Windows Installer: Download SimpleSpeech_Setup_1.1.0.exe from Releases

Commands

Command	Description
transcribe <file>	Transcribe audio/video to text
export <file>	Transcribe and export to file
chapters <file>	Detect chapter markers
batch <files...>	Process multiple files
embed <video>	Embed captions into video
info <file>	Show media file info

Examples

# Transcribe and export to SRT
speech_cli export video.mp4 --output captions.srt --format srt

# Detect chapters and export to JSON
speech_cli chapters video.mp4 --output chapters.json --format json

# Embed captions into video
speech_cli embed video.mp4 --output video_with_captions.mp4

Model Auto-Detection

The CLI automatically searches for models in:

models/ folder next to the executable
models/ in the current directory
Path specified via --model

API Reference

SPEECH_QUICK (Facade)

Method	Description
make_with_model (path)	Create with Whisper model
process_video (in, out)	One-liner workflow
transcribe (file)	Transcribe file (fluent)
detect_chapters	Detect chapters (fluent)
embed_to (output)	Embed metadata (fluent)