Overview
simple_speech is a local-first speech-to-text and media-structuring engine that automatically turns raw audio and video into navigable, captioned, chaptered media.
| Capability | Market Status | simple_speech |
|---|---|---|
| Transcription | Commodity | Yes |
| Captions (SRT/VTT) | Expected | Yes |
| Auto-Chapters | Rare | Yes |
| Embedded Metadata | Extremely Rare | Yes |
| Offline Determinism | High-Trust Differentiator | Yes |
We do not just transcribe media. We make it self-describing - automatically.
Installation
1. Environment Setup
Set SIMPLE_EIFFEL to your installation directory:
set SIMPLE_EIFFEL=C:\path o\your\eiffel\libraries
2. FFmpeg Installation
- Download from gyan.dev
- Extract to
C:fmpeg - Add
C:fmpeginto PATH - Verify:
ffmpeg -version
3. Whisper.cpp Setup
Option A: Download from whisper.cpp Releases
Option B: Build from source:
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release
4. Whisper Model
| Model | Size | Best For |
|---|---|---|
| ggml-tiny.en.bin | 75 MB | Testing |
| ggml-base.en.bin | 142 MB | General use (recommended) |
| ggml-small.en.bin | 466 MB | Production |
| ggml-large-v3.bin | 3.1 GB | Maximum accuracy |
Download: Hugging Face
5. AI Setup (Optional)
| Provider | Env Variable | Get Key |
|---|---|---|
| Claude | ANTHROPIC_API_KEY | anthropic.com |
| Gemini | GEMINI_API_KEY | google.com |
| Grok | XAI_API_KEY | x.ai |
| Ollama | (none needed) | ollama.com |
6. Add to ECF
<library name="simple_speech" location="$SIMPLE_EIFFEL/simple_speech/simple_speech.ecf"/>
Quick Start
local
quick: SPEECH_QUICK
do
create quick.make_with_model ("models/ggml-base.en.bin")
if quick.process_video ("input.mp4", "output.mp4") then
print ("Success: " + quick.segment_count.out + " segments%N")
end
end
See Cookbook for more recipes.
Command-Line Interface
v1.1.0 - Standalone CLI for use without Eiffel code.
Installation
Windows Installer: Download SimpleSpeech_Setup_1.1.0.exe from
Releases
Commands
| Command | Description |
|---|---|
| transcribe <file> | Transcribe audio/video to text |
| export <file> | Transcribe and export to file |
| chapters <file> | Detect chapter markers |
| batch <files...> | Process multiple files |
| embed <video> | Embed captions into video |
| info <file> | Show media file info |
Examples
# Transcribe and export to SRT
speech_cli export video.mp4 --output captions.srt --format srt
# Detect chapters and export to JSON
speech_cli chapters video.mp4 --output chapters.json --format json
# Embed captions into video
speech_cli embed video.mp4 --output video_with_captions.mp4
Model Auto-Detection
The CLI automatically searches for models in:
models/folder next to the executablemodels/in the current directory- Path specified via
--model
API Reference
SPEECH_QUICK (Facade)
| Method | Description |
|---|---|
| make_with_model (path) | Create with Whisper model |
| process_video (in, out) | One-liner workflow |
| transcribe (file) | Transcribe file (fluent) |
| detect_chapters | Detect chapters (fluent) |
| embed_to (output) | Embed metadata (fluent) |