simple_speech

Local-First Media Intelligence Pipeline

v1.1.0 MIT Phase 0-7

Overview

simple_speech is a local-first speech-to-text and media-structuring engine that automatically turns raw audio and video into navigable, captioned, chaptered media.

CapabilityMarket Statussimple_speech
TranscriptionCommodityYes
Captions (SRT/VTT)ExpectedYes
Auto-ChaptersRareYes
Embedded MetadataExtremely RareYes
Offline DeterminismHigh-Trust DifferentiatorYes

We do not just transcribe media. We make it self-describing - automatically.

Installation

1. Environment Setup

Set SIMPLE_EIFFEL to your installation directory:

set SIMPLE_EIFFEL=C:\path	o\your\eiffel\libraries

2. FFmpeg Installation

  1. Download from gyan.dev
  2. Extract to C: fmpeg
  3. Add C: fmpegin to PATH
  4. Verify: ffmpeg -version

3. Whisper.cpp Setup

Option A: Download from whisper.cpp Releases

Option B: Build from source:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release

4. Whisper Model

ModelSizeBest For
ggml-tiny.en.bin75 MBTesting
ggml-base.en.bin142 MBGeneral use (recommended)
ggml-small.en.bin466 MBProduction
ggml-large-v3.bin3.1 GBMaximum accuracy

Download: Hugging Face

5. AI Setup (Optional)

ProviderEnv VariableGet Key
ClaudeANTHROPIC_API_KEYanthropic.com
GeminiGEMINI_API_KEYgoogle.com
GrokXAI_API_KEYx.ai
Ollama(none needed)ollama.com

6. Add to ECF

<library name="simple_speech" location="$SIMPLE_EIFFEL/simple_speech/simple_speech.ecf"/>

Quick Start

local
    quick: SPEECH_QUICK
do
    create quick.make_with_model ("models/ggml-base.en.bin")
    if quick.process_video ("input.mp4", "output.mp4") then
        print ("Success: " + quick.segment_count.out + " segments%N")
    end
end

See Cookbook for more recipes.

Command-Line Interface

v1.1.0 - Standalone CLI for use without Eiffel code.

Installation

Windows Installer: Download SimpleSpeech_Setup_1.1.0.exe from Releases

Commands

CommandDescription
transcribe <file>Transcribe audio/video to text
export <file>Transcribe and export to file
chapters <file>Detect chapter markers
batch <files...>Process multiple files
embed <video>Embed captions into video
info <file>Show media file info

Examples

# Transcribe and export to SRT
speech_cli export video.mp4 --output captions.srt --format srt

# Detect chapters and export to JSON
speech_cli chapters video.mp4 --output chapters.json --format json

# Embed captions into video
speech_cli embed video.mp4 --output video_with_captions.mp4

Model Auto-Detection

The CLI automatically searches for models in:

  1. models/ folder next to the executable
  2. models/ in the current directory
  3. Path specified via --model

API Reference

SPEECH_QUICK (Facade)

MethodDescription
make_with_model (path)Create with Whisper model
process_video (in, out)One-liner workflow
transcribe (file)Transcribe file (fluent)
detect_chaptersDetect chapters (fluent)
embed_to (output)Embed metadata (fluent)