whisper-cry

Crystal bindings for whisper.cpp, providing local speech-to-text transcription using OpenAI's Whisper models. Version tracks whisper.cpp releases (currently v1.8.3).

Installation

Add the dependency to your shard.yml:

dependencies:
  whisper-cry:
    github: robacarp/whisper-cry

Run shards install
Build the native libraries:
```
cd lib/whisper-cry && make
```
This clones whisper.cpp v1.8.3, builds it as a static library, and copies the .a files into vendor/lib/. Requires cmake and a C++ compiler. See the whisper.cpp build documentation for platform-specific details and options.
Download a Whisper model (e.g. the base English model):
```
curl -L -o ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
```
See the whisper.cpp models directory for all available models.
Optimize the model for your hardware (optional but recommended):

The Whisper.cpp project has documentation and scripting support for optimizing models for different hardware, quantization, etc.

Usage

require "whisper-cry"

whisper = Whisper.new("/path/to/ggml-base.en.bin")
segments = whisper.transcribe_file("audio.wav")

segments.each do |segment|
  puts "#{segment.start_timestamp} --> #{segment.end_timestamp}"
  puts segment.text
end

whisper.close

Audio files must be 16-bit PCM WAV, mono, 16kHz. Convert with ffmpeg:

ffmpeg -i input.mp3 -ar 16000 -ac 1 -f wav output.wav

API

`Whisper.new(model_path, use_gpu = false)`

Loads a GGML-format model file and initializes the inference context. Set use_gpu: true to enable Metal acceleration on macOS. Raises Whisper::Error if the model file is missing or fails to load.

`#transcribe_file(path, language = "en", n_threads = 4, translate = false)`

Transcribes a WAV file and returns an Array(Whisper::Segment). The file must be 16-bit signed PCM, mono, 16kHz.

`#transcribe(samples, language = "en", n_threads = 4, translate = false)`

Transcribes pre-loaded Float32 audio samples (normalized to [-1.0, 1.0], mono, 16kHz). Useful when you already have audio data in memory.

Options:

language: BCP-47 code (e.g. "en", "es"), or nil for auto-detection
n_threads: CPU threads for inference
translate: when true, translates to English regardless of source language

`#close`

Frees the underlying whisper context. Safe to call multiple times. Also called automatically by #finalize.

`#version`, `#model_type`, `#multilingual?`, `#system_info`

Query the whisper.cpp version string, loaded model type (e.g. "base"), multilingual support, and available CPU features.

`Whisper::Segment`

Each segment represents a span of recognized speech:

Method	Returns
`#text`	Transcribed text
`#start_ms` / `#end_ms`	Timing in milliseconds
`#start_seconds` / `#end_seconds`	Timing in seconds
`#duration_ms`	Segment duration in milliseconds
`#start_timestamp` / `#end_timestamp`	Formatted as `"HH:MM:SS.mmm"`
`#no_speech_probability`	`Float32` (0.0-1.0), higher = likely not speech
`#speaker_turn_next`	`true` if next segment is a different speaker

Development

Run tests:

crystal spec

Tests cover Segment formatting/conversion, WAV file parsing and validation, and Whisper initialization error handling. No model file is needed to run the test suite.

License

MIT

Repository

whisper-cry

Owner

robacarp

Statistic

0
0
0
0
0
4 months ago
March 7, 2026

License

MIT License

Links

Synced at

Sun, 08 Mar 2026 23:03:21 GMT

Languages

Crystal 97.3% Makefile 2.29% Dockerfile 0.4%

whisper-cry

whisper-cry

Installation

Usage

API

Whisper.new(model_path, use_gpu = false)

#transcribe_file(path, language = "en", n_threads = 4, translate = false)

#transcribe(samples, language = "en", n_threads = 4, translate = false)

#close

#version, #model_type, #multilingual?, #system_info

Whisper::Segment

Development

License

`Whisper.new(model_path, use_gpu = false)`

`#transcribe_file(path, language = "en", n_threads = 4, translate = false)`

`#transcribe(samples, language = "en", n_threads = 4, translate = false)`

`#close`

`#version`, `#model_type`, `#multilingual?`, `#system_info`

`Whisper::Segment`