speak

A local AI assistant that runs on your computer. No cloud. No subscription. Your data stays with you.

speak logo

What is speak?

speak is a local AI assistant that runs entirely on your machine. It is designed for normal computers (4-6GB RAM), not expensive servers. No internet connection needed. No monthly fee. Your conversations never leave your computer.

speak is inspired by antirez's ds4 but built for everyday laptops, not high-end Macs.

It remembers who you are across sessions. It can read your files. It can search the web. And it does all of this using less than 2GB of RAM.

Features

Feature	What it means for you	Status
100% Local	Runs on your laptop. No data sent to anyone.	Yes
Persistent Memory	Tell speak something once. It remembers forever.	Yes
File Reading	"Read my config.json" - speak shows you the content.	No
Web Search	"Search for latest news" - speak finds current information.	No
Low RAM Usage	Uses disk caching. Long conversations don't fill your memory.	Yes
Hardware Detection	Auto-configures itself for your computer.	Yes
Offline First	Works without internet. Web search is optional.	Yes
Streaming Output	Tokens appear as they are generated.	Yes
Agent Loop	Multi-step tool use (search, read, then answer).	Yes
Resumable Downloads	Interrupted model downloads continue where they stopped.	Yes

Quick Start

One-liner

curl -fsSL https://raw.githubusercontent.com/zendrx/speak/refs/heads/master/install.sh | sh

Step by step

# Clone the repository
git clone https://github.com/zendrx/speak.git
cd speak

# Install dependencies
shards install

# Build the binary
crystal build src/speak.cr --release -o speak

# Run speak
./speak

On first run, speak will:

Detect your RAM and CPU
Create a config file for your hardware
Download the AI model (2.5GB)
Start the chat

Requirements

Component	Minimum	Recommended
RAM	4 GB	8 GB
Storage	3 GB free	10 GB free
OS	Linux	Ubuntu 22.04+
CPU	Any 64-bit	AVX2 support

macOS works but is experimental. Windows support coming soon.

Usage

Chat Commands

Inside the chat, type these commands:

Command	Action
exit or quit	Save and quit
clear	Clear the screen

Example conversation

$ ./speak

======================================================================
                         speak - Local AI Assistant
======================================================================

> Hello, who are you?

speak: I am speak, a local AI assistant running on your computer.
       I can read files, search the web, and remember things about you.

> My name is Sarah and I love Python

speak: I've remembered that your name is Sarah and you love Python.

> Read my config.json

speak: {
  "detected": {
    "total_ram_mb": 8192,
    "available_ram_mb": 6200
  },
  "active": {
    "context_size": 2048,
    "model_quant": "Q4_K_M"
  }
}

> Search for Python 3.13 features

speak: Search results for: Python 3.13 features

1. Python 3.13 Release Notes
   URL: https://docs.python.org/3.13/whatsnew/3.13.html
   Improved error messages, incremental garbage collection...

> What do you know about me?

speak: Your name is Sarah and you love Python.

> exit

Goodbye.

Configuration

All settings are stored in ./speak/config.json. You can edit this file to customize speak.

Configuration Structure

{
  "detected": {
    "total_ram_mb": 8192,
    "available_ram_mb": 6200,
    "os_reserved_ram_mb": 512
  },
  "active": {
    "cpu_cores": 4,
    "has_avx2": true,
    "free_disk_space_mb": 51200,
    "context_size": 2048,
    "kv_cache_type": "standard",
    "model_quant": "Q4_K_M",
    "model_file": "nanbeige-3b-q4_k_m.gguf",
    "temperature": 0.7,
    "max_tokens": 512,
    "use_mmap": true
  },
  "user_overrides": {
    "os_reserved_ram_mb": null,
    "context_size": null,
    "kv_cache_type": null,
    "model_quant": null,
    "temperature": null,
    "max_tokens": null,
    "use_mmap": null
  }
}

Common Settings

Setting	What it does	Default
context_size	How many tokens the AI remembers	2048
temperature	Creativity (0.0 = strict, 1.5 = creative)	0.7
max_tokens	Maximum response length	512
model_quant	Quality vs speed (Q2_K, Q4_K_M, Q6_K)	Q4_K_M

Make AI more creative

Edit ./speak/config.json:

"user_overrides": {
  "temperature": 1.2
}

Reduce RAM usage

"user_overrides": {
  "context_size": 1024,
  "model_quant": "Q2_K"
}

Custom System Prompt

Edit src/speak/system_prompt.txt and recompile. The prompt is embedded at build time.

File Structure

speak/
+-- src/
    +-- speak.cr              Entry point
    +-- speak/
        +-- system.cr         Hardware detection
        +-- config.cr         JSON configuration
        +-- install.cr        Model downloader
        +-- disk.cr           Disk-backed KV cache
        +-- tool.cr           Tool system
        +-- memory.cr         Agent memory
        +-- launch.cr         Chat interface
        +-- system_prompt.txt Embedded prompt
+-- lib/                      Shards
+-- shard.yml
+-- README.md

RAM Tiers

Available RAM	Model Quant	Context Size	mmap
3 GB	Q2_K	512	Enabled
3-6 GB	Q4_K_M	1024	Enabled
6-12 GB	Q4_K_M	2048	Enabled
12 GB	Q6_K	4096	Disabled

Troubleshooting

401 Unauthorized during download

The model repository requires authentication. Run:

./hfd.sh Edge-Quant/Nanbeige4.1-3B-Q4_K_M-GGUF --include *.gguf --local-dir ./speak/models

Then run ./speak again.

Model loads slowly on HDD

Use the smaller Q2_K model. Edit config.json:

"user_overrides": {
  "model_quant": "Q2_K"
}

Then delete the old model file in ./speak/models/ and restart speak.

Readline not working

Install the system library:

# Ubuntu/Debian
sudo apt install libreadline-dev

# macOS
brew install readline

Undefined method 'tokenize'

Ensure you are using the correct API: @vocab.tokenize not context.tokenize.

Contributing

Contributions are welcome. Please see CONTRIBUTING.md for guidelines.

git clone https://github.com/zendrx/speak.git
cd speak
# Make your changes
crystal build src/speak.cr --release -o speak_app
./speak_app

License

MIT License. See LICENSE file for details.

Credits

Project	Role
Crystal	Language
llama.cpp	Inference engine
ds4	Disk cache inspiration

speak - Your local AI assistant

Built with Crystal

Repository

speak

Owner

zendrx

Statistic

13
2
0
0
2
about 1 month ago
May 23, 2026

License

MIT License

Links

Synced at

Thu, 18 Jun 2026 12:21:04 GMT

Languages

Crystal 100.0%

speak

speak

Table of Contents

What is speak?

Features

Quick Start

One-liner

Requirements

Usage

Configuration

Common Settings

File Structure

Troubleshooting

Contributing

License

Credits