slim
Slim
Lightweight Crystal client for local LLM inference via Ollama.
Designed for running small language models ("specialized little brains") for tasks like:
- Text classification
- Annotation extraction
- Speech act recognition
- Summarization
Installation
Add to your shard.yml:
dependencies:
slim:
path: ../slim # or github: user/slim when published
Prerequisites
Install and run Ollama:
# Install (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a small model
ollama pull llama3.2:3b
# Ollama runs automatically, or start manually:
ollama serve
Usage
Basic Generation
require "slim"
client = Slim::Client.new
# Simple completion
response = client.generate("llama3.2:3b", "What is 2+2?")
puts response.content # => "4"
Streaming
client.generate("llama3.2:3b", "Write a haiku about code.") do |chunk|
print chunk # Print as it generates
end
puts
Chat Format
messages = [
Slim::Message.new("system", "You classify text as 'question' or 'statement'."),
Slim::Message.new("user", "How are you?"),
]
response = client.chat("llama3.2:3b", messages)
puts response.content # => "question"
Model Management
# List available models
client.list_models.each do |model|
puts "#{model.name} (#{model.details.try(&.parameter_size)})"
end
# Check if model exists
if client.has_model?("llama3.2:3b")
puts "Ready!"
end
# Pull a model
client.pull("llama3.2:1b")
Configuration
client = Slim::Client.new(
host: "http://localhost:11434", # Ollama endpoint
timeout: 5.minutes # Request timeout
)
# With generation options
response = client.generate(
"llama3.2:3b",
"Classify: Hello there",
system: "Respond with only: greeting, question, or statement",
options: Slim::Options.new(temperature: 0.1)
)
Sampling Options
Standard LLM parameters (work across most inference engines):
opts = Slim::Options.new(
temperature: 0.7, # Randomness: 0=deterministic, 1=creative
top_p: 0.9, # Nucleus sampling threshold
top_k: 40, # Consider top k tokens
repeat_penalty: 1.1, # Penalize repetition
seed: 42, # Reproducible output
)
Ollama-specific parameters:
opts = Slim::Options.new(
num_ctx: 4096, # Context window size
num_predict: 256, # Max tokens to generate (-1 = unlimited)
stop: ["\n", "END"], # Stop sequences
)
Presets for common use cases:
# Classification (deterministic)
opts = Slim::Options.classification # temperature: 0.1, top_k: 1
# Creative generation
opts = Slim::Options.creative # temperature: 0.8, top_p: 0.9
Async with Fibers
# Run inference in background
channel = Channel(Slim::Response).new
spawn do
response = client.generate("llama3.2:3b", prompt)
channel.send(response)
end
# Do other work...
# Get result when ready
result = channel.receive
Recommended Models
For classification/annotation tasks, smaller models work well:
| Model | Size | Good For |
|---|---|---|
| llama3.2:1b | ~1GB | Simple classification |
| llama3.2:3b | ~2GB | General tasks |
| phi3:mini | ~2GB | Reasoning |
| qwen2.5:1.5b | ~1GB | Fast inference |
Development
# Run tests (unit only)
crystal spec
# Run integration tests (requires Ollama)
crystal spec --tag integration
# Format code
crystal tool format
License
MIT
Repository
slim
Owner
Statistic
- 0
- 0
- 0
- 0
- 1
- 12 days ago
- March 22, 2026
License
Links
Synced at
Sun, 22 Mar 2026 01:41:53 GMT
Languages