Slim

Lightweight Crystal client for local LLM inference via Ollama.

Designed for running small language models ("specialized little brains") for tasks like:

Text classification
Annotation extraction
Speech act recognition
Summarization

Installation

Add to your shard.yml:

dependencies:
  slim:
    path: ../slim  # or github: user/slim when published

Prerequisites

Install and run Ollama:

# Install (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a small model
ollama pull llama3.2:3b

# Ollama runs automatically, or start manually:
ollama serve

Usage

Basic Generation

require "slim"

client = Slim::Client.new

# Simple completion
response = client.generate("llama3.2:3b", "What is 2+2?")
puts response.content  # => "4"

Streaming

client.generate("llama3.2:3b", "Write a haiku about code.") do |chunk|
  print chunk  # Print as it generates
end
puts

Chat Format

messages = [
  Slim::Message.new("system", "You classify text as 'question' or 'statement'."),
  Slim::Message.new("user", "How are you?"),
]

response = client.chat("llama3.2:3b", messages)
puts response.content  # => "question"

Model Management

# List available models
client.list_models.each do |model|
  puts "#{model.name} (#{model.details.try(&.parameter_size)})"
end

# Check if model exists
if client.has_model?("llama3.2:3b")
  puts "Ready!"
end

# Pull a model
client.pull("llama3.2:1b")

Configuration

client = Slim::Client.new(
  host: "http://localhost:11434",  # Ollama endpoint
  timeout: 5.minutes               # Request timeout
)

# With generation options
response = client.generate(
  "llama3.2:3b",
  "Classify: Hello there",
  system: "Respond with only: greeting, question, or statement",
  options: Slim::Options.new(temperature: 0.1)
)

Sampling Options

Standard LLM parameters (work across most inference engines):

opts = Slim::Options.new(
  temperature: 0.7,      # Randomness: 0=deterministic, 1=creative
  top_p: 0.9,            # Nucleus sampling threshold
  top_k: 40,             # Consider top k tokens
  repeat_penalty: 1.1,   # Penalize repetition
  seed: 42,              # Reproducible output
)

Ollama-specific parameters:

opts = Slim::Options.new(
  num_ctx: 4096,         # Context window size
  num_predict: 256,      # Max tokens to generate (-1 = unlimited)
  stop: ["\n", "END"],   # Stop sequences
)

Presets for common use cases:

# Classification (deterministic)
opts = Slim::Options.classification  # temperature: 0.1, top_k: 1

# Creative generation
opts = Slim::Options.creative  # temperature: 0.8, top_p: 0.9

Async with Fibers

# Run inference in background
channel = Channel(Slim::Response).new

spawn do
  response = client.generate("llama3.2:3b", prompt)
  channel.send(response)
end

# Do other work...

# Get result when ready
result = channel.receive

Recommended Models

For classification/annotation tasks, smaller models work well:

Model	Size	Good For
llama3.2:1b	~1GB	Simple classification
llama3.2:3b	~2GB	General tasks
phi3:mini	~2GB	Reasoning
qwen2.5:1.5b	~1GB	Fast inference

Development

# Run tests (unit only)
crystal spec

# Run integration tests (requires Ollama)
crystal spec --tag integration

# Format code
crystal tool format

License

MIT

Repository

slim

Owner

trans

Statistic

0
0
0
0
1
3 months ago
March 22, 2026

License

Links

Synced at

Sun, 22 Mar 2026 01:41:53 GMT

Languages

Crystal 100.0%