slim

Crystal library for running local SLMs (small language models)

Slim

Lightweight Crystal client for local LLM inference via Ollama.

Designed for running small language models ("specialized little brains") for tasks like:

  • Text classification
  • Annotation extraction
  • Speech act recognition
  • Summarization

Installation

Add to your shard.yml:

dependencies:
  slim:
    path: ../slim  # or github: user/slim when published

Prerequisites

Install and run Ollama:

# Install (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a small model
ollama pull llama3.2:3b

# Ollama runs automatically, or start manually:
ollama serve

Usage

Basic Generation

require "slim"

client = Slim::Client.new

# Simple completion
response = client.generate("llama3.2:3b", "What is 2+2?")
puts response.content  # => "4"

Streaming

client.generate("llama3.2:3b", "Write a haiku about code.") do |chunk|
  print chunk  # Print as it generates
end
puts

Chat Format

messages = [
  Slim::Message.new("system", "You classify text as 'question' or 'statement'."),
  Slim::Message.new("user", "How are you?"),
]

response = client.chat("llama3.2:3b", messages)
puts response.content  # => "question"

Model Management

# List available models
client.list_models.each do |model|
  puts "#{model.name} (#{model.details.try(&.parameter_size)})"
end

# Check if model exists
if client.has_model?("llama3.2:3b")
  puts "Ready!"
end

# Pull a model
client.pull("llama3.2:1b")

Configuration

client = Slim::Client.new(
  host: "http://localhost:11434",  # Ollama endpoint
  timeout: 5.minutes               # Request timeout
)

# With generation options
response = client.generate(
  "llama3.2:3b",
  "Classify: Hello there",
  system: "Respond with only: greeting, question, or statement",
  options: Slim::Options.new(temperature: 0.1)
)

Sampling Options

Standard LLM parameters (work across most inference engines):

opts = Slim::Options.new(
  temperature: 0.7,      # Randomness: 0=deterministic, 1=creative
  top_p: 0.9,            # Nucleus sampling threshold
  top_k: 40,             # Consider top k tokens
  repeat_penalty: 1.1,   # Penalize repetition
  seed: 42,              # Reproducible output
)

Ollama-specific parameters:

opts = Slim::Options.new(
  num_ctx: 4096,         # Context window size
  num_predict: 256,      # Max tokens to generate (-1 = unlimited)
  stop: ["\n", "END"],   # Stop sequences
)

Presets for common use cases:

# Classification (deterministic)
opts = Slim::Options.classification  # temperature: 0.1, top_k: 1

# Creative generation
opts = Slim::Options.creative  # temperature: 0.8, top_p: 0.9

Async with Fibers

# Run inference in background
channel = Channel(Slim::Response).new

spawn do
  response = client.generate("llama3.2:3b", prompt)
  channel.send(response)
end

# Do other work...

# Get result when ready
result = channel.receive

Recommended Models

For classification/annotation tasks, smaller models work well:

Model Size Good For
llama3.2:1b ~1GB Simple classification
llama3.2:3b ~2GB General tasks
phi3:mini ~2GB Reasoning
qwen2.5:1.5b ~1GB Fast inference

Development

# Run tests (unit only)
crystal spec

# Run integration tests (requires Ollama)
crystal spec --tag integration

# Format code
crystal tool format

License

MIT

Repository

slim

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 1
  • 12 days ago
  • March 22, 2026
License

Links
Synced at

Sun, 22 Mar 2026 01:41:53 GMT

Languages