mcp-grimoire-server

A Crystal MCP server that indexes project documentation via Ollama embeddings and exposes hybrid search (semantic + keyword) to MCP clients.

Why

Loading full documentation context at each Claude Code session is expensive. This server lets Claude fetch only the relevant passages on demand, reducing token costs significantly.

Features

Section-aware chunking — splits Markdown at ##/### boundaries, not arbitrary token counts
Hybrid search — semantic (Ollama embeddings) + in-memory keyword matching fused with RRF
Local & private — embeddings via Ollama (native or Docker), no data sent externally
Two transports — stdio (Claude Code) and HTTP/SSE (Cursor, other MCP clients)
Static binary — single executable, no runtime dependencies

Quick start

1. Start Ollama

docker run -d --name ollama -p 11434:11434 ollama/ollama
docker exec ollama ollama pull nomic-embed-text

2. Install mcp-grimoire-server

# macOS
brew install jbox-web/tap/mcp-grimoire-server

# Linux — download the binary for your architecture from the releases page:
# https://github.com/jbox-web/mcp-grimoire-server/releases

3. Create a config in your project

# Download the example config
curl -fsSL https://raw.githubusercontent.com/jbox-web/mcp-grimoire-server/master/.mcp-grimoire.example.yml \
  -o .mcp-grimoire.yml

# Then edit .mcp-grimoire.yml to set your doc paths

4. Index your docs and test

mcp-grimoire-server index doc/ --config .mcp-grimoire.yml
mcp-grimoire-server search "how to persist a model" --config .mcp-grimoire.yml

5. Add to Claude Code (~/.claude/settings.json)

{
  "mcpServers": {
    "doc": {
      "command": "mcp-grimoire-server",
      "args": ["serve", "--stdio", "--config", "/path/to/project/.mcp-grimoire.yml"]
    }
  }
}

CLI

mcp-grimoire-server serve --stdio [--config .mcp-grimoire.yml]   # Claude Code
mcp-grimoire-server serve --sse [--port 8765]                # Cursor / other clients
mcp-grimoire-server index <path>                             # Index a file or directory
mcp-grimoire-server search "<query>" [--mode hybrid]         # Test search from terminal
mcp-grimoire-server status                                   # Index stats
mcp-grimoire-server delete <path>                            # Remove from index
mcp-grimoire-server info                                     # Version info

MCP tools

Tool	Description
`query_documents`	Hybrid search — returns top-K relevant chunks
`ingest_path`	Index a file or directory
`list_files`	List indexed files with metadata
`delete_file`	Remove a file from the index
`status`	Server status and stats

Development

Requires: Crystal, mise, Ollama (native or Docker).

mise dev:ollama  # start Ollama (macOS native, Metal GPU) + pull model
mise dev:deps    # install dependencies
mise dev:spec    # run tests
mise dev:check   # build + lint + test

See CLAUDE.md for full development guide.

Alternatives

Project	Language	Vector store	Embeddings	Chunking
qpd-v/mcp-ragdocs	TypeScript	Qdrant	Ollama / OpenAI	Fixed tokens
sanderkooger/mcp-server-ragdocs	TypeScript	Qdrant	Ollama / OpenAI	Fixed tokens
Zackriya-Solutions/MCP-Markdown-RAG	Python	Milvus	Local	Fixed tokens
Daniel-Barta/mcp-rag-server	Python	In-memory	OpenAI	Fixed tokens

Why mcp-grimoire-server differs:

Zero runtime dependencies — static binary, no Node, no Python, no external vector database
SQLite only — no Qdrant, no Milvus to run alongside
Section-aware chunking — splits at ##/### boundaries instead of arbitrary token counts, preserving Markdown structure
Hybrid search — semantic + keyword fused with RRF, with a recency bias option
Ollama only — intentionally local-first; no OpenAI key required or supported

Contributing

Contributions welcome. See CLAUDE.md for the full development guide.

Scaling optimizations are deferred until the index grows past ~50–100k chunks (below that, semantic search is sub-millisecond and the cost/complexity isn't worth it). Known areas for improvement:

ANN index — replace the linear cosine scan (O(N·D) per query) with an approximate nearest-neighbour index (sqlite-vec or HNSW) for O(log N) lookups.
Lazy chunk content in cache — the in-memory cache currently keeps each chunk's full text; scoring only needs the embedding. Cache embeddings + metadata and fetch content from SQLite only for the returned top-K results.
Partial top-K selection — Semantic#search fully sorts all candidates (O(N log N)); a bounded top-K heap would be O(N log k).

License

MIT

Repository

mcp-grimoire-server

Owner

jbox-web

Statistic

0
0
0
0
7
about 1 month ago
June 16, 2026

License

MIT License

Links

Synced at

Tue, 16 Jun 2026 22:02:43 GMT

Languages

Crystal 98.96% Dockerfile 0.72% HCL 0.21% Ruby 0.11%