mcp-grimoire-server
mcp-grimoire-server
A Crystal MCP server that indexes project documentation via Ollama embeddings and exposes hybrid search (semantic + keyword) to MCP clients.
Why
Loading full documentation context at each Claude Code session is expensive. This server lets Claude fetch only the relevant passages on demand, reducing token costs significantly.
Features
- Section-aware chunking — splits Markdown at
##/###boundaries, not arbitrary token counts - Hybrid search — semantic (Ollama embeddings) + in-memory keyword matching fused with RRF
- Local & private — embeddings via Ollama (native or Docker), no data sent externally
- Two transports — stdio (Claude Code) and HTTP/SSE (Cursor, other MCP clients)
- Static binary — single executable, no runtime dependencies
Quick start
1. Start Ollama
docker run -d --name ollama -p 11434:11434 ollama/ollama
docker exec ollama ollama pull nomic-embed-text
2. Install mcp-grimoire-server
# macOS
brew install jbox-web/tap/mcp-grimoire-server
# Linux — download the binary for your architecture from the releases page:
# https://github.com/jbox-web/mcp-grimoire-server/releases
3. Create a config in your project
# Download the example config
curl -fsSL https://raw.githubusercontent.com/jbox-web/mcp-grimoire-server/master/.mcp-grimoire.example.yml \
-o .mcp-grimoire.yml
# Then edit .mcp-grimoire.yml to set your doc paths
4. Index your docs and test
mcp-grimoire-server index doc/ --config .mcp-grimoire.yml
mcp-grimoire-server search "how to persist a model" --config .mcp-grimoire.yml
5. Add to Claude Code (~/.claude/settings.json)
{
"mcpServers": {
"doc": {
"command": "mcp-grimoire-server",
"args": ["serve", "--stdio", "--config", "/path/to/project/.mcp-grimoire.yml"]
}
}
}
CLI
mcp-grimoire-server serve --stdio [--config .mcp-grimoire.yml] # Claude Code
mcp-grimoire-server serve --sse [--port 8765] # Cursor / other clients
mcp-grimoire-server index <path> # Index a file or directory
mcp-grimoire-server search "<query>" [--mode hybrid] # Test search from terminal
mcp-grimoire-server status # Index stats
mcp-grimoire-server delete <path> # Remove from index
mcp-grimoire-server info # Version info
MCP tools
| Tool | Description |
|---|---|
query_documents |
Hybrid search — returns top-K relevant chunks |
ingest_path |
Index a file or directory |
list_files |
List indexed files with metadata |
delete_file |
Remove a file from the index |
status |
Server status and stats |
Development
Requires: Crystal, mise, Ollama (native or Docker).
mise dev:ollama # start Ollama (macOS native, Metal GPU) + pull model
mise dev:deps # install dependencies
mise dev:spec # run tests
mise dev:check # build + lint + test
See CLAUDE.md for full development guide.
Alternatives
| Project | Language | Vector store | Embeddings | Chunking |
|---|---|---|---|---|
| qpd-v/mcp-ragdocs | TypeScript | Qdrant | Ollama / OpenAI | Fixed tokens |
| sanderkooger/mcp-server-ragdocs | TypeScript | Qdrant | Ollama / OpenAI | Fixed tokens |
| Zackriya-Solutions/MCP-Markdown-RAG | Python | Milvus | Local | Fixed tokens |
| Daniel-Barta/mcp-rag-server | Python | In-memory | OpenAI | Fixed tokens |
Why mcp-grimoire-server differs:
- Zero runtime dependencies — static binary, no Node, no Python, no external vector database
- SQLite only — no Qdrant, no Milvus to run alongside
- Section-aware chunking — splits at
##/###boundaries instead of arbitrary token counts, preserving Markdown structure - Hybrid search — semantic + keyword fused with RRF, with a recency bias option
- Ollama only — intentionally local-first; no OpenAI key required or supported
Contributing
Contributions welcome. See CLAUDE.md for the full development guide.
Scaling optimizations are deferred until the index grows past ~50–100k chunks (below that, semantic search is sub-millisecond and the cost/complexity isn't worth it). Known areas for improvement:
- ANN index — replace the linear cosine scan (O(N·D) per query) with an approximate nearest-neighbour index (
sqlite-vecor HNSW) for O(log N) lookups. - Lazy chunk content in cache — the in-memory cache currently keeps each chunk's full text; scoring only needs the embedding. Cache embeddings + metadata and fetch content from SQLite only for the returned top-K results.
- Partial top-K selection —
Semantic#searchfully sorts all candidates (O(N log N)); a bounded top-K heap would be O(N log k).
License
MIT
mcp-grimoire-server
- 0
- 0
- 0
- 0
- 7
- about 1 hour ago
- June 16, 2026
MIT License
Tue, 16 Jun 2026 22:02:43 GMT