mcp-grimoire-server

mcp-grimoire-server

A Crystal MCP server that indexes project documentation via Ollama embeddings and exposes hybrid search (semantic + keyword) to MCP clients.

Why

Loading full documentation context at each Claude Code session is expensive. This server lets Claude fetch only the relevant passages on demand, reducing token costs significantly.

Features

  • Section-aware chunking — splits Markdown at ##/### boundaries, not arbitrary token counts
  • Hybrid search — semantic (Ollama embeddings) + in-memory keyword matching fused with RRF
  • Local & private — embeddings via Ollama (native or Docker), no data sent externally
  • Two transports — stdio (Claude Code) and HTTP/SSE (Cursor, other MCP clients)
  • Static binary — single executable, no runtime dependencies

Quick start

1. Start Ollama

docker run -d --name ollama -p 11434:11434 ollama/ollama
docker exec ollama ollama pull nomic-embed-text

2. Install mcp-grimoire-server

# macOS
brew install jbox-web/tap/mcp-grimoire-server

# Linux — download the binary for your architecture from the releases page:
# https://github.com/jbox-web/mcp-grimoire-server/releases

3. Create a config in your project

# Download the example config
curl -fsSL https://raw.githubusercontent.com/jbox-web/mcp-grimoire-server/master/.mcp-grimoire.example.yml \
  -o .mcp-grimoire.yml

# Then edit .mcp-grimoire.yml to set your doc paths

4. Index your docs and test

mcp-grimoire-server index doc/ --config .mcp-grimoire.yml
mcp-grimoire-server search "how to persist a model" --config .mcp-grimoire.yml

5. Add to Claude Code (~/.claude/settings.json)

{
  "mcpServers": {
    "doc": {
      "command": "mcp-grimoire-server",
      "args": ["serve", "--stdio", "--config", "/path/to/project/.mcp-grimoire.yml"]
    }
  }
}

CLI

mcp-grimoire-server serve --stdio [--config .mcp-grimoire.yml]   # Claude Code
mcp-grimoire-server serve --sse [--port 8765]                # Cursor / other clients
mcp-grimoire-server index <path>                             # Index a file or directory
mcp-grimoire-server search "<query>" [--mode hybrid]         # Test search from terminal
mcp-grimoire-server status                                   # Index stats
mcp-grimoire-server delete <path>                            # Remove from index
mcp-grimoire-server info                                     # Version info

MCP tools

Tool Description
query_documents Hybrid search — returns top-K relevant chunks
ingest_path Index a file or directory
list_files List indexed files with metadata
delete_file Remove a file from the index
status Server status and stats

Development

Requires: Crystal, mise, Ollama (native or Docker).

mise dev:ollama  # start Ollama (macOS native, Metal GPU) + pull model
mise dev:deps    # install dependencies
mise dev:spec    # run tests
mise dev:check   # build + lint + test

See CLAUDE.md for full development guide.

Alternatives

Project Language Vector store Embeddings Chunking
qpd-v/mcp-ragdocs TypeScript Qdrant Ollama / OpenAI Fixed tokens
sanderkooger/mcp-server-ragdocs TypeScript Qdrant Ollama / OpenAI Fixed tokens
Zackriya-Solutions/MCP-Markdown-RAG Python Milvus Local Fixed tokens
Daniel-Barta/mcp-rag-server Python In-memory OpenAI Fixed tokens

Why mcp-grimoire-server differs:

  • Zero runtime dependencies — static binary, no Node, no Python, no external vector database
  • SQLite only — no Qdrant, no Milvus to run alongside
  • Section-aware chunking — splits at ##/### boundaries instead of arbitrary token counts, preserving Markdown structure
  • Hybrid search — semantic + keyword fused with RRF, with a recency bias option
  • Ollama only — intentionally local-first; no OpenAI key required or supported

Contributing

Contributions welcome. See CLAUDE.md for the full development guide.

Scaling optimizations are deferred until the index grows past ~50–100k chunks (below that, semantic search is sub-millisecond and the cost/complexity isn't worth it). Known areas for improvement:

  • ANN index — replace the linear cosine scan (O(N·D) per query) with an approximate nearest-neighbour index (sqlite-vec or HNSW) for O(log N) lookups.
  • Lazy chunk content in cache — the in-memory cache currently keeps each chunk's full text; scoring only needs the embedding. Cache embeddings + metadata and fetch content from SQLite only for the returned top-K results.
  • Partial top-K selectionSemantic#search fully sorts all candidates (O(N log N)); a bounded top-K heap would be O(N log k).

License

MIT

Repository

mcp-grimoire-server

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 7
  • about 1 hour ago
  • June 16, 2026
License

MIT License

Links
Synced at

Tue, 16 Jun 2026 22:02:43 GMT

Languages