mcp-server v1.0.0

Local, private MCP server: indexes your project docs with Ollama embeddings and serves hybrid (semantic + keyword) search to Claude Code, Cursor & other MCP clients. Single static binary, no runtime deps.

mnemodoc

mnemodoc-server

CI License: MIT Release

A Crystal MCP server that indexes project documentation via Ollama embeddings and exposes hybrid search (semantic + keyword) to MCP clients.

Why

Loading full documentation context at each Claude Code session is expensive. This server lets Claude fetch only the relevant passages on demand, reducing token costs significantly.

Features

  • Section-aware chunking — splits Markdown at ##/### boundaries, not arbitrary token counts
  • Hybrid search — semantic (Ollama embeddings) + in-memory keyword matching fused with RRF
  • Local & private — embeddings via Ollama (native or Docker), no data sent externally
  • Two transports — stdio (Claude Code) and HTTP (Cursor, other MCP clients)
  • Static binary — single executable, no runtime dependencies

Quick start

1. Start Ollama

docker run -d --name ollama -p 11434:11434 ollama/ollama
docker exec ollama ollama pull nomic-embed-text

2. Install mnemodoc-server

# macOS
brew install mnemodoc/tap/mnemodoc-server

# Linux — download the binary for your architecture from the releases page:
# https://github.com/mnemodoc/mcp-server/releases

3. Create a config in your project

# Download the example config
curl -fsSL https://raw.githubusercontent.com/mnemodoc/mcp-server/master/.mnemodoc.example.yml \
  -o .mnemodoc.yml

# Then edit .mnemodoc.yml to set your doc paths

4. Index your docs and test (optional — serve auto-indexes on startup)

mnemodoc-server index doc/ --config .mnemodoc.yml
mnemodoc-server search "how to persist a model" --config .mnemodoc.yml

5. Add to your MCP client

Claude Code (~/.claude/settings.json) — stdio transport, no network exposure:

{
  "mcpServers": {
    "doc": {
      "command": "mnemodoc-server",
      "args": ["serve", "--config", "/path/to/project/.mnemodoc.yml"]
    }
  }
}

Cursor (.cursor/mcp.json) — HTTP transport, start the server first:

mnemodoc-server serve --sse --config /path/to/project/.mnemodoc.yml
{
  "mcpServers": {
    "doc": {
      "url": "http://localhost:8765/mcp"
    }
  }
}

CLI

mnemodoc-server serve [--config .mnemodoc.yml]                        # Claude Code (stdio, default)
mnemodoc-server serve --sse [--port 8765] [--host 127.0.0.1]             # Cursor / other clients
mnemodoc-server index <path>                                               # Index a file or directory
mnemodoc-server search "<query>" [--mode hybrid|semantic|keyword] [--top 5] # Test search from terminal
mnemodoc-server status                                                     # Index stats
mnemodoc-server delete <path>                                              # Remove from index
mnemodoc-server info                                                       # Version info

MCP tools

Tool Required args Optional args Returns
query_documents query (string) top_k (int), mode (hybrid|semantic|keyword) chunks with file, heading, parent_heading, content, score; total_candidates, query_time_ms, mode
ingest_path path (string) indexed, skipped, pruned counts
list_files list of indexed files with metadata
delete_file path (string) confirmation
status version, chunk_count, file_count, model, search_mode, db_path

query_documents optional args override the config values for that request only.

Behaviour notes

Auto-indexing on startupserve automatically re-indexes all paths from the config in the background. The server is immediately responsive; indexing happens concurrently. Files whose mtime hasn't changed since the last run are skipped, so restarts are cheap.

Config paths resolve relative to the config filedoc/claude/ in .mnemodoc.yml is resolved relative to the directory that contains the config file, not the process working directory. Move the config file and the paths move with it.

Model mismatch — if you change ollama.model in the config, re-index before querying. Vectors from different models have incompatible dimensions and will silently score near-zero. query_documents emits a warning field in the response when it detects a mismatch.

Streaming ingest — MCP clients that support progress reporting can send Accept: text/event-stream with a tools/call ingest_path request. The server streams notifications/progress events per file indexed, followed by the final result frame. Include _meta.progressToken in the request arguments to receive progress notifications:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "ingest_path",
    "arguments": {
      "path": "/your/docs",
      "_meta": { "progressToken": "my-token" }
    }
  }
}

Deployment

systemd

To run as a systemd service (SSE mode), create /etc/systemd/system/mnemodoc-server.service:

[Unit]
Description=mnemodoc-server
After=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/mnemodoc-server serve --sse --config /path/to/.mnemodoc.yml
Restart=on-failure
WatchdogSec=30

[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable --now mnemodoc-server

The server sends READY=1 once the startup index pass completes and STOPPING=1 on SIGTERM. Log rotation via SIGUSR1 is supported for use with logrotate.

The HTTP transport also exposes GET /health — a lightweight liveness probe that returns 200 OK. Use it in ExecStartPost healthchecks or load balancer probes.

Development

Requires: Crystal, mise, Ollama (native or Docker).

mise dev:ollama  # start Ollama (macOS native, Metal GPU) + pull model
mise dev:deps    # install dependencies
mise dev:spec    # run tests
mise dev:check   # build + lint + test

See CLAUDE.md for full development guide.

Alternatives

Project Language Vector store Embeddings Chunking
qpd-v/mcp-ragdocs TypeScript Qdrant Ollama / OpenAI Fixed tokens
sanderkooger/mcp-server-ragdocs TypeScript Qdrant Ollama / OpenAI Fixed tokens
Zackriya-Solutions/MCP-Markdown-RAG Python Milvus Local Fixed tokens
Daniel-Barta/mcp-rag-server Python In-memory OpenAI Fixed tokens

Why mnemodoc-server differs:

  • Zero runtime dependencies — static binary, no Node, no Python, no external vector database
  • SQLite only — no Qdrant, no Milvus to run alongside
  • Section-aware chunking — splits at ##/### boundaries instead of arbitrary token counts, preserving Markdown structure
  • Hybrid search — semantic + keyword fused with RRF, with a recency bias option
  • Ollama only — intentionally local-first; no OpenAI key required or supported

Contributing

Contributions welcome. See CLAUDE.md for the full development guide.

Scaling optimizations are deferred until the index grows past ~50–100k chunks (below that, semantic search is sub-millisecond and the cost/complexity isn't worth it). Known areas for improvement:

  • ANN index — replace the linear cosine scan (O(N·D) per query) with an approximate nearest-neighbour index (sqlite-vec or HNSW) for O(log N) lookups.
  • Lazy chunk content in cache — the in-memory cache currently keeps each chunk's full text; scoring only needs the embedding. Cache embeddings + metadata and fetch content from SQLite only for the returned top-K results.
  • Partial top-K selectionSemantic#search fully sorts all candidates (O(N log N)); a bounded top-K heap would be O(N log k).

License

MIT

Repository

mcp-server

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 7
  • about 1 hour ago
  • June 16, 2026
License

MIT License

Links
Synced at

Wed, 17 Jun 2026 04:25:50 GMT

Languages