nucleoc
Nucleoc - Fuzzy Matcher for Crystal
Nucleoc is a Crystal port of the nucleo fuzzy matcher from Rust. It provides high-performance fuzzy matching algorithms for text search and filtering.
Status
✅ Production Ready - This is a complete port of the Rust nucleo library with full fuzzy matching functionality implemented and tested.
Features
- Exact string matching
- Case-sensitive and case-insensitive matching
- Configurable scoring parameters
- Fuzzy matching (greedy and optimal algorithms)
- Substring matching
- Prefix/Postfix matching
- Pattern parsing
- Unicode normalization
- High-performance optimizations
Installation
-
Add the dependency to your
shard.yml:dependencies: nucleoc: github: dsisnero/nucleoc -
Run
shards install
Tutorial: Complete Guide to Using Nucleoc
1. Basic Usage
Simple Fuzzy Matching
require "nucleoc"
# Create a matcher with default configuration
matcher = Nucleoc::Matcher.new
# Fuzzy match with score
if score = matcher.fuzzy_match("hello world", "hlo")
puts "Match found! Score: #{score}"
end
# Fuzzy match with indices (character positions)
indices = [] of UInt32
if score = matcher.fuzzy_indices("hello world", "hlo", indices)
puts "Score: #{score}, Indices: #{indices}" # => [0, 2, 3]
end
2. Matching Algorithms
Exact Matching
matcher = Nucleoc::Matcher.new
# Returns score if strings match exactly
score = matcher.exact_match("hello", "hello") # => 140
score = matcher.exact_match("Hello", "hello") # => 140 (case-insensitive by default)
# With indices
indices = [] of UInt32
score = matcher.exact_indices("crystal", "crystal", indices)
# score = 140, indices = [0, 1, 2, 3, 4, 5, 6]
Substring Matching
matcher = Nucleoc::Matcher.new
# Find needle as contiguous substring
score = matcher.substring_match("hello world", "world") # => 96
score = matcher.substring_match("hello world", "lo wo") # => 96
# With indices
indices = [] of UInt32
score = matcher.substring_indices("hello world", "world", indices)
# score = 96, indices = [6, 7, 8, 9, 10]
Prefix/Postfix Matching
matcher = Nucleoc::Matcher.new
# Prefix: needle must match start of haystack
score = matcher.prefix_match("hello world", "hello") # => 140
score = matcher.prefix_match(" hello world", "hello") # => 140 (ignores leading whitespace)
# Postfix: needle must match end of haystack
score = matcher.postfix_match("hello world", "world") # => 96
score = matcher.postfix_match("hello world ", "world") # => 96 (ignores trailing whitespace)
Greedy Fuzzy Matching
matcher = Nucleoc::Matcher.new
# Greedy algorithm (faster but may not find optimal score)
score = matcher.fuzzy_match_greedy("hello world", "hlo") # => 140
3. Configuration
Custom Configuration
# Default configuration (case-insensitive, normalized)
config = Nucleoc::Config::DEFAULT
# Custom configuration
config = Nucleoc::Config.new(
ignore_case: false, # Case-sensitive matching
normalize: true, # Unicode normalization
prefer_prefix: false, # Don't give bonus to matches near start
delimiter_chars: "/,:;|", # Characters that act as word boundaries
bonus_boundary_white: Nucleoc::BONUS_BOUNDARY + 2_u16,
bonus_boundary_delimiter: Nucleoc::BONUS_BOUNDARY + 1_u16,
initial_char_class: Nucleoc::CharClass::Whitespace
)
matcher = Nucleoc::Matcher.new(config)
File Path Configuration
# Optimized for matching file paths
config = Nucleoc::Config::DEFAULT.match_paths
matcher = Nucleoc::Matcher.new(config)
# On Unix: delimiter_chars = "/"
# On Windows: delimiter_chars = "/\\"
4. Pattern Parsing
Basic Patterns
# Parse a pattern with multiple atoms (space-separated)
pattern = Nucleoc::Pattern.parse("hello world")
# Matches "hello" AND "world" (both must match)
# Parse with specific case handling
pattern = Nucleoc::Pattern.parse("Hello World",
case_matching: Nucleoc::CaseMatching::Smart,
normalization: Nucleoc::Normalization::Smart
)
Pattern Syntax
# ^ = prefix match
pattern = Nucleoc::Pattern.parse("^hello") # Must start with "hello"
# ' = substring match
pattern = Nucleoc::Pattern.parse("'world") # Must contain "world" as substring
# $ = postfix match (or exact if combined with ^)
pattern = Nucleoc::Pattern.parse("world$") # Must end with "world"
pattern = Nucleoc::Pattern.parse("^hello$") # Must be exactly "hello"
# ! = negative match
pattern = Nucleoc::Pattern.parse("!error") # Must NOT contain "error"
# Escaping special characters
pattern = Nucleoc::Pattern.parse("\\!hello") # Literal "!hello"
pattern = Nucleoc::Pattern.parse("\\^start") # Literal "^start"
pattern = Nucleoc::Pattern.parse("\\'quote") # Literal "'quote"
Using Patterns with Matcher
matcher = Nucleoc::Matcher.new
pattern = Nucleoc::Pattern.parse("hello world")
# Match pattern against haystack
score = pattern.match(matcher, "hello beautiful world")
# score = combined score of "hello" and "world" matches
# With indices
indices = [] of Array(UInt32)
score = pattern.match(matcher, "hello beautiful world", indices)
# indices = [[0, 1, 2, 3, 4], [16, 17, 18, 19, 20]]
5. Advanced Features
Parallel Matching
# Match multiple haystacks against single needle in parallel
haystacks = ["hello", "world", "foo", "bar"]
needle = "lo"
# Returns array of scores in same order as input
scores = Nucleoc.parallel_fuzzy_match(haystacks, needle)
# scores = [score_for_hello, score_for_world, nil, nil]
# With indices
results = Nucleoc.parallel_fuzzy_indices(haystacks, needle)
# results = [{score, indices}, {score, indices}, nil, nil]
# Force a strategy (:sequential, :fiber, :spawn, :fiber_pool, :cml_pool, :pool, :auto)
scores = Nucleoc.parallel_fuzzy_match(haystacks, needle, strategy: :spawn)
# :pool is an alias for :cml_pool; :auto picks based on batch size.
Notes:
CRYSTAL_WORKERS=1tends to favor sequential/fiber paths; pools rarely help.- For
CRYSTAL_WORKERS=2, spawn/fiber often wins at mid-sized batches. - MultiPattern
score_parallelis usually slower thanscoreat typical sizes.
Custom Worker Pool
# Create worker pools with custom size
cml_pool = Nucleoc::CMLWorkerPool.new(4)
fiber_pool = Nucleoc::FiberWorkerPool.new(4)
# Batch matching
scores, indices = cml_pool.match_many(haystacks, needle, compute_indices: true)
scores, indices = fiber_pool.match_many(haystacks, needle, compute_indices: true)
Direct API Functions
# Static convenience methods
score = Nucleoc.fuzzy_match("hello world", "hlo")
score = Nucleoc.substring_match("hello world", "world")
score = Nucleoc.prefix_match("hello world", "hello")
score = Nucleoc.postfix_match("hello world", "world")
# With indices
result = Nucleoc.fuzzy_match_indices("hello world", "hlo")
# result = {score, indices}
6. Nucleo API (Advanced)
Managing Collections
# Create a Nucleo instance for managing collections
nucleo = Nucleoc.new_matcher
# Optionally cap results for faster snapshot builds
nucleo = Nucleoc.new_matcher(max_results: 100)
# Add items
nucleo.add("hello")
nucleo.add_all(["world", "foo", "bar"])
# Update pattern
nucleo.pattern = "lo" # Sets pattern to "lo"
# Schedule snapshot recompute (async in this Crystal port)
status = nucleo.tick(0)
puts "changed=#{status.changed?} running=#{status.running?}"
# Get matches
snapshot = nucleo.match
snapshot.items.each do |result|
puts "#{result.item}: #{result.score}"
end
# Clear items
nucleo.clear
Notes:
tickschedules background matching and reports whether a run is still in progress.- Use
parallel_fuzzy_matchorCMLWorkerPoolfor parallel batch matching.
Incremental Updates with Injector
nucleo = Nucleoc.new_matcher
# Get injector for batch operations
injector = nucleo.injector
# Add items through injector
injector.inject(0, "hello")
injector.extend(["world", "foo", "bar"])
# Injector automatically unregisters when done
7. Multi-Column Matching (MultiPattern)
matcher = Nucleoc::Matcher.new
pattern = Nucleoc::MultiPattern.new(2)
pattern.reparse(0, "foo")
pattern.reparse(1, "bar")
haystacks = ["foo.txt", "bar.log"]
score = pattern.score(haystacks, matcher)
puts score
8. UI Tick Loop Usage
The high-level Nucleo API is designed to be called from your UI loop. Each UI tick updates the pattern, calls tick, and reads the latest snapshot. In this Crystal port, tick schedules background matching and returns quickly.
config = Nucleoc::Config.new
nucleo = Nucleoc::Nucleo(Int32).new(config, -> { nil }, 1, 1)
injector = nucleo.injector
injector.extend(["alpha", "beta", "gamma", "delta"])
loop do
# Replace this with your input handling
query = "ga"
nucleo.pattern = query
status = nucleo.tick(0)
if status.changed?
snapshot = nucleo.match
snapshot.items.each do |match|
puts "#{match.item}: #{match.score}"
end
end
break
end
Debouncing Redraws
tick is non-blocking in this port, so avoid scheduling new runs more frequently than your UI needs. A common approach is to debounce redraws to ~16ms (60 FPS).
last_tick = Time.monotonic
loop do
now = Time.monotonic
if now - last_tick >= 16.milliseconds
last_tick = now
nucleo.pattern = "query"
status = nucleo.tick(0)
puts "changed=#{status.changed?}"
end
break
end
Streaming Updates
Inject items over time and call tick regularly to keep results up to date.
injector = nucleo.injector
items = ["alpha", "beta", "gamma", "delta", "epsilon"]
items.each do |item|
injector.inject(0, item)
nucleo.tick(0)
end
9. Debugging and Logging
# Enable debug logging
Log.setup(:debug)
matcher = Nucleoc::Matcher.new
score = matcher.fuzzy_match("hello world", "hlo")
# Logs include matrix layout, scoring steps, and reconstruction
# Or set environment variable
# LOG_LEVEL=DEBUG crystal run your_script.cr
10. Performance Tips
- Reuse Matchers: Create matcher once and reuse for multiple matches
- Use Appropriate Algorithm: Choose exact/substring when possible instead of fuzzy
- Batch Operations: Use
parallel_fuzzy_matchfor multiple haystacks - Pre-compile Patterns: Parse patterns once and reuse
- Configure Delimiters: Set appropriate
delimiter_charsfor your use case
11. Common Use Cases
File Search
config = Nucleoc::Config::DEFAULT.match_paths
matcher = Nucleoc::Matcher.new(config)
# Match file paths
files = ["src/nucleoc/api.cr", "spec/nucleoc_spec.cr", "README.md"]
pattern = Nucleoc::Pattern.parse("nucleoc cr")
files.each do |file|
if score = pattern.match(matcher, file)
puts "#{file}: #{score}"
end
end
Autocomplete
config = Nucleoc::Config.new(prefer_prefix: true)
matcher = Nucleoc::Matcher.new(config)
options = ["create", "read", "update", "delete", "config"]
query = "cr"
options.each do |option|
if score = matcher.fuzzy_match(option, query)
puts "#{option}: #{score}"
end
end
# "create" gets bonus for starting with "cr"
Filtering Lists
matcher = Nucleoc::Matcher.new
items = ["apple", "banana", "cherry", "date", "elderberry"]
pattern = Nucleoc::Pattern.parse("!e a") # No 'e', contains 'a'
filtered = items.select do |item|
pattern.match(matcher, item)
end
# filtered = ["banana", "date"]
Configuration Reference
Config Struct Fields
| Field | Type | Default | Description |
|---|---|---|---|
delimiter_chars |
String |
`"/,:; | "` |
bonus_boundary_white |
UInt16 |
BONUS_BOUNDARY + 2 |
Bonus for boundary after whitespace |
bonus_boundary_delimiter |
UInt16 |
BONUS_BOUNDARY + 1 |
Bonus for boundary after delimiter |
initial_char_class |
CharClass |
CharClass::Whitespace |
Class for start of string |
normalize? |
Bool |
true |
Enable Unicode normalization |
ignore_case? |
Bool |
true |
Case-insensitive matching |
prefer_prefix? |
Bool |
false |
Give bonus to matches near start |
Character Classes
Nucleoc uses character classification for scoring bonuses:
enum CharClass
Whitespace # Space, tab, newline
Delimiter # Characters in delimiter_chars
NonWord # Symbols like @, #, $
Number # 0-9
Lower # a-z
Upper # A-Z
end
Scoring Constants
SCORE_MATCH = 16 # Base score for each match
PENALTY_GAP_START = 3 # Penalty for starting a gap
PENALTY_GAP_EXTENSION = 1 # Penalty for extending a gap
BONUS_BOUNDARY = 8 # Bonus for word boundary
BONUS_CONSECUTIVE = 4 # Bonus for consecutive matches
BONUS_FIRST_CHAR_MULTIPLIER = 2 # Multiplier for first character bonus
Bonus Calculation
Bonuses are awarded for:
- Word boundaries (after whitespace/delimiter)
- CamelCase transitions (
lower → upper) - Number boundaries (
non-number → number) - Consecutive matches
Debugging
Nucleoc uses Crystal's Log system. Set LOG_LEVEL=DEBUG to see detailed matcher traces, including matrix layout, scoring, and reconstruction steps:
LOG_LEVEL=DEBUG crystal spec
Development Status
This project is a complete port of the Rust nucleo library. The implementation includes:
- Complete matching algorithms - Fuzzy (greedy and optimal), exact, substring, prefix/postfix
- Pattern parsing - Full pattern syntax with operators and escaping
- Unicode support - Full Unicode normalization and character classification
- Performance optimizations - Compressed matrix representation, prefiltering, efficient scoring
- Configuration - Flexible scoring parameters and matching options
Feature Parity with Rust Nucleo
- ✅ Core matching algorithms - All algorithms from Rust implementation
- ✅ Scoring system - Exact scoring constants and bonus calculations
- ✅ Unicode handling - Full Unicode normalization and case folding
- ✅ Pattern parsing - Complete pattern syntax with operators
- ✅ Core data structures - BoxcarVector, worker pool, parallel sort
- ✅ Test coverage - 125/125 tests passing with exact behavior matching
Feature Status
| Feature | Status | Notes |
|---|---|---|
| Core Matching Algorithms | ✅ Complete | Fuzzy (greedy/optimal), exact, substring, prefix/postfix |
| Pattern Parsing | ✅ Complete | Full syntax with operators and escaping |
| Unicode Support | ✅ Complete | Normalization and case folding |
| Configuration System | ✅ Complete | Custom scoring, delimiters, case handling |
| Boxcar Data Structure | ✅ Complete | Lock-free vector with snapshots (src/nucleoc/boxcar.cr) |
| Worker Pool | ✅ Complete | Thread pool for concurrent matching (src/nucleoc/worker_pool.cr) |
| CML Worker Pool | ✅ Complete | Concurrent ML-based agent system (src/nucleoc/worker_pool_cml.cr) |
| Parallel Sorting | 🔄 In Progress | Parallel quicksort with cancellation (src/nucleoc/par_sort.cr) - bug fixes needed (issue nucleoc-8f6) |
| MultiPattern | 🔄 Planned | Incremental pattern updates (issue nucleoc-efu) |
| Advanced CML Patterns | 🔄 Planned | choose, wrap, with_nack, guard (issue nucleoc-fep) |
| Parallel Matcher | 🔄 Planned | Intra-task parallelism like Rayon's par_iter (issue nucleoc-aa2) |
| Concurrency Tests | 🔄 Planned | Comprehensive race condition testing (issue nucleoc-2gq) |
| Error Handling & Recovery | 🔄 Planned | Supervisor patterns, circuit breaker (issue nucleoc-bsy) |
Legend: ✅ = Implemented, 🔄 = In Progress/Planned, ❌ = Not Started
Development
Prerequisites
- Crystal 1.18.2 or later
- Git
Setup
git clone https://github.com/dsisnero/nucleoc.git
cd nucleoc
shards install
Examples
crystal run examples/basic.cr
crystal run examples/nucleo_worker.cr
crystal run examples/multi_pattern.cr
crystal run examples/worker_pool.cr
Running Tests
crystal spec
Code Quality
# Format code
crystal tool format src/ spec/
# Run linter
ameba
Benchmarks
Benchmark harnesses live under bench/. Run with --release for meaningful results:
CRYSTAL_CACHE_DIR=.crystal-cache crystal run bench/src/main.cr --release -- all
To target specific benchmarks or tune the dataset:
BENCH_DATASET=20000 BENCH_CORES=1,2,4 crystal run bench/src/main.cr --release -- worker_pool
See PERFORMANCE.md for how to capture results and compare against Rust nucleo benchmarks.
Contributing
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Contributors
- Dominic Sisneros - creator and maintainer
See the full list of contributors who participated in this project.
Acknowledgments
- Based on the nucleo Rust library by Pascal Kuthe and the Helix editor team
- Inspired by fzf and skim fuzzy matching algorithms
- Uses CML for concurrent ML patterns
API Quick Reference
Core Classes
| Class | Purpose | Key Methods |
|---|---|---|
Matcher |
Main matching engine | fuzzy_match, exact_match, substring_match, prefix_match, postfix_match |
Pattern |
Parsed query pattern | parse, match |
Config |
Matching configuration | new, match_paths, bonus_for |
WorkerPool |
Parallel matching | match_many |
Nucleo |
Collection manager | add, clear, match, injector |
Static Convenience Methods
Nucleoc.fuzzy_match(haystack, needle) → UInt16?
Nucleoc.substring_match(haystack, needle) → UInt16?
Nucleoc.prefix_match(haystack, needle) → UInt16?
Nucleoc.postfix_match(haystack, needle) → UInt16?
Nucleoc.parallel_fuzzy_match(haystacks, needle) → Array(UInt16?)
Nucleoc.parallel_fuzzy_indices(haystacks, needle) → Array(Tuple(UInt16, Array(UInt32))?)
Pattern Syntax Cheat Sheet
| Syntax | Meaning | Example |
|---|---|---|
text |
Fuzzy match | "hello" matches "hlo" |
'text |
Substring match | "'world" matches "hello world" |
^text |
Prefix match | "^hello" matches "hello world" |
text$ |
Postfix match | "world$" matches "hello world" |
^text$ |
Exact match | "^hello$" matches only "hello" |
!text |
Negative match | "!error" excludes "error.log" |
a b |
AND (space) | "hello world" matches both |
\ |
Escape | "\!hello" matches literal "!hello" |
License
This project is licensed under the MIT License - see the LICENSE file for details.
nucleoc
- 0
- 0
- 0
- 1
- 2
- about 18 hours ago
- December 18, 2025
MIT License
Sun, 11 Jan 2026 01:56:35 GMT