cuckoofilter
Cuckoo Filter for Crystal
A Crystal shard implementing compact Cuckoo filters for approximate set-membership queries.
A Cuckoo filter answers the question: "Have I probably seen this value before?" It is similar to a Bloom filter, but it also supports deletion. The tradeoff is that lookups can return false positives: a value that was never inserted may occasionally be reported as present. False negatives are not expected for successfully inserted values unless the value was later deleted.
Features
- Fixed-size
Filterwith 8-bit fingerprints. - Fixed-size
Filter16with 16-bit fingerprints and a much lower false positive rate. ScalableFilterandScalableFilter16, which grow by adding more internal filters.- Insert, unique insert, lookup, delete, reset, count, encode, and decode operations.
- No runtime dependencies.
- Benchmarks for normal and very large workloads.
How It Works
Each inserted value is hashed into:
- one fingerprint
- two possible bucket indices
Each bucket stores four fingerprints. A lookup checks both candidate buckets. Insertions first try the two candidate buckets directly; if both are full, the filter performs bounded Cuckoo relocation.
The implementation uses:
BUCKET_SIZE = 4MAX_CUCKOO_COUNT = 500- FNV-1a 64-bit hashing by default
- non-zero fingerprints, so
0can represent an empty slot
Choosing a Filter
Use Filter when memory is more important and an approximate false positive rate around 3% is acceptable.
Use Filter16 when false positives are expensive. It uses two bytes per fingerprint instead of one, so encoded filter data and fingerprint storage are roughly twice as large, but false positives drop dramatically.
Approximate false positive rates:
| Filter | Fingerprint | Approximate false positive rate | Cost |
|---|---|---|---|
Filter |
8-bit | ~3.1% | smallest |
Filter16 |
16-bit | ~0.012% theoretical | about 2x fingerprint storage |
In the 30,000,000-key benchmark on this machine:
| Filter | Miss lookups | False positives | Observed rate |
|---|---|---|---|
Filter |
30,000,000 | 984,407 | ~3.28% |
Filter16 |
30,000,000 | 1,653 | ~0.0055% |
Installation
Add the shard to your shard.yml:
dependencies:
cuckoofilter:
github: creadone/cuckoofilter
Then install dependencies:
shards install
Basic Usage
require "cuckoofilter"
filter = CuckooFilter::Filter.new(1_000)
filter.insert("alpha") # => true
filter.lookup("alpha") # => true
filter.lookup("missing") # => usually false, but false positives are possible
filter.count # => 1_u64
filter.delete("alpha") # => true
filter.lookup("alpha") # => false
filter.count # => 0_u64
Strings and byte slices are supported:
filter.insert("hello")
filter.insert(Bytes[1, 2, 3, 4])
Unique Inserts
insert allows duplicate fingerprints and can increase the count when the same value is inserted more than once.
Use insert_unique when you want to skip values that already appear to be present:
filter = CuckooFilter::Filter.new(1_000)
filter.insert_unique("alpha") # => true
filter.insert_unique("alpha") # => false
filter.count # => 1_u64
Because this is an approximate data structure, insert_unique can also reject a value that was not inserted before if lookup returns a false positive.
Lower False Positives with Filter16
Filter16 has the same API as Filter, but stores 16-bit fingerprints:
filter = CuckooFilter::Filter16.new(1_000)
filter.insert("alpha")
filter.lookup("alpha") # => true
filter.lookup("missing") # => much less likely to be a false positive
Use it when the additional memory is worth the lower false positive rate.
Scalable Filters
A fixed-size filter can become too full. When insertion pressure is high, use a scalable filter:
filter = CuckooFilter::ScalableFilter.new
100_000.times do |index|
filter.insert("item-#{index}")
end
filter.lookup("item-42") # => true
filter.count # => 100_000_u64
For lower false positives, use the 16-bit scalable variant:
filter = CuckooFilter::ScalableFilter16.new
filter.insert("value")
filter.lookup("value") # => true
The default scalable configuration is:
- initial capacity:
10_000 - load factor:
0.9 - growth: roughly doubles the next filter capacity
You can customize it:
filter = CuckooFilter::ScalableFilter.new(
load_factor: 0.8_f32,
capacity: 50_000,
scale_factor: ->(current_bucket_count : UInt64) { current_bucket_count * 8 }
)
Encoding and Decoding
Filters can be serialized to bytes:
filter = CuckooFilter::Filter.new(1_000)
filter.insert("alpha")
encoded = filter.encode
decoded = CuckooFilter::Filter.decode(encoded)
decoded.lookup("alpha") # => true
Filter16 uses a different encoding because each fingerprint is two bytes:
encoded = filter16.encode
decoded = CuckooFilter::Filter16.decode(encoded)
Scalable filters can also be encoded:
encoded = scalable.encode
decoded = CuckooFilter::ScalableFilter.decode(encoded)
Use the matching decode method for the filter type:
CuckooFilter::Filter.decodeCuckooFilter::Filter16.decodeCuckooFilter::ScalableFilter.decodeCuckooFilter::ScalableFilter16.decode
Top-Level Helpers
The shard also exposes convenience constructors:
CuckooFilter.new_filter(1_000)
CuckooFilter.decode(bytes)
CuckooFilter.new_filter16(1_000)
CuckooFilter.decode16(bytes)
CuckooFilter.new_scalable_filter
CuckooFilter.decode_scalable_filter(bytes)
CuckooFilter.new_scalable_filter16
CuckooFilter.decode_scalable_filter16(bytes)
There is also a short alias:
filter = Cuckoo::Filter.new(1_000)
Benchmarks
Run the regular benchmark:
CRYSTAL_CACHE_DIR=.crystal-cache crystal run bench/cuckoofilter_bench.cr --release
Adjust the benchmark size:
BENCH_KEYS=100000 BENCH_LOOKUPS=500000 BENCH_CODEC=5000 crystal run bench/cuckoofilter_bench.cr --release
For very large key counts, use the streaming benchmark. It generates compact binary keys on the fly and avoids prebuilding large arrays of strings:
BENCH_KEYS=30000000 crystal run bench/large_filter_bench.cr --release
Compare the 16-bit variant:
BENCH_KEYS=30000000 BENCH_FINGERPRINT_BITS=16 crystal run bench/large_filter_bench.cr --release
Example results from this machine:
keys=30000000 capacity=30000000 fingerprint_bits=8
insert 30000000 6333727.70 4.737s 30000000
lookup hit 30000000 11687808.70 2.567s 30000000
lookup miss 30000000 13551078.33 2.214s 984407
keys=30000000 capacity=30000000 fingerprint_bits=16
insert 30000000 6350104.88 4.724s 30000000
lookup hit 30000000 11350244.47 2.643s 30000000
lookup miss 30000000 13707325.79 2.189s 1653
Benchmark numbers depend on hardware, Crystal version, compiler flags, and current system load.
Development
Run the test suite:
crystal spec
Check shard dependencies:
shards check
Format code:
crystal tool format
Notes and Limitations
- This is an approximate data structure: false positives are possible.
deleteremoves a matching fingerprint, not the original value.insert_uniquedepends onlookup, so it can be affected by false positives.- Encoded bytes are specific to the selected filter type.
License
MIT
cuckoofilter
- 0
- 0
- 0
- 0
- 0
- about 1 hour ago
- May 12, 2026
MIT License
Tue, 12 May 2026 09:47:18 GMT