logos
Logos for Crystal
A fast lexer generator for Crystal, ported from Rust's Logos library.
Overview
Logos allows you to create fast lexers by defining tokens in a Logos.define block. The library generates an optimized lexer at compile time with zero runtime overhead.
This is a Crystal port of the Rust Logos library, aiming to provide similar functionality and performance characteristics while following Crystal idioms.
Features
- Zero-copy parsing - Work with slices of your input
- Built-in error recovery - Skip invalid tokens and continue parsing
- Token disambiguation - Automatic priority resolution for overlapping patterns
- UTF-8 safe - Proper handling of Unicode boundaries
- No runtime dependencies - Pure Crystal implementation
- Fast compile times - Minimal macro overhead
Installation
Add this to your shard.yml:
dependencies:
logos:
github: dsisnero/logos
Then run:
shards install
Quick Start
require "logos"
Logos.define Token do
error_type Nil
token "fn", :KeywordFn
token "let", :KeywordLet
regex "[a-zA-Z_][a-zA-Z0-9_]*", :Identifier
regex "[0-9]+", :Number
token "+", :Plus
token "-", :Minus
skip_regex "\\s+", :Whitespace
end
lexer = Token.lexer("fn hello = 42")
loop do
result = lexer.next
break if result.is_a?(Iterator::Stop)
result = result.as(Logos::Result(Token, Nil))
puts "#{result.unwrap}: #{lexer.slice}"
end
Context-Dependent Lexing
Use Lexer#morph to switch token modes while preserving cursor position.
require "logos"
Logos.define OuterToken do
token "{", :Open
regex "[^\\{]+", :Text
end
Logos.define InnerToken do
token "}", :Close
regex "[^\\}]+", :Body
end
outer = OuterToken.lexer("prefix{inside}suffix")
token = outer.next.as(Logos::Result(OuterToken, Nil))
token.unwrap # => :Text
token = outer.next.as(Logos::Result(OuterToken, Nil))
if token.unwrap == OuterToken::Open
inner = outer.morph(InnerToken)
inner_token = inner.next.as(Logos::Result(InnerToken, Nil))
puts inner_token.unwrap # => InnerToken::Body
end
Use #spanned when you need (token, span) tuples:
lexer = OuterToken.lexer("ab{cd}")
lexer.spanned.each do |result, span|
puts "#{result.unwrap} @ #{span}"
end
Token Disambiguation and Priority
When multiple patterns can match at the same position:
- Logos prefers the longest match.
- If multiple matches have the same length, higher
prioritywins. - If same-length and same-priority patterns overlap, Logos raises a compile-time diagnostic.
Example:
Logos.define Token do
token "===", :StrictEq
token "==", :Eq
token "=", :Assign
regex "[a-zA-Z_][a-zA-Z0-9_]*", :Ident
end
Explicit priorities:
Logos.define Token do
regex "[a-z]+", :Word, priority: 10
token "if", :If, priority: 50
end
If you hit ambiguity diagnostics, either:
- Raise priority for the intended winner, or
- Refine regex patterns so they no longer overlap at equal priority.
Annotation-based API
For a Rust-style attribute-driven setup, use type-level annotations and logos_derive:
require "logos"
@[Logos::Options(skip: "\\s+", error: Nil)]
@[Logos::Subpattern("xdigit", "[0-9a-fA-F]")]
@[Logos::Token(:KeywordLet, "let")]
@[Logos::Regex(:Hex, "0x(?&xdigit)+")]
@[Logos::Regex(:Number, "[0-9]+")]
enum Token
KeywordLet
Hex
Number
end
logos_derive(Token)
lexer = Token.lexer("let 0x10 42")
Notes:
- Crystal cannot introspect per-enum-variant annotations the same way Rust proc-macros do, so mappings are declared at the enum type level.
Logos::TokenandLogos::Regexsupport both(:Variant, "pattern")and(pattern, variant: :Variant)forms.
Subpatterns
Logos.define Token do
error_type Nil
subpattern :xdigit, "[0-9a-fA-F]"
regex "0[xX](?&xdigit)+", :Hex
end
Examples
Crystal ports of the Rust Logos examples are available in examples/:
examples/brainfuck.crexamples/calculator.crexamples/custom_error.crexamples/extras.crexamples/json.crexamples/json_borrowed.crexamples/string_interpolation.crexamples/token_values.cr
Rust Handbook Parity Index
Reference mapping from Rust handbook topics to Crystal docs/spec coverage:
- Getting started: this README (Quick Start) and
spec/logos/simple_spec.cr - Attributes / derive: this README (Annotation-based API) and
spec/logos/derive_spec.cr - Callbacks:
spec/logos/callbacks_spec.cr - Extras:
examples/extras.crandspec/logos/custom_error_spec.cr - Common regex patterns:
spec/logos/advanced_spec.cr,spec/logos/properties_spec.cr - Context-dependent lexing: this README (Context-Dependent Lexing) and
spec/logos/lexer_modes_spec.cr - Token disambiguation: this README (Token Disambiguation and Priority) and
spec/logos/old_logos_bugs_spec.cr - Unicode support:
spec/logos/unicode_dot_spec.cr,spec/logos/ignore_case_spec.cr - Source and spans:
spec/logos/source_spec.cr,spec/logos/lexer_spec.cr
Enum Payload Parity Plan (logos-04d)
Rust uses enum variants with associated payloads. Crystal enums do not, so the port currently uses Lexer#callback_value_as(T) as a side-channel.
Planned parity path:
logos-nxw(done): Design union-backed typed payload API.logos-15n: Implement typed payload extraction helpers onLexer/Result.logos-pk9: Port Rust payload-oriented examples to concise Crystal equivalents.
Rust UI Error Parity (tests/ui/err)
| Rust case | Crystal status | Crystal coverage |
|---|---|---|
greedy-without-config.rs |
Supported | spec/logos/greedy_spec.cr, spec/logos/diagnostics_spec.cr |
priority-conflict.rs |
Supported | spec/logos/diagnostics_spec.cr |
subpattern-match-empty.rs |
Supported | spec/logos/diagnostics_spec.cr |
regex-non-utf8.rs |
Supported | spec/logos/diagnostics_spec.cr (use bytes: true to mark byte-oriented patterns) |
token-non-utf8.rs |
Supported | spec/logos/diagnostics_spec.cr |
multiple-export-dirs.rs |
Not applicable | Rust-only export_dir derive option is not implemented in Crystal annotations |
Status
✅ Port Complete: The Crystal port is feature-complete and the full spec suite passes.
Current Progress
- Source abstraction (
StringandSlice(UInt8)) - Basic lexer structure and state machine
- Pattern AST and parsing
- Result types and error handling
- Regex pattern compilation
- NFA/DFA construction
- Code generation
- Full test suite
Dependencies
The project includes two companion shards ported from Rust:
regex-syntax- Regular expression parserregex-automata- Automata construction library
Hybrid Automaton Status
Regex::Automata::Hybrid::LazyDFAnow performs lazy determinization and caches transitions on demand.- It supports anchored/unanchored start states and reuses the same look-around semantics as the DFA builder.
- This hybrid path is suitable for large regex sets where eager DFA construction has higher startup cost.
Development
# Install dependencies
make install
# Run tests
make test
# Format code
crystal tool format --check
# Lint code
make lint
Contributing
- Fork it (https://github.com/dsisnero/logos/fork)
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
License
MIT - see LICENSE file
Acknowledgments
- maciejhirsz/logos - The original Rust implementation
- BurntSushi/regex-automata - Rust regex engine used as reference
- rust-lang/regex - Rust regex library used as reference
logos
- 0
- 0
- 0
- 0
- 0
- 13 days ago
- February 6, 2026
MIT License
Thu, 19 Feb 2026 22:49:28 GMT