hecate-lex

hecate-lex

Powerful lexical analysis library for Crystal with declarative DSL and rich diagnostics.

Table of Contents

Install

Add this to your application's shard.yml:

dependencies:
  hecate-lex:
    github: hecatecr/hecate-lex
    version: ~> 0.1.0

Run shards install to install dependencies.

Usage

Basic Usage

require "hecate-lex"

# Define a simple calculator lexer
calculator_lexer = Hecate::Lex.define do
  # Skip whitespace
  token :WS, /\s+/, skip: true
  
  # Numbers
  token :INTEGER, /\d+/
  token :FLOAT, /\d+\.\d+/
  
  # Operators
  token :PLUS, /\+/
  token :MINUS, /-/
  token :MULTIPLY, /\*/
  token :DIVIDE, /\//
  
  # Parentheses
  token :LPAREN, /\(/
  token :RPAREN, /\)/
  
  # Identifiers
  token :IDENTIFIER, /[a-zA-Z_]\w*/
end

# Create a source map and add input
source_map = Hecate::Core::SourceMap.new
source_id = source_map.add_file("input.calc", "3.14 + x * (2 + 1)")

# Scan the input
tokens, diagnostics = calculator_lexer.scan(source_map.get(source_id).not_nil!)

# Process tokens
tokens.each do |token|
  puts "#{token.kind}: '#{token.lexeme(source_map)}'"
end

Type-Safe Lexers

For better type safety, define lexers with an enum:

enum CalcToken
  # Skip tokens
  WS
  
  # Literals
  INTEGER
  FLOAT
  IDENTIFIER
  
  # Operators
  PLUS
  MINUS
  MULTIPLY
  DIVIDE
  
  # Delimiters
  LPAREN
  RPAREN
end

# Define lexer with enum
lexer = Hecate::Lex.define(CalcToken) do |ctx|
  ctx.token :WS, /\s+/, skip: true
  ctx.token :FLOAT, /\d+\.\d+/
  ctx.token :INTEGER, /\d+/
  # ... rest of tokens
end

Error Handling

Built-in error handlers for common cases:

lexer = Hecate::Lex.define do |ctx|
  # Basic token definition
  ctx.token :STRING, /"([^"\\]|\\.)*"/
  ctx.token :COMMENT, /\/\*.*?\*\//
    
  # Custom error handler
  ctx.error :STRING do |input, pos|
    Hecate.error("unterminated string literal")
      .primary(span, "string starts here")
      .help("add a closing quote")
  end
end

API

DSL Methods

Create lexers using the declarative DSL:

# Dynamic lexer (runtime token types)
lexer = Hecate::Lex.define do
  token :KIND, /pattern/           # Basic token
  token :WS, /\s+/, skip: true     # Skip token
  token :ID, /\w+/, priority: 10   # With priority
  
  # Error handlers
  error :handler_name do |input, pos|
    # Return a diagnostic
  end
end

# Type-safe lexer with enum
lexer = Hecate::Lex.define(TokenEnum) do |ctx|
  ctx.token :KIND, /pattern/
end

Token Priorities

The lexer uses a longest-match-wins strategy with priority resolution:

lexer = Hecate::Lex.define do
  # Higher priority wins for same-length matches
  token :ELSE_IF, /else\s+if/, priority: 20
  token :ELSE, /else/, priority: 10
  token :IF, /if/, priority: 10
  token :IDENTIFIER, /[a-zA-Z_]\w*/, priority: 1
end

Default priorities:

  • Keywords: 10
  • Operators: 5-8
  • Identifiers: 1
  • Others: 5

Nesting Tracker

Track paired tokens for proper nesting validation:

# Use built-in bracket tracker
tracker = Hecate::Lex::NestingTracker.bracket_tracker(
  brace_open: :LBRACE,
  brace_close: :RBRACE,
  bracket_open: :LBRACKET,
  bracket_close: :RBRACKET,
  paren_open: :LPAREN,
  paren_close: :RPAREN
)

# Or create custom tracker
tracker = Hecate::Lex::NestingTracker.new(
  open_tokens: [:LPAREN, :LBRACE, :BEGIN],
  close_tokens: [:RPAREN, :RBRACE, :END],
  pairs: {
    :RPAREN => :LPAREN,
    :RBRACE => :LBRACE,
    :END => :BEGIN
  }
)

# Process tokens
tokens.each do |token|
  indent_level = tracker.process(token)
  puts "#{"  " * indent_level}#{token.kind}: #{token.lexeme(source_map)}"
end

# Validate nesting
unless tracker.balanced?
  unclosed = tracker.unclosed_tokens
  error_msg = tracker.validation_error
  # Handle unmatched tokens
end

Dynamic Lexers

Create lexers at runtime without predefined types:

# Dynamic lexer returns DynamicToken objects
lexer = Hecate::Lex.define do
  token :NUMBER, /\d+/
  token :WORD, /\w+/
end

tokens, _ = lexer.scan(source)
tokens.each do |token|
  puts token.kind_name  # "NUMBER", "WORD", etc.
end

For complete API documentation, see the Crystal docs.

Contributing

This repository is a read-only mirror. All development happens in the Hecate monorepo.

License

MIT © Chris Watson. See LICENSE for details.

Repository

hecate-lex

Owner
Statistic
  • 0
  • 0
  • 0
  • 1
  • 1
  • 7 months ago
  • July 25, 2025
License

MIT License

Links
Synced at

Tue, 27 Jan 2026 10:43:17 GMT

Languages