obelisk

Crystal syntax highlighting library inspired by Chroma

CI Status Coverage Release Crystal Version License GitHub Stars

Obelisk is a Crystal syntax highlighting library inspired by Chroma. It provides a performant, type-safe, and extensible syntax highlighting solution with support for multiple languages, themes, and output formats.

Features

  • Multi-language support: Built-in lexers for Crystal, JSON, YAML, and more
  • Multiple output formats: HTML, ANSI terminal colors, plain text, and JSON
  • Theme ecosystem: Compatible with TextMate, Sublime Text, VS Code, and Chroma themes
  • Advanced lexing: State machines, context-sensitive parsing, and delegating lexers
  • Theme serialization: Import/export themes in JSON, TextMate (.tmTheme), and Chroma XML formats
  • Extensible architecture: Easy to add new languages, formatters, and themes
  • Type-safe design: Leverages Crystal's type system for reliability
  • Performance focused: Hand-written lexers with compile-time optimizations

Installation

Add this to your application's shard.yml:

dependencies:
  obelisk:
    github: watzon/obelisk

Then run:

shards install

Quick Start

require "obelisk"

# Simple syntax highlighting
code = %q(
  def hello(name : String) : String
    "Hello, #{name}!"
  end
)

# Highlight Crystal code as HTML
html = Obelisk.highlight(code, "crystal", "html", "github")
puts html

# Highlight with ANSI colors for terminal
ansi = Obelisk.highlight(code, "crystal", "terminal", "monokai")
puts ansi

Usage

Basic Highlighting

# Quick highlighting
result = Obelisk.highlight(source_code, language, formatter, style)

# Available options
languages = Obelisk.lexer_names      # ["crystal", "json", "yaml", "text"]
formatters = Obelisk.formatter_names # ["html", "html-classes", "terminal", "text", "json"]
styles = Obelisk.style_names         # ["github", "monokai", "bw"]

Manual Tokenization

# Get a lexer and tokenize manually
lexer = Obelisk.lexer("crystal")
tokens = lexer.tokenize(source_code) if lexer

# Process tokens
tokens.each do |token|
  puts "#{token.type}: #{token.value.inspect}"
end

Custom Formatting

# Create formatters with options
html_formatter = Obelisk::HTMLFormatter.new(
  with_classes: true,
  class_prefix: "syntax-",
  with_line_numbers: true
)

# Get a style
style = Obelisk::Styles.github

# Format manually
output = html_formatter.format(tokens, style)

CSS Generation

# Generate CSS for HTML with classes
formatter = Obelisk::HTMLFormatter.new(with_classes: true)
css = formatter.css(Obelisk::Styles.github)
puts css

Theme Import/Export

# Load themes from different formats
tmtheme_style = Obelisk.load_theme("path/to/theme.tmTheme")     # TextMate theme
chroma_style = Obelisk.load_theme("path/to/theme.xml")          # Chroma XML theme
json_style = Obelisk.load_theme("path/to/theme.json")           # Obelisk JSON theme

# Export themes to different formats
json_output = Obelisk.export_theme_json(style, pretty: true)
tmtheme_output = Obelisk.export_theme_tmtheme(style)
chroma_output = Obelisk.export_theme_chroma(style)

# Save themes to files (format auto-detected from extension)
Obelisk.save_theme(style, "exported.json")      # JSON format
Obelisk.save_theme(style, "exported.tmtheme")   # TextMate format
Obelisk.save_theme(style, "exported.xml")       # Chroma XML format

Supported Languages

Core Languages

  • Crystal (.cr) - Full syntax support including string interpolation, annotations, and Crystal-specific types
  • JSON (.json) - Complete JSON syntax with proper escape handling
  • YAML (.yaml, .yml) - YAML syntax including documents, anchors, and tags
  • Plain Text - Fallback for unsupported file types

Additional Languages

More languages are being actively developed. See the Roadmap for upcoming language support.

Available Themes

  • GitHub - Light theme matching GitHub's syntax highlighting
  • Monokai - Dark theme with vibrant colors
  • Black & White - Simple monochrome theme

Supported Theme Formats

  • JSON - Obelisk's native theme format
  • TextMate (.tmTheme) - Compatible with Sublime Text, VS Code, and other editors
  • Chroma XML - Compatible with Go's Chroma syntax highlighter

Output Formats

  • HTML - Standard HTML with inline styles or CSS classes
  • Terminal - ANSI escape codes for colored terminal output
  • Plain Text - Strips all formatting
  • JSON - Structured token data for analysis

Architecture

Obelisk follows a clean, modular architecture:

  • Tokens: Hierarchical token types with CSS class mapping
  • Lexers: Regex-based lexers with state machines for complex languages
  • Formatters: Pluggable output formatters for different targets
  • Styles: Theme system with inheritance and customizable colors
  • Registry: Central management of lexers, formatters, and styles

Token Types

Token types are organized hierarchically for style inheritance:

# Root categories
TokenType::Keyword
TokenType::Name
TokenType::Literal
TokenType::Comment

# Specific types inherit from parents
TokenType::KeywordDeclaration  # inherits from Keyword
TokenType::NameFunction        # inherits from Name
TokenType::LiteralString       # inherits from Literal

Creating Custom Lexers

class MyLanguageLexer < Obelisk::RegexLexer
  def config : Obelisk::LexerConfig
    Obelisk::LexerConfig.new(
      name: "mylang",
      aliases: ["my", "mylang"],
      filenames: ["*.my"],
      mime_types: ["text/x-mylang"]
    )
  end

  def rules : Hash(String, Array(Obelisk::LexerRule))
    {
      "root" => [
        Obelisk::LexerRule.new(/\bdef\b/, Obelisk::TokenType::Keyword),
        Obelisk::LexerRule.new(/[a-zA-Z_]\w*/, Obelisk::TokenType::Name),
        # ... more rules
      ]
    }
  end
end

# Register the lexer
Obelisk::Registry.lexers.register(MyLanguageLexer.new)

Creating Custom Styles

custom_style = Obelisk::Style.new("custom", Obelisk::Color::WHITE)
custom_style.set(Obelisk::TokenType::Keyword,
  Obelisk::StyleEntry.new(
    color: Obelisk::Color.from_hex("#ff0000"),
    bold: Obelisk::Trilean::Yes
  )
)

Obelisk::Registry.styles.register(custom_style)

Performance

Obelisk is designed for speed:

  • Lazy compilation of regex rules
  • Efficient token streaming with iterators
  • Minimal memory allocation during lexing
  • Fast color calculations and CSS generation

Examples

Check out the examples/ directory for comprehensive usage examples:

  • 00_quickstart.cr - Simple, quick examples to get started
  • 01_basic_usage.cr - Basic syntax highlighting usage
  • 02_html_output.cr - HTML output options and configurations
  • 03_terminal_colors.cr - Terminal/ANSI color output
  • 04_multi_language.cr - Multiple language support
  • 05_custom_formatter.cr - Creating custom formatters
  • 06_css_generation.cr - CSS stylesheet generation
  • 07_file_highlighting.cr - Highlighting source files
  • 08_theme_comparison.cr - Comparing different themes
  • 09_custom_style.cr - Creating custom syntax highlighting styles
  • 10_theme_serialization.cr - Theme import/export examples
  • 14_chroma_xml_themes.cr - Working with Chroma XML stylesheets

Run any example with:

crystal run examples/00_quickstart.cr

Roadmap

Core Architecture

  • Token System
    • Hierarchical token types with parent/child relationships
    • CSS class mapping for web output
    • Token categories (keywords, literals, names, etc.)
  • Lexer System
    • Abstract base lexer interface
    • Regex-based lexer implementation
    • State machine support
    • Iterator-based token streaming
    • Basic state mutations (push/pop)
    • Advanced state mutations (include, combined states)
    • Delegating lexers for embedded languages
    • Lexer composition and chaining
  • Registry Pattern
    • Centralized management of lexers, formatters, and styles
    • Name and alias support
    • Dynamic registration
    • Priority-based lexer selection
    • Content analysis for auto-detection
    • MIME type and filename pattern matching

Language Support

  • Built-in Languages
    • Crystal
    • JSON
    • YAML
    • Plain text
    • Ruby
    • Python
    • JavaScript/TypeScript
    • Go
    • Rust
    • C/C++
    • HTML/CSS
    • Markdown
    • SQL
    • Shell/Bash
  • Language Features
    • Keywords and operators
    • String literals with escape sequences
    • Comments (single and multi-line)
    • Numbers (integers, floats, hex, binary, octal)
    • String interpolation (Crystal)
    • Embedded language support (via DelegatingLexer)
    • Context-sensitive parsing

Output Formatters

  • HTML Formatter
    • Inline styles
    • CSS classes
    • Custom class prefixes
    • Line numbers
    • Line highlighting (specific ranges)
    • Linkable line numbers
    • Table-based line number layout
  • Terminal Formatter
    • ANSI color codes
    • 24-bit true color support
    • Bold, italic, underline styles
  • Plain Text Formatter
    • Strip all formatting
  • JSON Formatter
    • Token data as JSON
  • Custom Formatters
    • Extensible formatter interface
    • Examples (Markdown, BBCode, etc.)

Styling System

  • Style Engine
    • RGB color support
    • Style attributes (bold, italic, underline)
    • Background colors
    • Style builder API
  • Built-in Themes
    • GitHub (light)
    • Monokai (dark)
    • Black & White
  • Theme Features
    • CSS generation
    • Per-token styling
    • Full style inheritance
    • Theme serialization/deserialization
    • TextMate (.tmTheme) import/export
    • Chroma XML import/export
    • Multi-format theme support

Performance & Optimization

  • Memory Efficiency
    • Iterator-based token streaming
    • Lazy regex compilation
    • Safe token iterator adapter (Crystal bug workaround)
  • Performance Features
    • Token coalescing
    • Token splitting utilities
    • Token size limits
    • Streaming optimizations
    • Parallel lexing support

Advanced Features

  • Token Processing
    • Token filtering
    • Token transformation pipelines
    • Token coalescing
    • Line splitting utilities
  • Language Detection
    • Content-based analysis
    • Confidence scoring
    • Multi-lexer fallback
  • Import/Export
    • TextMate (.tmTheme) theme support
    • Chroma XML stylesheet support
    • JSON theme serialization
    • Pygments compatibility
    • TextMate grammar support
    • Chroma XML lexer definitions (see #1)

Developer Experience

  • API Design
    • Simple one-liner usage
    • Progressive complexity
    • Type-safe interfaces
  • Documentation
    • Comprehensive examples
    • API documentation
    • Architecture overview
  • Testing
    • Unit tests for all components
    • Integration tests
    • Performance benchmarks
    • Fuzzing tests

Known Issues & Future Work

  • Crystal Compiler Bug Workarounds
    • Iterator memory corruption workaround (#2)
    • Remove workarounds when Crystal bug #14317 is fixed
  • Performance Improvements
    • Re-enable token coalescing
    • Optimize large file handling
  • Extended Language Support
    • XML lexer definitions support (Tartrazine compatibility)
    • Hybrid approach: hand-written + XML lexers
    • Code generation from XML definitions

Development

Running Tests

# Run tests without coverage
crystal spec

# Run tests with coverage (requires kcov)
./scripts/test.sh with-coverage

# Auto-detect kcov and run with coverage if available
./scripts/test.sh auto

Install kcov for coverage reports:

# Ubuntu/Debian
sudo apt-get install kcov

# macOS
brew install kcov

Development Commands

See CLAUDE.md for the complete list of development commands including:

  • crystal spec - Run all tests
  • crystal run examples/00_quickstart.cr - Run examples
  • crystal build src/obelisk.cr --no-codegen - Type check
  • crystal tool format - Format code

Contributing

  1. Fork it (https://github.com/watzon/obelisk/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Inspired by Chroma for Go
  • Token type hierarchy based on Pygments
  • Built with ❤️ for the Crystal community
Repository

obelisk

Owner
Statistic
  • 1
  • 0
  • 1
  • 0
  • 0
  • about 1 month ago
  • June 30, 2025
License

MIT License

Links
Synced at

Wed, 06 Aug 2025 05:54:11 GMT

Languages