obelisk
Obelisk is a Crystal syntax highlighting library inspired by Chroma. It provides a performant, type-safe, and extensible syntax highlighting solution with support for multiple languages, themes, and output formats.
Features
- Multi-language support: Built-in lexers for Crystal, JSON, YAML, and more
- Multiple output formats: HTML, ANSI terminal colors, plain text, and JSON
- Theme ecosystem: Compatible with TextMate, Sublime Text, VS Code, and Chroma themes
- Advanced lexing: State machines, context-sensitive parsing, and delegating lexers
- Theme serialization: Import/export themes in JSON, TextMate (.tmTheme), and Chroma XML formats
- Extensible architecture: Easy to add new languages, formatters, and themes
- Type-safe design: Leverages Crystal's type system for reliability
- Performance focused: Hand-written lexers with compile-time optimizations
Installation
Add this to your application's shard.yml
:
dependencies:
obelisk:
github: watzon/obelisk
Then run:
shards install
Quick Start
require "obelisk"
# Simple syntax highlighting
code = %q(
def hello(name : String) : String
"Hello, #{name}!"
end
)
# Highlight Crystal code as HTML
html = Obelisk.highlight(code, "crystal", "html", "github")
puts html
# Highlight with ANSI colors for terminal
ansi = Obelisk.highlight(code, "crystal", "terminal", "monokai")
puts ansi
Usage
Basic Highlighting
# Quick highlighting
result = Obelisk.highlight(source_code, language, formatter, style)
# Available options
languages = Obelisk.lexer_names # ["crystal", "json", "yaml", "text"]
formatters = Obelisk.formatter_names # ["html", "html-classes", "terminal", "text", "json"]
styles = Obelisk.style_names # ["github", "monokai", "bw"]
Manual Tokenization
# Get a lexer and tokenize manually
lexer = Obelisk.lexer("crystal")
tokens = lexer.tokenize(source_code) if lexer
# Process tokens
tokens.each do |token|
puts "#{token.type}: #{token.value.inspect}"
end
Custom Formatting
# Create formatters with options
html_formatter = Obelisk::HTMLFormatter.new(
with_classes: true,
class_prefix: "syntax-",
with_line_numbers: true
)
# Get a style
style = Obelisk::Styles.github
# Format manually
output = html_formatter.format(tokens, style)
CSS Generation
# Generate CSS for HTML with classes
formatter = Obelisk::HTMLFormatter.new(with_classes: true)
css = formatter.css(Obelisk::Styles.github)
puts css
Theme Import/Export
# Load themes from different formats
tmtheme_style = Obelisk.load_theme("path/to/theme.tmTheme") # TextMate theme
chroma_style = Obelisk.load_theme("path/to/theme.xml") # Chroma XML theme
json_style = Obelisk.load_theme("path/to/theme.json") # Obelisk JSON theme
# Export themes to different formats
json_output = Obelisk.export_theme_json(style, pretty: true)
tmtheme_output = Obelisk.export_theme_tmtheme(style)
chroma_output = Obelisk.export_theme_chroma(style)
# Save themes to files (format auto-detected from extension)
Obelisk.save_theme(style, "exported.json") # JSON format
Obelisk.save_theme(style, "exported.tmtheme") # TextMate format
Obelisk.save_theme(style, "exported.xml") # Chroma XML format
Supported Languages
Core Languages
- Crystal (
.cr
) - Full syntax support including string interpolation, annotations, and Crystal-specific types - JSON (
.json
) - Complete JSON syntax with proper escape handling - YAML (
.yaml
,.yml
) - YAML syntax including documents, anchors, and tags - Plain Text - Fallback for unsupported file types
Additional Languages
More languages are being actively developed. See the Roadmap for upcoming language support.
Available Themes
- GitHub - Light theme matching GitHub's syntax highlighting
- Monokai - Dark theme with vibrant colors
- Black & White - Simple monochrome theme
Supported Theme Formats
- JSON - Obelisk's native theme format
- TextMate (.tmTheme) - Compatible with Sublime Text, VS Code, and other editors
- Chroma XML - Compatible with Go's Chroma syntax highlighter
Output Formats
- HTML - Standard HTML with inline styles or CSS classes
- Terminal - ANSI escape codes for colored terminal output
- Plain Text - Strips all formatting
- JSON - Structured token data for analysis
Architecture
Obelisk follows a clean, modular architecture:
- Tokens: Hierarchical token types with CSS class mapping
- Lexers: Regex-based lexers with state machines for complex languages
- Formatters: Pluggable output formatters for different targets
- Styles: Theme system with inheritance and customizable colors
- Registry: Central management of lexers, formatters, and styles
Token Types
Token types are organized hierarchically for style inheritance:
# Root categories
TokenType::Keyword
TokenType::Name
TokenType::Literal
TokenType::Comment
# Specific types inherit from parents
TokenType::KeywordDeclaration # inherits from Keyword
TokenType::NameFunction # inherits from Name
TokenType::LiteralString # inherits from Literal
Creating Custom Lexers
class MyLanguageLexer < Obelisk::RegexLexer
def config : Obelisk::LexerConfig
Obelisk::LexerConfig.new(
name: "mylang",
aliases: ["my", "mylang"],
filenames: ["*.my"],
mime_types: ["text/x-mylang"]
)
end
def rules : Hash(String, Array(Obelisk::LexerRule))
{
"root" => [
Obelisk::LexerRule.new(/\bdef\b/, Obelisk::TokenType::Keyword),
Obelisk::LexerRule.new(/[a-zA-Z_]\w*/, Obelisk::TokenType::Name),
# ... more rules
]
}
end
end
# Register the lexer
Obelisk::Registry.lexers.register(MyLanguageLexer.new)
Creating Custom Styles
custom_style = Obelisk::Style.new("custom", Obelisk::Color::WHITE)
custom_style.set(Obelisk::TokenType::Keyword,
Obelisk::StyleEntry.new(
color: Obelisk::Color.from_hex("#ff0000"),
bold: Obelisk::Trilean::Yes
)
)
Obelisk::Registry.styles.register(custom_style)
Performance
Obelisk is designed for speed:
- Lazy compilation of regex rules
- Efficient token streaming with iterators
- Minimal memory allocation during lexing
- Fast color calculations and CSS generation
Examples
Check out the examples/
directory for comprehensive usage examples:
00_quickstart.cr
- Simple, quick examples to get started01_basic_usage.cr
- Basic syntax highlighting usage02_html_output.cr
- HTML output options and configurations03_terminal_colors.cr
- Terminal/ANSI color output04_multi_language.cr
- Multiple language support05_custom_formatter.cr
- Creating custom formatters06_css_generation.cr
- CSS stylesheet generation07_file_highlighting.cr
- Highlighting source files08_theme_comparison.cr
- Comparing different themes09_custom_style.cr
- Creating custom syntax highlighting styles10_theme_serialization.cr
- Theme import/export examples14_chroma_xml_themes.cr
- Working with Chroma XML stylesheets
Run any example with:
crystal run examples/00_quickstart.cr
Roadmap
Core Architecture
- Token System
- Hierarchical token types with parent/child relationships
- CSS class mapping for web output
- Token categories (keywords, literals, names, etc.)
- Lexer System
- Abstract base lexer interface
- Regex-based lexer implementation
- State machine support
- Iterator-based token streaming
- Basic state mutations (push/pop)
- Advanced state mutations (include, combined states)
- Delegating lexers for embedded languages
- Lexer composition and chaining
- Registry Pattern
- Centralized management of lexers, formatters, and styles
- Name and alias support
- Dynamic registration
- Priority-based lexer selection
- Content analysis for auto-detection
- MIME type and filename pattern matching
Language Support
- Built-in Languages
- Crystal
- JSON
- YAML
- Plain text
- Ruby
- Python
- JavaScript/TypeScript
- Go
- Rust
- C/C++
- HTML/CSS
- Markdown
- SQL
- Shell/Bash
- Language Features
- Keywords and operators
- String literals with escape sequences
- Comments (single and multi-line)
- Numbers (integers, floats, hex, binary, octal)
- String interpolation (Crystal)
- Embedded language support (via DelegatingLexer)
- Context-sensitive parsing
Output Formatters
- HTML Formatter
- Inline styles
- CSS classes
- Custom class prefixes
- Line numbers
- Line highlighting (specific ranges)
- Linkable line numbers
- Table-based line number layout
- Terminal Formatter
- ANSI color codes
- 24-bit true color support
- Bold, italic, underline styles
- Plain Text Formatter
- Strip all formatting
- JSON Formatter
- Token data as JSON
- Custom Formatters
- Extensible formatter interface
- Examples (Markdown, BBCode, etc.)
Styling System
- Style Engine
- RGB color support
- Style attributes (bold, italic, underline)
- Background colors
- Style builder API
- Built-in Themes
- GitHub (light)
- Monokai (dark)
- Black & White
- Theme Features
- CSS generation
- Per-token styling
- Full style inheritance
- Theme serialization/deserialization
- TextMate (.tmTheme) import/export
- Chroma XML import/export
- Multi-format theme support
Performance & Optimization
- Memory Efficiency
- Iterator-based token streaming
- Lazy regex compilation
- Safe token iterator adapter (Crystal bug workaround)
- Performance Features
- Token coalescing
- Token splitting utilities
- Token size limits
- Streaming optimizations
- Parallel lexing support
Advanced Features
- Token Processing
- Token filtering
- Token transformation pipelines
- Token coalescing
- Line splitting utilities
- Language Detection
- Content-based analysis
- Confidence scoring
- Multi-lexer fallback
- Import/Export
- TextMate (.tmTheme) theme support
- Chroma XML stylesheet support
- JSON theme serialization
- Pygments compatibility
- TextMate grammar support
- Chroma XML lexer definitions (see #1)
Developer Experience
- API Design
- Simple one-liner usage
- Progressive complexity
- Type-safe interfaces
- Documentation
- Comprehensive examples
- API documentation
- Architecture overview
- Testing
- Unit tests for all components
- Integration tests
- Performance benchmarks
- Fuzzing tests
Known Issues & Future Work
- Crystal Compiler Bug Workarounds
- Iterator memory corruption workaround (#2)
- Remove workarounds when Crystal bug #14317 is fixed
- Performance Improvements
- Re-enable token coalescing
- Optimize large file handling
- Extended Language Support
- XML lexer definitions support (Tartrazine compatibility)
- Hybrid approach: hand-written + XML lexers
- Code generation from XML definitions
Development
Running Tests
# Run tests without coverage
crystal spec
# Run tests with coverage (requires kcov)
./scripts/test.sh with-coverage
# Auto-detect kcov and run with coverage if available
./scripts/test.sh auto
Install kcov for coverage reports:
# Ubuntu/Debian
sudo apt-get install kcov
# macOS
brew install kcov
Development Commands
See CLAUDE.md for the complete list of development commands including:
crystal spec
- Run all testscrystal run examples/00_quickstart.cr
- Run examplescrystal build src/obelisk.cr --no-codegen
- Type checkcrystal tool format
- Format code
Contributing
- Fork it (https://github.com/watzon/obelisk/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
obelisk
- 1
- 0
- 1
- 0
- 0
- about 1 month ago
- June 30, 2025
MIT License
Wed, 06 Aug 2025 05:54:11 GMT