lingo

parser generator

Lingo Build Status

A parser generator for Crystal, inspired by Parslet.

Lingo provides text processing by:

  • parsing the string into a tree of nodes
  • providing a visitor to allow you to work from the tree

Installation

Add this to your application's shard.yml:

dependencies:
  lingo:
    github: rmosolgo/lingo

Usage

Let's write a parser for highway names. The result will be a method for turning strings into useful objects:

def parse_road(input_str)
  ast = RoadParser.new.parse(input_str)
  visitor = RoadVisitor.new
  visitor.visit(ast)
  visitor.road
end

road = parse_road("I-5N")
# <Road @interstate=true, @number=5, @direction="N">

(See more examples in /examples.)

In the USA, we write highway names like this:

50    # Route 50
I-64  # Interstate 64
I-95N # Interstate 95, Northbound
29B   # Business Route 29

Parser

The general structure is {interstate?}{number}{direction?}{business?}. Let's express that with Lingo rules:

class RoadParser < Lingo::Parser
  # Match a string:
  rule(:interstate) { str("I-") }
  rule(:business) { str("B") }

  # Match a regex:
  rule(:digit) { match(/\d/) }
  # Express repetition with `.repeat`
  rule(:number) { digit.repeat }

  rule(:north) { str("N") }
  rule(:south) { str("S") }
  rule(:east) { str("E") }
  rule(:west) { str("W") }
  # Compose rules by name
  # Express alternation with |
  rule(:direction) { north | south | east | west }

  # Express sequence with >>
  # Express optionality with `.maybe`
  # Name matched strings with `.named`
  rule(:road_name) {
    interstate.named(:interstate).maybe >>
      number.named(:number) >>
      direction.named(:direction).maybe >>
      business.named(:business).maybe
  }
  # You MUST name a starting rule:
  root(:road_name)
end

Applying the Parser

An instance of a Lingo::Parser subclass has a .parse method which returns a tree of Lingo::Nodes.

RoadParser.new.parse("250B") # => <Lingo::Node ... >

It uses the rule named by root.

Making Rules

These methods help you create rules:

  • str("string") matches string exactly
  • match(/[abc]/) matches the regex exactly
  • a | b matches a or b
  • a >> b matches a followed by b
  • a.maybe matches a or nothing
  • a.repeat matches one-or-more as
  • a.repeat(0) matches zero-or-more as
  • a.absent matches not-a
  • a.named(:a) names the result :a for handling by a visitor

Visitor

After parsing, you get a tree of Lingo::Nodes. To turn that into an application object, write a visitor.

The visitor may define enter and exit hooks for nodes named with .named in the Parser. It may set up some state during #initialize, then access itself from the visitor variable during hooks.

class RoadVisitor < Lingo::Visitor
  # Set up an accumulator
  getter :road
  def initialize
    @road = Road.new
  end

  # When you find a named node, you can do something with it.
  # You can access the current visitor as `visitor`
  enter(:interstate) {
    # since we found this node, this is a business route
    visitor.road.interstate = true
  }

  # You can access the named Lingo::Node as `node`.
  # Get the matched string with `.full_value`
  enter(:number) {
    visitor.road.number = node.full_value.to_i
  }

  enter(:direction) {
    visitor.road.direction = node.full_value
  }

  enter(:business) {
    visitor.road.business = true
  }
end

Visitor Hooks

During the depth-first visitation of the resulting tree of Lingo::Nodes, you can handle visits to nodes named with .named:

  • enter(:match) is called when entering a node named :match
  • exit(:match) is called when exiting a node named :match

Within the hooks, you can access two magic variables:

  • visitor is the Visitor itself
  • node is the matched Lingo::Node which exposes:
    • #full_value: the full matched string
    • #line, #column: position information for this match

About this Project

Goals

  • Low barrier to entry: easy-to-learn API, short zero-to-working time
  • Easy-to-read code, therefore easy-to-modify
  • Useful errors (not accomplished)

Non-goals

  • Blazing-fast performance
  • Theoretical correctness

TODO

  • Add some kind of debug output

How slow is it?

Let's compare the built-in JSON parser to a Lingo JSON parser:

./lingo/benchmark $ crystal run --release slow_json.cr
Stdlib JSON 126.45k (± 1.55%)        fastest
Lingo::JSON 660.18  (± 1.28%) 191.54× slower

Ouch, that's a lot slower.

But, it's on par with Ruby and parslet, the inspiration for this project:

$ ruby parslet_json_benchmark.rb
Calculating -------------------------------------
       Parslet JSON      4.000  i/100ms
       Built-in JSON     3.657k i/100ms
-------------------------------------------------
       Parslet JSON      45.788  (± 4.4%) i/s -    232.000
       Built-in JSON     38.285k (± 5.3%) i/s -    193.821k

Comparison:
       Built-in JSON:    38285.2 i/s
       Parslet JSON :       45.8 i/s - 836.13x slower

Both Parslet and Lingo are slower than handwritten parsers. But, they're easier to write!

Development

  • Run the tests with crystal spec
  • Install Ruby & guard, then start a watcher with guard
Repository

lingo

Owner
Statistic
  • 28
  • 7
  • 0
  • 3
  • 0
  • almost 3 years ago
  • November 15, 2015
License

MIT License

Links
Synced at

Fri, 22 Nov 2024 05:31:00 GMT

Languages