crystal-parsec
crystal-parsec
This is a parsing library for the Crystal language inspired by the Haskell library Parsec. It allows you to build parsers by combining smaller parsers.
It is based on the functional programming library crz which adds ADTs and monadic do notation to the language.
Quickstart
require "parsec"
include Parsec
Basics
A Parser(T)
is a parser that parses a value of type T.
p : Parser(T)
You can call .parse method on a parser with an input string to parse. It returns either a value of type T on successful parsing or a Parsec::ParseError
with an error message (return type T | ParseError
.
p.parse "some input string"
ParseError
has a .message field that gives a human readable error message.
Basic parsers
The simplest parser is a char parser that parses and returns a single character. It's type is Parser(Char)
p = char 'c'
p.parse("c") # => 'c'
p.parse("d").message # => "Expected character 'c', found 'd'"
string
parser parses a fixed string.
string("asdf").parse("asdf") # => "asdf"
string("asdf").parse("x") # => ParseError("Expected character 'a', found 'x'")
one_of
takes a string and returns a parser that matches one of the characters in the given string.
one_of("abcd").parse("a") # => 'a'
one_of("abcd").parse("c") # => 'c'
one_of("abcd").parse("2") # => ParseError
none_of
takes a string and returns a parser that matches all characters except the ones in the given string.
none_of(" \n\t").parse("a") # => 'a'
Parser.of
takes a value and returns a parser that consumes no input, and returns the given value unconditionally. What's the use of this parser? It's useful for sequencing as you'll see later.
Parser.of(2).parse("asdf") # => 2
Parser.of("asdf").parse("") # => "asdf"
fail
takes an error message and returns a parser that unconditionally fails using the given message. This is useful for error handling.
Combining parsers
Or combinator
The |
operator on parsers returns a parser that matches both the parser on it's LHS and RHS. It first tries the LHS, if it fails, then tries RHS.
digit = one_of "1234567890"
alphabet = one_of "qwertyuiopasdfghjklzxcvbnm"
alphanum = digit | alphabet
alphanum.parse("a") # => 'a'
alphanum.parse("1") # => '1'
Both sides of the |
operator should be of the same type. You cannot, for example combine a Parser(Char)
and a Parser(String)
.
many
It takes a parser and returns a parser that parses 0 or more instances of that parser. It's type is Parser(A) -> Parser(Array(A))
many(digit).parse("") # => []
many(digit).parse("1") # => ['1']
many(digit).parse("1234") # => ['1', '2', '3', '4']
many_1
It is like many, but it needs atleast one instance. It won't return an empty array.
sep_by
It takes two parsers. It parses an array of first parser, seperated by the second parser.
sep_by(digit, char(','))
.parse("1,2,3,4") # => ['1', '2', '3', '4']
Transforming the result of parsers
The .map method on parsers is used to transform the result of a given parser using a given block. It takes a block and returns a parser that parses using self
and passes the result through the block.
# take a char array and concatenate it's elements
# into a string
def concatenate(arr)
result = ""
arr.each do |char|
result += char.to_s
end
result
end
puts many_1(digit)
.map {|arr| concatenate(arr) } # concatenate
.map {|x| x.to_i } # convert to int
.parse("233") # => 233
Sequencing parsers
You can sequence multiple parsers using the bind method. The type signature of bind is (A -> Parser(B)) : Parser(B)
where Parser(A)
is the type of the parser it is called on. It takes the result of the parser it is called on, passes the result to the supplied block and returns the result of the block. It is like map but instead of taking a block that returns a value, it takes a block that returns another parser. This is used when you want to parse using multiple parsers in sequence possibly using the result of each step for the next.
alphabets = many_1(alphabet).map {|arr| concatenate(arr) }
digits = many_1(digit).map {|arr| concatenate(arr) }
.map {|x| x.to_i }
parser = alphabets.bind do |name|
digits.bind do |number|
Parser.of({name, number})
end
end
parser.parse("asdf23") # => {"asdf", 23}
When you want to sequence a lot of parsers, nested binds can become tedious and unreadable. To solve this problem, you can flatten out bind sequences using the crz macro mdo
. Using the macro, the previous parser would look like this
mdo({
name <= alphabets,
number <= digits,
Parser.of({name, number})
})
This is much more readable in sequences of multiple parsers. You can use regular assignments in mdo blocks too.
mdo({
name <= alphabets,
num_arr <= many_1(digit),
concatenated = concatenate(num_arr),
number = concatenated.to_i,
Parser.of({name, number})
}).parse("asdf123") # => {"asdf", 123}
Use <=
when the RHS is a parser and =
when it is a normal value.
Always make sure that the last line in an mdo block is wrapped in a parser using Parser.of.
Mutual recursion
If you have multiple parsers that depend on each other, you can wrap them in a block so that they can be referred to before definition.
def value_p
number_p | string_p | array_p
end
def array_p
mdo({
_ <= char('['),
arr <= sep_by(value_p, char(',')),
_ <= char(']')
Parser.of(arr)
})
end
For recursive and mutually recursive parsers, you may need to cast parsers to their type because type inference may not be able to infer their types.
For example, this is directly from the JSON example in the examples directory.
def json_p : Parser(JSON)
array_p | bool_p | null_p | jstring_p | number_p | object_p
end
def array_p
mdo({
_ <= char('['),
arr <= sep_by(json_p.as Parser(JSON), comma),
_ <= char(']'),
Parser.of(JArray.new(arr))
}).map {|a| a.as JSON }
end
Notice that even though return type of json_p is annotated, it still needs to be cast to Parser(JSON) when being used in array_p using .as method.
Other combinators
The >>
operator is used to sequence two parsers and discard the result of the left parser, returning the result of the right parser.
p = string("asdf") >> string("abcd")
p.parse("asdfabcd") # => "abcd"
The <<
operator sequences parsers but returns the result of the first parser, ignoring the result of second parser.
p = string("asdf") << string("abcd")
p.parse("asdfabcd") # => "asdf"
Custom parsers
You can create custom parsers by calling Parser.new
with block argument of type ParseState -> ParseResult(A)
where A is the result type of your parser. ParseState has two members, .input
and .offset
. input
is the input string and offset
is the current index into the input string. ParseResult is an abstract base class with two constructors, ParseState::Error(A).new "Error message"
, indicating parse error, and ParseState::Success.new(result, new_state)
indicating success. To create a new ParseState
from an existing state, you can call the .advance
method with an integer argument indicating the number of characters you want to advance by. If argument is not given, it advances by one character. advance
For example, here's the implementation of the char
function.
def char(c : Char) : Parser(Char)
Parser.new do |state|
if state.offset >= state.input.size
ParseResult::Error(Char).new "Expected '#{c}' found end of string"
elsif state.input[state.offset] == c
ParseResult::Success.new c, state.advance
else
ParseResult::Error(Char).new "Expected '#{c}' found '#{state.input[state.offset]}'"
end
end
end
Check out the json example for a partial JSON parser example.
crystal-parsec
- 0
- 0
- 0
- 0
- 0
- about 6 years ago
- September 13, 2018
MIT License
Thu, 21 Nov 2024 19:56:40 GMT