| Title: | Parser Combinator for R |
|---|---|
| Description: | Parser generator for R using combinatory parsers. It is inspired by combinatory parsers developed in Haskell. |
| Authors: | Chapman Siu |
| Maintainer: | Chapman Siu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-06-10 09:01:46 UTC |
| Source: | https://github.com/noraincheck/ramble |
%alt% is the infix notation for the alt function.%alt% is the infix notation for the alt function.
p1 %alt% p2p1 %alt% p2
p1 |
the first parser |
p2 |
the second parser |
Returns the first parser if it suceeds otherwise the second parser
(item() %alt% succeed("2")) ("abcdef")(item() %alt% succeed("2")) ("abcdef")
%then% is the infix operator for the then combinator.%then% is the infix operator for the then combinator.
p1 %then% p2p1 %then% p2
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1 and p2 would if placed in succession.
(item() %then% succeed("123")) ("abc")(item() %then% succeed("123")) ("abc")
%thentree% is the infix operator for the then combinator, and it is
the preferred way to use the thentree operator.%thentree% is the infix operator for the then combinator, and it is
the preferred way to use the thentree operator.
p1 %thentree% p2p1 %thentree% p2
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1 and p2 would if placed in
succession.
(item() %thentree% succeed("123")) ("abc")(item() %thentree% succeed("123")) ("abc")
%using% is the infix operator for using%using% is the infix operator for using
p %using% fp %using% f
p |
is the parser to be applied |
f |
is the function to be applied to each result of |
(item() %using% as.numeric) ("1abc")(item() %using% as.numeric) ("1abc")
Alpha checks for single alphabet character
Alpha(...)Alpha(...)
... |
additional arguments for the primitives to be parsed |
Digit, Lower, Upper,
AlphaNum, SpaceCheck,
String, ident, nat,
space, token, identifier,
natural, symbol
Alpha()("abc")Alpha()("abc")
AlphaNum checks for a single alphanumeric character
AlphaNum(...)AlphaNum(...)
... |
additional arguments for the primitives to be parsed |
Digit, Lower, Upper,
Alpha, SpaceCheck,
String, ident, nat,
space, token, identifier,
natural, symbol
AlphaNum()("123") AlphaNum()("abc123")AlphaNum()("123") AlphaNum()("abc123")
alt combinator is similar to alternation in BNF. the parser
(alt(p1, p2)) recognises anything that p1 or p2 would.
The approach taken in this parser follows (Fairbairn86), in which either is
interpretted in a sequential (or exclusive) manner, returning the result of
the first parser to succeed, and failure if neither does.%alt% is the infix notation for the alt function, and it is the
preferred way to use the alt operator.
alt(p1, p2)alt(p1, p2)
p1 |
the first parser |
p2 |
the second parser |
Returns the first parser if it suceeds otherwise the second parser
(item() %alt% succeed("2")) ("abcdef")(item() %alt% succeed("2")) ("abcdef")
Digit checks for single digit
Digit(...)Digit(...)
... |
additional arguments for the primitives to be parsed |
Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
space, token, identifier,
natural, symbol
Digit()("123")Digit()("123")
ident is a parser which matches zero or more alphanumeric
characters.ident is a parser which matches zero or more alphanumeric
characters.
ident()ident()
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, nat,
space, token, identifier,
natural, symbol
ident() ("variable1 = 123")ident() ("variable1 = 123")
item is a parser that consumes the first character of the string and
returns the rest. If it cannot consume a single character from the string, it
will emit the empty list, indicating the parser has failed.item is a parser that consumes the first character of the string and
returns the rest. If it cannot consume a single character from the string, it
will emit the empty list, indicating the parser has failed.
item(...)item(...)
... |
additional arguments for the parser |
item() ("abc") item() ("")item() ("abc") item() ("")
literal is a parser for single symbols. It will attempt to match the
single symbol with the first character in the string.literal is a parser for single symbols. It will attempt to match the
single symbol with the first character in the string.
literal(char)literal(char)
char |
is the character to be matched |
literal("a") ("abc")literal("a") ("abc")
Lower checks for single lower case character
Lower(...)Lower(...)
... |
additional arguments for the primitives to be parsed |
Digit, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
space, token, identifier,
natural, symbol
Lower() ("abc")Lower() ("abc")
many matches 0 or more of pattern p. In BNF notation,
repetition occurs often enough to merit its own abbreviation. When zero or
more repetitions of a phrase p are admissible, we simply write
p*. The many combinator corresponds directly to this operator,
and is defined in much the same way.This implementation of many differs from (Hutton92) due to the nature
of R's data structures. Since R does not support the concept of a list of
tuples, we must revert to using a list rather than a vector, since all values
in an R vector must be the same datatype.
many(p)many(p)
p |
is the parser to match 0 or more times. |
Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} many(Digit()) ("123abc") many(Digit()) ("abc")Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} many(Digit()) ("123abc") many(Digit()) ("abc")
maybe matches 0 or 1 of pattern p. In EBNF notation, this
corresponds to a question mark ('?').maybe matches 0 or 1 of pattern p. In EBNF notation, this
corresponds to a question mark ('?').
maybe(p)maybe(p)
p |
is the parser to be matched 0 or 1 times. |
maybe(Digit())("123abc") maybe(Digit())("abc123")maybe(Digit())("123abc") maybe(Digit())("abc123")
nat is a parser which matches one or more numeric characters.nat is a parser which matches one or more numeric characters.
nat()nat()
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident,
space, token, identifier,
natural, symbol
nat() ("123 + 456")nat() ("123 + 456")
natural creates a token parser for natural numbersnatural creates a token parser for natural numbers
natural(...)natural(...)
... |
additional arguments for the parser |
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
space, token, identifier,
symbol
Ramble allows you to write parsers in a functional manner, inspired by Haskell's Parsec library.
satisfy is a function which allows us to make parsers that recognise single symbols.satisfy is a function which allows us to make parsers that recognise single symbols.
satisfy(p)satisfy(p)
p |
is the predicate to determine if the arbitrary symbol is a member. |
some matches 1 or more of pattern p. in BNF notation, repetition occurs often enough to merit its own abbreviation. When zero or
more repetitions of a phrase p are admissible, we simply write
p+. The some combinator corresponds directly to this operator,
and is defined in much the same way.some matches 1 or more of pattern p. in BNF notation, repetition occurs often enough to merit its own abbreviation. When zero or
more repetitions of a phrase p are admissible, we simply write
p+. The some combinator corresponds directly to this operator,
and is defined in much the same way.
some(p)some(p)
p |
is the parser to match 1 or more times. |
Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} some(Digit()) ("123abc")Digit <- function(...) {satisfy(function(x) {return(grepl("[0-9]", x))})} some(Digit()) ("123abc")
space matches zero or more space characters.space matches zero or more space characters.
space()space()
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
token, identifier,
natural, symbol
space() (" abc")space() (" abc")
SpaceCheck checks for a single space character
SpaceCheck(...)SpaceCheck(...)
... |
additional arguments for the primitives to be parsed |
Digit, Lower, Upper,
Alpha, AlphaNum,
String, ident, nat,
space, token, identifier,
natural, symbol
SpaceCheck()(" 123")SpaceCheck()(" 123")
String is a combinator which allows us to build parsers which
recognise strings of symbols, rather than just single symbolsString is a combinator which allows us to build parsers which
recognise strings of symbols, rather than just single symbols
String(string)String(string)
string |
is the string to be matched |
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
ident, nat,
space, token, identifier,
natural, symbol
String("123")("123 abc")String("123")("123 abc")
succeed is based on the empty string symbol in the BNF notation The
succeed parser always succeeds, without actually consuming any input
string. Since the outcome of succeed does not depend on its input, its result
value must be pre-detemined, so it is included as an extra parameter.succeed is based on the empty string symbol in the BNF notation The
succeed parser always succeeds, without actually consuming any input
string. Since the outcome of succeed does not depend on its input, its result
value must be pre-detemined, so it is included as an extra parameter.
succeed(string)succeed(string)
string |
the result value of succeed parser |
succeed("1") ("abc")succeed("1") ("abc")
symbol creates a token for a symbolsymbol creates a token for a symbol
symbol(xs)symbol(xs)
xs |
takes in a string to create a token |
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
space, token, identifier,
natural
symbol("[") (" [123]")symbol("[") (" [123]")
then combinator corresponds to sequencing in BNF. The parser
(then(p1, p2)) recognises anything that p1 and p2 would
if placed in succession.%then% is the infix operator for the then combinator, and it is the
preferred way to use the then operator.
then(p1, p2)then(p1, p2)
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1 and p2 would if placed in
succession.
(item() %then% succeed("123")) ("abc")(item() %then% succeed("123")) ("abc")
thentree keeps the full tree representation of the results of parsing.
Otherwise, it is identical to then.thentree keeps the full tree representation of the results of parsing.
Otherwise, it is identical to then.
thentree(p1, p2)thentree(p1, p2)
p1 |
the first parser |
p2 |
the second parser |
recognises anything that p1 and p2 would if placed in
succession.
(item() %thentree% succeed("123")) ("abc")(item() %thentree% succeed("123")) ("abc")
token is a new primitive that ignores any space before and after
applying a parser to a token.token is a new primitive that ignores any space before and after
applying a parser to a token.
token(p)token(p)
p |
is the parser to have spaces stripped. |
Digit, Lower, Upper,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
space, identifier,
natural, symbol
token(ident()) (" variable1 ")token(ident()) (" variable1 ")
Unlist is the same as unlist, but doesn't recurse all the way to preserve the type. This function is not well optimised.
Unlist(obj)Unlist(obj)
obj |
is a list to be flatten |
Upper checks for a single upper case character
Upper(...)Upper(...)
... |
additional arguments for the primitives to be parsed |
Digit, Lower,
Alpha, AlphaNum, SpaceCheck,
String, ident, nat,
space, token, identifier,
natural, symbol
Upper()("Abc")Upper()("Abc")
using combinator allows us to manipulate results from a parser, for
example building a parse tree. The parser (p %using% f) has the same
behaviour as the parser p, except that the function f is
applied to each of its result values.%using% is the infix operator for using, and it is the
preferred way to use the using operator.
using(p, f)using(p, f)
p |
is the parser to be applied |
f |
is the function to be applied to each result of |
The parser (p %using% f) has the same behaviour as the
parser p, except that the function f is applied to each of
its result values.
(item() %using% as.numeric) ("1abc")(item() %using% as.numeric) ("1abc")