Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
In last weekās article, we introduced the Applicative parsing library. We learned about the RE type and the basic combinators like sym and string. We saw how we could combine those together with applicative functions like many and <*> to parse strings into data structures. This week, weāll put these pieces together in an actual parser for our Gherkin syntax. To follow along with the code examples, check out Parser.hs on the Github repository.
Starting next week, weāll explore some other parsing libraries, starting with Attoparsec. For a little more information about those and many other libraries, download our Production Checklist! It summarizes many libraries on topics from databases to WebĀ APIs.
If youāve never written Haskell at all, get started! Download our free Beginners Checklist!.
Value Parser
In keeping with our approach from the last article, weāre going to start with smaller elements of our syntax. Then we can use these to build larger ones with ease. To that end, letās build a parser for our Value type, the most basic data structure in our syntax. Letās recall what that looksĀ like:
data Value =ValueNull |ValueBool Bool |ValueString String |ValueNumber Scientific
Since we have different constructors, weāll make a parser for each one. Then we can combine them with alternative syntax:
valueParser :: RE Char ValuevalueParser = nullParser <|> boolParser <|> numberParser <|> stringParser
Now our parsers for the null values and boolean values are easy. For each of them, weāll give a few different options about what strings we can use to represent those elements. Then, as with the larger parser, weāll combine them withĀ <|>.
nullParser :: RE Char ValuenullParser = (string ānullā <|> string āNULLā <|> string āNullā) *> pure ValueNull
boolParser :: RE Char ValueboolParser = trueParser *> pure (ValueBool True) <|> falseParser *> pure (ValueBool False) where trueParser = string āTrueā <|> string ātrueā <|> string āTRUEā falseParser = string āFalseā <|> string āfalseā <|> string āFALSEā
Notice in both these cases we discard the actual string with *> and then return our constructor. We have to wrap the desired result withĀ pure.
Number and StringĀ Values
Numbers and strings are a little more complicated since we canāt rely on hard-coded formats. In the case of numbers, weāll account for integers, decimals, and negative numbers. Weāll ignore scientific notation for now. An integer is simple to parse, since weāll have many characters that are all numbers. We use some instead of many to enforce that there is at leastĀ one:
numberParser :: RE Char ValuenumberPaser = ā¦ where integerParser = some (psym isNumber)
A decimal parser will read some numbers, then a decimal point, and then more numbers. Weāll insist there is at least one number after the decimalĀ point.
numberParser :: RE Char ValuenumberPaser = ā¦ where integerParser = some (psym isNumber) decimalParser = many (psym isNumber) <*> sym ā.ā <*> some (psym isNumber)
Finally, for negative numbers, weāll read a negative symbol and then one of the otherĀ parsers:
numberParser :: RE Char ValuenumberPaser = ā¦ where integerParser = some (psym isNumber) decimalParser = many (psym isNumber) <*> sym ā.ā <*> some (psym isNumber) negativeParser = sym ā-ā <*> (decimalParser <|> integerParser)
However, we canāt combine these parsers as is! Right now, they all return different results! The integer parser returns a single string. The decimal parser returns two strings and the decimal character, and so on. In general, weāll want to combine each parserās results into a single string and then pass them to the read function. This requires mapping a couple functions over our last twoĀ parsers:
numberParser :: RE Char ValuenumberPaser = ā¦ where integerParser = some (psym isNumber) decimalParser = combineDecimal <$> many (psym isNumber) <*> sym ā.ā <*> some (psym isNumber) negativeParser = (:) <$> sym ā-ā <*> (decimalParser <|> integerParser)
combineDecimal :: String -> Char -> String -> String combineDecimal base point decimal = base ++ (point : decimal)
Now all our number parsers return strings, so we can safely combine them. Weāll map the ValueNumber constructor over the value we read from theĀ string.
numberParser :: RE Char ValuenumberPaser = (ValueNumber . read) <$> (negativeParser <|> decimalParser <|> integerParser) where ...
Note that order matters! If we put the integer parser first, weāll be in trouble! If we encounter a decimal, the integer parser will greedily succeed and parse everything before the decimal point. Weāll either lose all the information after the decimal, or worse, have a parseĀ failure.
The last thing we need to do is read a string. We need to read everything in the example cell until we hit a vertical bar, but then ignore any whitespace. Luckily, we have the right combinator for this, and weāve even written a trim functionĀ already!
stringParser :: RE Char ValuestringParser = (ValueString . trim) <$> readUntilBar
And now our valueParser will work as expected!
Building an ExampleĀ Table
Now that we can parse individual values, letās figure out how to parse the full example table. We can use our individual value parser to parse a whole line of values! The first step is to read the vertical bar at the start of theĀ line.
exampleLineParser :: RE Char [Value]exampleLineParser = sym ā|ā *> ...
Next, weāll build a parser for each cell. It will read the whitespace, then the value, and then read up through the nextĀ bar.
exampleLineParser :: RE Char [Value]exampleLineParser = sym ā|ā *> ... where cellParser = many isNonNewlineSpace *> valueParser <* readThroughBar
isNonNewlineSpace :: RE Char CharisNonNewlineSpace = psym (\c -> isSpace c && c /= ā\nā)
Now we read many of these and finish by reading theĀ newline:
exampleLineParser :: RE Char [Value]exampleLineParser = sym ā|ā *> many cellParser <* readThroughEndOfLine where cellParser = many isNonNewlineSpace *> valueParser <* readThroughBar
Now, we need a similar parser that reads the title column of our examples. This will have the same structure as the value cells, only it will read normal alphabetic strings instead ofĀ values.
exampleColumnTitleLineParser :: RE Char [String]exampleColumnTitleLineParser = sym ā|ā *> many cellParser <* readThroughEndOfLine where cellParser = many isNonNewlineSpace *> many (psym isAlpha) <* readThroughBar
Now we can start building the full example parser. Weāll want to read the string, the column titles, and then the valueĀ lines.
exampleTableParser :: RE Char ExampleTableexampleTableParser = (string āExamples:ā *> readThroughEndOfLine) *> exampleColumnTitleLineParser <*> many exampleLineParser
Weāre not quite done yet. Weāll need to apply a function over these results that will produce the final ExampleTable. And the trick is that we want to map up the example keys with their values. We can accomplish this with a simple function. It will return zip the keys over each value list usingĀ map:
exampleTableParser :: RE Char ExampleTableexampleTableParser = buildExampleTable <$> (string āExamples:ā *> readThroughEndOfLine) *> exampleColumnTitleLineParser <*> many exampleLineParser where buildExampleTable :: [String] -> [[Value]] -> ExampleTable buildExampleTable keys valueLists = ExampleTable keys (map (zip keys) valueLists)
Statements
Now we that we can parse the examples for a given scenario, we need to parse the Gherkin statements. To start with, letās make a generic parser that takes the keyword as an argument. Then our full parser will try each of the different statement keywords:
parseStatementLine :: String -> RE Char StatementparseStatementLine signal = ā¦
parseStatement :: RE Char StatementparseStatement = parseStatementLine āGivenā <|> parseStatementLine āWhenā <|> parseStatementLine āThenā <|> parseStatementLine āAndā
Now weāll get the signal word out of the way and parse the statement lineĀ itself.
parseStatementLine :: String -> RE Char StatementparseStatementLine signal = string signal *> sym ' ' *> ...
Parsing the statement is tricky. We want to parse the keys inside brackets and separate them as keys. But we also want them as part of the statementās string. To that end, weāll make two helper parsers. First, nonBrackets will parse everything in a string up through a bracket (or a newline).
nonBrackets :: RE Char StringnonBrackets = many (psym (\c -> c /= ā\nā && c /= ā<ā))
Weāll also want a parser that parses the brackets and returns the keywordĀ inside:
insideBrackets :: RE Char StringinsideBrackets = sym ā<ā *> many (psym (/= ā>ā)) <* sym ā>ā
Now to read a statement, we start with non-brackets, and alternate with keys in brackets. Letās observe that we start and end with non-brackets, since they can be empty. Thus we can represent a line a list of non-bracket/bracket pairs, followed by a last non-bracket part. To make a pair, we combine the parser results in a tuple using the (,) constructor enabled by TupleSections:
parseStatementLine :: String -> RE Char StatementparseStatementLine signal = string signal *> sym ā ā *> many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets
From here, we need a recursive function that will build up our final statement string and the list of keys. We do this with buildStatement.
parseStatementLine :: String -> RE Char StatementparseStatementLine signal = string signal *> sym ā ā *> (buildStatement <$> many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets) where buildStatement :: [(String, String)] -> String -> (String, [String]) buildStatement [] last = (last, []) buildStatement ((str, key) : rest) rem = let (str', keys) = buildStatement rest rem in (str <> "<" <> key <> ">" <> str', key : keys)
The last thing we need is a final helper that will take the result of buildStatement and turn it into a Statement. Weāll call this finalizeStatement, and then weāreĀ done!
parseStatementLine :: String -> RE Char StatementparseStatementLine signal = string signal *> sym ā ā *> (finalizeStatement . buildStatement <$> many ((,) <$> nonBrackets <*> insideBrackets) <*> nonBrackets) where buildStatement :: [(String, String)] -> String -> (String, [String]) buildStatement [] last = (last, []) buildStatement ((str, key) : rest) rem = let (str', keys) = buildStatement rest rem in (str <> "<" <> key <> ">" <> str', key : keys)
finalizeStatement :: (String, [String]) -> Statement finalizeStatement (regex, variables) = Statement regex variables
Scenarios
Now that we have all our pieces in place, itās quite easy to write the parser for scenario! First we get the title by reading the keyword and then the rest of theĀ line:
scenarioParser :: RE Char ScenarioscenarioParser = string āScenario: ā *> readThroughEndOfLine ...
After that, we read many statements, and then the example table. Since the example table might not exist, weāll provide an alternative that is a pure, empty table. We can wrap everything together by mapping the Scenario constructor overĀ it.
scenarioParser :: RE Char ScenarioscenarioParser = Scenario <$> (string āScenario: ā *> readThroughEndOfLine) <*> many (statementParser <* sym ā\nā) <*> (exampleTableParser <|> pure (ExampleTable [] []))
We can also make a āBackgroundā parser that is very similar. All that changes is that we read the string āBackgroundā instead of a title. Since weāll hard-code the title as āBackgroundā, we can include it with the constructor and map it over theĀ parser.
backgroundParser :: RE Char ScenariobackgroundParser = Scenario āBackgroundā <$> (string āBackground:ā *> readThroughEndOfLine) *> many (statementParser <* sym ā\nā) <*> (exampleTableParser <|> pure (ExampleTable [] []))
Finally theĀ Feature
Weāre almost done! All we have left is to write the featureParser itself! As with scenarios, weāll start with the keyword and a titleĀ line:
featureParser :: RE Char FeaturefeatureParser = Feature <$> (string āFeature: ā *> readThroughEndOfLine) <*> ...
Now weāll use the optional combinator to parse the Background if it exists, but return Nothing if it doesnāt. Then weāll wrap up with parsing many scenarios!
featureParser :: RE Char FeaturefeatureParser = Feature <$> (string āFeature: ā *> readThroughEndOfLine) <*> (optional backgroundParser) <*> (many scenarioParser)
Note that here weāre ignoring the ādescriptionā of a feature we proposed as part of our original syntax. Since there are no keywords for that, it turns out to be painful to deal with it using applicative parsing. When we look at monadic approaches starting next week, weāll see it isnāt as hardĀ there.
Conclusion
This wraps up our exploration of applicative parsing. We can see how well suited Haskell is for parsing. The functional nature of the language means itās easy to start with small building blocks like our first parsers. Then we can gradually combine them to make something larger. It can be a little tricky to wrap our heads around all the different operators and combinators. But once you understand the ways in which these let us combine our parsers, they make a lot of sense and are easy toĀ use.
To further your knowledge of useful Haskell libraries, download our free Production Checklist! It will tell you about libraries for many tasks, from databases to machine learning!
If youāve never written a line of Haskell before, never fear! Download our Beginners Checklist to learnĀ more!
Applicative Parsing II: Putting the Pieces Together was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.