Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
Last week, we took a step into the monadic world of parsing by learning about the Attoparseclibrary. It provided us with a clearer syntax to work with compared to applicative parsing. This week, weâll explore one final library: Megaparsec.
This library has a lot in common with Attoparsec. In fact, the two have a lot of compatibility by design. Ultimately, weâll find that we donât need to change our syntax a whole lot. But Megaparsec does have a few extra features that can make our lives simpler.
To follow the code examples here, head to the megaparsec branch on Github! To learn about more awesome libraries you can use in production, make sure to download our Production Checklist! But never fear if youâre new to Haskell! Just take a look at our Beginners checklist and youâll know where to get started!
A Different Parser Type
To start out, the basic parsing type for Megaparsec is a little more complicated. It has two type parameters, e and s, and also comes with a built-in monad transformer ParsecT.
data ParsecT e s m a
type Parsec e s = ParsecT e s Identity
The e type allows us to provide some custom error data to our parser. The s type refers to the input type of our parser, typically some variant of String. This parameter also exists under the hood in Attoparsec. But we sidestepped that issue by using the Text module. For now, weâll set up our own type alias that will sweep these parameters under the rug:
type MParser = Parsec Void Text
Trying our Hardest
Letâs start filling in our parsers. Thereâs one structural difference between Attoparsec and Megaparsec. When a parser fails in Attoparsec, its default behavior is to backtrack. This means it acts as though it consumed no input. This is not the case in Megaparsec! A naive attempt to repeat our nullParser code could fail in some ways:
nullParser :: MParser ValuenullParser = nullWordParser >> return ValueNull where nullWordParser = string "Null" <|> string "NULL" <|> string "null"
Suppose we get the input âNULLâ for this parser. Our program will attempt to select the first parser, which will parse the N token. Then it will fail on U. It will move on to the second parser, but it will have already consumed the N! Thus the second and third parser will both fail as well!
We get around this issue by using the try combinator. Using try gives us the Attoparsec behavior of backtracking if our parser fails. The following will work without issue:
nullParser :: MParser ValuenullParser = nullWordParser >> return ValueNull where nullWordParser = try (string "Null") <|> try (string "NULL") <|> try (string "null")
Even better, Megaparsec also has a convenience function stringâ for case insensitive parsing. So our null and boolean parsers become even simpler:
nullParser :: MParser ValuenullParser = M.string' "null" >> return ValueNull
boolParser :: MParser ValueboolParser = (trueParser >> return (ValueBool True)) <|> (falseParser >> return (ValueBool False)) where trueParser = M.string' "true" falseParser = M.string' "false"
Unlike Attoparsec, we donât have a convenient parser for scientific numbers. Weâll have to go back to our logic from applicative parsing, only this time with monadic syntax.
numberParser :: MParser ValuenumberParser = (ValueNumber . read) <$> (negativeParser <|> decimalParser <|> integerParser) where integerParser :: MParser String integerParser = M.try (some M.digitChar)
decimalParser :: MParser String decimalParser = M.try $ do front <- many M.digitChar M.char '.' back <- some M.digitChar return $ front ++ ('.' : back)
negativeParser :: MParser String negativeParser = M.try $ do M.char '-' num <- decimalParser <|> integerParser return $ '-' : num
Notice that each of our first two parsers use try to allow proper backtracking. For parsing strings, weâll use the satisfy combinator to read everything up until a bar or newline:
stringParser :: MParser ValuestringParser = (ValueString . trim) <$> many (M.satisfy (not . barOrNewline))
And then filling in our value parser is easy as it was before:
valueParser :: MParser ValuevalueParser = nullParser <|> boolParser <|> numberParser <|> stringParser
Filling in the Details
Aside from some trivial alterations, nothing changes about how we parse example tables. The Statement parser requires adding in another try call when weâre grabbing our pairs:
parseStatementLine :: Text -> MParser StatementparseStatementLine signal = do M.string signal M.char ' ' pairs <- many $ M.try ((,) <$> nonBrackets <*> insideBrackets) finalString <- nonBrackets let (fullString, keys) = buildStatement pairs finalString return $ Statement fullString keys where buildStatement = ...
Otherwise, weâll fail on any case where we donât use any keywords in the statement! But itâs otherwise the same. Of course, we also need to change how we call our parser in the first place. Weâll use the runParser function instead of Attoparsecâs parseOnly. This takes an extra argument for the source file of our parser to provide better messages.
parseFeatureFromFile :: FilePath -> IO FeatureparseFeatureFromFile inputFile = do ⊠case runParser featureParser finalString inputFile of Left s -> error (show s) Right feature -> return feature
But nothing else changes in the structure of our parsers. Itâs very easy to take Attoparsec code and Megaparsec code and re-use it with the other library!
Adding Some State
One bonus we do get from Megaparsec is that its monad transformer makes it easier for us to use other monadic functionality. Our parser for statement lines has always been a little bit clunky. Letâs clean it up a little bit by allowing ourselves to store a list of strings as a state object. Hereâs how weâll change our parser type:
type MParser = ParsecT Void Text (State [String])
Now whenever we parse a key using our brackets parser, we can append that key to our existing list using modify. Weâll also return the brackets along with the string instead of merely the keyword:
insideBrackets :: MParser StringinsideBrackets = do M.char '<' key <- many M.letterChar M.char '>' modify (++ [key]) -- Store the key in the state! return $ ('<' : key) ++ ['>']
Now instead of forming tuples, we can concatenate the strings we parse!
parseStatementLine :: Text -> MParser StatementparseStatementLine signal = do M.string signal M.char ' ' pairs <- many $ M.try ((++) <$> nonBrackets <*> insideBrackets) finalString <- nonBrackets let fullString = concat pairs ++ finalString âŠ
And now how do we get our final list of keys? Simple! We get our state value, reset it, and return everything. No need for our messy buildStatement function!
parseStatementLine :: Text -> MParser StatementparseStatementLine signal = do M.string signal M.char ' ' pairs <- many $ M.try ((++) <$> nonBrackets <*> insideBrackets) finalString <- nonBrackets let fullString = concat pairs ++ finalString keys <- get put [] return $ Statement fullString keys
When we run this parser at the start, we now have to use runParserT instead of runParser. This returns us an action in the State monad, meaning we have to use evalState to get our final result:
parseFeatureFromFile :: FilePath -> IO FeatureparseFeatureFromFile inputFile = do ⊠case evalState (stateAction finalString) [] of Left s -> error (show s) Right feature -> return feature where stateAction s = runParserT featureParser inputFile s
Bonuses of Megaparsec
As a last bonus, letâs look at error messages in Megaparsec. When we have errors in Attoparsec, the parseOnly function gives us an error string. But itâs not that helpful. All it tells us is what individual parser on the inside of our system failed:
>> parseOnly nullParser "true"Left "string">> parseOnly "numberParser" "hello"Left "Failed reading: takeWhile1"
These messages donât tell us where within the input it failed, or what we expected instead. Letâs compare this to Megaparsec and runParser:
>> runParser nullParser "true" ""Left (TrivialError (SourcePos {sourceName = "true", sourceLine = Pos 1, sourceColumn = Pos 1} :| []) (Just EndOfInput) (fromList [Tokens ('n' :| "ull")]))>> runParser numberParser "hello" ""Left (TrivialError (SourcePos {sourceName = "hello", sourceLine = Pos 1, sourceColumn = Pos 1} :| []) (Just EndOfInput) (fromList [Tokens ('-' :| ""),Tokens ('.' :| ""),Label ('d' :| "igit")]))
This gives us a lot more information! We can see the string weâre trying to parse. We can also see the exact position it fails at. Itâll even give us a picture of what parsers it was trying to use. In a larger system, this makes a big difference. We can track down where weâve gone wrong either in developing our syntax, or conforming our input to meet the syntax. If we customize the e parameter type, we can even add our own details into the error message to help even more!
Conclusion
This wraps up our exploration of parsing libraries in Haskell! In the past few weeks, weâve learned about Applicative parsing, Attoparsec, and Megaparsec. The first provides useful and intuitive combinators for when our language is regular. It allows us to avoid using a monad for parsing and the baggage that might bring. With Attoparsec, we saw an introduction to monadic style parsing. This provided us with a syntax that was easier to understand and where we could see what was happening. Finally, this week, we explored Megaparsec. This library has a lot in common syntactically with Attoparsec. But it provides a few more bells and whistles that can make many tasks easier.
Ready to explore some more areas of Haskell develop? Want to get some ideas for new libraries to learn? Download our Production Checklist! Itâll give you a quick summary of some tools in areas ranging from data structures to web APIs!
Never programmed in Haskell before? Want to get started? Check out our Beginners Checklist! It has all the tools you need to start your Haskell journey!
Megaparsec: Same Syntax, More Features! was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.