“I hope that one day, the business needs a string calculator. Then I can say”This is the moment I trained for my whole life!”” – Michael Stum (@mstum), tweet
An effective string calculator is obviously indispensable to any software project. I have attempted this before, but one can never be too prepared, and so I thought I’d revisit it. But this time I thought I’d try it using Parsec, a parser combinator library for Haskell.
The basic idea
One way to think of a parser is as a function that takes a String
, and either succeeds by producing a value and the remainder of the input to parse, or fails.
We can use functions to combinate, er, combine, parsers in this form to make new parsers.
We can implement these parser combinator functions ourselves, but there are some ready-built implementations in libraries like Parsec and Attoparsec (and they’ll undoubtedly perform better than my naive attempt).
Some boilerplate
Let’s create a stringCalc.hs
file with some pre-entered imports and so on, so it doesn’t get in the way for the rest of the post. It’s here for completeness, but feel free to skip it.
We’ll start this file by importing the libraries we’ll need for this post:
Next, we’ll include a parseAll
function which will try to parse the full block of text and return a result. If the parse fails, or if there is unparsed text left over, this will return Nothing
.1
Comma-separated values
The first part of the string calculator exercise is to sum a comma-separated sequence of natural numbers. So we’ll want a Parser [Int]
that can get these numbers from a string, and then we can use the built-in sum
function to add those numbers.
We’ll start by constructing a parser for a natural number. It will parse many1
(i.e. at least one) digits, then use Haskell’s read
function to convert those digits to an integer.
Now we can get comma-separated numbers by constructing a new parser – naturals, separated by ‘,’ characters:
We can add these together by telling Haskell we want to sum
whatever integers have been parsed by the csvNumbers
parser (using fmap
), and then get the result using our parseAll
helper function:
Newlines as delimiters
The next part of the exercise is to support both commas and newline characters as separators. We’ll rename the csvNumbers
parser to stringCalcP
(string calculator parser), and adjust it to accept either delimiter. The <|>
operator can be read as “or”.
Custom delimiters
Our next job is to support optional custom delimiters, not just ‘,’ and ‘\n’. Strings with custom delimiters will be in the format “//[delimiter]\n[numbers…]”. So let’s construct a parser that produces a parser as a value (it’s parsers all the way down, until we hit the turtles). We’ll also extract our current delimiter logic into a default delimiter parser.
We can now update stringCalcP
to use a custom delimiter, and fall back to the default delimiters as required.
And if we chuck in a main
function, we can compile this into an EXE that reads from stdin before doing its calculatory magic:
The business is safe… but for how long?!
I think that should be enough to satisfy the most common business requirements for calculating strings. Now, to await the call…
This throws away some useful information Parsec gives us when a string is not parsed, but makes for cleaner output when showing examples in a post.↩