String calculator with Parsec

"I hope that one day, the business needs a string calculator. Then I can say "This is the moment I trained for my whole life!"" – Michael Stum (@mstum), tweet

An effective string calculator is obviously indispensable to any software project. I have attempted this before, but one can never be too prepared, and so I thought I’d revisit it. But this time I thought I’d try it using Parsec, a parser combinator library for Haskell.

Note: I’m not going to strictly follow the kata, nor do TDD. I just wanted to play around with parsers! :)

The basic idea

One way to think of a parser is as a function that takes a String, and either succeeds by producing a value and the remainder of the input to parse, or fails.

-- Example: something that can parse a digit character (a Char from '1' to '9')
digit :: Parser Char

ghci> digit `parse` "123abc"
Just ("23abc",'1')   -- success! Parsed a single '1' from input,
                     -- still have "23abc" left to parse
ghci> digit `parse` "abc"
Nothing              -- could not parse a digit from input

We can use functions to combinate, er, combine, parsers in this form to make new parsers.

digits :: Parser [Char]
digits = atLeast1 digit

-- A natural number is an int read from one or more digits (0 or positive number)
natural :: Parser Int
natural = read `fmap` digits

-- A number is a natural or a negative number
number :: Parser Int
number = natural <|> negative

We can implement these parser combinator functions ourselves, but there are some ready-built implementations in libraries like Parsec and Attoparsec (and they’ll undoubtedly perform better than my naive attempt).

Some boilerplate

Let’s create a stringCalc.hs file with some pre-entered imports and so on, so it doesn’t get in the way for the rest of the post. It’s here for completeness, but feel free to skip it.

We’ll start this file by importing the libraries we’ll need for this post:

import Control.Applicative ((<*))
import Data.Text (Text, pack)
import Text.Parsec
import Text.Parsec.Text (Parser)

Next, we’ll include a parseAll function which will try to parse the full block of text and return a result. If the parse fails, or if there is unparsed text left over, this will return Nothing.1

-- Parse entire string. If anything is left over, then return Nothing.
parseAll :: Parser a -> Text -> Maybe a
parseAll p s =
    let parseToEnd = p <* eof    -- matches p then eof, but only returns value from p
        parsed = parse parseToEnd [] s
    in either (const Nothing) Just parsed

Comma-separated values

The first part of the string calculator exercise is to sum a comma-separated sequence of natural numbers. So we’ll want a Parser [Int] that can get these numbers from a string, and then we can use the built-in sum function to add those numbers.

We’ll start by constructing a parser for a natural number. It will parse many1 (i.e. at least one) digits, then use Haskell’s read function to convert those digits to an integer.

natural :: Parser Int
natural = read `fmap` many1 digit

Now we can get comma-separated numbers by constructing a new parser – naturals, separated by ‘,’ characters:

csvNumbers :: Parser [Int]
csvNumbers = natural `sepBy` char ','

-- Examples:
ghci> parseAll csvNumbers "1,2,3"
Just [1,2,3]
ghci> parseAll csvNumbers "abc"
Nothing

We can add these together by telling Haskell we want to sum whatever integers have been parsed by the csvNumbers parser (using fmap), and then get the result using our parseAll helper function:

add :: Text -> Maybe Int
add s = parseAll (fmap sum csvNumbers) s

-- Examples:
ghci> add "1,2,3"
Just 6
ghci> add ""
Just 0
ghci> add "asdf"
Nothing

Newlines as delimiters

The next part of the exercise is to support both commas and newline characters as separators. We’ll rename the csvNumbers parser to stringCalcP (string calculator parser), and adjust it to accept either delimiter. The <|> operator can be read as "or".

stringCalcP :: Parser [Int]
stringCalcP =
    let delim = newline <|> char ','
    in natural `sepBy` delim

-- Examples:
ghci> add "1,2\n3,4"
Just 10
ghci> add "1a2b"
Nothing
ghci> add ""
Just 0

Custom delimiters

Our next job is to support optional custom delimiters, not just ‘,’ and ‘\n’. Strings with custom delimiters will be in the format "//[delimiter]\n[numbers…]". So let’s construct a parser that produces a parser as a value (it’s parsers all the way down, until we hit the turtles). We’ll also extract our current delimiter logic into a default delimiter parser.

-- Parse out a parser that will match the custom delimiter
customDelim :: Parser (Parser Char)
customDelim = do
    count 2 (char '/')      -- match 2 forward-slash characters
    delim <- anyChar        -- ... then any character as a delimiter
    newline                 -- ... then a newline
    return (char delim)     -- return a parser for that character

-- Parse default delimiters: newline or ','
defaultDelim :: Parser Char
defaultDelim = newline <|> char ','

We can now update stringCalcP to use a custom delimiter, and fall back to the default delimiters as required.

stringCalcP :: Parser [Int]
stringCalcP = do
    delim <- customDelim <|> return defaultDelim
    natural `sepBy` delim

-- Example: custom delimiter
ghci> add "//;\n1;2;3"
Just 6
-- Examples: previous delimiters still supported
ghci> add "1,2,3"
Just 6
ghci> add "1\n2\n3"
Just 6
ghci> add "1,2\n3"
Just 6

And if we chuck in a main function, we can compile this into an EXE that reads from stdin before doing its calculatory magic:

main = do
    input <- getLine
    print $ add (pack input)

The business is safe… but for how long?!

I think that should be enough to satisfy the most common business requirements for calculating strings. Now, to await the call…


  1. This throws away some useful information Parsec gives us when a string is not parsed, but makes for cleaner output when showing examples in a post.

Comments