See a typo? Have a suggestion? Edit this page on Github
NOTE This content originally appeared on School of Haskell.
I've run into this issue myself, and seen others hit it too. Let's start off with some very simple code:
#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
sayHi :: Maybe String -> IO ()
sayHi mname =
case mname of
Nothing -> return ()
Just name -> putStrLn $ "Hello, " ++ name
main :: IO ()
main = sayHi $ Just "Alice"
There's nothing amazing about this code, it's pretty straight-forward pattern matching Haskell. And at some point, many Haskellers end up deciding that they don't like the explicit pattern matching, and instead want to use a combinator. So the code above might get turned into one of the following:
#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
import Data.Foldable (forM_)
hiHelper :: String -> IO ()
hiHelper name = putStrLn $ "Hello, " ++ name
sayHi1 :: Maybe String -> IO ()
sayHi1 = maybe (return ()) hiHelper
sayHi2 :: Maybe String -> IO ()
sayHi2 = mapM_ hiHelper
main :: IO ()
main = do
sayHi1 $ Just "Alice"
sayHi2 $ Just "Bob"
-- or often times this:
forM_ (Just "Charlie") hiHelper
The theory is that all three approaches (maybe
, mapM_
, and
forM_
) will end up being identical. We can fairly conclusively state
that forM_
will be the exact same thing as mapM_
, since
it's just mapM_
flipped. So
the question is: will the maybe
and mapM_
approaches do the same
thing? In this case, the answer is yes, but let's spice it up a bit
more. First, the maybe
version:
#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s
import Control.Monad (when)
uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)
printChars :: Int -> [Char] -> IO ()
printChars idx str = maybe (return ()) (\(c, str') -> do
when (idx `mod` 100000 == 0)
$ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c
printChars (idx + 1) str') (uncons str)
main :: IO ()
main = printChars 1 $ replicate 5000000 'x'
You can compile and run this by saving to a Main.hs
file and running
stack Main.hs && ./Main
. On my system, it prints out the following
memory statistics, which from the maximum residency you can see runs
in constant space:
2,200,270,200 bytes allocated in the heap
788,296 bytes copied during GC
44,384 bytes maximum residency (2 sample(s))
24,528 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
While constant space is good, the usage of maybe
makes this a bit
ugly. This is a common time to use forM_
to syntactically clean
things up. So let's give that a shot:
#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s
import Control.Monad (when, forM_)
uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)
printChars :: Int -> [Char] -> IO ()
printChars idx str = forM_ (uncons str) $ \(c, str') -> do
when (idx `mod` 100000 == 0)
$ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c
printChars (idx + 1) str'
main :: IO ()
main = printChars 1 $ replicate 5000000 'x'
The code is arguablycleaner and easier to follow. However, when I run it, I get the following memory stats:
3,443,468,248 bytes allocated in the heap
632,375,152 bytes copied during GC
132,575,648 bytes maximum residency (11 sample(s))
2,348,288 bytes maximum slop
331 MB total memory in use (0 MB lost due to fragmentation)
Notice how max residency has balooned up from 42kb to 132mb! And if you increase the size of the generated list, that number grows. In other words: we have linear memory usage instead of constant, clearer something we want to avoid.
The issue is that the implementation of mapM_
in Data.Foldable
is
not tail recursive, at least for the case of Maybe
. As a result,
each recursive call ends up accumulating a bunch of "do nothing"
actions to perform after completing the recursive call, which all
remain resident in memory until the entire list is traversed.
Fortunately, solving this issue is pretty easy: write a tail-recursive
version of forM_
for Maybe
:
#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s
import Control.Monad (when)
uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)
forM_Maybe :: Monad m => Maybe a -> (a -> m ()) -> m ()
forM_Maybe Nothing _ = return ()
forM_Maybe (Just x) f = f x
printChars :: Int -> [Char] -> IO ()
printChars idx str = forM_Maybe (uncons str) $ \(c, str') -> do
when (idx `mod` 100000 == 0)
$ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c
printChars (idx + 1) str'
main :: IO ()
main = printChars 1 $ replicate 5000000 'x'
This implementation once again runs in constant memory.
There's one slight difference in the type of forM_Maybe
and forM_
specialized to Maybe
. The former takes a second argument of type a -> m ()
, while the latter takes a second argument of type a -> m b
. This difference is unfortunately necessary; if we try to get back
the original type signature, we have to add an extra action to wipe
out the return value, which again reintroduces the memory leak:
forM_Maybe :: Monad m => Maybe a -> (a -> m b) -> m ()
forM_Maybe Nothing _ = return ()
forM_Maybe (Just x) f = f x >> return ()
Try swapping in this implementation into the above program, and once again you'll get your memory leak.
mono-traversable
Back in 2014, I raised this same issue
about the mono-traversable library,
and ultimately decided to change the type signature of the omapM_
function to the non-overflowing demonstrated above. You can see that
this in fact works:
#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec --package mono-traversable -- ghc -with-rtsopts -s
import Control.Monad (when)
import Data.MonoTraversable (oforM_)
uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)
printChars :: Int -> [Char] -> IO ()
printChars idx str = oforM_ (uncons str) $ \(c, str') -> do
when (idx `mod` 100000 == 0)
$ putStrLn $ "Character #" ++ show idx ++ ": " ++ show c
printChars (idx + 1) str'
main :: IO ()
main = printChars 1 $ replicate 5000000 'x'
As we'd hope, this runs in constant memory.