
Profile and monitor in real time
The heap profile report file <program>.hp
is generated as the program executes, so it's perfectly fine to take a snapshot of the file at any time and visualize it with hp2ps, even if the program is still busy executing. Because very basic heap profiling (-hT
) is possible without having compiled with profiling support, it is possible to produce running heap profiles with a very small overhead.
Increasing the sample interval -i
to something relatively big, such as a couple of seconds, it is very feasible to extract heap profiles from long-running programs even in production environments.
A quick-and-dirty trick that isn't for the light-hearted is the -S
Runtime System option. This option prints garbage collector statistics every time a cleanup takes place, in realtime. This includes bytes allocated, bytes copied, bytes live, time elapsed, and how long the garbage collector took. To make some sense of the output, it might make sense to limit the number of generations in the garbage collector to 1
(the default is 2
). So +RTS -S -G1
.
Monitoring over HTTP with ekg
We are now familiar with profiling and benchmarking applications executing locally. We also know how to extract garbage collector information from programs running locally, but what if the program was running on a server? Things are no longer so convenient.
It would be nice to be able to monitor the performance of programs running on servers in realtime, and perhaps to store performance history in some timeseries database for later investigations.
A package called ekg
provides a ready solution for the first wish, namely real-time monitoring, and also a quite mature set of features for collecting statistics from Haskell programs. It provides a REST API out-of-the-box, which can be used to fetch data into time-series databases.
The first step with a new library is again to install it. This also installs the ekg-core
library, which contains metrics. The ekg
library provides the monitoring application:
cabal install ekg
Now we get to an example. A silly example, but still an example: a program that repeatedly asks for a number and prints the factorial of that number. For some reason or other, we want to monitor the performance of our command-line application via HTTP. Implementing this program is very straightforward:
-- file: ekg-fact.hs {-# LANGUAGE OverloadedStrings #-} module Main where import Control.Monad import System.Remote.Monitoring main = do forkServer "localhost" 8000 forever $ do input <- getLine print $ product [1..read input :: Integer]
Adding real-time monitoring over HTTP to our command-line program was a matter of adding one import
and one extra line in main
. When compiling, we should enable the -T
Runtime System option to enable ekg
to collect GC statistics. It is also a good idea to enable at least -O
. On multithreaded systems, we may likely also want a threaded runtime, so that the program doesn't need to share the same system core with the monitoring subsystem. All in all, we have:
ghc -O -threaded -rtsopts -with-rtsopts='-N-T' ekg-fact.hs ekg-fact
Now we can open a browser at http://localhost:8000
and get a real-time view of the performance of our program. A portion of the page is shown here:

It's possible to add our own metrics for ekg
to monitor. As an example, let's add to our program a metric that counts the number of factorials we have calculated:
{-# LANGUAGE OverloadedStrings #-} module Main where import Control.Monad import System.Remote.Monitoring import System.Metrics import qualified System.Metrics.Counter as Counter main = do server <- forkServer "localhost" 8000 factorials <- createCounter "factorials.count" (serverMetricStore server) forever $ do input <- getLine print $ product [1..read input :: Integer] Counter.inc factorials
First we needed to import
metric modules from the ekg-core
package. Then we created a new Counter
type metric with createMetric
. After calculating a factorial, we increase that counter by one.
The JSON API that comes with ekg
is quite simple. We can retrieve all metrics by just requesting the root with content-type JSON:
curl -H "Accept: application/json" http://localhost:8000 { "ekg": { "server_timestamp_ms": { "type": "c", "val": 1460318128878 } }, "rts": { "gc": { "gc_cpu_ms": { "type": "c", "val": 624 }, […]
If we are only interested in one metric, say our new factorials.count
metric, we can only request it specifically, like so:
curl -H "Accept: application/json" http://localhost:8000/factorials/count {"type":"c","val":12}
It's not hard to imagine integrating ekg-monitoring to time series databases. Using the REST API is straightforward, but the ekg
library is so flexible that a push-model wouldn't be too hard either.