
Handling tabular data
If you need O(1) general indexing, a table-like data structure is virtually your only option. The Haskell report specifies the array
package, which provides tables indexed by anything with an instance for a Ix
typeclass.
Immutable arrays come in two flavors (we'll discuss mutable arrays later):
Data.Array.Array
: Immutable arrays of boxed valuesData.Array.Unboxed.UArray
: Immutable arrays of unboxed values
A common use case for Immutable arrays is memoization. For example, a table of Fibonacci numbers could be constructed as follows:
-- file: fib-array-mem.hs import Data.Array fib :: Int -> Array Int Integer fib n = arr where arr = listArray (1,n) $ 1 : 1 : [ arr!(i-2) + arr!(i-1)| i <- [3..n] ]
We can also index by a tuple, which gives the array extra dimensions. The symmetric Pascal matrix will serve as an example:
pascal :: Int -> Array (Int, Int) Integer pascal n = arr where arr = array ((1,1),(n,n)) $ [ ((i,1),1) | i <- [1..n] ] ++ [ ((1,j),1) | j <- [1..n] ] ++ [ ((i,j),arr!(i-1,j) + arr!(i,j-1)) | i <- [2..n], j <- [2..n] ]
These self-referential definitions look very nice, and with arbitrary-sized integers the performance is pretty much as good as it could be.
But what if we only needed a table of the first 90 Fibonacci numbers? The 90th Fibonacci number fits into an Int64, so we could switch to that and use UArray instead of Array. But then we could not have had the nice self-referential definition, because array construction would block trying to index its unfinished self. In this case, you should build the list so that it doesn't reference the array, or convert the boxed array into an unboxed array via an intermediate list:
toUArray :: (Ix i, IArray UArray e) => Array i e -> UArray i e toUArray a = listArray (bounds a) (elems a)
This conversion comes at the cost of an intermediate list, though.
The array
package is quite low-level, but the speed will be there at the cost of doing a lot of index fiddling yourself. For multi-dimensional arrays, much of that index-fiddling is unfortunately unavoidable. But for single-dimensional arrays, there is a better choice.
Using the vector package
The Data.Vector
modules in the vector
package provide sweet and speedy high-level Int-indexed arrays, implemented on top of Data.Array
. They too come in boxed and unboxed flavors.
The sweetness of vector
is in the API, which is loaded with higher-order functions, convenient helpers, monad-lifted operations, and of course all the common operations for list-like structures.
The speed comes once again from fusion; in terms of raw speed, operations on vector have an upper bound set by arrays that vectors use internally. However, a sufficiently large composition of vector operations will almost always outperform a similar naive array-based program.
Say we have a sensor from which we have stored numeric observations at different times, and now we want to analyze that data. For performance, we choose to use unboxed vector for storing the data in memory. Also, we import randomIO
for testing purposes:
-- file: vector-testing.hs import qualified Data.Vector.Unboxed as U import System.Random (randomIO)
A neat thing about unboxed Vectors is that unboxed vectors support tuple-valued elements. Internally, they are represented with two vectors. This defines some types:
type Obs = U.Vector (TimeStamp, Double) type TimeStamp = Int
We can extract the value Vector of our observation vector using U.unzip
in constant time and no copying:
-- | O(1) values :: Obs -> U.Vector Double values obs = snd (U.unzip obs)
Note that U.map snd
would be bad, because mapping constructs a new vector, in general.
Now let's try something more interesting: a windowing function, which gives us the slice between two timestamps. The following implementation of a window
function is linear in time and constant in space:
-- | O(n+m), no copying. window :: TimeStamp -> TimeStamp -> Obs -> Obs window from until v = let (_, start) = U.span ((< from) . fst) v (between, _) = U.span ((<= until) . fst) start in between
We could improve this by a more involved binary search, for example. But for demonstration, I used just U.span
, which also does no copying by reusing the original vector. Because the time step between two observations (TimeStamps) can be arbitrary, logarithmic time complexity is the best we could get.
Implementing value average is straightforward:
-- | O(n) average :: Obs -> Double average obs = U.sum (values obs) / fromIntegral (U.length (values obs))
Let's test out the performance by generating a data set of a million random observations and then calculating averages at different windows:
main = do obs <- U.generateM (1024 ^ 2) $ \i -> randomIO >>= \v -> return (i, v) print $ average $ window 1 (1024 ^ 2) obs print $ average $ window 2 (1023 ^ 2) obs print $ average $ window 3 (1022 ^ 2) obs print $ average $ window 4 (1021 ^ 2) obs
Compile and run with Runtime System statistics:
$ ghc -rtsopts -O vector-testing.hs && time ./vector-testing +RTS -s [...] 2,090,993,872 bytes allocated in the heap 341,188,752 bytes copied during GC 59,032,744 bytes maximum residency (7 sample(s)) 2,863,512 bytes maximum slop 138 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 4003 colls, 0 par 0.372s 0.372s 0.0001s 0.0013s Gen 1 7 colls, 0 par 0.177s 0.177s 0.0253s 0.0447s INIT time 0.000s ( 0.000s elapsed) MUT time 1.426s ( 1.428s elapsed) GC time 0.548s ( 0.548s elapsed) EXIT time 0.006s ( 0.006s elapsed) Total time 1.983s ( 1.983s elapsed) %GC time 27.7% (27.7% elapsed) Alloc rate 1,465,833,131 bytes per MUT second Productivity 72.3% of total user, 72.3% of total elapsed
Wow, this looks pretty bad: lots of GC, only 70% productivity, and a 60-megabyte memory footprint! The data itself is only 16 megabytes on a 64-bit machine, which implies that a lot of unnecessary things are going on.
Turns out, optimization level -O
is insufficient for lots of important optimizations to kick in. Switching to -O2
gives significant improvements:
$ ghc -rtsopts -O2 vector-testing.hs && ./vector-testing +RTS -s [..] 1,862,402,704 bytes allocated in the heap 818,920 bytes copied during GC 16,779,344 bytes maximum residency (2 sample(s)) 2,070,704 bytes maximum slop 19 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3576 colls, 0 par 0.014s 0.014s 0.0000s 0.0000s Gen 1 2 colls, 0 par 0.001s 0.001s 0.0006s 0.0010s INIT time 0.000s ( 0.000s elapsed) MUT time 1.227s ( 1.229s elapsed) GC time 0.016s ( 0.015s elapsed) EXIT time 0.001s ( 0.001s elapsed) Total time 1.247s ( 1.245s elapsed) %GC time 1.3% (1.2% elapsed) Alloc rate 1,517,508,130 bytes per MUT second Productivity 98.7% of total user, 98.9% of total elapsed
With -O2
, we got down to 1.3% GC and 19 megabytes of memory footprint. For even better results, we could combine with optimizations for numerical code.
Note
It's certainly a good idea to always use -O2
for production and performance testing, especially for vector code. Compile times will be significantly longer, however, so in development -O
or even no optimization passes at all is advisable.