Haskell High Performance Programming
上QQ阅读APP看书,第一时间看更新

Handling numerical data

Like all general-purpose programming languages, Haskell too has a few different number types. Unlike other languages, the number types in Haskell are organized into a hierarchy via type classes. This gives us two things:

  • Check sat compiletime we aren't doing anything insane with numbers
  • The ability to write polymorphic functions in the number type with enhanced type safety

An example of an insane thing would be dividing an integer by another integer, expecting an integer as a result. And because every integral type is an instance of the Integral class, we can easily write a factorial function that doesn't care what the underlying type is (as long as it represents an integer):

factorial :: Integral a => a -> a
factorial n = product [1..n]

The following table lists basic numeric types in Haskell:

Apart from Integer and its derivatives, the performance of basic operations is very much the same. Integer is special because of its ability to represent arbitrary-sized numbers via GNU Multiple Precision Arithmetic Library (GMP). For its purpose, Integer isn't slow, but the overhead relative to low-level types is big.

Because of the strict number hierarchy, some things are a bit inconvenient. However, there are idiomatic conventions in many situations. For example:

  • Instead of fromIntegral . length, use Data.List.genericLength
  • Instead of 1 / fromIntegral (length xs), write 1 % length xs
  • Use float2Double and double2Float from GHC.Float to convert between Floats and Doubles

Loops that use intermediate numbers usually benefit from strictness annotations. GHC often unboxes strict number arguments, which leads to efficient code. Strict arguments in non-recursive functions, however, are usually not a good idea, resulting in longer execution times due to suboptimal sharing.

GHC flags that often give better performance for number-heavy code include -O2, -fexcess-precision, and -fllvm. The last flag compiles via LLVM, which requires the LLVM libraries installed (and currently (GHC 7 series) only version 3.5 is supported).