# An unified array interface

Recently I’m trying to merge some array code from my research project stdio to primitive, one of the core haskell libraries. I hope the reviewing process can be finished soon so that people can start using it. This post is a summary on current GHC array support(up to 8.2.1, things may changed when you read it), and what my patch is about.

## boxed, unboxed, lifted, unlifted

For many haskellers, using arrays may be the first time one wants to know what’s the difference between boxed, unboxed, lifted, unlifted types. Let’s spend some time explaining these buzzwords.

In other languages, you often have to distinguish reference and value. For example, in C pointers are references to other objects. It’s a memory location in hardware sense: you can use machine code to follow a reference to the memory it pointing to. While the other non-pointer types value are not memory locations, their 1-0 arrangement stands for a certain value of that type.

In haskell almost every value you see is a pointer in C sense, i.e. a memory location point to a heap object, for example a data type like:

Are represented as:

During runtime the value foo is just a pointer, and all the operations, e.g. pattern match, is going through dereferencing. Values like this are called boxed because it’s a pointer to a box, i.e. heap objects with info-table. The info-table contains many useful infomation about the box, such as how many words the boxed occupied, which constructor the box stand for, etc.

The 3# and 'a'# above are haskell’s non-pointer value, we call values like this unboxed values. Unboxed values don’t have info-tables, so we really can’t have them directly on heap: otherwise the GC would get confused when it scans them, without infomation from info-table, it can’t decide how many bytes to copy. These values are only belong to boxes, registers or stacks: we generate machine code to manipulate them directly.

Another difference, unlifted and lifted, exists because in haskell we have non-strict evaluation mechanism, for example a value 1 + 2 may have a representation like:

As you can see, 1 + 2 and 3 are both references, they can be used interchangeably: a function expecting an Int argument can accept both pointers. This is done by entering the heap objects. i.e. execute the entry code following the info-table. The entry code for constructors are simply returns, since they’re already evaluated. For thunks the code will do evaluation and the reserved word above is reserved exactly for evaluation result, by writing a forward pointer and change the thunk box into an indirection box.

But the evaluation may fail(diverged recursion, stackoverflow, etc.), then the pointer will point to an undefined value, this kind of things are called bottom in haskell, written as _|_. The intuition for this name is that all the other evaluated values have certain meaning, but bottom doesn’t, it sits lower in the spectrum of determinism, concreteness, usefulness … whatever suits your mind. Hence comes the concept of lifted type, i.e. types which contain bottom values, or more formly, inhabited by _|_.

As you expected, most of the boxed type can be inhabited by _|_, the thunk may explode and terminate your program, just think about error "!" or undefined in base. What about unboxed types then? Can Int# stand for an undefined value? No, it’s impossible! All the 1-0 arrangements represent a Int#, there’s no way we get a bottom from it.

## boxed arrays

Now let’s consider GHC arrays, they’re special heap objects provided by RTS. We have boxed arrays MutableArray# and Array#, they are called boxed arrays because they store references to boxes:

It looks quite complicated, mainly because we want to optimize the GC for arrays:

• MutableArray# can have different info-table pointers during its life time, but we never enter them. The difference between these info-tables is the type field. e.g. a MutableArray# on old generation heap may have MUT_ARR_PTRS_CLEAN type which is saying this array have not been mutated since last GC, so if this is a minor GC we can safely skip it.

• MutableArray# have a card table which is just an byte map, recording which part of payload is mutated after last GC, a none-zero byte in card table indicate corresponding payload area(in GHC it’s 128 elements) contain mutated pointers, so that GC will trace them.

MutableArray#s are always kept in a generation’s mutable list once it’s promoted to that generation, so these optimizations are important if you keep a large mutable array on heap for a long time. For arrays smaller than 128, it’s unnecessary to use a card-table, so we also have MutableSmallArray# for that purpose.

In GHC we usually turn MutableArray# into a Array# with freeze operations after creation is completely. We changed the info-table(so its type) to stg_SMALL_MUT_ARR_PTRS_FROZEN0, then(after a GC) to stg_SMALL_MUT_ARR_PTRS_FROZEN, so that minor GC will not scan it anymore. But the card-table’s space is still there in case we thaw the array in place again. Generally speaking, under creation-freeze pattern, MutableSmallArray# and SmallArray# are more recommended since you won’t keep the mutable one on heap for too long.

## boxed, unlifted type

ghc-prim also provide MutableArrayArray# and ArrayArray# array type, you can use them to store boxed unlifted values, a boxed unlifted value? Yeah you hear it right, there’re certain kind of values, which are pointers pointing to boxes, but them self can never be bottom. It turned out MutableArray# and Array# are exactly this type of thing. It’s easy to tell they’re boxed value because they point to boxes we draw above, but why they’re unlifted?

MutableArray#/Array# are unlifted because they can’t be obtained directly by thunk evaluation, there’re simply no ways to create a thunk which evaluates to MutableArray# or Array#, you can only create them using primitive operations provided by RTS, and RTS never produce them lazily(actually that may be possible, i.e. allocating array when we enter them, but doing that will complicate things).

What you can do is to wrap the pointer itself inside another box, i.e. data Array = Array Array#, and wrap the primitive operations so that an Array works like all the other haskell lazy boxed types, this is actually just creating another layer indirection. Now let’s say you want an array of arrays, Array Array is definitely not optimal: every element of the array points to a Array box, and inside the Array box we have a Array#
points to the real array. The indirection is absolute wasteful if we don’t need lazy on the element arrays.

So here come ArrayArray#, It’s actually just Array# s, but we use it to save the Array# pointers instead of Array pointers, thus save an unnecessary indirection. And we can be sure the element are all evaluated, because Array# are unlifted type, which can’t be a thunk.

In primitive package @dolio push this technique to its limit: we use ArrayArray# to store all the boxed unlifted types, such as MutVar# or ArrayArray# itself. And in my patch i extend this support to MVar# and TVar#.

## byte arrays

The heap object layout of MutableByteArray#, ByteArray# are simpler, since they don’t contain pointers, we don’t have to trace them during GC:

In fact we only have one single info-table for both MutableByteArray# and ByteArray#, thus unlike boxed arrays, freezing and thawing between them in place are just no-ops. Byte array can be used to encode different size non-pointer data, such as Int and Word8, ghc-prim provide seperated functions to work with different data types: indexIntArray#, indexWord8Array#, etc. In primitive, the ByteArray type is also accompanied with all these operations.

My patch add a tagged version of byte array:

Here a can be Int, Word8, etc. They are just instances of Prim class in primitive since a long time ago. The tagged version byte array makes polymorphric unboxed array possible, later it’s unified again with another level of abstraction.

Using byte array with different size of Prim instances requires user to watch out alignment, GHC RTS allocate byte arrays aligned to machine word, this is sufficient for all the built-in Prim instances, but in case of a special alignment requirement, you have to use newAlignedPinnedByteArray# primitive operations which do aligned allocation.

You may wonder what’s the pinned fuzz about. There’re two kind of byte arrays, the one which can be moved by GC, and the one which can not. This difference arise when we want to interface foreign code: once we pass a byte array a foreign function call, we don’t want it get moved by GC while the foreign function is still doing work on it, so we want to allocate a pinned byte array to do the job.

Allocating pinned byte array are not happening on capability’s nursery, it’s happening on a global pinned object heap. It’s more like a traditional malloc. It can be slower to collect, and the allocation may suffer from contentions since we need to aquire a global lock during allocating.

Even though it’s slower, we still want to allocate large byte array on that global pinned object heap, we gain performance elsewhere: the GC will not move it. In GHC there’s a limit(LARGE_OBJECT_THRESHOLD, a little bit smaller than 4K block size), once you are allocating byte array large than that, it will be allocated on pinned heap.

After #8281 is resolved, we now have a guarantee that during unsafe FFI, the GC won’t happen. Thus it’s safe to pass byte array to an unsafe FFI call(This actually also hold for older GHC, but not for older GHCi). Otherwise you have to ask if a byte array is pinned and create a pinned copy if necessary, new ghc-prim provides isMutableByteArrayPinned#/isByteArrayPinned#, but on older GHC you can easily get this infomation by asking the byte array’s block descriptor flag:

## unified array interface

Together with PrimArray a, my patch also add a Arr class:

This is a type class trying to unify RTS’s array interface, e.g. the Data.Primitive.XXXArray modules, it’s a multi-parameter class constraining both immutable and mutable array types. For example we have following instances:

Arr class uses functional dependencies to force a one-to-one immutable/mutable constraint, which is useful since many operations under Arr only mention either the immutable array type, or the mutable one.

This class is useful in many ways, for example, array slices can be defined as:

We can provide generic operations without considering which underlined array type is, e.g. a foldr/build awared packing function from stdio:

With Arr class, the array programming experience is improved to the “average” level, because in other languages you don’t have to use different functions with boxed and unbox array. Now you don’t have to do it in haskell either.