Yesterday I and my friend Tao He write a short release message at a local cafe, we released a new IO library for GHC based on our previous work on combining libuv and GHC. Here I’d like to spend some time explaining what’s inside our new library besides the libuv binding part, what’s the motivation and what’s our solution.
ByteString is used as the packed bytes type for years, it’s used as the buffer type in IO, serialization, parsing, etc. Because it uses pinned memory, thus allow everything works on
Ptr ()s works on it. But this is really not ideal: pinned memory is only suitable for large blocks, if lots of small
ByteStrings are created during runtime, memory fragmentation may happen. To address this problems, in stdio we do a couple of things:
- To solve the system FFI use case, we add a separated
Which always wrap an immutable null-terminated string, It’s used as the file path type, domain name type, etc. We use a rewrite rule so that a literal
CBytes is constructed directly from a literal
Addr# in O(1).
- We build a
ByteArray#based vector type, which based on
PrimArrayin primitive package.
-- from primitive >= 0.6.4.0
With this definition, GHC RTS can freely choose to allocate on nursery or pinned blocks depending on vector size, creating small vectors are much cheaper. When you want to pass this new vector to a safe FFI, you will have to check if it’s a pinned one, if not a pinned copy is made (in unsafe FFI case, you can pass the
- Now makes
Bytesan alias to
There’s a key function that
ByteString beats vector:
break, it uses
memchr if possible, which is SIMD enabled. This is very important because a high performance
Parser will need a fast
takeWhile (==x), which in turn rely on finding the first specific byte quickly. We leverage GHC rewrite rules to solve this problem: use
memchr if types matches
Like what we suggested in the last blog post, we build a unified vector for both boxed and unboxed vector. I didn’t provide separated typeclasses for slicing, indexing, etc, instead, I made it very clear that a vector is just a slice of an array:
class (Arr (MArray v) (IArray v) a) => Vec v a where
Vector and unboxed
PrimVector ‘s definition is as simple as it can be:
data Vector a = Vector
In stdio vector combinators all work on
Vec v a constraint, so something like type changing
zipWith', etc are possible:
-- We don't require result vector type is the same with input one
I even implemented a KMP searching algorithm with this unified vector interface:
indices :: (Vec v a, Eq a) => v a -> v a -> Bool -> [Int]
Now enjoy searching an
Int vector or boxed vector with O(n + k) time complexity!
Bytes requires a whole new machinery of constrution and deconstrution, traditional
Parsers all works on
Ptr ()s, which is not suitable for our new
Bytes anymore. When we first want to design a solution for our new
Bytes type at the time of 2017, we met a problem though: we can’t do unaligned access on
ByteArray#s, e.g. we can’t read or write 4 bytes as
Word32 at a random index of a
ByteArray# like a
Ptr (), the primitives GHC provides only allow us to read or write on aligned indexes. So the work on stdio stalled, until in GHC 8.6.1 new prim-ops are added:
indexWord8ArrayAsWord16# :: ByteArray# -> Int# -> Word#
This is very important because if we can’t do unaligned access with
ByteArray#, then we’re forced to break down a primitive value into bytes during serializing, or constructing from several bytes when parsing, which will be much slower than old
Storable way. In stdio we provide
UnalignedAccess type class to leverage these new prim-ops, which probably should be picked up in the next version of
Now the missing piece is ready, we can have our
Parser now. In stdio we design something different: A
Builder monad not only support building
Bytes chunks, but also support building short strict
Bytes. That is, we support different
data AllocateStrategy s
This is an internal type to implement different allocate strategies, and the
Builder monad use this information to handle writing across buffer boundaries. The final API exposed to the user is very nature:
buildBytes :: Builder a -> V.Bytes
Parser type, we choose a very simple CPS formula:
data Result a
After I implemented this
Parser then I found it’s exactly the same one with scanner package, it’s not a coincidence though: in my own benchmark this formula is as fast as the non-resumable one from store package!
Unlike what Yuras claimed in scanner package, it’s actually quite easy to provide an
Alternative instance for this formula. And if a user doesn’t use it, there will be no backtracking cost at all.
We provide high quality numeric
Parsers as well, the IEEE floating formatting is based on my grusi3 patch, which is much faster.
Some extra words on
Builder here: why do we choose a monadic interface instead of monoid one? The reasons in my mind are:
A monadic interface is more powerful, you can do something like accumulating a verification value while building bytes, or add an extra
ReaderTlayer to do some building constant injections.
A monadic interface is more symmetrical to
Parser, e.g. we probably have something like
import qualified Std.Data.Builder as B
- A monadic interface works extremely well with
Control.Monad, while control structures on monoid is a bit of lacking.
Buffered IO is a fundamental module to any IO toolkit since it can greatly boost IO performance. In base we have
Handle type which is powerful but very complex. In stdio we designed a rather simple solution based on the idea from io-streams package: a buffered input device should support push back operation, so that running a resumable
Parser which only consumes part of the buffer is not a problem. The API of buffered IO is as following:
readBuffer :: (HasCallStack, Input i) => BufferedInput i -> IO V.Bytes
Nothing fancy, every function should work as you expected. Note we don’t provide thread-safe guarantees here: to use a
BufferedOutput in multiple threads, use an
MVar to protect concurrent access.
For now, stdio is already in its shape. We have put a lot of time on handcrafting over last two years. Lots of experiments, lots of tests. We hope this little project can start to make some of your jobs easier, but of course, it’s still in its early stage. There’re a lot of functionalities we want to add to it eventually, here is something on the roadmap:
- Implement a binary serialization module similar to binary, cereal package.
- Implement a JSON module.
- Support UDP device.
- Add DNS functionality, i.e
- Add HTTP protocol support.
- Add TLS(transfer layer security) support.
Both I and Tao He know that maintain a project with size like this will be out of our capability, so we’d like to ask help from Haskell community here. How can we grow this project further? In which direction our community needs it to be? We’d like to hear your opinions, and of course, your user experience report with stdio!