Oops Null Pointer

Java programming related

Monthly Archives: April 2012

Memory and wire size of message protocols

There are many studies of the speed of message protocols like protobuf, JSON, BSON, etc, but little in the way of measuring the memory usage required to get the in memory data out to the client. The simplest approach (and the worst in terms of memory usage) is buffering the whole data structure before sending. This typically requires at least the same amount of memory as the original data.

My data set at hand for testing was a large (82MiB) 2D array of decimal values represented as strings (about 10 decimal places).

The Java generated CORBA serialisation code I started with buffers everything at once in its write method. 82MiB is copied to 82MiB.

JSON mapped using Jackson had similar but slightly better memory usage.

Using an ancient version of  The Mind Electric’s GLUE SOAP toolkit  (don’t ask!), the SOAP wrapped JSON message also buffers the lot into memory and was horribly inefficient in creating the envelope (using 100’s of MiB’s).

*A note on wire size – the SOAP message compressed very well using GZip as the whole message is available.

BSON (using BSON4Jackson) by default requires the first element to be the message size and thus buffers the lot into memory. By disabling pure BSON using the BsonGenerator.Feature.ENABLE_STREAMING setting, streaming code can be used and the memory usage is about a third of the original data size again to send.

I couldn’t get Google’s protocol buffers to be very large data friendly. Strings appear to be unoptimised (it just uses String.getBytes()), so even sending an array “row” at a time did not yield great performance. Sending a field at at time with string size fields and row length prefixes was even worse.

The least memory usage was by sending the data via Jackson’s streaming API. This coupled with no content length header to enable chunking (a HTTP 1.1 feature) had almost no overhead. Sending 82MiBs took about 4KB! There is some clever code in the streaming API as it is exceptionally efficient at streaming to an output stream (in my case in a servlet or via Restlet’s OutputRepresentation class).

You can also use GZip on this and it produces half the wire size, but takes twice the time.

In summary: Plain JSON streamed with Jackson is the clear winner for my data set with it’s tiny memory usage sending data to a stream.

In practice I felt that it was much simpler that this article made it seem (but I’m sending a very simple message here). Here is my code to stream a JSON representation of a 2D string array:

JsonFactory f = new JsonFactory();
JsonGenerator g = f.createJsonGenerator(outputStream);
g.writeStringField("type", "JsonJacksonStreaming");

for (int r = 0; r < a2d.length; r++)
    for(int c = 0; c < a2d[r].length; c++)

Note: I didn’t get time to try MessagePack, but I’d like to. Anyone who has tried message pack with large amounts of string-ish data care to comment?