Oops Null Pointer

Java programming related

Avoiding the database – Ehcache’s poor fit as a disk only cache

Recently we needed to a quick fix (with a better solution to come) to speed up our application queries – which involve reading and assembling copious amounts of data from the database.

Large parts of this data changes infrequently (once a day) and is shared by multiple queries. So as retrieving from the database is too slow and there is too much data to store in the same jvm as the app we needed an alternative.

The idea was to pull data out of the database via scheduled jobs (say once a day) and put the data is something that was faster that the DB queries.

Ehcache was already used for another purpose, so we tried creating a disk only cache with the following settings:

diskcache.cacheEnabled = false
diskcache.multiDateRange.cacheEnabled = FALSE
diskcache.maxElementsInMemory = 1
diskcache.overflowToDisk.enabled = TRUE

We discovered Ehcache’s disk store has a few caveats:

  • It’s fragile – if it gets corrupted then the cache file and its index are deleted. This is by design as Ehcache prefers safety over persistence
  • It’s not persisted until there is a flush or the cache manager is shut down (you need to register the shut down hook)
  • Raw object serialisation is used, so unthoughtful serialisation policies cause the cache file size to balloon
  • The disk spooling can run faster than the garbage collector, so you can get OutOfMemory errors unless you flush periodically
  • It’s not very fast and performance degrades if you are updating entries that already exist

That last point on performance is probably the biggest death knell for using Ehcache it this way. You can work around the fragility with lots of flushing, but the performance degradation over time is difficult to avoid.

For example, if the update cache code does a get and then a put, then the first query I run on an empty cache takes 2 minutes. The second time the same query is run it takes 20mins!

If this is replaced by a remove and a put then the second time takes 8mins. So if you can get away with it don’t bother updating, just remove and re-add. Note that without the remove the query’s performance is nearly as bad as in the first example.

The real answer is to use something else as a out of jvm store that is faster than the DB like Infinispan or Terracotta. What has worked for you?

In summary:

  • Use custom serialisation to minimise cached object sises
  • Use many smaller caches so the whole cache can be emptied and recreated
  • Use a remove before a put rather than a get and then a put
  • Flush regularly to minimise the disk spool memory usage and ensure the cache is written to file

Update: see here for the an issue on slow caches when the disk store is bigger than the memory store. Login and vote if this issue affects you!


2 responses to “Avoiding the database – Ehcache’s poor fit as a disk only cache

  1. steve September 29, 2010 at 3:59 pm

    Have you taken a look at the BigMemory add-on to ehcache. Seems like it might help. It has a new disk store that is better on the performance issues you raised but the best part is really the large amount of memory it can leverage. Here is a blog I wrote about it.


    • oopsnullpointer September 30, 2010 at 6:23 am

      Yes BigMemory looks like the right idea. We had already registered for the beta but haven’t tried it. Perhaps a cleverly arranged Terracotta structure could serve a similar purpose without the cost, but obviously with less convenience? I’ve read that the Terracota’s method of writing to disk is very efficient, so I expect BigMemory is similarly so?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: