Oops Null Pointer

Java programming related

Monthly Archives: September 2010

I see the future – it’s disappointingly like today (Eclipse plugin setup still painful)

I hate setting up plugins in Eclipse. Every time a new version comes out, like a kid hoping for a transformer under the tree at Christmas but getting a pair of socks, I get disappointed.

Helios, as shiny as its new socks are, still makes me click away to install plugin after plugin.

In desperation I turned to Genuitec’s Pulse, and it showed promise, but has enough bugs to make me fall back to plain Eclipse again. It often got confused about a plugin (hey google stop innovating and updating things!) and then you get stuck weeping “I just want to install the GEF”!

The other thing that puts me off pulse is that you have to have a separate “common” directory for plugins. I personally like the fact that plain Eclipse can just be picked up and moved without any installation hassles.

I while ago I tried and few other services like Yoxos which I should probably have another look at (Yoxos 5 is in Beta).

Anyone else just want to get stuff done and have a good solution?

Avoiding the database – Ehcache’s poor fit as a disk only cache

Recently we needed to a quick fix (with a better solution to come) to speed up our application queries – which involve reading and assembling copious amounts of data from the database.

Large parts of this data changes infrequently (once a day) and is shared by multiple queries. So as retrieving from the database is too slow and there is too much data to store in the same jvm as the app we needed an alternative.

The idea was to pull data out of the database via scheduled jobs (say once a day) and put the data is something that was faster that the DB queries.

Ehcache was already used for another purpose, so we tried creating a disk only cache with the following settings:

diskcache.cacheEnabled = false
diskcache.multiDateRange.cacheEnabled = FALSE
diskcache.maxElementsInMemory = 1
diskcache.overflowToDisk.enabled = TRUE

We discovered Ehcache’s disk store has a few caveats:

  • It’s fragile – if it gets corrupted then the cache file and its index are deleted. This is by design as Ehcache prefers safety over persistence
  • It’s not persisted until there is a flush or the cache manager is shut down (you need to register the shut down hook)
  • Raw object serialisation is used, so unthoughtful serialisation policies cause the cache file size to balloon
  • The disk spooling can run faster than the garbage collector, so you can get OutOfMemory errors unless you flush periodically
  • It’s not very fast and performance degrades if you are updating entries that already exist

That last point on performance is probably the biggest death knell for using Ehcache it this way. You can work around the fragility with lots of flushing, but the performance degradation over time is difficult to avoid.

For example, if the update cache code does a get and then a put, then the first query I run on an empty cache takes 2 minutes. The second time the same query is run it takes 20mins!

If this is replaced by a remove and a put then the second time takes 8mins. So if you can get away with it don’t bother updating, just remove and re-add. Note that without the remove the query’s performance is nearly as bad as in the first example.

The real answer is to use something else as a out of jvm store that is faster than the DB like Infinispan or Terracotta. What has worked for you?

In summary:

  • Use custom serialisation to minimise cached object sises
  • Use many smaller caches so the whole cache can be emptied and recreated
  • Use a remove before a put rather than a get and then a put
  • Flush regularly to minimise the disk spool memory usage and ensure the cache is written to file

Update: see here for the an issue on slow caches when the disk store is bigger than the memory store. Login and vote if this issue affects you!