May 22, 2011
Posted by on
I recently updated ehcache from 2.3.0 to 2.4.2. Everything seemed to work fine, but one cache was throwing the error “Could not create cache directory” and the location was in the bin directory of my app server. Suspiciously, this is the default path when an empty path is given as the disk store location. This cache is a little unusual in it’s configuration as it loads an xml config file for the cache manager settings and then programmatically adds a cache to this cache manager. The xml file has the disk store path and the cache constructor sets the disk store parameter to “”.
In 2.3.0 this would allow the cache to inherit the disk sore in the xml cache manager config, but in 2.4.2 you must set the disk store parameter in null to have the same effect. I couldn’t find this noted anywhere so I though I’d post it here in case you hit the same pot hole on the road to cache happiness.
This upgrade was part of heading towards using Terracotta to cluster application (via the caches) and while everything is hanging together so far, the cluster’s performance needs some tuning.
September 28, 2010
Posted by on
Recently we needed to a quick fix (with a better solution to come) to speed up our application queries – which involve reading and assembling copious amounts of data from the database.
Large parts of this data changes infrequently (once a day) and is shared by multiple queries. So as retrieving from the database is too slow and there is too much data to store in the same jvm as the app we needed an alternative.
The idea was to pull data out of the database via scheduled jobs (say once a day) and put the data is something that was faster that the DB queries.
Ehcache was already used for another purpose, so we tried creating a disk only cache with the following settings:
diskcache.cacheEnabled = false
diskcache.multiDateRange.cacheEnabled = FALSE
diskcache.maxElementsInMemory = 1
diskcache.overflowToDisk.enabled = TRUE
We discovered Ehcache’s disk store has a few caveats:
- It’s fragile – if it gets corrupted then the cache file and its index are deleted. This is by design as Ehcache prefers safety over persistence
- It’s not persisted until there is a flush or the cache manager is shut down (you need to register the shut down hook)
- Raw object serialisation is used, so unthoughtful serialisation policies cause the cache file size to balloon
- The disk spooling can run faster than the garbage collector, so you can get OutOfMemory errors unless you flush periodically
- It’s not very fast and performance degrades if you are updating entries that already exist
That last point on performance is probably the biggest death knell for using Ehcache it this way. You can work around the fragility with lots of flushing, but the performance degradation over time is difficult to avoid.
For example, if the update cache code does a get and then a put, then the first query I run on an empty cache takes 2 minutes. The second time the same query is run it takes 20mins!
If this is replaced by a remove and a put then the second time takes 8mins. So if you can get away with it don’t bother updating, just remove and re-add. Note that without the remove the query’s performance is nearly as bad as in the first example.
The real answer is to use something else as a out of jvm store that is faster than the DB like Infinispan or Terracotta. What has worked for you?
- Use custom serialisation to minimise cached object sises
- Use many smaller caches so the whole cache can be emptied and recreated
- Use a remove before a put rather than a get and then a put
- Flush regularly to minimise the disk spool memory usage and ensure the cache is written to file
Update: see here for the an issue on slow caches when the disk store is bigger than the memory store. Login and vote if this issue affects you!