A rant about proper memcache usageMarch 13, 2009
I have been noticing an interesting pattern for a while now, and figured it was about time to help set the story straight on memcache. If you don’t know what memcache is, I recommend you reading up on it first.
Memcache is a dead simple cache and it works very well when you remember that it is a simple secondary level to your data. Developers tend to forget this, which leads to some very interesting consequences.
What memcache is.
I can not stress enough that memcache is a very simple cache and works best when you treat it just like that and nothing more. Its ultra fast because it does that 1 simple thing very well. Adding even minor things like delete in x number of seconds adds overhead that defeats the purpose of it.
Don’t get me wrong, without some more complex features memcache can be pretty difficult at times. Things like pagination and key organization are some of the biggest problems.
What memcache is NOT!
- Its NOT a database – Don’t treat it as your primary database. MySQL is designed to protect and organize your data.
- Its NOT a queue system – Think about what you are trying to do. You are putting important flags into a system that is not guaranteed to have your data. Can you deal with your queue missing performing work a few times ?
- Its NOT intended for session handling – Similar to the reason above would you want users to randomly get booted out ? Don’t get me wrong, you want something fast .. I understand. But, there are alternatives such as MCache. Keep in mind, I have not actually used it. It has an ultra annoying build system. Before you think of using memcache for sessions realize what the impact on your users is. Wouldn’t you be annoyed if you were checking your E-Mail on Gmail and got booted randomly ?
- Its NOT a proxy – If you are serving files or cached dynamic HTML you should strongly consider using varnish or mod_proxy. If you are experiencing issues with io load on your server and you want to serve files very fast you could also consider using TempFS. Remember to populate your data on boot time
- If you need replication you might be using memcache wrong. If you are looking for replication for memcache step back and think why do I really want this ? There is a pretty good chance you don’t have a fall back plan. The same holds true for wanting to create a backup of your memcahe to restore in the event of a node failure or reboot. There is a never a good reason to do this.
- Its NOT a locking daemon – Don’t get me wrong, there aren’t many alternatives but this is just not an option. Never forget, memcache is never guaranteed. But with that in mind, you could use it as a locking daemon but only if you can deal with duplicate runs.
- Use connection pools, don’t get crafty with your connections unless you FULLY understand what you are doing. Running multiple instances and separate connection handlers is not the proper way to handle things. Don’t use 127.0.0.1 when you are running multiple instances. Use the external IP and specify your entire pool on EACH webserver.
$memcache = new memcache;
- Always have a fall back. If the data isn’t found in memcache, go to your datastore. That doesn’t mean that you need to exit execution of your site if the datastore isn’t reachable. One of my sites can run with 85% of the features when MySQL is unreachable. Basically, anything that requires a write will not function and is disabled. I use warm up scripts that pre-populate memcache during each build. Its a simple script that uses my existing code to populate, severally lowering the maintenance requirements. I set some randomness to the TTLs so they don’t all expire at the same time. This would cause sudden spikes to your database.
- Use low cache times until stable. Until you are properly cache busting in your application keep your cache times very low. This will allow you to pinpoint problems, get a little bit of performance and not piss off your visitors too much.
- Create a wrapper or extend your memcache library, especially in PHP. This will allow you to create stastics during development and troubleshoot key management. Not to mention, create an “internal cache” that prevents duplicate over the wire requests.
- Cache negative search results, lets say you have user profile pages and you ban a user. There might be a high amount of traffic looking for that record that doesn’t exist in your database (well, if you actually delete the data ). Prevent those look ups from hitting your database and you will be happier for it.
- Don’t md5() your keys! This really drives me nuts. Why on earth would you want to do this?!? There are rare times when your key name could contain things that violate the memcache protocol. However, to my knowledge pretty much every library will strip out violation characters. If they don’t strip these out, it could lead to command injection. Doing things like this make memcached verbose (-vv) pretty worthless. Not to mention, a lot of calls to md5() just chew up CPU cycles. Sure, with modern processors its very quick but this is a huge problem these days. No one wants to create tight and solid code, they just want it done yesterday.
- Go easy on calling memcache for stats. If you are running Cacti on your memcache farm go easy on how much you poll. Don’t hit it every minute as after you have your memcache setup working well you won’t watch those graphs all the time. Your graphs will reach a point where they don’t really move too much… Unless for some reason you are doing frequent memcached restarts.
- Use increment / decrement for things like number of views on a page. After you run an update query, why bother running another select query right after that? Depending on your storage engine, this could cause lock contention. Instead, run the update and use memcache::increment(). This will help performance greatly since you are just publishing your data, not trying to process it.
- Don’t use a shotgun to take down a spider. Lets say you have a box on your home page that shows the last 10 posts. Don’t set the cache key for 10 minutes, set it for 2+ hours and create a cron job that performs the select and updates memcache. This will also reduce lock contention, you don’t want all those vistors fighting to update the same key. Remember that during development, you will want to keep that expiration low and increase it to hours when you consider it stable.
- Use memcache during development. This is another one that baffles me. It drives me crazy when developers won’t develop with memcache enabled. You need to really test this. Test your CRUD (Create Read Update Delete) and few times over and for multiple users. If you are not using your production strategies during development, you are essentially writing code that is untested (and in some ways could even be considered a fork of your app for lack of functionality)
- Running memcache verbose (-vv) during development helps a lot. You can quickly see when variables aren’t being set properly. Imagine seeing a key like user_ scroll by, you can spot problems quick.
I have a few follow up articles that I have planned that will touch on more advanced usage of memcache. One of these updates will include the memcache wrapper that I am using for my projects.