Caching is a key component to any system design. Caching allows programs to be lazy, by referring to data that’s already been access. Looking up data takes a lot of work. Think of it this way: Someone asks you want the Capital of Indonesia is, and you don’t have Google handy. You have to figure out the best reference to look in, probably an encyclopedia, find the proper page, search on that page for the data that corresponds to “Capital”, in this case, Jakarta, and relay that information back to the person that asked you. However, five minutes later someone else asks you “what is the capital of Indonesia” and you simply say “Jakarta.” You’ve cached that data, and are now returning it.

Database look-ups are very similar. Each time a program needs a piece of data, it’ll form a query, log into a database, execute the query, wait for data to be returned, and return it to the requester. If you don’t have some sort of database cache, it’ll have to do this each time the data is requested. In essence, your program doesn’t have a short-term memory.

Enter memcached.

Memcached is a very generic cache that, per the website, is used to store “small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.” It is known as a key-value store, very simply, every piece of data that’s the result of an operation (like a database query) is stored as an arbitrary key. The same operation would result in the same key-value pair being generated. So, like our previous example the question “What is the Capital of Indonesia?” would result in the key “Capital_of_Indonesia” a program can then ask memcached first, and if it has a result available, it’ll get a faster response than going to the database. It it’s not there, it’ll go to the database, and then check in the key-value result into memcached for the next operation to ask that question.

Memcached exists in RAM, and will automatically purge data over time. It doesn’t really need a lot of memory dedicated to it, especially if the variety of queries is pretty low.

There are two major caveats to using memcached. The first is that the application has to be aware of it to use it. Many applications are able to natively, or can be extended to do so with modules or plugins (e.g. WordPress, Drupal, etc)

The second major warning is security. Memcached accepts all key-value operations from anyone without any sort of authentication. So if it’s available to the internet at large, anyone can go in via say, telnet and compromise your data. You’d either want to run it on a private network, local only, or on a VPN.

I use memcached in my web cluster. I have nodes available via a private network between my frontends and my backend server. All PHP applications that I have setup to use memcached are set to use it locally. That’s because, with persistent load balancing, 75%+ of repeat requests come from the same source browsing around a website, and that source is routed to the same front end. I have PHP session IDs cached on the backend server, and I have memcached there setup for memcached cluster aware applications.

I’m experimenting with having memcached on my mail cluster. Between running Dovecot with MySQL and my anti-spam setup, there are some very slow queries that come up. Postfix 2.9.x supports memcached as a source (with a database backup, of course) So hopefully I can ease some of the load on my mail backend soon.

So, cache away!