The Value and Cost of Persistent Data
I’ve been cleaning out my house recently. There’s a lot of crud that’s just been lying around, collected through years. My wife describes me as a level 2 hoarder; she says that I would be a shoe-in for that A&E show. Going through many, many boxes that I’ve collected in the basement, I pick through each cord and think “I might need that.” I won’t need it though, so with a small mental push, I put it in the trash bag. Persistent data is a lot like that. A lot of companies have, either through policy or inertia, tons of useless information sitting on disks, or tapes, or CDs, that may be useful one day, but probably will not ever be.
I look at many cloud providers and I see the opposite. Their services were designed for expedience instead of permanence. They make it hard and, at times, very expensive to actually keep data around. Usually you have to attach a “disk” (or “volume”) to any machine that has data you want to keep and you have to pay for that privilege. You also better have backups because you have no idea about the underlying storage or data retention policies.
Any data that you absolutely need could mean you’re paying two or three times what you’d expect in order to keep it.
To my hoarder eyes the cloud is one big data furnace. It’s a dangerous place for your information to stay.
Enterprise data storage is expensive. I’ve often joked that virtualization is a scheme to sell storage arrays. It’s a tricky game of performance, space, and redundancy. Disks fail, flash is expensive, you never have enough RAM or CPU. There are dozens of types of arrays for hundreds of applications, retention policies, regulations; it’s a mess! When you have a service that has hundreds of thousands of customers then it may make sense that you discourage persistent data. You want people to consume your resources, pay their bill, and move on. Expedience instead of permanence. I’ve often been asked: Why online storage is so expensive when hard drives are so cheap? Well, this is why.
We built the ipHouse vmForge product with the idea that a virtual data center (VDC) replaces co-located infrastructure. The storage is persistent from the get-go. Is it any wonder that Mike has been loath to call it a ‘cloud service’?
This means that there are severe implications for any storage array that we put in place. We have to make sure that anything we put in place not only performs well but also goes the distance. It’s still a very good idea to do backups, though they probably will not be nearly as large, as most customers just need to back up a few key files or the database dumps that happen regularly. (you are backing up your database, right?)
Well, that’s my opinion anyways. Now I’m going to go back home and work on my basement.