It has been 17 months since I deployed the Nexenta cluster for our VMware hosting platform at ipHouse.

Unfortunately this post will not be positive.

Storage system related problems on our Nexenta HA cluster:

  • Thanksgiving weekend, 2010
  • February into March, 2011
  • October 22nd, 2011

Thanksgiving weekend of 2010 was not a good weekend.  I later found that it was a customer virtual server that was swapping inside the VM itself in a way that was thoroughly crushing the backend storage system. The problem was seen by only a few of the VMs, though and took me 2 days to find the badly behaving virtual server. In the end this was not a Nexenta problem.

And now we move into the issues I relate back to Nexenta and the failures of Nexenta to notify me of problems.

First, my screw-ups…

February/March, 2011 was not a fun 30 days of problems.

A volume went offline and I did not know why. Bringing it back online was taking forever and this affected ~50% of our customers on the VMware cluster.

Finally, when the volume was mounted, I started to figure out what was going on.

The problem was simple: Deduplication. The solutions as simple as well but the execution was very time-consuming.

It starts with the issue that for every 1 TiB of storage space you need ~8 GiB of RAM to hold the dedupe tables. My head units have 24 GiB of RAM in them and have more than 10 TiB of usable storage each. I should have known this but I did not and googling around shows that this is not a very well-known piece of information at the time. (it is now, of course)

The solution was to turn off deduplication on the other head (and volume) and Storage vMotion all customer virtual machines onto the other storage system then turn off deduplication on the source volume and reverse the process. This takes a very long time.

Now for the Nexenta items…

The problem with this is that there was a secondary issue that wasn’t being reported; a failure of a ZIL device (mirrored). Removal of the ZIL took care of many performance problems and I ended up removing the ZIL from both head units and their volumes. They were never added back in.

My issue with Nexenta in dealing with these problems? Nothing was reported in the GUI or the command line system about drive failure(s).

I was convinced that the problem was our SAS HBAs and so I replaced them with a  different model (both HBAs are on the Solaris HCL) and rebuilt each head unit (again going through the long process of Storage vMotion) and the volumes. One of the things I like about the newer edition HBA is that the devices are now using the WWN instead of just cxtxdx or sd nomenclature though now I have to maintain a spreadsheet of which drive is in which slot.

Everything had run fine until October 22nd, 2011 when I was alerted about high I/O latency to one of the volumes on our Nexenta storage cluster (yep, full HA bought and paid for).

So, taking everything I learned (and using lsiutil to look at things) I find, again, drives failing. Two of them this time, each in different RAID groups. As soon as I issued the commands on the command line to offline the devices everything returned to normal in terms of I/O latency but now I am missing drives from 2 different RAID groups and would need to replace them.

My beef with Nexenta is that I should not have to manually go and look for drives failing out.

Nexenta should be reporting this to me.

Nexenta should be able to do this same work automatically without human interaction and then alert if failures are imminent or happening.

I bought Nexenta not because I don’t know how to support my own systems but because it offered a simple way for my employees to help out in the storage realm (they aren’t necessarily as hip to Solaris as I am) and to do high-availability. My NetApp systems are more than capable of telling me when a drive is failing. Old systems using 3WARE controllers can tell me when drives are failing. Old systems using Areca controllers – the same!

Why can’t Nexenta? I spent 5 figures on a storage management software suite that really is only doing HA for me. Issues with the underlying volumes and devices is now on my head to manage, monitor, and maintain. That’s an expensive HA system in my opinion.

Here is an example of what I did (not much of an example as it is exactly what I did but you can use this to do your own remediation):

(remove failure)
# zpool detach volume01 c4t5000C500104EEE9Fd0
(remove spare)
# zpool remove volume01 c4t5000C500104F191Bd0
(attach spare to mirror)
# zpool attach volume01 c4t5000C500104F6313d0 c4t5000C500104F191Bd0

# zpool detach volume01 c4t5000C50020CB0F97d0
# zpool remove volume01 c4t5000C500104F99C3d0
# zpool attach volume01 c4t5000C500104F67D3d0 c4t5000C500104F99C3d0

and in under 5 hours I was fully redundant again.

Want to do your own checking?

% iostat -en `zpool status volume01 | grep c4t | awk '{print $1}'`
  ---- errors ---
  s/w h/w trn tot device
    0   0   0   0 c4t5000C500104EF0D3d0
    0   0   0   0 c4t5000C500104F6DD3d0
    0   0   0   0 c4t5000C50010330173d0
    0   0   0   0 c4t5000C500104EF433d0
    0   0   0   0 c4t5000C500104F99C3d0
    0   0   0   0 c4t5000C500104F6533d0
    0   0   0   0 c4t5000C500104F67D3d0
    0   0   0   0 c4t5000C500104F6313d0
    0   0   0   0 c4t5000C50020CAF9E3d0
    0   0   0   0 c4t5000C500104EF1D3d0
    0   0   0   0 c4t5000C50020CB1557d0
    0   0   0   0 c4t5000C50020CAFC57d0
    0   0   0   0 c4t5000C500104F6607d0
    0   0   0   0 c4t5000C500104F191Bd0
    0   0   0   0 c4t5000C500104F6C9Bd0
    0   0   0   0 c4t5000C500104F6DABd0
    0   0   0   0 c4t5000C50020C6102Bd0
    0   0   0   0 c4t5000C500104F190Bd0
    0   0   0   0 c4t5000C5000439FB2Bd0
    0   0   0   0 c4t5000C500104EE6EFd0
    0   0   0   0 c4t5000C50020CAFAFFd0
    0   0   0   0 c4t5000C500104F6D1Fd0

Replace ‘volume01′ with your volume name and ‘c46′ with the starting characters of the devices in your volume.

You should see zeros in every column, if you are not then you should look further for hardware problems.

Good luck.

Sad to say but ipHouse won’t be purchasing further Nexenta licensing for our production network.