Its all about the IOPS

I’d never heard of disk IOPS (Input/Output Operations Per Second) until I started researching what kind of SAN we need to by for our virtualisation project.

But I went to a seminar by an HP storage expert who pointed out that since the days when I started working in IT (late eighties), disk capacity has increased by an eye-watering 10,000 times but disk performance has only increased a measly fifty times. So squeezing out performance from your disks is A. Big. Deal. So it becomes necessary for an IT Manager like me to roll up his sleeves and learn about IOPS.

The HP P2000 SAN that I’m looking at comes in two versions, a 12 bay model supporting LFF disks (3.5 inch disks) and a 24 bay model supporting SFF disks (2.5 inch disks). They’re both about the same price. So the 24 bay looks more attractive, right? The problem is SFF disks only spin at 10k RPM whereas the LFF disks spin at a much faster 15k RPM.

My reseller has recommended against buying SFF disks because he says the performance won’t be adequate for our needs. He’s recommended twelve 300GB LFF disks in a  RAID 6 array. This gives 3TB of usable data. Not bad, but it fills all the bays so there is no room for expansion if we ever need it (well, there is, but it requires purchasing another drive shelf which isn’t cheap).

But I suspect that talk about spin speeds is over-simplifying the problem somewhat. The number of disks you have, and the RAID level you use, will all effect performance. Where performance is counted by IOPS. Its average IOPS that define performance, not RPM.

With the 24 bay P2000, twenty 300GB SFF disks in a RAID 10 array will give the same amount of usable data. It also leaves four bays free, so we can expand our storage later on by upto 20% – a nice buffer.

Which is better? It all depends on the IOPS. I have no idea. But given that RAID10 offers much better performance than RAID6, it wouldn’t surprise me if those “slow” SFF disks outperform the “fast” LFF disks. I could work it out with a fancy Excel spreadsheet, but my understanding of the inputs is very limited. I’m hoping someone will do it for me.

Obviously having to buy twenty disks instead of twelve will up the budget somewhat (by just under two grand). But then the argument becomes “choose LFF disks because its cheaper” and not “choose LFF disks because of the performance”. When a disk fails and I’m waiting for an array to rebuild then I’m stressed out (its the pessimist in me). RAID6 is better than RAID5 in that it can survive two disk failures, but I’d still rather pay an extra couple of grand to get the resilience of RAID10 (its possible we could have ten disks fail and still not lose our data). Because RAID6 only survives two disk failures, it doesn’t obey the law that states “bad news comes in threes”.

The other reason for choosing SFF disks over LFF is that they’re just so much sexier. I know size isn’t supposed to be important, but it is. 24 little disks in a tiny 2U bay, who couldn’t get turned on by that?

3 Hosts, 2 CPUs

I really want to re-use our existing servers when we virtualise our server environment. We have three HP Proliants that are all under two years old, so barely worn in. The issue is that two of the servers have dual Xeon E5506 processors and the other has a faster dual Xeon X5560. My reseller tells me that if you VMotion a virtual machine from a host with one processor to another host with a different processor, it might fail. The solution is to use EVC (Enhanced VMotion Compatibility). His concern, as I understand it, is that we’d effectively downgrade the X5560 to a slower, inferior, CPU. However, reading the VMWare KB, I’m not convinced that we’d notice any difference:

From the Knowledge Base (I’ve put the key sentence in bold):

“If I add newer hardware into an EVC-enabled cluster with a lower EVC mode, do I lose performance?
All CPU features provided by your host hardware are available to the hypervisor. Optimizations for CPU virtualization such as AMD-V and Intel VT-x or facilities for MMU virtualization such as AMD RVI or Intel EPT support are still used by the hypervisor. Only those CPU instructions that are unique to the new CPU are hidden from virtual machines when the host joins the EVC-enabled cluster. Typically this includes new SIMD instructions, such as the latest SSE additions. It is possible, but unlikely, that an application running in a virtual machine would benefit from these features, and that the application performance would be lower as the result of using an EVC mode that does not include the features. Check with the application vendor to determine which CPU features are used by the application.”

HP P2000 SAN, more thoughts

The comments in this post are interesting.

“The only part of the P2000 that is not redundant is the backplane. But it is a fairly simple device and after asking around I haven’t heard of very many backplane failures.”

This goes back to my earlier post, of having redundancy by having two phyiscal hosts, but no redundancy with one single SAN. The SAN is a single point of failure. It requires me to believe that the backplane won’t fail. Or spend a lot more money on a P4000 which has complete redundancy. However, I need to consider what would happen if the SAN did die, and the following comments address that:

If you make Veeam a physical machine then if the SAN dies you can leverage the new vPower features and run your VM’s directly from the Backup server until the SAN is back online and then storage vmotion them back to the SAN.

Unfortunately, I don’t think Essentials+ includes storage vMotion. The answer seems to be:

“If you do not have storage vMotion you can just power off the VM then migrate the storage. …. so you would be better of to just power off the Vm after everyone goes home. and then migrating it to the production datastore…. this will also copy any changes that happen from when you do the instant recovery until you shut it down.”

It all sounds very simple and neat.

I have a Proliant DL380G5 that I was considering using a direct attached backup server. However, it doesn’t sound like this is supported, even though it will probably work:

Did you have any issue using the SC08e HBA on a G5 server? From the QuickSpec of the product it seems that only DL generation G6 and G7 are supported from HP.
It seems due to the fact that the SC08e is PCI-Express 2.0, while the SC08Ge is PCI-Express 1.1, but I suppose that the SC08e should be backward compatible.
I’m going to install it in a DL385 G5!

It might be easier to buy a NAS anyway, £1,900 will buy me a Iomega StorCenter px4-300r with 4 x 3TB drives. As John writes “Veeam Deduplicates and Compresses inline… so as it pulls data off the SAN or the local datastore it compresses and deduplicates and then writes to the NAS.“, so I’m guessing network bandwidth may not be an issue. It would then be easier to move that NAS to the cloud when a decent internet connection becomes cheap.

It really is an excellent site.

 

SAN versus Local Storage and High Availability

I have two problems with buying a SAN. An entry-level SAN has a single point of failure – even if you have redundant power and dual controllers, the backplane can fail. If all your applications are running off that SAN, you’re screwed. But having two SANs for full redundancy is twice the price. I’m looking at £20k for an P4300 versus £10k for the entry level HP SAN (P2000). That’s a lot for a small business like ours. My supplier tells me I’d be crazy to go for a single SAN.

But are there any benefits of virtualisation without shared storage? Well, a lot of people seem to go for two physical hosts with local storage only and replicate them using Veeam replication. This does provide automatic failover, but seems a lot better than waiting for a dead server to be repaired and then possibly having to rebuild it from scratch which could take a couple of days. If one host dies, you spend a few minutes getting everything back and running on the other host. And its cheap. So I’m attracted to this. But my supplier, once again, tells me I’d be crazy.

I followed a debate on a forum that went:

“Yes, shared storage is good, but going with cheap shared storage is risky. You are putting all eggs in one basket, and if such storage dies, HA will not work. In fact, nothing will work. So unless you can afford good fault-tolerant shared storage (read “expensive”), 2 ESX hosts with local storage and replication between them is significantly more fault tolerant solution. Secondly, HA is much worse than replication, because replication provides transactionally-consistent images, while HA does cold restart of crash-consistent image, so such recovery rarely does any good for transactional applications and databases. With Microsoft Exchange for example, out of 2 times I had this situation in production, MDB got corrupted in both cases, so I still had to manually restore from earlier backup.”

“Veeam Backup & Replication product provides functionality of replicating virtual machines between ESX hosts with local storages.”

“HA only works with shared storage and there is no replication involved. HA wil power up guests on the second host when the first host dies. It is the equivalent of pulling power to a server and restarting it. Any decent SAN will have dual controllers and dual PS. So the only single point of failure is a complete failure of the backplane. That’s a risk that most will take. Let’s face it, if that small risk is too great, you are not currently investing enough in DR.”

“HA is pretty simplistic one, it requires shared storage, and does not guarantee successful recovery due to performing simple crash-consistent restart. Replication is much more advanced than that, does not require shared storage, and provides guaranteed application recovery. As for automatic failover with replication, actually there is such capability. With Veeam Essential suite, you also get Veeam Monitor product, that has built-in alerts for VM heartbit, and ability to automatically trigger response action based on alert. This response action can be simple script that automatically starts up the corresponding replica VM on the standby host.”

“Scenario 1. Production ESX host fails, HA does its job restarting VM with Exchange server on another ESX host in the cluster. VM restarts fine, but Exchange is failing due to MDB corruption caused by improper shutdown. I had personally suffered twice from exact same situation. No fun.

Scenario 2. Shared storage goes down. You have to perform full VM restore to local ESX storage now.

Both of these scenarios will results in:

1. A few hours of down time while you are restoring Exchange VM from backup.

2. Up to 24 hours data loss because you have to roll back to your nightly Exchange backup.

Now, compare this with replication between local storage (very popular scenario with our customers). Whether your production ESX host or shared storage fail,

1. Down-time will be less than a few minutes (time it takes for replica VM to start up on standby ESX host).

2. Maximum loss of data will be less than your chosen replication period.

I am not saying HA does not have its uses, most applications will be fine after crash-consistent image restart. However, for some applications this often causes data corruption. On the other hand, applications like Exchange or databases are also most oftenly used apps (and always mission-critical too).”

The guy above pushing replication over high availability is a Director at Veeam, so not exactly unbiased. Then again, in the sales meetings I’ve had, no-one has mentioned that a crash is likely to cause database corruption of your Exchange database (and presumably our ERP SQL-Server database). It may not be likely, but I’d be really annoyed if I spent over fifty grand on a High Availability solution only for it to fail because of database corruption. And replication is probably cheaper.

As usual, it comes down to who to believe. Everyone is selling a product to me.

Steve Albini on cookery show hosts

Frathouse cocksuckers with gimmick hairdos and catch phrases, hooting and hi-fiving, “bringing it,” celebrating gluttonous sports bar chow. Dipshits abbreviating their ingredients and making childish, cutesy-poo “comfort food” full of “yummy veggies,” shit like that. Detestable. You can spot the people who have their shit together because they don’t have to tell you how delicious their food is.

When Rebooting is not enough

Had a Windows 7 laptop today where the USB mouse was giving a USB malfunction error (device not recognized etc etc). Tried all the usual tactics, rebooting, logging in as Administrator, deleting the USB devices in Device Manager and then re-installing them, replacing the mouse…. Everything failed.

After a tip from Google, I powered down the laptop, removed the battery, waited a couple of minutes, then booted up. All worked perfectly.

Apparently, the USB controller on the motherboard can get corrupt and this isn’t fixed by rebooting, the motherboard needs to completely power down in order to fix itself. Amazing.

Adventures of an IT Manager / Dynamics NAV enthusiast