Hardwhere?
Posted by Brad Lewis on February 27th 2008

It’s a fact — all software ends up relying on a piece of hardware at some point. And hardware can fail. But the secret is to create redundancy to minimize the impact if hardware does fail.
RAIDS, load balancers, redundant power supplies, cloud computing - the list goes on. And we support them all. Many of these options are not mandatory, but I wish they were! That’s where the customer comes in – it is critical to understand the value of the application and data sitting on the hardware and set a redundancy and recovery plan that fits.
Keep your DATA safe:

  • RAID - For starters *everyone* should have a RAID 1, 5, or 10. This keeps your server online in the event of a drive failure.

The best approach – RAID 10 all the way. You get the benefits of a RAID 0 (striping across 2 drives so you get the data almost twice as fast) and the security of RAID 1 (mirroring data on 2 separate drives) all rolled into one. I think every server should have this as a default.

  • Separate Backups – EVault Backup, ISCSI Storage, FTP/NAS Storage, your own NAS server or just a different server. Lose data just once (or have the ability to recover it painlessly) and these will pay for themselves. Remember, hardware is not the only way in which you can lose data -– hackers, software failures, and human error will always be a risk.

StorageLayer. Use it or lose it.

Going further:

  • Redundant servers in different locations – spread your servers out across different datacenters and use a load balancer. Nothing is safer than a duplicate server 1000’s of miles away. That’s why we have invested in a second data center – to keep your data and business safe.

Check ‘em out in our Services > Network Services section.

The future:

  • Solid state drives – aww yeah baby. They are coming.

Solid state drives are just that – a drive with no moving parts. No more platters or read/write heads. I mean come on, hard drives are essentially using the same basics that old record players use. CD’s use this technology too. And you see where those went (can you say iPod? I prefer my iPod touch. I have never had an iPod until now so I skipped right to the new fancy pants model. Can you tell I just got it?).
Check out these comparison tests of solid state drives vs. conventional ones:

  • Faster, faster, faster! –- Processors, memory, drives, network — everything is getting much faster. And in part by redundancy (dual and quad core processors, dual and quad processor motherboards). See? Redundancy is the way of the future!

We have 4 Intel Xeon Quadcore Tigertown processors on one motherboard. That’s 16 processors on one server! Shazam!

  • Robot DC patrol sharks – yep. Got the plans on my desk right now. But I can’t take all the credit, Josh R. suggested this one, I just make things happen.

I work to keep all of our hardware running in tip top condition. But I look at the bigger picture when it comes to hardware – how to completely eliminate the impact of any hardware issue. That’s why I suggest all the redundancies listed above. While I can reduce the probability of hardware issues with testing, monitoring of firmware updates, proper handling procedures, choosing quality components, etc., redundancy is the ultimate solution to invisible hardware.
Hardwhere?, if you will.

 
Spares at the ready
Posted by Sam Fleitman on February 11th 2008

In Steve’s last post he talked about the logic of outsourcing. The rationale included the cost of redundant internet connections, the cost of the server, UPS, small AC, etc. He covers a lot of good reasons to get the server out of the broom closet and into a real datacenter. However, I would like to add one more often over looked component to that argument: the Spares Kit.

Let’s say that you do purchase your own server and you set it up in the broom closet (or a real datacenter for that matter) and you get the necessary power, cooling and internet connectivity for it. What about spare parts?

If you lose a hard drive on that server, do you have a spare one available for replacement? Maybe so - that’s a common part with mechanical features that is liable to fail - so you might have that covered. Not only do you have a spare drive, the server is configured with some level of RAID so you’re probably well covered there.

What if that RAID card fails? It happens - and it happens with all different brands of cards.

What about RAM? Do you keep a spare RAM DIMM handy or if you see failures on one stick, do you just plan to remove it and run with less RAM until you can get more on site? The application might run slower because it’s memory starved or because now your memory is not interleaved - but that might be a risk you are willing to take.

How about a power supply? Do you keep an extra one of those handy? Maybe you keep a spare. Or, you have dual power supplies. Are those power supplies plugged into separate power strips on separate circuits backed up by separate UPSs?

What if the NIC on the motherboard gets flaky or goes out completely? Do you keep a spare motherboard handy?

If you rely on out of band management of your server via an IPMI, Lights Out or DRAC card - what happens if that card goes bad while you’re on vacation?

Even if you have all necessary spare parts for your server or you have multiple servers in a load balanced configuration inside the broom closet; what happens if you lose your switch or your load balancer or your router or your… What happens if that little AC you purchased shuts down on Friday night and the broom closet heats up all weekend until the server overheats? Do you have temperature sensors in the closet that are configured to send you an alert - so that now you have to drive back to the office to empty the water pail of the spot cooler?

You might think that some of these scenarios are a bit far fetched but I can certainly assure you that they’re not. At SoftLayer, we have spares of everything. We maintain hundreds of servers in inventory at all times, we maintain a completely stocked inventory room full of critical components, and we staff it all 24/7 and back it all up with a 4 hour SLA.

Some people do have all of their bases covered. Some people are willing to take a chance, and even if you convince your employer that it’s ok to take those chances, how do you think the boss will respond when something actually happens and critical services are offline?

 










 
 
Copyright © SoftLayer Technologies, All Rights Reserved.
Close
E-mail It
Socialized through Gregarious 42