Author Archive

The Greening of IT: Beyond the Easy
Posted by Sam Fleitman on May 23rd 2008

The growth in energy demanded by, and used in, IT environments is a well documented phenomenon. Datacenters are using more energy as CPUs get faster, hard drives become larger, and end user demand for access to data and applications continues to increase. Prices for the underlying hardware and services continue to fall, which just fuels more demand.

Datacenter operators have done their best to maximize the use of every available asset within a facility in order to operate highly efficient environments. Much of the emphasis to date has been on proper datacenter alignment: hot-aisle/cold-aisle configurations, blanking panels to cover gaps in server racks, and sealing holes under raised floors to better contain cold air have become common place in the data center.

However, in most large organizations, there many areas that needs more attention. Departments within a large company often have competing goals that negate green IT efforts. One example of this would be –

  • The system administrators and developers want the biggest, fastest machines they can get with the most expandability. This enables them to add memory or hard drives as utilization increases – which makes their jobs much easier to perform and helps them better maintain customer SLAs.
  • purchasing (and finance) department’s primary goal is to save money. The focus is to work with the vendors to reduce the overall hardware cost.

The disconnect between those two departments will often leave the datacenter manager out in the heat (definitely not “out in the cold”). That person’s job essentially becomes “just find a place to put it” until the datacenter is full enough that the answer becomes “no more”. It then becomes a “fix it now” problem as the company struggles with plans to build more datacenter space. So, the problem is a short term view (meeting quarterly earnings results) versus long term direction (to achieve a sustainable and efficient operations environment that may have a short term cost implication).

What should happen is that the disparate groups need to work together throughout the entire planning process. The purchasing department, the system administrators, developers, and the datacenter managers should build a common plan and set realistic expectations in order to optimize the IT infrastructure required and to best meet business, operations, and efficiency objectives.

Let’s continue the example from above… if a server is ordered just because it’s more expandable (more expansion slots, RAM slots and hard drive bays), that means that the power supply has to be bigger to support the potential need of those future components. A server power supply is most efficient (wastes the least amount of power doing the conversion) when it is running at 80-90% load. If a power supply is over sized to support potential future needs, then it is operating at a much lower efficiency than it should – thus generating more heat, wasting more power and requiring more cooling, which in turn requires more power to run the AC’s.

That might seem like a small price to pay for expandability, but when that scenario is multiplied over an entire datacenter, the scope of the problem becomes very significant. This could lead to lost efficiency of well over 20% as a business plans and buys ahead of demand for the computing capacity it may need in the future.

So, what is the other option? Is purchasing right? Should IT simply buy a small server, at a lower total cost, and scale as the business scales? The problem with this is that it tends to lead to exponential growth in all aspects of IT – more racks to house smaller servers, additional disks, more space and power over time, increased obsolescence of components, and more lost efficiency.

The problem is considerably more complex than both options. The simple fact is that the “fixes” for IT go well beyond implementing a hot-aisle cold-aisle layout and sealing up holes under the raised floor of the datacenter. Now that those things have become “best practices,” it’s time to start highlighting all of the other things that can be done to help improve energy efficiency.

At SoftLayer, we promote an energy efficient focus across the entire company. Datacenter best practices are implemented in all of the datacenter facilities we occupy; we use hot-aisle cold-aisle configurations, we use blanking panels, we use 208v power to the server, we pay very close attention to energy efficient components such as power supplies, hard drives and of course CPUs, and we recycle whatever we can.

Most importantly, we deliver a highly flexible solution that allows customers to scale their businesses as they grow – they do not need to over buy or under buy, since we will simply “re-use” the server for the next customer that needs it. Individually, the energy savings from each of these might be fairly small. But, when multiplied across thousands and thousands of servers and multiple datacenters – these many small things become one large thing quickly – a huge savings in energy consumption over a traditional IT environment.

Ultimately, SoftLayer believes that we can never be satisfied with our efforts. As soon as one set of ideas becomes common place or best practices, we need to be looking for the next round of improvements. And bring those new ideas and practices forward so all can benefit.

 
Spares at the ready
Posted by Sam Fleitman on February 11th 2008

In Steve’s last post he talked about the logic of outsourcing. The rationale included the cost of redundant internet connections, the cost of the server, UPS, small AC, etc. He covers a lot of good reasons to get the server out of the broom closet and into a real datacenter. However, I would like to add one more often over looked component to that argument: the Spares Kit.

Let’s say that you do purchase your own server and you set it up in the broom closet (or a real datacenter for that matter) and you get the necessary power, cooling and internet connectivity for it. What about spare parts?

If you lose a hard drive on that server, do you have a spare one available for replacement? Maybe so - that’s a common part with mechanical features that is liable to fail - so you might have that covered. Not only do you have a spare drive, the server is configured with some level of RAID so you’re probably well covered there.

What if that RAID card fails? It happens - and it happens with all different brands of cards.

What about RAM? Do you keep a spare RAM DIMM handy or if you see failures on one stick, do you just plan to remove it and run with less RAM until you can get more on site? The application might run slower because it’s memory starved or because now your memory is not interleaved - but that might be a risk you are willing to take.

How about a power supply? Do you keep an extra one of those handy? Maybe you keep a spare. Or, you have dual power supplies. Are those power supplies plugged into separate power strips on separate circuits backed up by separate UPSs?

What if the NIC on the motherboard gets flaky or goes out completely? Do you keep a spare motherboard handy?

If you rely on out of band management of your server via an IPMI, Lights Out or DRAC card - what happens if that card goes bad while you’re on vacation?

Even if you have all necessary spare parts for your server or you have multiple servers in a load balanced configuration inside the broom closet; what happens if you lose your switch or your load balancer or your router or your… What happens if that little AC you purchased shuts down on Friday night and the broom closet heats up all weekend until the server overheats? Do you have temperature sensors in the closet that are configured to send you an alert - so that now you have to drive back to the office to empty the water pail of the spot cooler?

You might think that some of these scenarios are a bit far fetched but I can certainly assure you that they’re not. At SoftLayer, we have spares of everything. We maintain hundreds of servers in inventory at all times, we maintain a completely stocked inventory room full of critical components, and we staff it all 24/7 and back it all up with a 4 hour SLA.

Some people do have all of their bases covered. Some people are willing to take a chance, and even if you convince your employer that it’s ok to take those chances, how do you think the boss will respond when something actually happens and critical services are offline?

 
Backups
Posted by Sam Fleitman on October 31st 2007

“ah - I don’t need backups.”
“Too busy to do backups - I’ll get to that later.”
“Backups? It costs too much.”
“I don’t need backups - MTBF of a Raptor is 1.2 Million hours.”
“Oops - I forgot about doing backups.”

Backups are one of the most commonly forgotten tasks of a system administrator. In some cases, they are never implemented. In other cases, they are implemented but not maintained. In other cases, they are implemented with a great backup and recovery plan - but the system usage or requirements change and the backups are not altered to compensate.

A hard drive really is a fairly reliable piece of IT equipment. The WD 150GB Raptor has a rating of 1.2 Million hours MTBF. With that kind of mean time between failures, you would think that you would never have to worry about a hard drive failing. How willing are you to take that chance? What if you double your odds by setting up two drives in a RAID 1 configuration? Now can you afford to take that chance? How willing are you to gamble with your data?

What if one of your system administrators accidentally deletes the wrong file? Maybe it’s your apache config file. Maybe it’s a piece of code you have been working on all day. Or, maybe your server gets compromised and you now have unknown trojans and back doors on your server. Now what do you do?

Working in a datacenter with thousands of servers, there are thousands and thousands of hard drives. When you see that many hard drives in production, you are naturally going to see some of them fail. I have seen small drives fail, large drives fail, and I have even seen RAID 1 mirrors completely fail beyond recovery. Is it bad hardware? Nope. Is it Murphy’s Law? Nope. It’s the laws of physics. Moving parts create heat and friction. Heat and friction cause failures. No piece of IT equipment is immune to failure.

That 1.2 million hours MTBF looks pretty impressive. For a round number, let’s say there are 15,000 drives in the SL datacenter. 1,200,000 hours / 15,000 drives = 80 hours. That means that every 80 hours, one hard drive in the SL datacenter could potentially fail. Now how impressive is that number?

Ultimately, regardless of the levels of redundancy you implement, there is always a chance of a failure - hardware or human - that results in data loss. The question is - how important is that data to you? In the event of a catastrophic failure, are you willing to just perform an OS reload and start from scratch? Or, if a file is deleted and unrecoverable, are you willing to start over on your project? And lastly, how much downtime can you afford to endure?

Regardless of how much redundancy you can build into your infrastructure with the likes of load balancers, RAID arrays, active/passive servers, hot spares, etc, you should always have a good plan for doing backups as well as checking and maintaining those backups.

Have you checked your backups lately?

 
HostingCon 2007 / More Green
Posted by Sam Fleitman on August 6th 2007

The SoftLayer contingency recently returned from attending HostingCon 2007 in Chicago and I have to say, it was a great experience. We had a lot of opportunities to meet up with many of our customers, meet with a lot of vendors and potential vendors as well as visit with some of our competitors.

While there, I had the privilege of participating in a panel discussion on “Green Hosting: Hope or Hype“. Isabel Wang did a great job of moderating the discussion with Doug Johnson, Dallas Kashuba, and myself. The overall premise of the panel discussion was to talk about green initiatives, how they affect the hosting industry, what steps can hosting companies take and is it something we should be pursuing.

It was interesting to hear the different approaches that companies take to be green. Should companies focus their efforts on becoming carbon neutral by purchasing carbon credits such as DreamHost, by promising to plant a tree for each server purchased such as Dell, by working on virtualization strategies such as SWSoft or by working to eliminate the initial impact on the environment such as we have done at SoftLayer. You can probably tell from one of my previous blog posts where SoftLayer is focusing our efforts to help make a difference.

Besides the efforts of the individual companies on the panel, there were some good questions from the audience that helped spur the conversation. Does the hosting industry need its own organization for self regulation or are entities such as The Green Grid sufficient? Do any of the hosting industry customers really care if a company is “green”? Should a hosting company care if it’s “green”? And, what exactly does “being green” mean?

While there are differing opinions to all of those questions, there really isn’t a “wrong” answer. Ultimately all of the steps companies take - no matter how small - will help to some extent. And no matter what the motivation - whether a company is “being green” in an effort to gain publicity, to save money or to simply “make a difference” - it’s all worth it in the end.

 
Remote Access Success Story
Posted by Sam Fleitman on July 27th 2007

In previous posts, there have been mentions of the datacenter of the future, kvm over IP and a reference to an elevator. Then, just the other day, someone in the office pointed out this article: “How remote management saved me an emergency flight overseas

The article discusses the successful deployment of servers from a remote location. The author talks about being able to remotely configure and deploy some new servers from the confines of a ski lodge. Of course, they had to have someone at their offices to receive the server shipment, unbox the servers, rack them up, get them all cabled, make sure space, power and cooling would all be sufficient and then put in a CD. Things that weren’t mentioned probably included throwing away all of the packaging material, doing QA on the hardware to verify it was all correct and changing any BIOS settings.

Beyond all of that, there are many things that are just inherent to the process that they didn’t refer to, including having to find the right server vendor, negotiating pricing for the servers, making sure all of the pieces and parts were going to be shipped, tracking the shipment dates, contacting the vendor multiple times to try to find out why the shipment wasn’t going to be on time, having available datacenter space and infrastructure, putting those dang cage nuts in the server racks, having available switch ports, making sure the network was configured correctly, providing network security, making sure all of the software licenses were up to date, etc, etc, etc.

Or, as so many of you already know - they could have gotten their servers from a dedicated hosting provider such as SoftLayer (hint, hint) and had the servers purchased, configured, QA’d and online within just a couple of hours and with no more effort than just filling out a signup form. It’s hard to imagine there are still so many people out there doing things the hard way.

 
Truck Day Operations
Posted by Sam Fleitman on July 11th 2007

How do you unload 1,000 servers and have them ready to go live in a datacenter in five hours? With lots and lots of planning. Every month we take in a shipment of servers to accommodate the next 30 days of sales. Preparation for each delivery starts several months in advance with forecasting models. You have to look far enough ahead in your models to continually adjust forecasts for sales, facilities and available resources. Some vendors need more lead time than others so you have to constantly update your forecasts, all the way up to final order placement.

Also, you don’t just walk into a datacenter with a server and set it down. There’s a lot of work that goes into physical prep for the datacenter as well. You have to plan the datacenter layout, order and assemble racks, add rails, power strips, switches, power cord bundles, network cable bundles, etc. Every rack we deploy has almost 400 cage nuts and just under 200 cables in it. We don’t just string a bunch of cables up and call it a day. Every cable bundle is meticulously routed, combed and hung to make them look professional. With that much cabling, you have to make it right or you’ll never be able to work around it.

With one week to go before the trucks arrive, all of the datacenter prep starts wrapping up. And with just a few days left, we have our last manager meeting to review server placement, personnel, timing and other delivery details.

Next is Truck Day - this is when the fun begins.

On Truck Day, we leave plenty of people behind to handle sales, support and accounting, but everyone else is expected at the loading dock. After all the pallets are pulled off the truck and accounted for, the team gets busy un-boxing. As servers are unboxed, all of the spare parts in the boxes - spare screws, riser cards, SATA cables, and various other pieces - are sorted into bins on the dock. The servers themselves are then placed in custom transport carts and moved to the datacenter.

From there, the teams inside the datacenter sort the servers according to type and perform a strict QA process that includes verifying the hardware configurations and verifying that the components are all seated properly.

Once sorted, the servers get scanned into the system and racked up. As all of the cables are plugged in, another QA process is completed to verify that all of the ports are correct. At that point, it’s just a matter of turning each server on and watching them check in, get their bios flashed with the latest and greatest release and having the system update any component firmware that is needed. As the systems check themselves into inventory, they go through two more QA processes that include an inventory check and a burn-in process.

By the time the truck is empty, the last box is stashed and the final server is racked up, everyone is ready to get back to their day jobs. Months worth of planning - all wiped out in a matter of hours.

Mary is working on a great post about what Truck Day looks like from a Salesperson’s perspective. It explains why we have everyone get involved in the process.

 
Being Green
Posted by Sam Fleitman on June 12th 2007

For so many years growing up, I heard the “Sam I Am” / “Green Eggs and Ham” comments when being introduced to other kids. At this point, you would think I would hate the color green. On the contrary - being green is good.

One of the biggest costs in a datacenter is power, and if you’re involved in datacenter operations you get to experience first hand the challenges of juggling power, cooling and floor space availability. If you use less power, your electrical costs go down and your cooling costs go down and there is a ripple affect across the entire facility. In an effort to reach that goal, we do everything we can to hone down the power requirements of our servers. We start by using 240v circuits to the rack. Doing so eliminates the need to step down to 110v which is much more efficient and it helps eliminate harmonic feedback in the circuit. Add to that “less heat” which means less wear and tear on the servers and that is a good first step.

Once you get power to the server, it helps to spec your servers properly. A properly sized power supply can save more than 25 Watts per server. When you multiply that by just 1,000 servers, that’s a cool 25kW of power savings. When you multiply that by the number of servers in our facilities? Well, it’s certainly worth the exercise of making sure we are ordering the proper equipment.

Aside from server equipment and datacenter power, SoftLayer has recently joined the Green Grid (more info). We are looking to use that association to join the likes of AMD, Intel, Dell, HP, IBM, Microsoft and many more to help reduce overall power consumption by datacenters. There are many lessons yet to be learned by IT companies to help reach that goal.

Being green is not confined to datacenter facilities. On SoftLayer Truck Day, we receive hundreds of cardboard boxes. Rather than just throwing those all away, we work with a local vendor to make sure the cardboard and packaging materials inside get recycled. Each server comes with various parts that are not needed (it’s cheaper for the vendor to just ship the servers with all misc parts than it is to strip specific parts from specific orders). It would be easiest to just deposit all of those unneeded parts into a dumpster, but being green means doing more than just whatever is easiest. We sort spare power cords and recycle those for the copper. We sort screws and sell them to a local vendor (and use the money to buy Monster). Any spare part that we have not found a specific destination for, gets donated to a group that sells the parts and makes donations to charities.

Being green not only makes good financial sense, but it also makes good ecological sense. And – it keeps us stocked with Monster.

 
Who is SamF?
Posted by Sam Fleitman on May 23rd 2007

Since this is my first blog post, I thought I would take the time to introduce myself and explain my role here at SoftLayer. That way, if you wind up reading any future posts, your first question won’t be “who is this guy and why do I care?”

Like many of you, I’ve been in this business for quite some time. My first job in the industry was back in 1992 when I was working with the CIS department at Texas A&M helping to manage the university Gopher system. I remember going around campus to the various departments helping to convince people that putting information online in Gopher was the end-all/be-all for sharing information. Of course, that evangelizing didn’t last long. Shortly after going to GopherCon ‘94 in Minnesota, our attention started to shift to the Mosaic browser and HTTP protocol. From there, things just steamrolled.

After A&M, I went to work for Oracle Corp where we started work on an online learning website. The goal was to take all Oracle related CBT courses and find ways to put them online under one site. This was before such things were designed for the web and it meant working with the various vendors and all the different CBT formats to find ways to get them online.

Next was an ISP / shared hosting company named Catalog.com (now known as Webhero.com). We provided all the typical Internet services including dial up access, DSL, shared hosting, domain name registration, online storefronts as well as hosting for some extremely large enterprise organizations. We did a lot with that company and it still continues on today with a pretty solid product offering and services.

From there, it was into the enterprise datacenter hosting and dedicated server hosting markets. Now it’s all about SoftLayer and the services we can provide customers with our latest and greatest infrastructure.

As COO at SoftLayer, I am basically in charge of day to day operations including support, facilities management, internal systems infrastructure and anything else that gets dreamed up on a daily basis. What’s the funnest part of my job? Every bit of it! I love the daily challenges in the support group. Facilities planning and forecasting allow me to really dig into the numbers. And, since I originally started out as a developer and system administrator, I love being involved with internal systems. Now at this point, I’ve got to be honest; we’ve got some really good people here at SoftLayer that do all of the dirty work (the actual fun stuff), but I do get to stay involved in all of it. However, because these guys are so good at what they do, I don’t have to lose sleep over any one particular thing – instead, I get to stay involved in every piece of it. Maybe in future posts I’ll explain how we determine the number of chassis fans that go inside each server (over 35,000 chassis fans in production so far) or how many different types of SAS and SATA cables we need with how many different types of connectors (so many of differing types that it eventually became cheaper and more efficient to just have them custom made), where to put all of these servers, etc.

I guess the point of all that was to introduce myself and to let you know - having been in the industry for so long now and having dealt with everything from Gopher to dial up access to enterprise hosting to being in the dedicated server market now for quite a while, I feel I have a pretty decent understanding of what our customers are looking for and what their pain points are. While overall operations are critical for everyone, enterprise customers running CRM apps, file servers and domain controllers view things from a different standpoint than someone running a personal mail server or even a large shared hosting or VPS business. As I read through tickets on a daily basis, I try to put myself back in the customers’ shoes to make sure that the services we provide cover the needs of all the different types of customers we have. Having been a customer or provider at pretty much every level, I certainly understand the challenges many of you face on a regular basis. It’s our job to help you overcome as many of those as possible.

We have a lot of really cool things going on at SoftLayer and I hope to share some of those in future posts. In my next post, I’ll tell you all about Truck Day at SoftLayer.

 










 
 
Copyright © SoftLayer Technologies, All Rights Reserved.
Close
E-mail It
Socialized through Gregarious 42