|
|
|
|
|
|
|
Author Archive
By Sam Fleitman on Monday, June 22nd, 2009
In catching up on some of my blog reading, I ran across this blog by Jill Eckhaus of AFCOM (a professional organization for data center managers). Yes, I realize that article is four months old, but like I said – I’m catching up.
One of the things that really concerns me with articles and blogs such is this one are the repetitive concerns about “data security” and “loss of control” of your infrastructure. Both of those points are easy to state because they prey on the natural fear of any system administrator or data center manager.
System administrators have long ago come to realize that, in the proper environment, there is no real downside to not being able to physically place their hands upon their servers. In the proper environment the system administrator can power on or off the server, can get instant KVM access to the server, can boot the server into a rescue kernel to try to salvage a corrupt file system, can control network port speeds and connectivity, can reload the operating system, can instantly add and manage services such as load balancers and firewalls, can manage software licenses and naturally, can control full access to the server with root or administrator level privileges. In other words, there is no “loss of control” and “data security” is still up to the system administrator.
The data center managers are understandably concerned about outsourcing because it can potentially impact their jobs. But let’s face it – in today’s economy, the capital outlay required to acquire new datacenter space or additional datacenter equipment is extremely difficult to justify. In those cases sometimes the only two options are to do nothing or to outsource to an available facility. Of course, another option is to jeopardize your existing facility by trying to cram even more services into an already overloaded data center. If a data center manager is trying to build a fiefdom of facilities and personnel, outsourcing is certainly going to be a concern. One interesting aspect of outsourcing is – datacenter management jobs are still there; they are just at consolidated and often times more efficient facilities.
In reality, “data security” and “loss of control” should be of no more or less concern if you are using your own data center versus if you are doing the proper research and selecting a viable outsourcing opportunity with a provider that can prove it has the processes, procedures and tools in place to handle the job for you.
(In the spirit of full disclosure; I am both a local and national AFCOM member and find the organization and the information they make available to be quite useful.)
Posted in News | 1 Comment »
By Sam Fleitman on Thursday, May 21st, 2009
I just got back from participating in a panel discussion at the most recent Anti-Spyware Coalition Public Workshop. The title of the panel session was “Who Owns the Problem”. You can see who all of the participants were, but it was a good session with representation from the FBI, Symantec, Paypal, the Center for Democracy and Technology, Stopbadware.org and KnujOn.
A lot of the session was focused on end user security regarding spyware, rogue anti-virus, malware and other general badware. But part of the discussion was in regards to the security efforts of the hosting industry in general and SoftLayer specifically. Some of the things we deal with in the hosting industry are second nature to those of us that have been here for a while. But when you start talking about it in front of a different crowd, you begin to appreciate the different perspectives that are out there.
For instance, one common perception (held by some, but obviously not by all) is that once we are made aware of a server that has malware on it, all we have to do is pull the plug on the server and the problem is resolved. However, sometimes the consequences of doing so are high enough to be worthy of a second look. For instance, consider the scenario where SoftLayer rents a server to a customer. That customer slices the server into virtuals using Parallel’s Virtuozzo product and rents a virtual to another customer. That customer puts Cpanel on it to sell shared hosting accounts. Now SoftLayer is 2 layers removed from the actual end user. If that end user’s website gets compromised and begins to distribute malware, how do we at SoftLayer deal with the problem. Ideally, we tell our customer and they tell their customer and they tell the end user about the problem. The end user reacts quickly and cleans up the site. That’s not anywhere close to “best case scenario”, but I would call that a reasonable real-world response.
The problem is, if any of the individuals in that chain of communication fails to react quickly, then the response time for that issue is drastically impacted and more people are potentially victimized by the malware. At what point do we pull the plug on the server? At what point do we decide that all of the other customers on the server have to suffer because of the one bad apple or because of a slow response time from one customers in the chain of communication? Websense did a study that showed in the second half of 2007, over half of all sites distributing malware were themselves compromised sites so the scenario described above is actually a very common problem. It also highlights that there is one more victim in the incident; the web site owner.
We tend to deal with each case as prudently and expeditiously as possible in every abuse report that we receive. In some cases, we pull the plug immediately. In others, we try very hard to work with the customer to resolve the issue. But in all cases, we are constantly working to act as quickly as possible on each individual case.
This is just one of the many scenarios that we have to deal with and it highlights why having a good relationship with your provider is such an important factor when choosing someone to help supply or service your IT needs.
Posted in News | 3 Comments »
By Sam Fleitman on Wednesday, May 20th, 2009
No – this isn’t one of those blogs or editorials ranting and railing about how no one out there is able to provide good customer service anymore. This isn’t about how no one in the service industry – from restaurants to retail and everything in between – seems to care about the customer anymore. People have been writing those stories for the past 50 years (about half as long as they have been writing about the coming demise of baseball). This is just a short little missive lamenting how the same people that complain about lack of service are often people that work in the service industry themselves.
I often find myself in a retail store wondering why I can’t get help locating an object. Or in a restaurant wondering where the wait staff is. Or trying to work my way through an automated phone help system. Part of me sympathizes with the wait staff knowing that they are probably just too busy to get to my table. Maybe the restaurant is understaffed or maybe they have an unexpected rush of customers. And part of me even realizes the operational value of the automated phone system. The ability to reduce head count and lower costs with an automated system seems like a great idea (and sometimes it is).
But when I find myself in those aggravating situations and my anger is just about to get the better of me, I generally come back to the fact that myself and everyone else that works at SoftLayer is in the customer service industry. Oh, I might complain to a manager or I might tip less or I might shop at that location less. But more important than that, I try to use that experience as a reminder of how important customer service is. I’m not talking about just the ability to provide the product the customer is looking for – I mean the ability to be able to answer questions in a timely manner, to answer the phone as quickly as possible, to handle outages as quickly and professionally as possible, to provide customers with frequent updates and most importantly, to treat every customer interaction with the level of urgency that the customer thinks it deserves.
And THAT’s the important part – not just solving the problem, but making sure that the customer’s expectations are met.
Posted in News | Comments Off
By Sam Fleitman on Monday, December 1st, 2008
If you read through some of the previous blogs on this site such as our CEO’s “SoftLayer Thinks ‘Outside the Box’” or the blog written by one of our super developers, Mr McAloon, entitled “Simplicity”, or Mr Rushe’s “An Interview with an elevator” (OK – that has nothing to do with what I’m referring to, but it’s one of the funniest blogs on this site), one thing you’ll notice is that at SoftLayer, we try to automate and simplify things for the customer. Our customer portal has a LOT of customer features. There are automated OS reloads, the ability to boot into a rescue kernel, the capability to add IP addresses on demand, add and configure a firewall or a local or global load balancer, the ability to edit your DNS settings (forward or reverse) and – my favorite – the ability to reboot your server via IPMI or the power strip. You can also manage your CDN services, monitor your NAS or iSCSI storage, configure backups, use the free KVM services, check your bandwidth and of course, handle all of the usual things like opening support tickets or checking your invoice. Or, if you want to integrate any or all of those features into your own management system, there is a full API available for your use.
With all of that functionality in the portal, one of the challenges we continuously run into is educating new customers on all of the features. Not just educating them on how to use the features – but that the features actually exist in the customer portal. A lot of our customers are either new to On-Demand IT Infrastructure Services (aka the hosting environment) or come from other competitors that only offer a fraction of the features that we are able to provide. For instance, you would be amazed at how many customers open “reboot” tickets. While we respond to tickets quickly, it is actually faster for the customer to click on the “reboot” button in the portal than to click on the “create new ticket” link in the portal and then type out a reboot request.
As ways to address that issue, we created a private customer forum so that customers can share ideas, comments and suggestions with each other. We have also not only created the KnowledgeLayer FAQ database, but we have integrated that directly into the support ticket feature of the portal (when you open a ticket, the FAQ system will automatically recommend related fixes before you even submit the ticket). We also have tutorials directly linked inside the portal and even have all of our API documentation available for review.
So one of the challenges we have at SoftLayer isn’t just creating and deploying the new features and services that keep us out in front of the pack, but educating our customers of their existence and their ease of use. BTW, that’s a great problem to have!
Posted in News | Comments Off
By Sam Fleitman on Friday, May 23rd, 2008
The growth in energy demanded by, and used in, IT environments is a well documented phenomenon. Datacenters are using more energy as CPUs get faster, hard drives become larger, and end user demand for access to data and applications continues to increase. Prices for the underlying hardware and services continue to fall, which just fuels more demand.
Datacenter operators have done their best to maximize the use of every available asset within a facility in order to operate highly efficient environments. Much of the emphasis to date has been on proper datacenter alignment: hot-aisle/cold-aisle configurations, blanking panels to cover gaps in server racks, and sealing holes under raised floors to better contain cold air have become common place in the data center.
However, in most large organizations, there many areas that needs more attention. Departments within a large company often have competing goals that negate green IT efforts. One example of this would be –
- The system administrators and developers want the biggest, fastest machines they can get with the most expandability. This enables them to add memory or hard drives as utilization increases – which makes their jobs much easier to perform and helps them better maintain customer SLAs.
- purchasing (and finance) department’s primary goal is to save money. The focus is to work with the vendors to reduce the overall hardware cost.
The disconnect between those two departments will often leave the datacenter manager out in the heat (definitely not “out in the cold”). That person’s job essentially becomes “just find a place to put it” until the datacenter is full enough that the answer becomes “no more”. It then becomes a “fix it now” problem as the company struggles with plans to build more datacenter space. So, the problem is a short term view (meeting quarterly earnings results) versus long term direction (to achieve a sustainable and efficient operations environment that may have a short term cost implication).
What should happen is that the disparate groups need to work together throughout the entire planning process. The purchasing department, the system administrators, developers, and the datacenter managers should build a common plan and set realistic expectations in order to optimize the IT infrastructure required and to best meet business, operations, and efficiency objectives.
Let’s continue the example from above… if a server is ordered just because it’s more expandable (more expansion slots, RAM slots and hard drive bays), that means that the power supply has to be bigger to support the potential need of those future components. A server power supply is most efficient (wastes the least amount of power doing the conversion) when it is running at 80-90% load. If a power supply is over sized to support potential future needs, then it is operating at a much lower efficiency than it should – thus generating more heat, wasting more power and requiring more cooling, which in turn requires more power to run the AC’s.
That might seem like a small price to pay for expandability, but when that scenario is multiplied over an entire datacenter, the scope of the problem becomes very significant. This could lead to lost efficiency of well over 20% as a business plans and buys ahead of demand for the computing capacity it may need in the future.
So, what is the other option? Is purchasing right? Should IT simply buy a small server, at a lower total cost, and scale as the business scales? The problem with this is that it tends to lead to exponential growth in all aspects of IT – more racks to house smaller servers, additional disks, more space and power over time, increased obsolescence of components, and more lost efficiency.
The problem is considerably more complex than both options. The simple fact is that the “fixes” for IT go well beyond implementing a hot-aisle cold-aisle layout and sealing up holes under the raised floor of the datacenter. Now that those things have become “best practices,” it’s time to start highlighting all of the other things that can be done to help improve energy efficiency.
At SoftLayer, we promote an energy efficient focus across the entire company. Datacenter best practices are implemented in all of the datacenter facilities we occupy; we use hot-aisle cold-aisle configurations, we use blanking panels, we use 208v power to the server, we pay very close attention to energy efficient components such as power supplies, hard drives and of course CPUs, and we recycle whatever we can.
Most importantly, we deliver a highly flexible solution that allows customers to scale their businesses as they grow – they do not need to over buy or under buy, since we will simply “re-use” the server for the next customer that needs it. Individually, the energy savings from each of these might be fairly small. But, when multiplied across thousands and thousands of servers and multiple datacenters – these many small things become one large thing quickly – a huge savings in energy consumption over a traditional IT environment.
Ultimately, SoftLayer believes that we can never be satisfied with our efforts. As soon as one set of ideas becomes common place or best practices, we need to be looking for the next round of improvements. And bring those new ideas and practices forward so all can benefit.
Posted in News | Comments Off
By Sam Fleitman on Monday, February 11th, 2008
In Steve’s last post he talked about the logic of outsourcing. The rationale included the cost of redundant internet connections, the cost of the server, UPS, small AC, etc. He covers a lot of good reasons to get the server out of the broom closet and into a real datacenter. However, I would like to add one more often over looked component to that argument: the Spares Kit.
Let’s say that you do purchase your own server and you set it up in the broom closet (or a real datacenter for that matter) and you get the necessary power, cooling and internet connectivity for it. What about spare parts?
If you lose a hard drive on that server, do you have a spare one available for replacement? Maybe so – that’s a common part with mechanical features that is liable to fail – so you might have that covered. Not only do you have a spare drive, the server is configured with some level of RAID so you’re probably well covered there.
What if that RAID card fails? It happens – and it happens with all different brands of cards.
What about RAM? Do you keep a spare RAM DIMM handy or if you see failures on one stick, do you just plan to remove it and run with less RAM until you can get more on site? The application might run slower because it’s memory starved or because now your memory is not interleaved – but that might be a risk you are willing to take.
How about a power supply? Do you keep an extra one of those handy? Maybe you keep a spare. Or, you have dual power supplies. Are those power supplies plugged into separate power strips on separate circuits backed up by separate UPSs?
What if the NIC on the motherboard gets flaky or goes out completely? Do you keep a spare motherboard handy?
If you rely on out of band management of your server via an IPMI, Lights Out or DRAC card – what happens if that card goes bad while you’re on vacation?
Even if you have all necessary spare parts for your server or you have multiple servers in a load balanced configuration inside the broom closet; what happens if you lose your switch or your load balancer or your router or your… What happens if that little AC you purchased shuts down on Friday night and the broom closet heats up all weekend until the server overheats? Do you have temperature sensors in the closet that are configured to send you an alert – so that now you have to drive back to the office to empty the water pail of the spot cooler?
You might think that some of these scenarios are a bit far fetched but I can certainly assure you that they’re not. At SoftLayer, we have spares of everything. We maintain hundreds of servers in inventory at all times, we maintain a completely stocked inventory room full of critical components, and we staff it all 24/7 and back it all up with a 4 hour SLA.
Some people do have all of their bases covered. Some people are willing to take a chance, and even if you convince your employer that it’s ok to take those chances, how do you think the boss will respond when something actually happens and critical services are offline?
Tags: backups, costs, datacenter, failures, hardware, memory, redundancy, spares Posted in News | Comments Off
By Sam Fleitman on Wednesday, October 31st, 2007
“ah – I don’t need backups.”
“Too busy to do backups – I’ll get to that later.”
“Backups? It costs too much.”
“I don’t need backups – MTBF of a Raptor is 1.2 Million hours.”
“Oops – I forgot about doing backups.”
Backups are one of the most commonly forgotten tasks of a system administrator. In some cases, they are never implemented. In other cases, they are implemented but not maintained. In other cases, they are implemented with a great backup and recovery plan – but the system usage or requirements change and the backups are not altered to compensate.
A hard drive really is a fairly reliable piece of IT equipment. The WD 150GB Raptor has a rating of 1.2 Million hours MTBF. With that kind of mean time between failures, you would think that you would never have to worry about a hard drive failing. How willing are you to take that chance? What if you double your odds by setting up two drives in a RAID 1 configuration? Now can you afford to take that chance? How willing are you to gamble with your data?
What if one of your system administrators accidentally deletes the wrong file? Maybe it’s your apache config file. Maybe it’s a piece of code you have been working on all day. Or, maybe your server gets compromised and you now have unknown trojans and back doors on your server. Now what do you do?
Working in a datacenter with thousands of servers, there are thousands and thousands of hard drives. When you see that many hard drives in production, you are naturally going to see some of them fail. I have seen small drives fail, large drives fail, and I have even seen RAID 1 mirrors completely fail beyond recovery. Is it bad hardware? Nope. Is it Murphy’s Law? Nope. It’s the laws of physics. Moving parts create heat and friction. Heat and friction cause failures. No piece of IT equipment is immune to failure.
That 1.2 million hours MTBF looks pretty impressive. For a round number, let’s say there are 15,000 drives in the SL datacenter. 1,200,000 hours / 15,000 drives = 80 hours. That means that every 80 hours, one hard drive in the SL datacenter could potentially fail. Now how impressive is that number?
Ultimately, regardless of the levels of redundancy you implement, there is always a chance of a failure – hardware or human – that results in data loss. The question is – how important is that data to you? In the event of a catastrophic failure, are you willing to just perform an OS reload and start from scratch? Or, if a file is deleted and unrecoverable, are you willing to start over on your project? And lastly, how much downtime can you afford to endure?
Regardless of how much redundancy you can build into your infrastructure with the likes of load balancers, RAID arrays, active/passive servers, hot spares, etc, you should always have a good plan for doing backups as well as checking and maintaining those backups.
Have you checked your backups lately?
Tags: backups, evault, hardware, IT, mtbf, nas, physics, SoftLayer Posted in Technology, backups | Comments Off
By Sam Fleitman on Monday, August 6th, 2007
The SoftLayer contingency recently returned from attending HostingCon 2007 in Chicago and I have to say, it was a great experience. We had a lot of opportunities to meet up with many of our customers, meet with a lot of vendors and potential vendors as well as visit with some of our competitors.
While there, I had the privilege of participating in a panel discussion on “Green Hosting: Hope or Hype“. Isabel Wang did a great job of moderating the discussion with Doug Johnson, Dallas Kashuba, and myself. The overall premise of the panel discussion was to talk about green initiatives, how they affect the hosting industry, what steps can hosting companies take and is it something we should be pursuing.
It was interesting to hear the different approaches that companies take to be green. Should companies focus their efforts on becoming carbon neutral by purchasing carbon credits such as DreamHost, by promising to plant a tree for each server purchased such as Dell, by working on virtualization strategies such as SWSoft or by working to eliminate the initial impact on the environment such as we have done at SoftLayer. You can probably tell from one of my previous blog posts where SoftLayer is focusing our efforts to help make a difference.
Besides the efforts of the individual companies on the panel, there were some good questions from the audience that helped spur the conversation. Does the hosting industry need its own organization for self regulation or are entities such as The Green Grid sufficient? Do any of the hosting industry customers really care if a company is “green”? Should a hosting company care if it’s “green”? And, what exactly does “being green” mean?
While there are differing opinions to all of those questions, there really isn’t a “wrong” answer. Ultimately all of the steps companies take – no matter how small – will help to some extent. And no matter what the motivation – whether a company is “being green” in an effort to gain publicity, to save money or to simply “make a difference” – it’s all worth it in the end.
Posted in Business, Company Funfacts, Going Green | Comments Off
By Sam Fleitman on Friday, July 27th, 2007
In previous posts, there have been mentions of the datacenter of the future, kvm over IP and a reference to an elevator. Then, just the other day, someone in the office pointed out this article: “How remote management saved me an emergency flight overseas”
The article discusses the successful deployment of servers from a remote location. The author talks about being able to remotely configure and deploy some new servers from the confines of a ski lodge. Of course, they had to have someone at their offices to receive the server shipment, unbox the servers, rack them up, get them all cabled, make sure space, power and cooling would all be sufficient and then put in a CD. Things that weren’t mentioned probably included throwing away all of the packaging material, doing QA on the hardware to verify it was all correct and changing any BIOS settings.
Beyond all of that, there are many things that are just inherent to the process that they didn’t refer to, including having to find the right server vendor, negotiating pricing for the servers, making sure all of the pieces and parts were going to be shipped, tracking the shipment dates, contacting the vendor multiple times to try to find out why the shipment wasn’t going to be on time, having available datacenter space and infrastructure, putting those dang cage nuts in the server racks, having available switch ports, making sure the network was configured correctly, providing network security, making sure all of the software licenses were up to date, etc, etc, etc.
Or, as so many of you already know – they could have gotten their servers from a dedicated hosting provider such as SoftLayer (hint, hint) and had the servers purchased, configured, QA’d and online within just a couple of hours and with no more effort than just filling out a signup form. It’s hard to imagine there are still so many people out there doing things the hard way.
Posted in Business, Technology | Comments Off
By Sam Fleitman on Wednesday, July 11th, 2007
How do you unload 1,000 servers and have them ready to go live in a datacenter in five hours? With lots and lots of planning. Every month we take in a shipment of servers to accommodate the next 30 days of sales. Preparation for each delivery starts several months in advance with forecasting models. You have to look far enough ahead in your models to continually adjust forecasts for sales, facilities and available resources. Some vendors need more lead time than others so you have to constantly update your forecasts, all the way up to final order placement.
Also, you don’t just walk into a datacenter with a server and set it down. There’s a lot of work that goes into physical prep for the datacenter as well. You have to plan the datacenter layout, order and assemble racks, add rails, power strips, switches, power cord bundles, network cable bundles, etc. Every rack we deploy has almost 400 cage nuts and just under 200 cables in it. We don’t just string a bunch of cables up and call it a day. Every cable bundle is meticulously routed, combed and hung to make them look professional. With that much cabling, you have to make it right or you’ll never be able to work around it.
With one week to go before the trucks arrive, all of the datacenter prep starts wrapping up. And with just a few days left, we have our last manager meeting to review server placement, personnel, timing and other delivery details.
Next is Truck Day – this is when the fun begins.
On Truck Day, we leave plenty of people behind to handle sales, support and accounting, but everyone else is expected at the loading dock. After all the pallets are pulled off the truck and accounted for, the team gets busy un-boxing. As servers are unboxed, all of the spare parts in the boxes – spare screws, riser cards, SATA cables, and various other pieces – are sorted into bins on the dock. The servers themselves are then placed in custom transport carts and moved to the datacenter.
From there, the teams inside the datacenter sort the servers according to type and perform a strict QA process that includes verifying the hardware configurations and verifying that the components are all seated properly.
Once sorted, the servers get scanned into the system and racked up. As all of the cables are plugged in, another QA process is completed to verify that all of the ports are correct. At that point, it’s just a matter of turning each server on and watching them check in, get their bios flashed with the latest and greatest release and having the system update any component firmware that is needed. As the systems check themselves into inventory, they go through two more QA processes that include an inventory check and a burn-in process.
By the time the truck is empty, the last box is stashed and the final server is racked up, everyone is ready to get back to their day jobs. Months worth of planning – all wiped out in a matter of hours.
Mary is working on a great post about what Truck Day looks like from a Salesperson’s perspective. It explains why we have everyone get involved in the process.
Posted in Company Funfacts | 5 Comments »
|
|
|
|
|
|
|
|
|
|
|
|
|
|