You can watch the entire segment in this video.
During the fifth edition of Open Compute Summit, Jon Koomey (Research Fellow, Stanford University) moderated a Keynote Panel titled “Bringing Integrated Design, Mass Production, and Learning by Doing to the Datacenter Industry” where he invited on stage three illustrious speakers: Sherman Ikemoto (Director, Future Facilities), Kushagra Vaid (General Manager, Cloud Server Engineering, Microsoft) and Jim Stogdill (General Manager, O’Reilly Media).
Koomey began by acknowledging the irony that, despite information technology being a driver for efficiency improvements throughout the economy, buying large hasn’t affected the way IT itself has been provisioning. Specifically referring to the Enterprise IT, he mentioned the lack of transparency of total costs of IT at the project level or the business unit level and the low utilization, plus the fact that 10-20 percent of servers are comatose.
“What lessons are there from the open source software community for transforming IT to become more efficient inside the enterprise?” asked Koomey.
“It’s almost a misdirected question,” replied Stogdill. “We focus a lot on the openness, which is an enabler for a lot of things, but ultimately the question is about ‘How do we keep the hurdle low to adopt new things?‘ In the open source software space it was really about low hurdle for adoption, about low cost, try-before-you-buy, low exits in terms of no proprietary lock-ins. In the open hardware space, at least in the horizontal compute space, probably the most interesting thing right now is being able to enter your credit card and get an EC2 instance. It’s probably the thing that’s most parallel to what we are talking about here,” said Stogdill.
He continued: “From an Open Compute perspective, the question is ‘How do we lower the hurdles across the board, not just with open source and open culture, but with other models that make it easier to adopt these things?‘
“It sounds like it’s not just a hardware problem, noted Koomey. “In Open Compute it’s also a software problem.”
Stodgill disagreed: “We have to be careful not to overstate the parallels between open source: with open software I can download and try it with zero friction; hardware, at the end of the day, still needs to show up in a crate. We should be asking ourselves what model could make that process as simple as possible.”
Scaling data centers
“Microsoft has been driving forward with really large integration,” said Koomey. He invited Vaid to “talk a little about the challenges and benefits.”
Vaid obliged: “In the early days, when Microsoft was scaling its data centers, we realized that, unless there was a process where we could have consistency across the different life stages of the design, supply chain and operations, it would really be difficult to ensure that we meet our time-to-market goals, our efficiency goals and also keep costs under control. At a high level, we break it down into those three areas.”
“For the design front, one of the key principles is having modularity, because the facility has a typically 15-year of life, technologies change, and it should be really easy to introduce new technologies over the life-cycle. This applies to mechanical design, power-electrical design, control software, EPMS, TCIM etc.” continued Vaid.
“On the supply chain side, how the servers get deployed from the dock to the time they go live, there needs to be a very streamline process to take care of that.
On the operations side, you can’t fix what you can’t monitor, so it’s very important to have all kinds of monitoring (power, performance), the utilization metrics, and feed that into a machine learning system, which can detect patterns for you and find out when you’re operating below efficiency levels,” Vaid stated.
Metrics and data center performance
After hearing so much about measurements, the panel moderator wanted to tackle the issue of metrics and link it back to business performance.
“There are different metrics at different stages of the life-cycle. During the design, power efficiency and agility metrics, during the deployment phase – how long it takes for a server to go live, during the operations – how efficiently are you running,” said Vaid.
- A metric for any kind of applications
“Any measurements of costs per transactions? Energy efficiency per transaction, profit per transaction”? asked Koomey
“We typically do that on a port application business. For running web search they’ll be a metric that will take into account the cost of running a web search based on the facility metrics. For running Windows Azure, there will be a metric about the costs of hosting a VM. There’s a metric for any kind of applications,” stated Vaid.
“Sherman, you know a lot about existing facilities, and how apparently logical decisions can lead to deployment and to strengthened capacity inside existing enterprise facilities. Can you talk about how software can mitigate those kinds of issues?” asked the moderator.
“My expertise is more on the physical side of the data center, and getting full utilization of the physical capacity of the room – whether you’re talking about space, or the power that’s being provided, or the cooling provisioning to the data center. There are
emerging standards now that will help various silos of the data centers management teams manage their own portion of the data center. There’s software-based standards and measurement-based standards, like PUE (Power usage effectiveness), but the challenge in integrating all those various strands or silos of management into a single, overarching metric. There is software available to do that now. It’s one level higher than where the industry is today. It will involve new software technology which is modeling (you need to know how the various sub-components of the data center are interacting with each-other at a system’s level),” stated Ikemoto.
He went on to present the Use-case of a company in the UK: “On the facility side, for the 20,000 square feet they were saving about a million and a half a year in energy bill; on the IT side, they were able to simultaneously improve PUE and IT capacity of the data center in a synchronized way. They were able to achieve a much lower PUE, freeing up the equivalent of about 77 out of 300 cabinets of computing resources.
“The only way to achieve that was to see how the two subsystem interacted with each-other,” believed Ikemoto.
“What kind of developments would make tracking rapid change of IT easier and more effective?” asked Koomey.
Stodgill found the question on defining the workload very interesting. “The way we talk about data centers, it often seems like data centers were the place where we disposed of excess electricity,” he said. “Especially in the enterprise space, where workloads are so diverse, it’s difficult to come with a common definition of what workload even is.”
This brings up an important point: the workloads relate to business value, realized Koomey. “What does it do for the company to generate this much computing?”
“There’s a hierarchy from PUE to utilization, to what’s happening into that utilized code,”to the application’s layer. Because we can measure it, we focus on PUE even longer after each incremental dollar’s probably not giving us much return and we can much better spend it higher up the hierarchy,” said Stodgill.
- Modularity in the facility
“From a physical standpoint, it reduces the number of degrees of freedom; the less freedom you have, the more able you are to meet your original goals. It’s easier to define the goals for a smaller module, than it is to do for a completely open data center. From a physical standpoint, it’s very helpful but it doesn’t get rid of the problem because you still have to build out a module and, unless you stick to the original plan, you are going to run into capacity issues and shortfalls or cost overrun,” thought Ikemoto.
“The optimization depends on software,” Stogdill jumped in. “One of the provocative things that open compute has to decide is whether or not it’s an open hardware only thing or it embraces being open software as well. Open hardware depends on open software to make any sense – at the layer of provisioning and dispatch and everything else that makes stuff work.”
“The whole stack matters,” agreed Koomey. “From the perspective of cost-per-unit of compute you need to worry about software and the way users are interacting ”
“How are we going to do things in the future when computing is cheaper that we don’t do now. Stuff that we do now and seem so big will seem so small in size. The things we do with IT in a digital business future they will seem minor,” relayed Stodgill.
- Business outcome versus operations
“Open Compute is going towards making that connection; in the enterprise/government, in the traditional mixed use data center that connection is broken at the very beginning,” said Ikemoto. “The original plan is set, the money is committed to by the company, but then the plan is forgotten, everything goes to the operations, cost-per-compute skyrockets and nobody even knows about it.”
“Senior management attention on these issues is almost absent, if not entirely,” said Koomey. “The level of waste here is so large we will see massive shifts in the way enterprise IT is provisioned,” Koomey predicted.