When data centers lose their cool
am 22.05.2006 22:16:20 von spamhotmailhttp://www.gcn.com/print/25_12/40759-1.html
Login | Register
Search GCN GCN Quickfind GO
=B7 Print Edition
=B7 eNewsletters
=B7 First Take
=B7 Tech Blog
=B7 Defense Systems
=B7 FOSE
=B7 Government Leader
=B7 Washington Technology
GCN Home > 05/15/06 issue
When data centers lose their cool
As processors get more powerful and servers get hotter, system
modernization takes on new urgency
By Joab Jackson, GCN Staff
Story Tools: Print this | Email this | Purchase a Reprint | Link to
this page
Every 10-degree [heat] increase doubles the failure rate of that
system."
,br>-Wu Feng, former team leader at Los Alamos National Laboratory
Image: Rick Steele
In this report
SPECIAL REPORT: Systems modernization
=B7 How to chill out
=B7 HUD's team effort
=B7 Agencies chip away at BSMs
Data center managers like to say they create weather. They're not
exaggerating.
To prepare for a new supercomputer he'll oversee, Ramesh Kolluru,
director of Louisiana's Center for Business and Information
Technologies, set up an ad hoc data center within his research
facility. Two rows of server cabinets, packed with SGI Altix 350
systems, run 15 feet down the center of the lab. When you start walking
down the aisle between those racks, the temperature is a pleasant 78
degrees; by the time you reach the end-about seven paces later-the
temperature has plunged to a wintry 40.
Lately, the weather inside data centers has grown unmanageably hot,
forcing IT managers to reconsider plans they'd taken for granted. And
data center modernization-the unglamorous second cousin to today's
business systems modernization-has become essential. Often, an agency
can't successfully complete the latter without seriously considering
the former.
"You cannot build data centers like you used to," said Richard
Sawyer, director of data center technology for American Power
Conversion Corp. of West Kingston, R.I. That's partly because modern
servers can deliver a lot more processing power in a lot less space
than they could a decade ago. While the benefits are immediate, the
costs are often more subtle-and over the long run can adversely
affect an agency's ability to accomplish its mission.
At first you might simply notice servers on the top racks failing more
often than those at the bottom. You may see groups of servers running
hotter than the rest-"hot spots," as they're known. You may
find you're running your backup cooling unit around the clock,
instead of just for emergencies. Bring in a cadre of blade servers, and
suddenly the heat problem grows-as do the electricity bills. At
worst, you may end up with entire racks of servers, or "dark
clusters," powered down until electricity and cooling concerns are
addressed.
According to Gary Spilde, site planning manager for Mountain View,
Calif.-based SGI, "The government has an awful lot of 20- and
25-year-old data centers. Typically, there is a lot of retrofitting to
be done."
Power surge
Part of the problem stems from the law of unintended consequences. In
October 1995, following the lead of the commercial enterprise sector,
the Office of Management and Budget issued a bulletin calling for
agencies to consolidate operations into data centers as much as
possible.
"There was a push to ... put more gear into the same bit of real
estate," said Douglas Alger of the Cisco Systems Inc. Data Center
Infrastructure team. As organizations rented or built data centers,
they looked to cut real estate costs as much as possible, so it made
sense to buy small.
Server manufacturers heard the call for compactness. Servers that used
to take up 3.5 inches of vertical space in a rack-called 2U
servers-were replaced with more powerful units that took up half the
space. Today's blade servers are even more space-efficient.
But packing more-and more powerful-processors into a cabinet also
requires more juice. Much more juice. By some industry estimates, each
new generation of servers requires 30 to 50 percent more power.
"I don't know where the cut-off is when a department chair says
'This is too much power,' " confessed Thomas Zacharia, associate
lab director at the Energy Department's Oak Ridge National
Laboratory.
Perhaps a bigger problem than soaring electric bills is the heat that
servers emit. "When you increase power, you're doing more work, and
you're also creating more heat," said Brad Nacke, who heads up
government relationships for cooling equipment vendor Liebert Corp. of
Columbus, Ohio. Some of that heat comes from the memory modules and
some from the disk drives but, in most cases, at least half the heat
comes from the processor.
Providing proper cooling escalates electricity bills even more. Poorly
cooled servers run slow or even shut down when they get too hot. Heat
also accelerates equipment breakdown, said Wu Feng, a former team
leader at Los Alamos National Laboratory and now part of the computer
science department at Virginia Polytechnic Institute.
"Every 10-degree increase doubles the failure rate of that system,"
Feng said. Failure rates, of course, mean replacement costs and
manpower costs for repairing or replacing the components.
Data center architects didn't spend much time on cooling issues,
Nacke said. For instance, rack-mountable servers are designed to pull
cold air in from the front and emit hot air out the back. Often server
cabinets are arranged front-to-back, meaning the hot air from one
server gets sucked into the one behind it. But of course, spreading out
servers is rarely a practical solution given real-estate prices and
per-foot costs of intrarack cabling.
Technology to the rescue
There are a number of ways agencies can approach data center
modernization projects (see sidebar for cooling strategies). Perhaps
the most significant is to pursue server technologies that require less
energy. Processor vendors, in particular, have been working to address
customers' call for more energy-efficient systems.
Faced with the task of building the world's largest computer
system-one to be built from hundreds of thousands of
processors-engineers from Lawrence Livermore National Laboratory and
IBM Corp. decided the fastest CPUs weren't necessarily optimal.
"No one would have been able to afford the power bill," said Herb
Schultz, a program director in IBM's deep computing group. The
BlueGene/L supercomputer runs over 130,000 processors and will
eventually execute up to 360 trillion floating points per second.
With so many processors churning away, engineers looked at broader
measures of performance, such as performance-per-watt and
performance-per-square-meter, both of which factored into technology
and cooling decisions.
As a result, they equipped each node with two IBM PowerPC 440
processing cores. The chips don't execute as many operations per
second as the finest from Advanced Micro Devices Inc. and Intel Corp.,
but because the IBM chips run slower, they run cooler. More nodes can
fit into a given area.
"BlueGene/L is a model for where high-density computing is going,"
APC's Sawyer said. In other words, all agencies have to start
thinking about data centers in a way that balances performance against
other costs.
Earlier this year, Sun Microsystems Inc. introduced a new performance
measure, called the Space, Wattage and Performance-or SWaP-metric.
SWaP, very simply, is performance divided by space consumed times
power, said Fadi Azhari, director of marketing for the scalable systems
group at Sun. An organization can get a better handle on the true cost
of a server through this approach, he said.
Regardless of metrics, companies are starting to acknowledge the new
realities of the data center. Both AMD and Intel have introduced
dual-core processors, which can execute more than one thread at a time
even though they have lower clock rates than their single-core
predecessors. Both companies are planning processors with four or more
cores.
In June 2006, Intel is scheduled to roll out its next-generation server
processors, code-named Woodcrest. According to company officials, this
line of processors should ultimately boost performance by 80 percent
while reducing power consumption by 35 percent compared to a 2.8-GHz
Intel Xeon chip. The savings will come from the chip's multiple cores
and smaller transistors, which require less voltage to switch.
Azhari says midsize data centers (those with thousands of servers) can
save up to $4 to $5 million per year in electricity and floor space
costs by going with more efficient processors.
The fact of the matter is that incremental, thoughtful adjustments to a
data center plan are all it takes to overhaul operations. In that
respect, data center modernization should be easier than the business
systems modernization programs that sometimes bog down agencies.
According to APC's Sawyer, "Marginal changes in efficiency can have
big impact on operational budget.