Abstract Preview

Here is the abstract you requested from the thermal_2015 technical program page. This is the original abstract submitted by the author. Any changes to the technical content of the final manuscript published by IMAPS or the presentation that is given during the event is done by the author, not IMAPS.

Chillerless liquid cooled 350 kW GPU cluster
Keywords: Liquid Cooling , GPU cluster, Cooling tower
This liquid cooled GPU cluster utilizes a cooling tower to remove all of the heat from 1056 GPU based nodes. Each 8 GPU server is equipped with dual Xeon processors and an Infiniband interface. Each rack consumes 32kW of total power and requires 23kW of liquid cooling. The heat is rejected via a facility water system that is cooled with a closed loop evaporative cooling tower. No chillers are required. This saves approximately $100,000 per year on electricity, assuming a data center PUE of 1.8 and electricity at $.07 per kWhr. The liquid cooling system replaces 70 tons of HVAC saving approximately $250K on chillers and installation. In order to maximize GPU lifetime, the system specification required GPU core temperatures 26C above the cooling tower water temperature which ranges from 3C to 33C. This low approach meant the maximum GPU core temperatures were 35C below their rated maximum. This allow for 30 seconds of time to switch to backup under full load before GPU throttling. The liquid cooling system runs under negative pressure to eliminate the possibility of leaks, and the two cooling distribution units (CDUs) are configured for N+1 redundancy, with each rack incorporating valves to switch to a warm backup CDU. The entire liquid cooling system requires power equal to 2% of the heat removed or approximately 5kW. Dual redundant cooling towers are used to maintain uptime. The CDUs report data on heat removed, facility and server inlet/outlet temperatures, as well as flow rates. GPU sore temperature data was also recorded. This data is captured and logged every second of operation.
Steve Harrington , CTO
carlsbad, ca

  • Amkor
  • ASE
  • Canon
  • Corning
  • EMD Performance Materials
  • Honeywell
  • Indium
  • Kester
  • Kyocera America
  • Master Bond
  • Micro Systems Technologies
  • MRSI
  • Palomar
  • Promex
  • Qualcomm
  • Quik-Pak
  • Raytheon
  • Rochester Electronics
  • Specialty Coating Systems
  • Spectrum Semiconductor Materials
  • Technic