AWS Unveils Liquid Cooling and Data Center Innovations to Power the Future of AI
It’s that time of year again—AWS re:Invent, Amazon’s annual cloud computing mega-event, is taking over Las Vegas. Known for its groundbreaking announcements, AWS has once again delivered a slew of updates that are sure to shake up the tech world. Even before the official opening of the event, AWS has revealed significant advancements in its data center strategy, with a particular focus on liquid cooling and energy efficiency. These updates are poised to redefine how data centers operate, especially in the age of artificial intelligence (AI).
Liquid Cooling: The Future of AI Servers
One of the most exciting announcements is AWS’s move to adopt liquid cooling for its AI servers and other high-performance machines. This includes servers powered by its in-house Trainium chips and NVIDIA’s cutting-edge accelerators. Specifically, AWS highlighted that its upcoming Trainium2 chips, currently in preview, and “rack-scale AI supercomputing solutions like NVIDIA GB200 NVL72” will utilize liquid cooling technology.
Why is this a big deal? Liquid cooling is far more efficient than traditional air cooling, especially for the intense workloads required by AI models. AWS emphasized that its new cooling systems are designed to integrate both air and liquid cooling. This hybrid approach ensures that other servers in the data centers, such as those handling networking and storage, can continue to operate without requiring liquid cooling. According to AWS, this “flexible, multimodal cooling design allows AWS to provide maximum performance and efficiency at the lowest cost, whether running traditional workloads or AI models.”
Streamlined Electrical and Mechanical Designs
In addition to liquid cooling, AWS is simplifying the electrical and mechanical designs of its servers and server racks. The company stated, “AWS’s latest data center design improvements include simplified electrical distribution and mechanical systems, which enable infrastructure availability of 99.9999%. The simplified systems also reduce the potential number of racks that can be impacted by electrical issues by 89%.”
How is AWS achieving this? By reducing the number of times electricity is converted as it travels from the electrical grid to the servers. While AWS didn’t go into extensive detail, this likely involves using direct current (DC) power to run servers and HVAC systems, thereby avoiding the energy losses associated with alternating current (AC) to DC conversions.
Prasad Kalyanaraman, Vice President of Infrastructure Services at AWS, explained the significance of these changes: “AWS continues to relentlessly innovate its infrastructure to build the most performant, resilient, secure, and sustainable cloud for customers worldwide. These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads. But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”
Scaling Up for the AI Revolution
With these innovations, AWS is preparing for a massive increase in rack power density. The company claims that its new cooling and power delivery systems will enable a sixfold increase in rack power density over the next two years, with an additional threefold increase planned for the future. This is a critical development as AI workloads continue to grow in complexity and demand.
To further optimize its data centers, AWS is leveraging AI to predict the most efficient placement of racks. This minimizes unused or underutilized power, ensuring that every watt is put to good use. Additionally, AWS is rolling out its own control system across its electrical and mechanical devices. This system includes built-in telemetry services for real-time diagnostics and troubleshooting, making it easier to identify and resolve issues quickly.
Collaboration with NVIDIA
The partnership between AWS and NVIDIA is also worth noting. Ian Buck, Vice President of Hyperscale and HPC at NVIDIA, commented on the collaboration: “Data centers must evolve to meet AI’s transformative demands. By enabling advanced liquid cooling solutions, AI infrastructure can be efficiently cooled while minimizing energy use. Our work with AWS on their liquid cooling rack design will allow customers to run demanding AI workloads with exceptional performance and efficiency.”
Key Takeaways
Here’s a quick summary of AWS’s latest data center innovations:
- Adoption of liquid cooling for AI servers, including Trainium2 chips and NVIDIA accelerators.
- Integration of air and liquid cooling for maximum flexibility and efficiency.
- Simplified electrical and mechanical designs to improve reliability and reduce energy losses.
- AI-driven optimization of rack placement to minimize power waste.
- Modular designs that allow retrofitting of existing infrastructure for liquid cooling and energy efficiency.
- Collaboration with NVIDIA to enhance AI workload performance and efficiency.
Looking Ahead
As AI continues to revolutionize industries, the demand for high-performance, energy-efficient data centers will only grow. AWS’s latest innovations position the company as a leader in meeting these demands. By embracing liquid cooling, simplifying infrastructure, and leveraging AI for optimization, AWS is not just keeping up with the times—it’s setting the pace for the future of cloud computing.
With promises of increased power density, reduced carbon footprints, and enhanced performance, AWS is making it clear that it’s ready to support the next wave of AI advancements. As the re:Invent conference unfolds, it will be exciting to see what other groundbreaking announcements AWS has in store.
Originally Written by: Frederic Lardinois