Category:
Amazon EC2 Trn2 Instances and UltraServers Revolutionize AI and ML

Amazon EC2 Trn2 Instances and UltraServers Revolutionize AI and ML

Amazon EC2 Trn2 Instances and Trn2 UltraServers: A Game-Changer for AI and Machine Learning

Amazon Web Services (AWS) has unveiled its latest innovation in cloud computing: the Amazon Elastic Compute Cloud (Amazon EC2) Trn2 instances and Trn2 UltraServers. These cutting-edge compute options are designed to revolutionize machine learning (ML) training and inference, offering unprecedented speed, memory capacity, and cost efficiency. Powered by the second generation of AWS Trainium chips, known as AWS Trainium2, these instances promise to redefine the boundaries of AI and ML capabilities.

Unmatched Performance and Cost Efficiency

The Trn2 instances are a significant leap forward compared to their predecessors, the Trn1 instances. They are four times faster, provide four times more memory bandwidth, and offer three times the memory capacity. Additionally, they deliver 30-40% better price performance than the current generation of GPU-based EC2 P5e and P5en instances. This makes them an attractive option for businesses and researchers looking to optimize their AI and ML workloads.

Each Trn2 instance is equipped with 16 Trainium2 chips, 192 vCPUs, 2 TiB of memory, and 3.2 Tbps of Elastic Fabric Adapter (EFA) v3 network bandwidth. This advanced network bandwidth offers up to 35% lower latency compared to the previous generation, ensuring faster and more efficient data processing.

Introducing Trn2 UltraServers

The Trn2 UltraServers are a completely new compute offering from AWS. These servers feature 64 Trainium2 chips connected via a high-bandwidth, low-latency NeuronLink interconnect. This configuration is designed to deliver peak performance for both inference and training on frontier foundation models. With their immense computational power, UltraServers are poised to handle the most demanding AI and ML tasks.

To put this into perspective, tens of thousands of Trainium chips are already in use, powering Amazon and AWS services. For instance, over 80,000 AWS Inferentia and Trainium1 chips supported the Rufus shopping assistant during the most recent Prime Day. The new Trainium2 chips are already being used to power latency-optimized versions of advanced models like Llama 3.1 405B and Claude 3.5 Haiku on Amazon Bedrock.

Scaling Up and Scaling Out

The growth in the size and complexity of AI models has necessitated innovative compute architectures. AWS has addressed this challenge by combining the principles of scaling up (using bigger computers) and scaling out (using more computers). The Trainium2 chip, Trn2 instance, and UltraServers exemplify this hybrid approach, offering unparalleled scalability and performance.

Key Components of the Trn2 Ecosystem:

  • NeuronCores: At the heart of the Trainium2 chip are NeuronCores, which include scalar, vector, tensor engines, and a GPSIMD core. These cores are designed for high-efficiency computation, supporting up to 2.9 TB/second of High Bandwidth Memory (HBM) bandwidth.
  • Trainium2 Chips: Each chip houses eight NeuronCores and 96 GiB of HBM, delivering up to 1.3 petaflops of dense FP8 compute and up to 5.2 petaflops of sparse FP8 compute.
  • Trn2 Instances: With 16 Trainium2 chips, these instances offer up to 20.8 petaflops of dense FP8 compute and up to 83.2 petaflops of sparse FP8 compute.
  • UltraServers: These servers combine four Trn2 instances, resulting in 64 Trainium2 chips, 6 TiB of HBM, and up to 332 petaflops of sparse FP8 compute.

UltraServers are particularly noteworthy for their ability to support training and inference at the trillion-parameter level and beyond. They are currently available in preview, and interested users can contact AWS to join the preview program.

UltraClusters: Scaling to New Heights

Trn2 instances and UltraServers are being deployed in EC2 UltraClusters, enabling distributed training across tens of thousands of Trainium chips. These clusters operate on a petabit-scale, non-blocking network and have access to Amazon FSx for Lustre high-performance storage. This setup is ideal for large-scale AI and ML projects, offering unmatched speed and efficiency.

Getting Started with Trn2 Instances

Trn2 instances are now available for production use in the US East (Ohio) AWS Region. Users can reserve up to 64 instances for up to six months using Amazon EC2 Capacity Blocks for ML. Reservations can be made up to eight weeks in advance, with instant start times and the option to extend reservations if needed. For more details, check out the official announcement.

Software Support

On the software side, AWS provides Deep Learning AMIs preconfigured with popular frameworks like PyTorch and JAX. Developers who have used the AWS Neuron SDK can easily recompile their applications for use on Trn2 instances. The SDK integrates seamlessly with essential libraries like Hugging Face, PyTorch Lightning, and NeMo, and includes optimizations for distributed training and inference.

With support for OpenXLA, including stable HLO and GSPMD, the Neuron SDK enables developers to leverage Trainium2’s compiler optimizations for maximum performance. This makes it easier than ever to build and deploy AI and ML applications on AWS’s state-of-the-art infrastructure.

As AI and ML continue to evolve, AWS’s Trn2 instances and UltraServers are setting a new standard for performance, scalability, and cost efficiency. Whether you’re training trillion-parameter models or optimizing real-time inference, these innovations are designed to meet the demands of the most ambitious projects.

Original source article rewritten by our AI can be read here.
Originally Written by: Jeff

Share

Related

Popular

bytefeed

By clicking “Accept”, you agree to the use of cookies on your device in accordance with our Privacy and Cookie policies