//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
AMD executives introduced their imaginative and prescient for AI, together with new information heart CPU and GPU merchandise, at an occasion in San Francisco on Tuesday. They displayed virtually all the pieces within the firm’s portfolio able to operating AI and wheeled out buyer after hyperscale buyer to drive dwelling the purpose that they’re succeeding in supercomputers and at hyper scale. However what everybody actually desires to know is: Can AMD’s new GPU rival competitor Nvidia’s flagship H100 GPU? And whereas it seems to be to beat the H100 in uncooked efficiency, will that be sufficient?

“AI is the defining expertise shaping the following technology of computing,” AMD CEO Lisa Su stated on the occasion. “Frankly, it’s AMD’s largest and most strategic development alternative.”
With the H100 reportedly briefly provide for the time being, on account of heightened demand for coaching and inference of massive language fashions (LLMs) like ChatGPT, there has by no means been a greater second to announce a reputable H100 competitor.
AMD’s line of information heart GPUs contains the previously-announced MI300A, which can be within the 2-exaFLOPS El Capitan supercomputer presently being commissioned on the Lawrence Livermore Nationwide Laboratory. MI300A is a 13-chiplet design with a mixture of CPU and GPU chiplets, a preferred mixture for HPC.
Throughout AMD’s occasion, Su launched particulars of the brand new MI300X—a GPU-only model of the MI300A that has swapped out MI300A’s three CPU chiplets for 2 GPU chiplets. Extra HBM reminiscence and reminiscence bandwidth has additionally been added, particularly to optimize for LLMs and different AI workloads. The GPU chiplets are on AMD’s third gen CDNA3 structure, which has a brand new compute engine with AI-friendly information codecs in comparison with the MI250 and earlier merchandise.

Total, the brand new flagship GPU has a dozen 5- and 6-nm chiplets, for 153 billion transistors whole. It options 192 GB HBM3 reminiscence with 5.2 TB/s reminiscence bandwidth. For comparability, Nvidia’s H100 is available in a model with 80 GB HBM2e, with a complete of three.3 TB/s. That places the MI300X at 2.4× the HBM capability and 1.6× the HBM bandwidth.
“With all of that additional capability, we have now a bonus for bigger fashions as a result of you may run bigger fashions immediately in reminiscence,” Su stated. “For the biggest fashions, that reduces the variety of GPUs you want, rushing up efficiency— particularly for inference—and lowering [total cost of ownership, TCO].”
In different phrases, overlook “the extra you purchase, the extra you save,” (per Nvidia CEO Jensen Huang’s 2018 speech), AMD is saying you may get away with fewer GPUs, if you wish to. The general impact is that cloud service suppliers can run extra inference jobs per GPU, decreasing the price of LLMs and making them extra accessible to the ecosystem. It additionally reduces the event time wanted for deployment, Su stated.

AMD additionally confirmed off the AMD Intuition Platform, an 8x MI300X system (with 2x AMD Genoa host CPUs) analogous to Nvidia’s HGX-H100, in OCP-compatible format. This method is meant to drop into present infrastructure shortly and simply. It will likely be sampling to key prospects in Q3.
Software program journey
The jewel in Nvidia’s crown is its mature AI and HPC software program stack, CUDA. That is usually cited as one of many key causes AI chip startups have struggled to take market share from the chief.
“No query, the software program is so essential for enabling our {hardware} to be deployed broadly,” Su stated, admitting that software program has been “a journey.”
AMD calls its AI software program stack ROCm (“Rock-’em”); Su stated a “super” quantity of progress has been made within the final yr.
In distinction to Nvidia’s CUDA, “a good portion” of ROCm is open, stated AMD President Victor Peng. The open portion contains drivers, language, runtime, instruments like AMD’s debugger and profiler, and libraries. ROCm additionally helps open frameworks, fashions and instruments, with optimized kernels for HPC and AI. AMD has been working with PyTorch to make sure day-zero help for PyTorch 2.0, and to check the PyTorch-ROCm stack works as promised.
AMD additionally introduced a brand new collaboration with HuggingFace. HuggingFace will optimize 1000’s of its fashions for AMD Intuition accelerators, in addition to different components within the AMD edge portfolio.
HuggingFace CEO Clem Delangue took the stage to speak about democratization of AI and LLMs.
“It’s actually essential that {hardware} doesn’t develop into the bottleneck or gatekeeper for AI when it develops,” he stated. “What we are attempting to do is to increase the vary of choices to AI builders for coaching and inference. We’re excited in regards to the means of AMD particularly to energy LLMs in information facilities, because of the reminiscence capability and bandwidth benefit [of the MI300X].”
Uncooked efficiency vs. timing
Whereas the MI300X seems to be to beat the H100 on uncooked efficiency, Nvidia has a number of tips up its sleeve, together with the transformer engine, a software program function that allows mixed-precision coaching of LLMs for higher throughput. On the software program facet, regardless of AMD’s progress on ROCm, and efforts like OpenAI’s Triton, Nvidia’s CUDA nonetheless dominates. Software program maturity has been an enormous problem for startups on this subject.
After which there’s timing. Nvidia H100s can be found now (if you may get maintain of 1), however MI300X isn’t set to ramp till This fall. Nvidia is predicted to unveil a brand new technology of its GPU structure in 2024, which might doubtlessly put the MI300X on the again foot as soon as once more. The cadence means AMD will perpetually be following, not main.
Future AMD GPUs will possible proceed to leverage its chiplet experience, whereas although Nvidia has multi-chip merchandise just like the Grace Hopper superchip, it has not but moved to chiplets in the identical manner.
Will this earlier transfer to chiplets work out to AMD’s benefit? It appears inevitable Nvidia must transfer to chiplets (following Intel and AMD) ultimately, however how quickly this may occur remains to be unclear.
Is the MI300X compelling sufficient to take at the least some share of the information heart AI market from Nvidia? It definitely seems to be that manner, given AMD’s present buyer base in HPC and information heart CPUs—an enormous benefit over startups.
One factor is for positive: The dimensions of this chance is greater than sufficiently big for 2 gamers.