Home 9 AI 9 Blackwell Ultra Sets the New Benchmark

Blackwell Ultra Sets the New Benchmark

by Ruchika Saini, AI | Sep 15, 2025

Nvidia leads MLPerf Inference with stronger reasoning, energy efficiency, and small-model performance.

Nvidia topped MLPerf’s new reasoning benchmark with its new Blackwell Ultra GPU, packaged in a GB300 rack-scale design (source: Nvidia).

MLPerf’s latest inference benchmark round reveals that machine learning workloads are growing in scale and complexity and that Nvidia’s Blackwell Ultra GPU is pulling ahead. IEEE Spectrum reports that MLPerf added new tests this round, including a reasoning model called Deepseek R1 (671B parameters), which outpaces previous large-language-model tests such as Llama3.1-403B. It also introduced a “small” LLM benchmark (Llama3.1-8B), voice-to-text tasks (Whisper-large-v3), plus existing benchmarks such as Llama2-70B. The trend is clear: reasoning and inference at many scales, not just gargantuan models.

Nvidia’s Blackwell Ultra, in a GB300 rack scale configuration, came out on top in the biggest benchmarks (Deepseek R1 and Llama3.1-405B), delivering high performance per accelerator. Improvements include more memory capacity, faster attention-layer acceleration, upgraded compute, and better memory/bandwidth. Two technical wins stood out: adoption of a new 4-bit floating point format (NVFP4) that reduces compute cost without too much loss in accuracy; and “disaggregated serving,” which splits inference into two stages (prefill and generation/decoding), assigning different GPU groups to each to optimize for their specific demands. These changes yield nearly 50% performance gains in certain workloads.

AMD also made strong strides. Their MI355X GPU improved significantly over past models, especially in “open” benchmark categories (where some model modifications are allowed). It showed large gains in high-bandwidth memory use, 4-bit computation, and overall throughput. Intel showed that CPUs can still compete for certain tasks, and finally submitted a GPU entry (Intel Arc Pro) to demonstrate that GPUs aren’t the only path, but the top performance remains with Nvidia.

MLPerf’s newest round underscores that inference demands are evolving: bigger models, more varied tasks, lower latency requirements. Nvidia’s Blackwell Ultra is currently leading, but competitors are ramping up. Innovations such as low-bit precision and task-stage specialization are turning into differentiators. Regular inference evaluation is becoming more complex and more essential.