
Researchers at Imec presented new findings at the 2025 IEEE International Electron Devices Meeting on stacking high-bandwidth memory (HBM) directly on top of graphical processing units (GPUs), a design that could dramatically increase memory bandwidth and compute density for future AI accelerators, tells IEEE Spectrum. In most modern AI systems, HBM sits beside the GPU on a 2.5D interposer; this layout minimizes data travel but still leaves room for improvement. By placing HBM directly above the GPU in a true 3D stack, bits could flow far more quickly between memory and processor, potentially boosting effective bandwidth by as much as four times over current designs.
The main obstacle for such tight integration is heat. Imec’s thermal simulations showed that, without mitigation, stacking HBM atop the GPU can roughly double operating temperature compared with conventional 2.5D packaging, pushing levels to more than 140°C—too hot for reliable operation. To address that, the team applied a system-technology co-optimization strategy. By merging memory stacks into wider units to eliminate heat-trapping regions, thinning the topmost dies, and adding thermal silicon and cooling on both the top and underside of the package, researchers brought peak temperature back down to near 70°C, roughly in line with current GPU thermal limits.
Another effective lever involved slowing the GPU clock. Because memory bandwidth, not raw logic speed, limits many AI workloads, slowing the processor while massively increasing memory throughput could still yield overall performance gains while cutting heat output.
Though the initial results are promising, Imec researchers emphasize that more work is needed to assess whether HBM-on-GPU will become a practical architecture. Other configurations, such as putting the GPU under the memory or exploring alternative cooling schemes, remain under investigation.