Home 9 Computing 9 NVIDIA BlueField-4 Powers AI Inference Context Memory Storage

NVIDIA BlueField-4 Powers AI Inference Context Memory Storage

by Joydeep Kar | Jan 13, 2026

A processing platform integrates inference context memory storage, extends GPU memory capacity, enables high-speed node sharing, increases tokens per second up to fivefold, and improves power efficiency versus traditional storage

LAS VEGAS, NV (CES 2026), Jan 13, 2026 – NVIDIA said its BlueField-4 data processor supports the NVIDIA Inference Context Memory Storage platform, which is designed to handle context data used during AI inference. The storage platform targets AI inference workloads that need access to model context data. It offloads data movement and storage processing from servers to support AI infrastructure.

Large AI models rely on multistep reasoning and generate extensive context data. This data is stored in a key-value (KV) cache that plays a central role in model accuracy, continuity, and user experience.

Storing KV cache data on GPUs limits real-time inference in multi-agent AI systems. As a result, AI-native applications need scalable infrastructure that can store and share data effectively.

NVIDIA Inference Context Memory Storage platform extends GPU memory capacity and enables high-speed sharing across nodes. The approach increases tokens per second by up to 5x and improves power efficiency by up to 5x compared to traditional storage systems.

“AI is revolutionizing the entire computing stack – and now, storage,” said Jensen Huang, founder and CEO of NVIDIA. “AI is no longer about one-shot chatbots but intelligent collaborators that understand the physical world, reason over long horizons, stay grounded in facts, use tools to do real work, and retain both short- and long-term memory. With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”

The platform expands KV cache capacity and enables context sharing across rack-scale AI systems. By maintaining persistent context for multi-turn AI agents, it improves responsiveness, raises AI factory throughput, and supports long-context, multi-agent inference.

Key capabilities of the NVIDIA BlueField-4-powered platform include:

NVIDIA Rubin cluster-level KV cache capacity, to support long-context, multi-turn agentic inference.
Up to 5x greater power efficiency than traditional storage.
Smart KV cache sharing across AI nodes uses NVIDIA DOCA framework, NIXL library and Dynamo software to increase tokens per second, reduce time to first token, and improve multi-turn responsiveness.
Hardware-accelerated KV cache placement on NVIDIA BlueField-4 reduces metadata overhead, limits data movement, and ensures secure, isolated access from the GPU nodes.
NVIDIA Spectrum-X Ethernet enables RDMA-based access to AI-native KV cache across AI systems.

AIC, Cloudian, DDN, Dell Technologies, HPE, Hitachi Vantara, IBM, Nutanix, Pure Storage, Supermicro, VAST Data and WEKA are developing AI storage platforms based on BlueField-4, expected to be available in the second half of 2026.

Source: NVIDIA

About NVIDIA

NVIDIA, founded in 1993 and headquartered in Santa Clara, CA, designs and manufactures graphics processing units, systems on chips, networking hardware, and AI intelligence software such as CUDA. Its products serve industries including gaming, data centers, autonomous vehicles, professional visualization, robotics, health care, and energy. The company introduced the GPU in 1999 and later expanded into accelerated computing and AI infrastructure. In gaming, its GPUs support high-performance rendering, while in AI and high-performance computing, its systems provide the infrastructure for training and deploying large-scale models. NVIDIA also develops tools for robotics and autonomous driving.