
NVIDIA has released the Nemotron 3 family of open AI models, datasets, and libraries to support development of agent-based AI systems.
The Nemotron 3 models – with Nano, Super and Ultra sizes – introduce hybrid latent mixture-of-experts (MoE) architecture that helps developers build and deploy multi-agent systems.
Organizations are adopting multi-agent AI systems, increasing development complexity and inference costs. Nemotron 3 is designed to support transparent development of agentic AI workflows.
“Open innovation is the foundation of AI progress,” said Jensen Huang, founder and CEO of NVIDIA. “With Nemotron, we’re transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale.”
NVIDIA Nemotron supports NVIDIA’s sovereign AI efforts, with organizations from Europe to South Korea adopting open, transparent and efficient models that allow them to build AI systems aligned to their own data, regulations and values.
“NVIDIA and ServiceNow have been shaping the future of AI for years, and the best is yet to come,” said Bill McDermott, chairman and CEO of ServiceNow. “Today, we’re taking a major step forward in empowering leaders across all industries to fast-track their agentic AI strategy. ServiceNow’s intelligent workflow automation combined with NVIDIA Nemotron 3 will continue to define the standard with unmatched efficiency, speed and accuracy.”
Developers are combining proprietary reasoning models with efficient open models as multi-agent AI systems grow. Routing tasks between frontier models and Nemotron within a single workflow manages costs while optimizing tokenomics.
“Perplexity is built on the idea that human curiosity will be amplified by accurate AI built into exceptional tools, like AI assistants,” said Aravind Srinivas, CEO of Perplexity. “With our agent router, we can direct workloads to the best fine-tuned open models, like Nemotron 3 Ultra, or leverage leading proprietary models when tasks benefit from their unique capabilities – ensuring our AI assistants operate with exceptional speed, efficiency and scale.”
Startups are using the open Nemotron 3 models to speed AI agent development from prototype to deployment. Companies in the portfolios of General Catalyst, Mayfield, and Sierra Ventures are testing Nemotron 3 to support human-AI collaboration.
“NVIDIA’s open model stack and the NVIDIA Inception program give early-stage companies the models, tools and a cost-effective infrastructure to experiment, differentiate and scale fast,” said Navin Chaddha, managing partner at Mayfield. “Nemotron 3 gives founders a running start on building agentic AI applications and AI teammates, and helps them tap into NVIDIA’s massive installed base.”
Nemotron 3 Reinvents Multi-Agent AI with Efficiency and Accuracy
The Nemotron 3 family of MoE models includes three sizes:
- Nemotron 3 Nano, 30-billion-parameter model that activates up to 3 billion parameters at a time for targeted, efficient tasks.
- Nemotron 3 Super, a reasoning model with approx. 100 billion parameters and up to 10 billion active per token, for multi-agent applications.
- Nemotron 3 Ultra, a reasoning engine with about 500 billion parameters and up to 50 billion active per token, for complex AI applications.
Nemotron 3 Nano targets low-cost inference for common workloads, including debugging, content summarization, AI assistant workflows, and information retrieval. The model uses a hybrid MoE architecture to scale efficiently.
This design achieves up to 4x higher token throughput than Nemotron 2 Nano and reduces reasoning-token generation by up to 60%, lowering inference costs. With a 1-million-token context window, Nemotron 3 Nano retains context across long, multi-step tasks.
Nemotron 3 Super targets applications that require multiple AI agents working together with low latency. Nemotron 3 Ultra supports AI workflows that involve complex reasoning and long-term planning. Both models use NVIDIA’s NVFP4 4-bit training format on the Blackwell architecture, reducing memory use and training time. This allows larger models to train on existing hardware while maintaining accuracy comparable to higher-precision formats.
Developers can use the Nemotron 3 model family to match open models to specific workloads. The approach supports scaling from dozens to hundreds of agents while handling long-horizon reasoning in complex workflows.
New Open Tools and Data for AI Agent Customization
NVIDIA also released a collection of training datasets and reinforcement learning libraries available to anyone building specialized AI agents.
Three trillion tokens of new Nemotron pretraining, post-training and reinforcement learning datasets supply the reasoning, coding and multistep workflow examples needed to create domain-specialized agents. The Nemotron Agentic Safety Dataset provides telemetry to help teams evaluate and strengthen the safety of agent systems.
To accelerate development, NVIDIA released the NeMo Gym and NeMo RL open-source libraries, which provide the training environments and post-training foundation for Nemotron models, along with NeMo Evaluator to validate model safety and performance. All tools and datasets are available on GitHub and Hugging Face.
Nemotron 3 is supported by LM Studio, llama.cpp, SGLang and vLLM. In addition, Prime Intellect and Unsloth are integrating NeMo Gym’s training environments into their workflows, giving teams access to reinforcement learning training.
Get Started with NVIDIA Open Models
Nemotron 3 Nano is available on Hugging Face, Baseten, DeepInfra, Fireworks, FriendliAI, OpenRouter and Together AI.
Nemotron is offered on enterprise AI and data infrastructure platforms, including Couchbase, DataRobot, H2O.ai, JFrog, Lambda and UiPath. For customers on public clouds, Nemotron 3 Nano will be available on AWS via Amazon Bedrock (serverless) as well as supported on Google Cloud, CoreWeave, Crusoe, Microsoft Foundry, Nebius, Nscale and Yotta soon.
Nemotron 3 Nano is available as NVIDIA NIM microservice for secure, scalable deployment anywhere on NVIDIA-accelerated infrastructure for privacy and control.
Nemotron 3 Super and Ultra are expected to be available in the first half of 2026.
Source: NVIDIA
About NVIDIA
![]()
NVIDIA, founded in 1993 and headquartered in Santa Clara, CA, designs and manufactures graphics processing units, systems on chips, networking hardware, and AI intelligence software such as CUDA. Its products serve industries including gaming, data centers, autonomous vehicles, professional visualization, robotics, health care, and energy. The company introduced the GPU in 1999 and later expanded into accelerated computing and AI infrastructure. In gaming, its GPUs support high-performance rendering, while in AI and high-performance computing, its systems provide the infrastructure for training and deploying large-scale models. NVIDIA also develops tools for robotics and autonomous driving.