Nvidia Nemo Framework

Bonisiwe Shabane

-Dec 28, 2025, 3:29 AM

NVIDIA NeMo™ Framework is a development platform for building custom generative AI models. The framework supports custom models for language (LLMs), multimodal, computer vision (CV), automatic speech recognition (ASR), natural language processing (NLP), and text to speech (TTS). More Details in Blog: Run Hugging Face Models Instantly with Day-0 Support from NVIDIA NeMo Framework. Future releases will enable support for more model families such as Video Generation models.(2025-05-19) NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS),... It is designed to help you efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints.

For technical documentation, please see the NeMo Framework User Guide. NVIDIA NeMo 2.0 introduces several significant improvements over its predecessor, NeMo 1.0, enhancing flexibility, performance, and scalability. Python-Based Configuration - NeMo 2.0 transitions from YAML files to a Python-based configuration, providing more flexibility and control. This shift makes it easier to extend and customize configurations programmatically. This document provides an overview of the NeMo framework architecture and installation procedures. For detailed information about core model classes and interfaces, see Core Model Classes and Interfaces.

For experiment management and training orchestration, see Experiment Management and Training. NeMo (Neural Modules) is an end-to-end, cloud-native framework designed to build, customize, and deploy generative AI models across speech, language, and vision domains. Built on PyTorch Lightning, NeMo provides a modular architecture where neural network components can be easily composed, trained, and deployed at scale. The framework supports state-of-the-art training techniques including mixed precision, model parallelism, distributed optimization, and various optimization strategies for efficient multi-GPU and multi-node training. Sources: docs/source/index.rst1-19 docs/source/starthere/intro.rst11-17 NeMo follows a hierarchical architecture with core base classes, domain-specific collections, and supporting infrastructure components.

Visit your regional NVIDIA website for local content, pricing, and where to buy partners specific to your country. Accelerated infrastructure, enterprise AI software, and advanced NVIDIA AI models Building blocks of AI agents designed to reason, plan, and act Optimized inference platform for fast AI model deployment AI-powered cybersecurity solutions to detect and prevent threats This post is co-written with Ranjit Rajan, Abdullahi Olaoye, and Abhishek Sawarkar from NVIDIA.

AI’s next frontier isn’t merely smarter chat-based assistants, it’s autonomous agents that reason, plan, and execute across entire systems. But to accomplish this, enterprise developers need to move from prototypes to production-ready AI agents that scale securely. This challenge grows as enterprise problems become more complex, requiring architectures where multiple specialized agents collaborate to accomplish sophisticated tasks. Building AI agents in development differs fundamentally from deploying them at scale. Developers face a chasm between prototype and production, struggling with performance optimization, resource scaling, security implementation, and operational monitoring. Typical approaches leave teams juggling multiple disconnected tools and frameworks, making it difficult to maintain consistency from development through deployment with optimal performance.

That’s where the powerful combination of Strands Agents, Amazon Bedrock AgentCore, and NVIDIA NeMo Agent Toolkit shine. You can use these tools together to design sophisticated multi-agent systems, orchestrate them, and scale them securely in production with built-in observability, agent evaluation, profiling, and performance optimization. This post demonstrates how to use this integrated solution to build, evaluate, optimize, and deploy AI agents on Amazon Web Services (AWS) from initial development through production deployment. The open source Strands Agents framework simplifies AI agent development through its model-driven approach. Developers create agents using three components: The framework includes built-in integrations with AWS services such as Amazon Bedrock and Amazon Simple Storage Service (Amazon S3), local testing support, continuous integration and continuous development (CI/CD) workflows, multiple deployment options, and OpenTelemetry...

NeMo Framework is NVIDIA's GPU accelerated, fully open-source, end-to-end training framework for large language models (LLMs), multi-modal models, diffusion and speech models. It enables seamless scaling of pretraining, post-training, and reinforcement learning workloads from single GPU to thousand-node clusters for both 🤗Hugging Face/PyTorch and Megatron models. This GitHub organization includes a suite of libraries and recipe collections to help users train models from end to end. NeMo Framework is also a part of the NVIDIA NeMo software suite for managing the AI agent lifecycle. Visit the individual repos to find out more 🔍, raise 🐛, contribute ✍️ and participate in discussion forums 🗣️! Note: The NeMo Framework is currently in the process of restructuring.

The original NeMo 2.0 repository will now focus specifically on speech-related components, while other parts of the framework are being modularized into separate libraries such as NeMo Automodel, NeMo Gym, NeMo RL, and more. This transition aims to make NeMo more modular and developer-friendly. The NeMo GitHub Org and its repo collections are created to address the following problems However, this transition also forces difficult tradeoffs. Smaller models are fast and cheap but often lack the reasoning depth, robustness, and long context capacity needed for advanced multi-agents. Larger models deliver strong accuracy, but are too slow and expensive when many agents are running in parallel.

As agentic systems grow, inference costs spiral, context windows become a bottleneck, and reliability starts to degrade, making efficiency of utmost importance. Striking the right balance is what led NVIDIA to produce the NVIDIA Nemotron 3 Nano 30B A3B, part of our Nemotron 3 family of models (Nano, Super, and Ultra). Nano utilizes a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture with a 1M-token context window. (🔥🔥🔥) enabling developers to build high-throughput, reliable agents that are more accurate, more scalable, and capable of specialized sub-tasks in long-running multi-step workflows. Figure 1: Nemotron 3 Nano matches or exceeds the accuracy of Qwen3-30B and GPT-OSS-20B while delivering dramatically higher throughput. In an 8K input / 16K output configuration on a single H200 GPU, Nano achieves 3.3x higher throughput than Qwen3-30B and 2.2x higher than GPT-OSS-20B.

Nemotron 3 Nano (30B/A3B) is our latest small-but-powerful reasoning model, building on the success of Nemotron Nano 2's hybrid Mamba-2 + Transformer architecture, reasoning ON/OFF modes, and explicit thinking budgets—while introducing a major architectural... NVIDIA NeMo Framework is a scalable and cloud-native generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (e.g. Automatic Speech Recognition and Text-to-Speech). It enables users to efficiently create, customize, and deploy new generative AI models by leveraging existing code and pre-trained model checkpoints. Setup Instructions: Install NeMo Framework NeMo Framework provides end-to-end support for developing Large Language Models (LLMs) and Multimodal Models (MMs).

It provides the flexibility to be used on-premises, in a data-center, or with your preferred cloud provider. It also supports execution on SLURM or Kubernetes enabled environments. NeMo Curator [1] is a Python library that includes a suite of modules for data-mining and synthetic data generation. They are scalable and optimized for GPUs, making them ideal for curating natural language data to train or fine-tune LLMs. With NeMo Curator, you can efficiently extract high-quality text from extensive raw web data sources. NeMo Framework provides tools for efficient training and customization of LLMs and Multimodal models.

It includes default configurations for compute cluster setup, data downloading, and model hyperparameters, which can be adjusted to train on new datasets and models. In addition to pre-training, NeMo supports both Supervised Fine-Tuning (SFT) and Parameter Efficient Fine-Tuning (PEFT) techniques like LoRA, Ptuning, and more. A comprehensive look at NVIDIA's Nemotron 3 Nano — the hybrid Mamba-Transformer MoE model with 1M context window, 4x faster inference, open weights under NVIDIA Open Model License, and what it means for agentic... NVIDIA released Nemotron 3 Nano on December 15, 2025. It's a hybrid Mamba-Transformer model with a 1 million token context window and inference speeds up to 4x faster than its predecessor. The architecture is built around efficiency: 31.6 billion total parameters with only 3.6 billion active per token through Mixture-of-Experts (MoE) routing.

The model ships under the NVIDIA Open Model License, which allows commercial use. According to NVIDIA's official announcement, Nemotron 3 Nano is the first in a family of models designed for agentic AI workloads. View Nemotron 3 Nano details on LLM Stats -> The Nemotron 3 Nano API is available through Baseten, DeepInfra, Fireworks, FriendliAI, OpenRouter, and Together AI.

Nvidia Nemo Framework

People Also Search

NVIDIA NeMo™ Framework Is A Development Platform For Building Custom

For Technical Documentation, Please See The NeMo Framework User Guide.

For Experiment Management And Training Orchestration, See Experiment Management And

Visit Your Regional NVIDIA Website For Local Content, Pricing, And

AI’s Next Frontier Isn’t Merely Smarter Chat-based Assistants, It’s Autonomous