Github

代码库

Scalable toolkit for efficient model reinforcement
Python
Scalable data pre processing and curation toolkit for LLMs
Python
datadata-curationdata-prepdata-preparationdata-processingdata-processing-pipelinesdata-qualitydatacurationdatarecipesdeduplicationfast-data-processingfine-tuninglarge-language-modelslarge-scale-data-processingllmllm-data-qualityllmappspythonsemantic-deduplication
🕵️ NeMo Anonymizer: Detect and protect PII through context-aware replacement and rewriting
Python
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
Python
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
Python
agentic-aidata-augmentationdata-generationllmmcpmultimodalnemonvidiasdgsynthetic-datatool-use
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Python
deepseek-v3-2finetuninggemma3gemma3nglmgpt-osshuggingfacekimi-k2llamallama3llmminimax-m2mistralopenaipytorchqwenqwen3qwen3-nextstepfunvlm