# Installation and Prerequisites

## Clone the Repository

Clone **NeMo RL** with submodules:

```sh
git clone git@github.com:NVIDIA-NeMo/RL.git nemo-rl --recursive
cd nemo-rl

# If you are already cloned without the recursive option, you can initialize the submodules recursively
git submodule update --init --recursive

# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure
# submodules are automatically updated after switching branches or pulling updates by configuring git with:
# git config submodule.recurse true

# **NOTE**: this setting will not download **new** or remove **old** submodules with the branch's changes.
# You will have to run the full `git submodule update --init --recursive` command in these situations.
```

## Install System Dependencies

### cuDNN (For Megatron Backend)

If you are using the Megatron backend on bare metal (outside of a container), you may need to install the cuDNN headers. Here is how you check and install them:

```sh
# Check if you have libcudnn installed
dpkg -l | grep cudnn.*cuda

# Find the version you need here: https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_network
# As an example, these are the "Linux Ubuntu 20.04 x86_64" instructions
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cudnn  # Will install cuDNN meta packages which points to the latest versions
# sudo apt install cudnn9-cuda-12  # Will install cuDNN version 9.x.x compiled for cuda 12.x
# sudo apt install cudnn9-cuda-12-8  # Will install cuDNN version 9.x.x compiled for cuda 12.8
```

### libibverbs (For vLLM Dependencies)

If you encounter problems when installing vllm's dependency `deepspeed` on bare-metal (outside of a container), you may need to install `libibverbs-dev`:

```sh
sudo apt-get update
sudo apt-get install libibverbs-dev
```

## Install UV Package Manager

For faster setup and environment isolation, we use [uv](https://docs.astral.sh/uv/).

Follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/) to install uv.

Quick install:
```sh
curl -LsSf https://astral.sh/uv/install.sh | sh
```

## Create Virtual Environment

Initialize the NeMo RL project virtual environment:

```sh
uv venv
```

> [!NOTE]
> Please do not use `-p/--python` and instead allow `uv venv` to read it from `.python-version`.
> This ensures that the version of python used is always what we prescribe.

## Configure cuDNN for Transformer Engine (Bare Metal Only)

> [!IMPORTANT]
> **Skip this section if you are using the NeMo RL container** — these environment variables are already set in the Dockerfile.
>
> When running on bare metal (outside a container), your system may have a different cuDNN version than the pip-installed `nvidia-cudnn-cu12` package. Transformer Engine (TE) prioritizes system libraries by default, which can cause version mismatch crashes or force fallback to slower attention backends (UnfusedDotProductAttention instead of FusedAttention).
>
> Set these environment variables before running any commands:
>
> ```sh
> # Point TE at the pip-installed cuDNN (adjust path if UV_PROJECT_ENVIRONMENT is set)
> export CUDNN_HOME=.venv/lib/python3.13/site-packages/nvidia/cudnn
> export LD_LIBRARY_PATH=".venv/lib/python3.13/site-packages/nvidia/cudnn/lib:${LD_LIBRARY_PATH:-}"
>
> # Verify TE picks up the correct cuDNN version (TE is in the mcore extra).
> # The version should match nvidia-cudnn-cu12 pinned in pyproject.toml (currently 9.19.0).
> uv run --extra mcore python -c "import transformer_engine.pytorch as te; print(te.get_cudnn_version())"
> ```

## Using UV to Run Commands

Use `uv run` to launch all commands. It handles pip installing implicitly and ensures your environment is up to date with our lock file.

```sh
# Example: Run GRPO with DTensor backend
uv run python examples/run_grpo.py

# Example: Run GRPO with Megatron backend
uv run python examples/run_grpo.py --config examples/configs/grpo_math_1B_megatron.yaml
```

> [!NOTE]
> - It is not recommended to activate the `venv`, and you should use `uv run <command>` instead to execute scripts within the managed environment.
>   This ensures consistent environment usage across different shells and sessions.
> - Ensure your system has the appropriate CUDA drivers installed, and that your PyTorch version is compatible with both your CUDA setup and hardware.
> - If you update your environment in `pyproject.toml`, it is necessary to force a rebuild of the virtual environments by setting `NRL_FORCE_REBUILD_VENVS=true` next time you launch a run.
> - **Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.