Public
Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

NCCL

Optimized primitives for inter-GPU communication.

Introduction

NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.

For more information on NCCL usage, please refer to the NCCL documentation.

Build

Note: the official and tested builds of NCCL can be downloaded from: https://developer.nvidia.com/nccl. You can skip the following build steps if you choose to use the official builds.

To build the library :

$ cd nccl $ make -j src.build

If CUDA is not installed in the default /usr/local/cuda path, you can define the CUDA path with :

$ make src.build CUDA_HOME=<path to cuda install>

NCCL will be compiled and installed in build/ unless BUILDDIR is set.

By default, NCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining NVCC_GENCODE (defined in makefiles/common.mk) to only include the architecture of the target platform :

$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"

Install

To install NCCL on the system, create a package then install it as root.

Debian/Ubuntu :

$ # Install tools to create debian packages $ sudo apt install build-essential devscripts debhelper fakeroot $ # Build NCCL deb package $ make pkg.debian.build $ ls build/pkg/deb/

RedHat/CentOS :

$ # Install tools to create rpm packages $ sudo yum install rpm-build rpmdevtools $ # Build NCCL rpm package $ make pkg.redhat.build $ ls build/pkg/rpm/

OS-agnostic tarball :

$ make pkg.txz.build $ ls build/pkg/txz/

Python wheel :

$ # Install uv to create the Python wheel (uv manages Python deps in a venv) $ # See: https://docs.astral.sh/uv/getting-started/installation/ $ curl -LsSf https://astral.sh/uv/install.sh | sh $ # Build NCCL Python wheel (this also builds the .txz archive as an intermediate) $ make pkg.python_wheel.build $ ls build/pkg/python_wheel/

Tests

Tests for NCCL are maintained separately at https://github.com/nvidia/nccl-tests.

$ git clone https://github.com/NVIDIA/nccl-tests.git $ cd nccl-tests $ make $ ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus>

Copyright

All source code and accompanying documentation is copyright (c) 2015-2020, NVIDIA CORPORATION. All rights reserved.

关于 About

Optimized primitives for collective multi-GPU communication
communicationscppcudadeep-learninggpunvidia

语言 Languages

C++54.5%
Cuda15.8%
C14.4%
Python9.6%
Cython3.7%
Makefile1.0%
CMake0.8%
Shell0.1%
Dockerfile0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
605
Total Commits
峰值: 95次/周
Less
More

核心贡献者 Contributors