The NVIDIA Collective Communications Library (NCCL) implements multi-GPU
and multi-node collective communication primitives that are performance
optimized for NVIDIA GPUs. NCCL provides routines such as all-gather,
all-reduce, broadcast, reduce, reduce-scatter, that are optimized to
achieve high bandwidth over PCIe and NVLink high-speed interconnect.