Engineering at Meta logoEngineering at Meta
|Machine Learning

RCCLX: Innovating GPU communications on AMD platforms

2026-02-24
8 min read
0

Endigest AI Core Summary

Meta open-sources RCCLX, an enhanced GPU communication library for AMD platforms that significantly improves AI training and inference performance.

  • RCCLX is an enhanced version of RCCL integrated with the Torchcomms API, enabling a single cross-platform API for GPU communications across AMD and NVIDIA backends
  • Direct Data Access (DDA) reduces AllReduce latency from O(N) to O(1) using flat and tree algorithms, achieving 10-50% improvement over the RCCL baseline on AMD MI300X for decode workloads
  • DDA delivers approximately 10% reduction in time-to-incremental-token (TTIT) during the LLM decoding phase
  • Low Precision (LP) collectives use FP8 quantization for up to 4:1 compression, reducing communication overhead for large messages (>=16MB) via parallel P2P mesh communication over AMD Infinity Fabric
  • LP collectives yield ~9-10% latency decrease and ~7% throughput increase in E2E inference with only ~0.3% accuracy delta on GSM8K, enabled via the RCCL_LOW_PRECISION_ENABLE=1 environm
Tags:
#AI Research
#Data Center Engineering
#ML Applications
#Networking & Traffic