Multi-Node Multi-GPU
Properties
Pros:
- GPUs are distributed across multiple machines
- Within the node: NVLink
- Between nodes: Networks(Ethernet, InfiniBand)
- Strong scalability
- Cost is relative flexible
Cons:
- Lower Bandwidth
- Higher Latency
- May exist network issue
Examples:
- Cloud-based training cluster