Home GADGETS Nvidia outlines plans for using light for communication between AI GPUs by...

Nvidia outlines plans for using light for communication between AI GPUs by 2026 — silicon photonics and co-packaged optics may become mandatory for next-gen AI data centers

Nvidia outlines plans for using light for communication between AI GPUs by 2026 — silicon photonics and co-packaged optics may become mandatory for next-gen AI data centers

The extreme demands of passing communication between ever-growing clusters of AI GPUs is fueling a move towards using light for communication across the networking layers. Earlier this year, Nvidia outlined that its next-generation rack-scale AI platforms will use silicon photonics interconnects with co-packaged optics (CPO) for higher transfer rates at lower power. At the Hot Chips conference this year, Nvidia released some additional information about its next-generation Quantum-X and Spectrum-X photonics interconnection solutions and when they will arrive in 2026.

Nvidia’s roadmap will likely closely follow TSMC’s COUPE roadmap, which unfolds in three stages. The first generation is an optical engine for OSFP connectors, offering 1.6 Tb/s data transfer while reducing power consumption. The second generation moves into CoWoS packaging with co-packaged optics, enabling 6.4 Tb/s at the motherboard level. The third generation aims for 12.8 Tb/s within processor packages and targets further cuts in power and latency.

Why CPO?

In large-scale AI clusters, thousands of GPUs must behave as one system, which introduces challenges in how these processors must be interconnected: instead of each rack having its own Tier-1 (Top-of-Rack) switch linked by short copper cables, the switches are moved to the end of the row to create a consistent, low-latency fabric across multiple racks. This relocation greatly extends the distance between servers and their first switch, which makes copper impractical at speeds like 800 Gb/s, so optical connections are required for nearly every server-to-switch and switch-to-switch link.

(Image credit: Nvidia)

Using pluggable optical modules in this environment introduces clear limits: data signals in such designs leave the ASIC, travel across the board and connectors, and only then are converted to light. That method produces severe electrical loss, up to roughly 22 decibels on 200 Gb/s channels, which requires compensation that uses complex processing and increases per-port power consumption to 30W (which in turn calls for additional cooling and creates a point of potential failure), which becomes almost unbearable as the scale of AI deployments grow, according to Nvidia.

(Image credit: Nvidia)

CPO sidesteps penalties of traditional pluggable optical modules by embedding the optical conversion engine alongside the switch ASIC, so instead of traveling over long electrical traces, the signal is coupled to fiber almost immediately. As a result, electrical loss is cut to 4 decibels, and per-port power consumption is reduced to 9W. Such a layout removes numerous components that could fail and greatly simplifies the implementation of optical interconnects.

Source link