Intel is keeping things competitive: the company’s new $11,600 flagship 64-core Emerald Rapids 5th-Gen Xeon Platinum 8592+ arrives as part of a complete refresh of the company’s Xeon product stack as it grapples with AMD’s EPYC Genoa lineup of processors that continue to chew away Intel’s market share. Our benchmarks in this article show that Emerald Rapids delivers surprisingly impressive performance uplifts, drastically improving Intel’s competitive footing against AMD’s competing chips. Critically, Intel’s new chips have also arrived on schedule, a much-needed confirmation that the company’s turnaround remains on track.
For Emerald Rapids, Intel has added four more cores to the flagship over the company’s prior-gen chips, providing up to 128 cores and 256 threads per dual-socket server, tripled the L3 cache, and moved to faster DDR5-5600 memory across the breadth of its product stack. In concert with other targeted enhancements, including a significant redesign of the die architecture, the company claims these enhancements provide gen-on-gen gains of 42% in AI inference, 21% more performance in general compute workloads, and 36% higher performance-per-watt.
As with the previous-gen Sapphire Rapids processors, Emerald Rapids leverages the ‘Intel 7’ process, albeit a more refined version of the node, and the slightly-enhanced Raptor Cove microarchitecture. However, the new Emerald Rapids server chips come with plenty of new innovations and design modifications that far exceed what we’ve come to expect from a refresh generation — Intel moved from the complex quad-chiplet design for the top-tier Sapphire Rapids chips to a simpler two-die design that wields a total of 61 billion transistors, with the new die offering a more consistent latency profile. Despite the redesign, Emerald Rapids still maintains backward compatibility with the existing Sapphire Rapids ‘Birch Stream’ platform, reducing validation time and allowing for fast market uptake of the new processors.
Emerald Rapids still trails in terms of overall core counts — AMD’s Genoa tops out at 96 cores with the EPYC 9654, a 32-core advantage. As such, Emerald Rapids won’t be able to match Genoa in many of the densest general compute workloads; the latter’s 50% core count advantage is tough to beat in most parallel workloads. However, Intel’s chips still satisfy the requirements for the majority of the market — the highest-tier chips always comprise a much smaller portion of the market than the mid-range — and leans on its suite of in-built accelerators and performance in AI workloads to tackle AMD’s competing 64-core chips with what it claims is a superior blend of performance and power efficiency.
There’s no doubt that Emerald Rapids significantly improves Intel’s competitive posture in the data center, but AMD’s Genoa launched late last year, and the company’s Zen 5-powered Turin counterpunch is due in 2024. Those chips will face Intel’s Granite Rapids processors, which are scheduled for the first half of 2024. A new battlefield has also formed — AMD has its density-optimized Bergamo with up to 128 cores in the market, and Intel will answer with its Sierra Forest lineup with up to 288 cores early next year.
It’s clear that the goalposts will shift soon for the general-purpose processors we have under the microscope today; here’s how Intel’s Emerald Rapids stacks up against AMD’s current roster.
Intel Emerald Rapids 5th-Gen Xeon Specifications and Pricing
Intel’s fifth-gen Xeon lineup consists of 32 new models divided into six primary swimlanes (second slide), including processors designed for the cloud, networking, storage, long-life use, single-socket models, and processors designed specifically for liquid-cooled systems. The stack is also carved into Platinum, Gold, Silver, and Bronze sub-tiers. Notably, Intel hasn’t listed any chips as scalable to eight sockets, a prior mainstay for Xeon. Now the series tops out at support for two sockets. Intel also offers varying levels of memory support, with eight-channel speeds spanning from DDR5-4400 to DDR5-5600. In contrast, all of AMD’s Genoa stack supports 12 channels of DDR5-4800 memory.
Intel seemingly has a SKU for every type of workload, but Emerald Rapids’ 32-chip stack actually represents a trimming of Intel’s Xeon portfolio — the previous-gen roster had 52 total options. In contrast, AMD’s EPYC Genoa 9004 Series family spans 18 models in three categories — Core Performance, Core Density, and Balanced and Optimized — creating a vastly simpler product stack.
Emerald Rapids continues Intel’s push into acceleration technologies that can be purchased outright or through a pay-as-you-go model. These purpose-built accelerator regions of the chip are designed to radically boost performance in several types of work, like compression, encryption, data movement, and data analytics (QAT, DSA, DLB, IAA), which typically require discrete accelerators for maximum performance. Each chip can have a variable number of accelerator ‘devices’ enabled, but the ‘+’ models have at least one accelerator of each type enabled by default.
Emerald Rapids’ TDPs range from 125W to 350W for the standard models, but the liquid-cooling-optimized chips peak at 385W. In contrast, AMD’s standard chips top out at 360W but also have a 400W configurable TDP rating.
Model | Price | Cores/Threads | Base/Boost (GHz) | TDP | L3 Cache (MB) | cTDP (W) |
---|---|---|---|---|---|---|
EPYC Genoa 9654 | $11,805 | 96 / 192 | 2.4 / 3.7 | 360W | 384 | 320-400 |
Intel Xeon 8592+ (EMR) | $11,600 | 64 / 128 | 1.9 / 3.9 | 350W | 320 | – |
Intel Xeon 8490H (SPR) | $17,000 | 60 / 120 | 1.9 / 3.5 | 350W | 112.5 | – |
Intel Xeon 8480+ (SPR) | $10,710 | 56 / 112 | 2.0 / 3.8 | 350W | 105 | – |
EPYC Genoa 9554 | $9,087 | 64 / 128 | 3.1 / 3.75 | 360W | 256 | 320-400 |
Intel Xeon 8562Y+ (EMR) | $5,945 | 32 / 64 | 2.8 / 4.1 | 300W | 60 | – |
Intel Xeon 8462Y+ (SPR) | $3,583 | 32 / 64 | 2.8 / 4.1 | 300W | 60 | – |
EPYC Genoa 9354 | $3,420 | 32 / 64 | 3.25 / 3.8 | 280W | 256 | 240-300 |
Intel Xeon 4516Y+ (EMR) | $1,295 | 24 / 48 | 2.2 / 3.7 | 185W | 45 | – |
Intel Xeon 6442Y (SPR) | $2,878 | 24 / 48 | 2.6 / 3.3 | 225W | 60 | Row 9 – Cell 6 |
EPYC Genoa 9254 | $2,299 | 24 / 48 | 2.9 / 4.15 | 200W | 128 | 200-240 |
EPYC Genoa 9374F | $4,850 | 32 / 64 | 3.85 / 4.3 | 320W | 256 | 320-400 |
EPYC Genoa 9274F | $3,060 | 24 / 48 | 4.05 / 4.3 | 320W | 256 | 320-400 |
The presence, or lack thereof, of Intel’s in-built accelerators makes direct pricing comparisons to AMD’s Genoa difficult — especially when accounting for the possibility of a customer purchasing additional acceleration functions.
The Intel Xeon Platinum 8592+ has 64 cores and 128 threads, four more cores than Sapphire Rapids’ peak of 60 cores with the pricey specialized 8490H. However, the 8592+ has eight more cores than Intel’s last-gen general-purpose flagship, the 8480+. As denoted by its ‘+’ suffix, the 8592+ has one of each of the in-built accelerators activated. This is upgradeable to four units of each type of accelerator — for an additional fee (this is typically offered through OEMs, so pricing varies).
The 8592+’s cores run at a base of 2.0 GHz but can boost up to 3.0 GHz for all cores or 3.8 GHz on a single core. The chip is armed with 320MB of L3 cache — more than triple that of its prior-gen comparable. Intel’s decision to boost L3 capacity will benefit a host of workloads, but there’s a caveat. As we’ll cover below, Emerald Rapids processors can have one of three different die configurations, and only the highest-end die (40 cores and higher) has the tripled cache capacity. Meanwhile, the 32-core and lesser models use a die that generally has the same amount of cache as the prior-gen.
Intel’s processors now support up to DDR5-5600 in 1DPC mode (DDR5-4800 for 2DC), an improvement over the prior-gen’s DDR5-4800. Intel has also tuned the UPI links to 20GT/s, a slight increase over the previous 16 GT/s.
All of the Emerald Rapids chips support the following:
- LGA4677 Socket / Eagle Stream platform
- Hyperthreading
- Eight channels of DDR5 memory: Top-tier models run at up to DDR5-5600 (1DPC) and DDR5-4800 (2DPC), but speeds vary by model
- 80 Lanes of PCIe 5.0 (EPYC Genoa has 128 lanes of PCIe 5.0)
- Up to 6TB of memory per socket (same as Genoa)
- CXL Type 3 memory support (Genoa also has support for Type 3)
- AMX, AVX-512, VNNI, Bfloat 16 (Genoa does not support AMX)
- UPI speed increased from 16 GT/s to 20 GT/s
Intel employed two types of die with its last-generation Sapphire Rapids: an XCC die design that was used in standard and mirrored configurations for the four-tile chips that stretched up to 60 cores and a single MCC die for chips 32 cores and under.
Intel has moved to three different die designs for Emerald Rapids: An XCC die that is only used for the two-tile designs (up to 64 cores), a monolithic MCC die for models from 20+ to 32 cores, and a new monolithic EE LLC die for models with 20 cores or fewer.
Each XCC die has 30.5 billion transistors, making for a total of 61 billion transistors for the dual-XCC models. Despite stepping back to a dual-tile design, Emerald Rapids still employs roughly the same amount of die area as the Sapphire Rapids processors. Each XCC die comes with 33 cores, but one is disabled to defray the impact of defects during manufacturing. Intel says the die area for the XCC and MCC tiles is similar, though it hasn’t shared exact measurements yet.
Intel’s older quad-XCC-tile Sapphire Rapids design employed ten different EMIB interconnects to stitch them together, but this added latency and variability issues and, thus, performance penalties in many types of workloads — not to mention design complexity. Emerald Rapids’ new dual-XXC-tile only employs three EMIB connections between the die, thus easing latency and variability concerns and improving performance while reducing design complexity. The reduced EMIB connections help in multiple ways: The associated reduction in die area devoted to the Network on Chip (NoC) and reduced data traffic saved 5 to 7% of overall chip power, which Intel then diverted to the power-hungry areas of the chip to deliver more performance.
As before, Intel provides an option to logically partition the die to attempt to keep workloads on the same die (sub-NUMA cluster – SNC), thus avoiding the latency penalties of accessing another die. However, instead of offering up to four clusters as it did with the quad-tile Sapphire Rapids, Intel now only supports up to two clusters, one for each die, to optimize Emerald Rapids for latency-sensitive applications. The chip operates as one large cluster by default. Intel says this is optimal for the majority of workloads. In the above slides, you can see the latency impact of the different SNC configurations.
Intel’s MCC and EE LLC dies are custom designs — Intel made separate designs for these dies so it could discard some of the unused functions, like the EMIB connections, to save die area and reduce complexity. The second slide in the above album outlines Emerald Rapids’ new alignment of the functional elements of the die, like the memory, PCIe, and UPI controllers.
Intel earmarked power reductions as a key focus area with Emerald Rapids. Most data center processors operate at 30 to 40% utilization in normal use (cloud is typically 70%), so Intel improved multiple facets of the design, including optimizing the cores and SoC interconnect for lower utilization levels. In tandem with the extra efficiencies wrung from the newer revision of the Intel 7 node, the company claims it has reduced power consumption by up to 110W when the system is at lower load levels. Intel claims the chips still deliver the same level of performance even when running in the Optimized Power Mode (OPM).
Intel also expanded upon its AVX and AMX licensing classes (explained here) to expose up to two bins of higher performance under heavy vectorized workloads.
Intel has also expanded support for CXL memory beyond the fledgling foray with Sapphire Rapids — Emerald Rapids now also supports Type 3 memory expansion devices, thus enabling new memory tiering and interleaving options. Adding externally connected memory increases both capacity and bandwidth, thus enabling the chip to perform similarly to having more memory channels at its disposal, albeit at the cost of the normal amount of latency associated with the CXL interconnect.