The Intel Xe-LP GPU Architecture Deep Dive: Building Up The Next Generation

Name: The Intel Xe-LP GPU Architecture Deep Dive: Building Up The Next Generation
Item: The Intel Xe-LP GPU Architecture Deep Dive: Building Up The Next Generation
Author: Ryan Smith

by Ryan Smith on August 13, 2020 9:00 AM EST

33 Comments | Add A Comment

33 Comments

Xe-LP Feature Set: DirectX FL 12_1 with Variable Rate Shading

Kicking off the proper part of our architectural deep dive, let’s start with a quick summary of Xe-LP’s graphics feature set. I call this a quick summary as there is unfortunately not a whole lot new to talk about here.

From an API-level perspective, Xe-LP’s feature set is going to be virtually identical to that of Intel’s Gen11 graphics. Not unlike AMD with their RDNA1 architecture, Intel has decided to concentrate their efforts on updating the low-level aspects of their GPU architecture, making numerous changes downstairs. As a result, relatively little has changed upstairs with regards to graphics features.

The net result is that Xe-LP is a DirectX feature level 12_1 accelerator, with a couple of added features. In particular, tier 1 variable rate shading, which was first introduced for Intel in their Gen11 hardware, is back again in Xe-LP. Though not as capable as the newer tier 2 implementation, it allows for basic VRS support, with games able to set it on a per-draw call basis. Notably, Intel remains the only vendor to support tier 1; AMD and NVIDIA have (or are) going straight to tier 2.

DirectX 12 Feature Levels
	12_2 (DX12 Ult.)	12_1
GPU Architectures	Intel: Xe-HPG? NVIDIA: Turing AMD: RDNA2	Intel: Gen9, Gen11, Xe-LP NVIDIA: Maxwell 2, Pascal AMD: Vega, RDNA (1)
Ray Tracing (DXR 1.1)	Yes	No
Variable Rate Shading (Tier 2)	Yes	No (Gen 11/Xe-LP: Tier 1)
Mesh Shaders	Yes	No
Sampler Feedback	Yes	No
Conservative Rasterization	Yes	Yes
Raster Order Views	Yes	Yes
Tiled Resources (Tier 2)	Yes	Yes
Bindless Resources (Tier 2)	Yes	Yes
Typed UAV Load	Yes	Yes

The good news for Intel, at least, is that they were already somewhat ahead of the game with Gen11, shipping 12_1 support for even their slowest integrated GPUs before AMD had phased it into all of their products. So at this point, Intel is still at parity with other integrated graphics solutions, if not slightly ahead.

The downside is that it also means that Intel is the only hardware vendor launching a new GPU/architecture in 2020 without support for the next generation of features, which Microsoft & co are codifying as DirectX 12 Ultimate. The consumer-facing trade name for feature level 12_2, DirectX Ultimate incorporates support for variable rate shading tier 2, along with ray tracing, mesh shaders, and sampler feedback. And to be fair to Intel, expecting ray tracing in an integrated part in 2020 was always a bit too much of an ask. But some additional progress would always be nice to see. Plus it puts DG1 in a bit of an odd spot, since it’s a discrete GPU without 12_2 functionality.

The Intel Xe-LP GPU Architecture Deep Dive Xe-LP By The Slice: 50% Larger With 96 EUs

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

33 Comments

View All Comments

mode_13h - Thursday, August 13, 2020 - link
As always, thanks for the deep coverage.

Not finished reading, but I already have one complaint:

> Gen11’s smallest wavefront width is 8 threads wide (SIMD8), so it can take multiple clock cycles to execute a single wavefront, with Intel interleaving multiple threads as a form of latency hiding.

Wow. Mixing 2 different definitions of "thread" in the same sentence? Please don't.

Last I checked Nvidia is the only one talking about SIMD lanes as if they're threads. In Intel's Gen 9 whitepaper, it uses "threads" in a manner equivalent to CPU threads, and they talk about SIMD lanes as SIMD lanes.

And speaking of Gen 9, they claim it has 7-way SMT. Did they ever specify this, for Gen 11? I don't recall seeing it in their Gen 11 whitepaper, which went into significantly less detail on the EUs than previous whitepapers.
mode_13h - Thursday, August 13, 2020 - link
I guess your article could be self-consistent by replacing the second use of "thread" in that quoted sentence with "wavefront"?

Although, "wavefront" is an AMD term (Nvidia calls them "Warps"). However, Intel's slides suggest they still call them "threads".
Ryan Smith - Thursday, August 13, 2020 - link
"I guess your article could be self-consistent by replacing the second use of "thread" in that quoted sentence with "wavefront"?"

You are correct sir! That was supposed to be "wavefront".

And Intel tends to use "wave" in its literature, though I prefer to collapse it down to just wavefront to keep things reasonably consistent. We don't need 2 nearly-identical terms for the same thing.
mode_13h - Thursday, August 13, 2020 - link
Cool. Thanks for the reply!

BTW, I don't mind the term "wavefront" - I said that more to point it out to those who might not know.
mode_13h - Thursday, August 13, 2020 - link
IMO, the reason Nvidia has long called their Warp elements "threads" is so they can claim that each SIMD lane is a "core", to make their GPUs *sound* more impressive.

Since Volta finally fixed their per-lane IP register (which is basically just a fancy form of branch predication), there's almost a touch of truth in that characterization, and I'd finally agree that their ISA is more than just a straight-forward combination of SIMD + SMT.
xenol - Thursday, August 13, 2020 - link
AMD feels more confusing. Their base unit is a "stream processor" which seems to suggest something larger than it really is. But a group of stream processors is called a Compute Unit, which that seems to suggest something smaller than it really is.

Though looking at some of the programming literature for GPUs, I can see where the "thread" terminology comes from. So this looks more like a problem of someone coming up with their own language instead of the industry coming together to standardize on it. However, given that NVIDIA, AMD, and Intel have their own way of doing things, it may not be possible to do that and for the sake of clarity, having their own terminology is more or less correct.
mode_13h - Thursday, August 13, 2020 - link
Since Nvidia's Fermi and AMD's GCN, their architectures basically amount to SIMD + SMT. I'm not sure exactly when Intel added SMT.

Anyway, I wouldn't characterize their architectures as fundamentally different. Intel is traditionally the most distinct, among the three.
jim bone - Friday, August 14, 2020 - link
recent editions of Hennessy and Patterson have a nice table mapping the CPU terminology to nvidia’s GPU terminology:
https://books.google.ca/books?id=cM8mDwAAQBAJ&...
jim bone - Friday, August 14, 2020 - link
and yes for reasons nvidia calls a vertical slice of simd instructions a thread
kpx86 - Thursday, August 13, 2020 - link
I believe the SW libraries like DirectX and OpenGL use threads this way.

From MSFT website: The maximum number of threads is limited to D3D11_CS_4_X_THREAD_GROUP_MAX_THREADS_PER_GROUP (768) per group.

The Intel Xe-LP GPU Architecture Deep Dive: Building Up The Next Generation

Xe-LP Feature Set: DirectX FL 12_1 with Variable Rate Shading

Post Your Comment

33 Comments

View All Comments

mode_13h - Thursday, August 13, 2020 - link

mode_13h - Thursday, August 13, 2020 - link

Ryan Smith - Thursday, August 13, 2020 - link

mode_13h - Thursday, August 13, 2020 - link

mode_13h - Thursday, August 13, 2020 - link

xenol - Thursday, August 13, 2020 - link

mode_13h - Thursday, August 13, 2020 - link

jim bone - Friday, August 14, 2020 - link

jim bone - Friday, August 14, 2020 - link

kpx86 - Thursday, August 13, 2020 - link

Log in

Don't have an account? Sign up now