Nvidia Details RTX 30-Series Core Enhancements

2020-09-05

Nvidia revealed the on September 1, celebrating the 21st anniversary of its first GPU, the GeForce 256. The features and specifications certainly look impressive, as you can read more in our , , and breakdowns. However, we ended up with quite a few questions, and Nvidia provided plenty of additional information that we're summarizing here. We'll be adding much of this to our main Ampere architecture hub, so this is just the new details.

(Image credit: Nvidia)

First, let's talk about the Ampere streaming multiprocessor (SM). The biggest change for gaming is likely the doubling of FP32 performance. Each SM now has two FP32 clusters, providing for up to 128 FMA (fused multply-add) operations per cycle. Half of these are full FP32 + INT cores, while the other half is FP32 only. That might sound like a potential problem, but generally speaking (particularly for gaming workloads) FP32 is the most important, INT less so. It's a balanced approach to boost overall performance without bloating the core too much.

To help feed the beast (TM!), the data path was doubled, along with L1 bandwidth. L1 capacity is also 33% larger, with twice the partition size.

One of the other changes made is that Ampere can simultaneously run work through the CUDA cores, RT cores, and Tensor cores. This allows a game to run DLSS to upscale one frame while at the same time doing the CUDA and RT calculations for the next frame, cutting down on rendering time and improving overall performance.

(Image credit: Nvidia)

For the RT cores, Ampere also added functionality to interpolate triangle position. This is particularly important for things like motion blur, where not every triangle used to render a scene is at the same position or time. I'm still not a huge fan of motion blur in games, even if it might be more realistic looking, but whatever. This change potentially speeds up ray traversal by 8X, so it's an important addition.

That's it for the truly new information. Much of the remainder is previously known details, but we've provided the full slide deck below for those who want to see more. There are additional details looking into the performance of Wolfentstein Youngblood, as well as RTX IO (which we've covered elsewhere in our discussion of ).

Image 1 of 50

(Image credit: Nvidia)
Image 2 of 50

(Image credit: Nvidia)
Image 3 of 50

(Image credit: Nvidia)
Image 4 of 50

(Image credit: Nvidia)
Image 5 of 50

(Image credit: Nvidia)
Image 6 of 50

(Image credit: Nvidia)
Image 7 of 50

(Image credit: Nvidia)
Image 8 of 50

(Image credit: Nvidia)
Image 9 of 50

(Image credit: Nvidia)
Image 10 of 50

(Image credit: Nvidia)
Image 11 of 50

(Image credit: Nvidia)
Image 12 of 50

(Image credit: Nvidia)
Image 13 of 50

(Image credit: Nvidia)
Image 14 of 50

(Image credit: Nvidia)
Image 15 of 50

(Image credit: Nvidia)
Image 16 of 50

(Image credit: Nvidia)
Image 17 of 50

(Image credit: Nvidia)
Image 18 of 50

(Image credit: Nvidia)
Image 19 of 50

(Image credit: Nvidia)
Image 20 of 50

(Image credit: Nvidia)
Image 21 of 50

(Image credit: Nvidia)
Image 22 of 50

(Image credit: Nvidia)
Image 23 of 50

(Image credit: Nvidia)
Image 24 of 50

(Image credit: Nvidia)
Image 25 of 50

(Image credit: Nvidia)
Image 26 of 50

(Image credit: Nvidia)
Image 27 of 50

(Image credit: Nvidia)
Image 28 of 50

(Image credit: Nvidia)
Image 29 of 50

(Image credit: Nvidia)