We have 18 guests online

Login Form



Remember me
Password Reminder

ASUS ENGTX480 (nVidia Fermi) Print E-mail
Written by Michael Schuette   
Jul 19, 2010 at 09:21 AM

Growing up in the last millennium and reading a lot of science fiction, not to mention going on the daily quests for UFOs, has the advantage that quite a few of the new names popping up in the tech world are, in fact, very old acquaintances. The Little Green Men from Mars, the Arecibo message and finally the Fermi paradox were things that any geek had to be familiar with. I mean, you didn’t even have to be a geek, hi-school alone already qualified.

So I have been looking at Fermi for the last 60 years and finally the little green men, this time on sabbatical in Santa Clara, came out with it. Not a sex toy this time, though undeniably sexy, it is still somewhat different from what I anticipated half a century ago – no I am lying, I am not that old yet.

To get back to the topic at hand, we are looking at nVidia’s Fermi graphics processor / general purpose graphics processing unit and, truth be told, we have been hearing about it almost as long as about the Fermi paradox. But it is finally here.

The first substantiated rumors and semi-facts about nVidia’s Femi architecture a.k.a GF100 GPU surfaced during the summer of 2009 and at least according to the PR machinery behind it, it was going to be nothing like anything that had been there before. And then, silence struck again. There were a few press briefings to kindle the fires while AMD released their 5000 series and unleashed performance like nothing that had been there before. And then, there was again, nothing from nVidia.

Arguably, the difficulties of manufacturing ICs increase exponentially with complexity of the design and with die size. Add a new, un-proven fab process and there is a recipe for some major handicaps. It would be lopsided to claim that nVidia was the only company affected by the difficulties at TSMC to deliver sufficient yields of their 40 nm process but on the other hand, as mentioned above, the GF100 GP-GPU is at 500 mm2 die size and 3 billion transistors just a tad larger and more complex than the RV870 Cypress chip sporting a measly 2.15 billion transistors on an area of 334 mm2.

Whatever the contributing factors were, Fermi has been late to the show and after it finally debuted in limited quantities in the middle of spring, there still are no full version of the GF100, taking advantage of all processing units. Instead, there are two scaled-down versions namely the GTX480 and the GTX470. Before going into details on what is missing where, let’s take a quick overview of the architecture.

In short, the GF100 chip is organized into four quadrants or graphics processing clusters (GPCs), each of which is featuring four Fermi Streaming Multiprocessors (SM) for a total of 16 SMs. The four quadrants are not obvious from the functional diagrams but can be appreciated when looking at a die shot.

Functionally, the quadrants are primarily defined on the basis of one discrete raster engine per GPC, performing edge setup, rasterization and z-culling, otherwise, we have 16 totally interchangeable SMs, each of which features 64 CUDA cores, supplemented by 16 Load/Store units and four special function units (SFUs).

For reference, here is a quick recap of some of the stats and numbers of the Fermi GPU in comparison to the older generations of nVidia GPU, that is G80 and GT200.

GPUG80 GT200GF100
Transistors 681 million 1.4 billion 3.0 billion
CUDA Cores 128 240 512
Double Precision Floating
Point Capability
None30 FMA ops / clock 256 FMA ops /clock
Single Precision Floating
Point Capability
128 MAD
240 MAD ops /
512 FMA ops /
Special Function Units
(SFUs) / SM
Warp schedulers (per SM)112
Shared Memory (per SM)16 KB 16 KB Configurable 48 KB
or 16 KB
L1 Cache (per SM) None None Configurable
16 KB or 48 KB
L2 Cache None None 768 KB
ECC Memory Support No No Yes
Concurrent KernelsNo No Up to 16
Load/Store Address Width 32-bit32-bit64-bit

It is a bit difficult to compare the GF100 to the older generations just on the basis of numbers since there are more fundamental changes that heavily impact functionality and capabilities of the GPU. From a hierarchical cache organization to a HyperThreading equivalent and ECC extended to the local frame buffer, the changes in architecture are probably the biggest since the move from the GeForce2 to the GeForce4 MX.

Discuss this article on our forums

Last Updated ( Jul 26, 2010 at 12:20 AM )
<Previous Article   Next Article>