|
|
| sites that we like: |
|---|
|
Hardtecs4U Anandtech Techreport Realworldtech |
Login Form
| ASUS ENGTX480 (nVidia Fermi) |
|
|
| Written by Michael Schuette | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Jul 19, 2010 at 09:21 AM | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Page 1 of 23
So I have been looking at Fermi for the last 60 years and finally the little green men, this time on sabbatical in Santa Clara, came out with it. Not a sex toy this time, though undeniably sexy, it is still somewhat different from what I anticipated half a century ago – no I am lying, I am not that old yet. To get back to the topic at hand, we are looking at nVidia’s Fermi graphics processor / general purpose graphics processing unit and, truth be told, we have been hearing about it almost as long as about the Fermi paradox. But it is finally here.
The first substantiated rumors and semi-facts about nVidia’s Femi architecture a.k.a GF100 GPU surfaced during the summer of 2009 and at least according to the PR machinery behind it, it was going to be nothing like anything that had been there before. And then, silence struck again. There were a few press briefings to kindle the fires while AMD released their 5000 series and unleashed performance like nothing that had been there before. And then, there was again, nothing from nVidia. Arguably, the difficulties of manufacturing ICs increase exponentially with complexity of the design and with die size. Add a new, un-proven fab process and there is a recipe for some major handicaps. It would be lopsided to claim that nVidia was the only company affected by the difficulties at TSMC to deliver sufficient yields of their 40 nm process but on the other hand, as mentioned above, the GF100 GP-GPU is at 500 mm2 die size and 3 billion transistors just a tad larger and more complex than the RV870 Cypress chip sporting a measly 2.15 billion transistors on an area of 334 mm2. Whatever the contributing factors were, Fermi has been late to the show and after it finally debuted in limited quantities in the middle of spring, there still are no full version of the GF100, taking advantage of all processing units. Instead, there are two scaled-down versions namely the GTX480 and the GTX470. Before going into details on what is missing where, let’s take a quick overview of the architecture.
![]()
In short, the GF100 chip is organized into four quadrants or graphics processing clusters (GPCs), each of which is featuring four Fermi Streaming Multiprocessors (SM) for a total of 16 SMs. The four quadrants are not obvious from the functional diagrams but can be appreciated when looking at a die shot.
![]()
Functionally, the quadrants are primarily defined on the basis of one discrete raster engine per GPC, performing edge setup, rasterization and z-culling, otherwise, we have 16 totally interchangeable SMs, each of which features 64 CUDA cores, supplemented by 16 Load/Store units and four special function units (SFUs). For reference, here is a quick recap of some of the stats and numbers of the Fermi GPU in comparison to the older generations of nVidia GPU, that is G80 and GT200.
It is a bit difficult to compare the GF100 to the older generations just on the basis of numbers since there are more fundamental changes that heavily impact functionality and capabilities of the GPU. From a hierarchical cache organization to a HyperThreading equivalent and ECC extended to the local frame buffer, the changes in architecture are probably the biggest since the move from the GeForce2 to the GeForce4 MX.
Discuss this article on our forums
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Last Updated ( Jul 26, 2010 at 12:20 AM ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| <Previous Article | Next Article> |
|---|


