ASUS ENGTX460 Direct Cu Print E-mail
Written by Michael Schuette   
Dec 12, 2010 at 06:34 AM

The story around Fermi and by extension around nVidia has been one of the more interesting sagas of the last year, paralleling almost the Ring of the Nibelung in its epic proportions and featuring Jen-Hsun Huang as Siegfried the dragon slayer. Contrary to medieval traditions, though, Fermi was originally presented as a highly scalable and modular design in a top-to-bottom approach with the high end models being introduced first to set the expectations and create a demand for scaled back versions at a lower price. History further predicted the follow-up on the GF100 design to be a result of taking the Xacto knife and simply dividing the die into equal halves. At a very high level, the GF104, which is the subject of this article does seem to follow this rather simplistic model. However, it only takes a second look to see that there are a number of fundamental changes to the GF100 design that dramatically increase the cost/performance ratio, at least in theory. The theory in this case being that the GF100 is a somewhat unbalanced design with over-engineering in some areas and lack of resources in some others as we already mentioned in our review of the ASUS ENGTX 480.

ASUS ENGTX460 Direct Cu

Under the Hood: GF100 vs. GF104

To make a long story short, the GeForce GF100 contains four identical graphics processing clusters (GPC), each containing a raster engine and four streaming multiprocessors (SMs).

The GF104, on the other hand, contains only two GPCs with four SMs each, in so far, the simple model of divide and conquer seems to hold.

If we look a bit more closely, though, the internal organization of the SMs is quite different between the two designs. Whereas the GF100 used a 4x8 matrix of CUDA cores per SM, the GF104 increases the number of CUDA cores by 50% by going to a 6x8 matrix for a total of 48 CUDA cores per SM.

At the same time, the number of special function units (SFUs) has doubled from 4 to 8 and the same holds for the texture units. It is a bit of a simplistic view, which is based simply on quantitative assessment of resources but just by looking at the numbers of the different building blocks one can infer that at least the geometry processing and texturing performance in the GF104 streaming multi-processors is about twice of those used in the GF100. In the grand picture, therefore, the two should perform almost on par when it comes to complex geometry setup and texture fill. If we look carefully at the front end of the SMs, we also see that now we have four dispatch units as opposed to just two in the GF100 design. This is probably the most important change in the entire design and its underlying philosophy and we'll dive into this some more on the next page since it opens up the GPU entry into the exciting world of superscalar execution.

Discuss this article on our forums

Last Updated ( Dec 20, 2010 at 09:04 AM )
Next Article>