|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
|
|
|
LOSTCIRCUITS
|
|
| Intel's SkullTrail Extreme Platform Playground of the Titans | |
|
(Author: Michael Schuette, February 10, 2008) |
Summary
Awhile back we looked at Intel's V8 system, kind of a proof of what is technically feasible using off the shelf components. Even though the performance in truly multithreaded applications was impressive, the gaming performance was somewhat lackluster, primarily because of the use of FBDIMMs and only a single PCIe graphics slot. On the other hand, the entire platform was never meant to become the backbone of a gaming rig in the first place. Then came the QuadFX platform with a promising architecture but it failed to impress compared to even AMD's in-house competition. Meanwhile, there is a new generation of CPU out there based on Intel's P1266 45 nm process and with a 12 MB L2 cache per package. There is also the recurrent rumor about Intel supporting SLI, or rather nVidia allowing Intel to support it in view of the AMD-ATI marriage.
Well, something's happening here, what it is ain't exactly clear ...
This time we are not having the carrot juice, rather we are embarking on a rather grizzly adventure on the SkullTrail, looking at the architecture and the core assignment of different applications and then try to come up with some halfway educated guesses why some software behaves the way it does, once it is confronted with eight individual cores and caches and two largely independent memory subsystems. That part actually turned out to be easier than we thought.
The Ultimate Dream Machine
Imagine for a moment that there was a computer that can do everything that you want in real time. Not just word processing and scrolling through data sheets but a machine that can do real time rendering and ray-tracing in games for photorealistic graphics, process audio-visual content at a rate much faster than what we have now while filtering and sorting all kinds of data in the background. Arguably, such systems are going to be in a different price category than the average office setup but let’s still think about it for a moment.
Any configuration of this kind needs to be able to execute massive parallel processing. On the level of CPUs, symmetric multiprocessing - the use of several identical processors in parallel fashion as opposed to one main processor in tandem with a co-processor – has gone mainstream some two years ago. In the memory subsystem, multi-channel data access has also become a commodity in the past years, starting with nVidia’s nForce controller and followed by Intel’s Granite Bay and AMD’s dual channel memory controllers on the Athlon64 microprocessor.
In the world of graphics, parallel processing was heralded a decade ago by 3dfx with their Voodoo architecture and something called Scan Line Interleaving or SLI. Voodoo has faded, hardly anybody remembers 3dfx anymore but SLI is still around, even if it is only the acronym that prevails. The meaning of SLI has changed over the years to “scalable link interface” designating an nVidia-proprietary technology that only runs on nVidia platforms. Truth be told, the compatibility or functionality of SLI with a given motherboard is primarily an issue relating to the recognition of a specific code in the BIOS of the motherboard as long as there is an existing PCIe interconnect infrastructure available. In other words, what it comes down to is licensing legalities of the technology obfuscated by pseudo-technical grounds used to explain the lockout.

Note the Hybrid chipset consisting of Intel and nVidia logic parts. The final line from "Casablanca" comes to mind ..
It is somewhat disappointing not to see wide-spread acceptance of SLI, after all, with the exception of the latest dual-GPU AMD graphics cards, nVidia has ruled the realm of high-end graphics for quite some time with their 8000 series and even before that, with the 7900 series, even in single card configurations. High-end SLI setups were – and still are – uncontested leaders, with the small but important restriction that they only run on nVidia chipset-based boards. Unfortunately though, the latest iterations of nVidia chipsets have some flaws, for example, transaction-concurrent PCI busmaster snooping has been disabled in chipset generations past the nForce5, and this causes significant performance issues in certain applications. And then, there is also the issue of Intel chipsets just being “trusted” as the best in the business.
In a nutshell, at the top of the wish list would be a configuration based on an Intel chipset, supporting two CPUs with 4 cores each that are manufactured using Intel’s P1266 45nm process with Hafnium-based transistors and natively supporting SLI without hacked drivers. One possibility is of course a combination of Intel and nVidia parts. In other words, "as a sign of good will" Intel can use a lane duplicator or port splitter like nVidia's nForce100 chip to split the two 16 x lanes originating from the "Seaburg" 5400 MCH into four 16 x connectors. This way, at least according to the letter, SLI will run off nVidia hardware and nobody can claim a precedent later. On the memory side of things would be DDR3, scalable up to 16 GB, then we’d like to see an abundance of SATA ports and the obligatory Gigabit Ethernet in its wireless manifestation.
Well, we have to stay reasonable….
Courtesy of the connector size and built-in port replicators, SATA configurations are at this point primarily limited by the case size and the number of drive bays therein. The situation is different for DDR3, limitations are in the number of ranks that each address and command unit can drive and, by factoring in chip densities, translate directly into system memory density. For a server class amount of system memory, therefore, DDR3 has to be thrown out of the race unless we would consider FBDIMMs based on DDR3 technology. For all practical reasons, especially the power density of the buffer chip that would turn into a supernova if run at current DDR3 frequencies this is not a viable solution. In other words, we’ll have to content ourselves with DDR2, which is no big deal if the bandwidth is doubled by implementing two separate host bus interfaces with independent memory controllers on each back end. Which brings us back to the FBDIMM concept with its separate branches and channels.
next page: => SMP, Cache Coherency and Cluster-Snoops =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!