|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
|
|
|
LOSTCIRCUITS
|
|
| AMD's Quad FX Platform What's in a 4x4? | |
|
(Author: MS, January 21, 2007) |
Memory Performance
We already touched on the issues with the BIOS and its interaction with different operating systems. Memory performance is one of the system parameters most affected by this phenomenon. In Windows XP-32 Professional (SP2 installed) the max we could squeeze out of the memory system was 6500 MBs. Switching to a different drive with Vista-64 as OS more than doubled the memory bandwidth - as long as node interleaving was set to "disabled" in the BIOS, which effectively enables it.
**: Vista-64, NUMA enabled
Memory Latency
Since it appears as if Cachemem is the only benchmark that gives reliable and reproducible results across the board, we are still using it and will continue to use it in the future. In this case we are comparing the FX-62 (solid) against the FX-74 (transparent), both CPUs are 90 nm and have the same cache architecture which shows in identical access latencies. As soon as the system memory is accessed, however, the Quad-FX platform is burdened with much higher access latencies. This may reflect the CMD rate or other memory timing parameters that are not correctly executed by the BIOS but a major contributor is the shared access of the memory via near and far nodes of the two CPUs, which is essentially the gist of any NUMA architecture. Cachemem was only run under WinXP-32, this version does not work under Vista-64 and there were no differences between node interleaving set to enabled or disabled either.
Unlike the case of a single processor system it is almost self understood that all four memory slots should be populated in order to give each CPU access to a full channel of memory bandwidth. We chose the term "full channel" deliberately since it is a more accurate description than dual channel; the two 64-bit channels are interleaved on the level of the virtual memory address space to a single 128-bit channel. SiSoft Sandra identifies this correctly as a 1 channel interface but it may lead to confusion since the "128-bit width" descriptor is often overlooked. If two DIMMs are pulled and the remaining DIMMs are in two complemetary slots, the bandwidth drops to approximately 6,000 MB/s, since the accesses are averaged over both CPUs hitting the "near" and the "far" node, respectively. In this case we are still looking at a 128-bit memory bus, though. If the module distribution is such that each CPU is served by a single DIMM, the bus width drops to 64-bit (as shown correctly in Sandra) and the bandwidth drops to approximately 5,000 MB/s.
Memory access latency [ns], lower is better. Compared are the FX-62 (solid blocks) vs. FX-74 (transparent blocks).
next page: => Power Consumption =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!