|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
Registers are important for immediate data access by the execution units and underscore the impact of latencies on overall processing power. The highest latencies encountered are, however, those associated with the main memory. In this case, when talking about latencies, we are not referring to the RAS-to-CAS delay and CAS latency only but the entire latency from the moment where the CPU issues a memory request until the time where the critical word is output.

In turn, this means that if the controller runs at a higher frequency, the initial access times will be shorter proportionally to the savings in address and command decode latencies on the chipset level. For example, the access times as measured by e.g. Cachemem 2.65 MMX for the P4 using the i875 Canterwood chipset are in the order of 85 ns at a bus interface frequency of 800 MHz and DDR400 running at 2:2:2 DRAM latencies. This means that at a DRAM cycle time of 5 ns, the actual DIMM latencies are only 20 ns, whereas a total of 65 ns can be attributed to the chipset (PAT enabled). At this point, we have no concrete information of how many cycles are caused by the CPU itself, for the sake of the argument and simplicity reasons, we'll assign three chipset cycles for the CPU decode and transfer to the chipset and output to the memory bus. This leaves (65ns / 5 ns) - 3 = 10 cycles for the chipset.
If these 10 cycles were running at CPU speed instead of chipset speed, that is, approximately 10 times faster, the net savings would be 45 ns on the initial access time. In the case of the Athlon 64 FX-51, it is still necessary to add the extra register latency on the module level, furthermore, we noticed that at DDR400 it was not possible to run anything at a tRCD below 3T, therefore, we need to add another 5 ns back into the equation. The net savings, therefore, would be in the order of approximately 30-35 ns. Admittedly, there are a few other factors, like the number of pipeline stages that is needed by the different controllers (or PAT vs. non-PAT) and the numbers given above are only to explain the principle but we'll have benchmarks later to show that they are rather close to reality.
next page: => Windows64 and WoW =>