|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS
|
|
| AMD Athlon64 FX53 Back to the top | |
|
(Review by MS, July 1, 2004) |
| AMD Athlon64 3000+ At: |
Memory Access Latencies
When we are talking about memory access latencies in the following we are not referring to the commonly known DRAM latencies as CAS, tRCD etc. although, those naturally also play a role in the overall performance. Rather, we are looking at the total access latencies, that is, either CPU cycles or, in order to compensate for different clock frequencies of the CPU, access latencies in [ns]. As mentioned, the Socket 940 CPUs only use registered DIMMs and, therefore, add one penalty cycle (meaning DRAM cycle) to the initial access time. At DDR400, this results in roughly 5 ns for each initial access. In benchmark situations with maximized memory bus utilization, this initial access latency is hardly visible since the controller will issue commands during a concurrent transfer. On the other hand, even with Winbond BH-5 components, it is hardly possible to run the Athlon 64-FX at a tRCD of 2, at least not where stability is mandatory. This creates an extra (memory) cycle of latency for every page miss, which pushes the access latencies up by 5 ns.
The same goes for the reference memory supplied with the Socket 939 Athlon 3800+. However, in contrast to the Socket 940 processors, the Athlon64 3800+ turned out to be able to run very well at tRCD-2 and with all major DRAM access latencies set to 2T for a 2:2:2 timing configuration. It is one of the idiosyncrasies of CPU-to-memory interactions that the 2:2:2 latency settings were not the fastest. Instead we found that increasing the CAS latency from 2 to 2.5 cycles yielded a performance boost that was visible not only in synthetic benchmarks but also in applications. The benchmarks at 2:2:2 latency settings were run on discontinued Mushkin Level2 and Corsair XMS 3200-C2 modules, the CAS 2.5:2:2 used OCZ PC3700 EB modules
Corsair vs Corsair
Access latencies, lower is better: All other parameters being equal, increasing tRCD from 2 to 3 causes a measurable performance degradation. Keep in mind that the reference memory is only capable of running at tRCD-3 but cannot do tRCD-2.
Access latencies, lower is better: The OCZ PC3700 EB DIMMs are officially rated at 3:3:2 but had no problems running at 2.5:2:2 even with four modules in the system. The picture looks very similar to that shown above, only the difference between the two sets of module appears greater.
2:2:2 vs. 2.5:2:2
Access latencies, lower is better: It is somewhat counterintuitive or against the "current grain of wisdom" that increasing the CAS latency enhances the performance by cutting down on access latencies but we have seen this happen over and again. There is no rational explanation for this other than that memory is also subjected to error and retry and that loosening CAS latency parameters will just make the system run more smoothly. The results shown are not based on a single run each but are 100% reproducible within the margins of noise.
Athlon64 FX-53 vs Athlon64 3800+
Access latencies, lower is better: We compared the best against the best on the FX53 and the 3800+ platforms. Whereas the FX53 loses out with respect to the access latencies, where a large portion of the access latencies has to be attributed to the tRCD setting of 3 on the Registered DIMMs, there is clearly an advantage, courtesy of the larger L2 cache that keeps access latencies in the 1024 kB block size at cache level. The 3800+, on the other hand needs to access the main memory for this sort of transaction.
| AMD Athlon64 3800+ At: |
Hypothetical and theoretical memory access latency benchmarks aside, the question is how much of the performance increase will be visible in real world benchmarks. In this case, we are looking only at the impact of the different DRAM latencies, leaving aside the FX53 since its larger cache will play a rather prominent role in some benchmarks anyway. Briefly, running 2:3:2:8 (as recommended by AMD) vs 2.5:2:2 yielded about 2% performance increase for the latter. For example, Unreal Tournament 2003 Botmatch (1280 x 960) scores increased from 99.2 fps to 102.6 and Aquamark3 default fps went from 60.13 to 61.02 (in both cases an ASUS RADEON AX800Pro modded to XT functionality was used).
next page: => Memory Performance: SiSoft Sandra =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!