|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home
|
. | . |
|
CPU Intel P4 840 D P4 820 D P4 630 P4 640 P4 650 P4 660 P4 670 AMD Athlon64 3500+ 3700+ 3800+ 4000+ X2-3800+ X2-4200+ X2-4400+ X2-4600+ X2-4800+ 1-Way Opteron Opteron 144 Opteron 146 Opteron 148 Opteron 150 Opteron 152 2-Way Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Opteron 248 Opteron 250 Opteron 252 2-Way Dual Core Opteron Opteron 270 Opteron 275 nVidia GF 7800GT GF 6800GT GF 6600GT ATI R X850 XT PE R X850 XT R X800 XT PE R X800 XT R X800 XL Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
|
| Latency vs. Bandwidth, a performance analysis Life Beyond 150 MHz | |
| (Review by MS, August 15, 2000) |
The Test System
We have configured the following test system to measure the performance impact of different bus speed and latencies in synthetic benchmarks and real world applications:
Why are latencies important? In short, latency means the time from when an access is issued until the critical data are available. Latencies are a universal phenomenon and a main determining performance factor for the CPU cache, the HDD (access time), the PCI bus and any other component of any computer system, including the main memory. This article focuses only on the latter factor, that is the system memory. For a detailed introduction of how memory works, the PC-guide has the most comprehensive information. A more compressed version of the different timing issues relating to performance is found in our own articles on ESDRAM, HSDRAM and Corsair PC 133 DIMMs. For those interested in more architectural questions, a good write-up has been posted on Aces Hardware.
Briefly, the current chipsets allow to vary three timing parameters:
From these numbers, it is possible to calculate that, at 3:3:3 settings, a total of 10 bus cycles is necessary to release four words (Bytes). At 2:2:2 settings. the same would be accomplished within 8 cycles (precharge latency is not as important since the precharge can already start while the third word is output). Increasing either CAS Delay or tRCD to 3 cycles will result in 9 bus cycles / 4 bytes, however, this simple equation only holds for totally random accesses. In real life applications, CAS delay is much more important than RAS-to-CAS delay as we'll see in the next paragraph.
A simplified, three paragraph explanation of memory latencies
Data are written to and read from memory in a relatively ordered manner, that is, most of the time coherent data are stored within the same row (high locality) where only the column address changes, meaning activation of the Column Address Strobe only. From a performance standpoint, therefore, the most important of the three timing parameters is the CAS delay. Since this is important, I'll rephrase it one more time: most of the time, data that belong together are stored in contiguous blocks within the same row. In turn, this means that for consecutive single word or burst reads, only the new column address needs to be specified which which is then accessed by the column address strobe (CAS). If the data are not found within the same row, they have to be retrieved elsewhere and that involves delays from the bank activate command to the read command (RAS-to-CAS delay) and the CAS delay.
What about the precharge? Data are electrical charges that are coming out of the memory cell (1 capacitor) and after a copy of these data has been generated in the sense amplifiers, the original has to be moved back to the cell of origin and stored (precharge). The tRP latency is of minor importance since the precharge can start as soon as the 3rd word of a burst write is being output and, thus the latency is masked.
The specs of the various chipsets , e.g., the page hit limitation (PH limit) in the AMD 751 North Bridge allow to predict how many times data can be retrieved from within the same row, and even though the individual application plays an important role, we can estimate that a mandatory tRCD occurs only every 32to 64 reads (depending on the BIOS settings). Under normal circumstances, the counter treats any transaction alike, that is, it does not differentiate between single word and burst reads. Therefore, the CAS delay occurs as much 64 times as often as the RAS-to-CAS delay (This only holds for optimal conditions, in real life, the ratio would most likely be in the order of 10:1 or 20:1). More details details would fill a book but who would read it anyway?
next page: => the impact of CAS latency =>
If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.