|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home
|
. | . |
|
CPU Intel P4 840 D P4 820 D P4 630 P4 640 P4 650 P4 660 P4 670 AMD Athlon64 3500+ 3700+ 3800+ 4000+ X2-3800+ X2-4200+ X2-4400+ X2-4600+ X2-4800+ 1-Way Opteron Opteron 144 Opteron 146 Opteron 148 Opteron 150 Opteron 152 2-Way Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Opteron 248 Opteron 250 Opteron 252 2-Way Dual Core Opteron Opteron 270 Opteron 275 nVidia GF 7800GT GF 6800GT GF 6600GT ATI R X850 XT PE R X850 XT R X800 XT PE R X800 XT R X800 XL Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
|
| Inside the EDDR Chip Combining DRAM storage and SRAM speed | |
| (Review by MS, November 27, 2000) |
Summary
Currently, the memory bus constitutes the most severe performance bottleneck in personal computers. Conventional DRAM architecture can be vamped up to operate at higher clock frequencies and to include Double Data Rate transfer protocols to increase the peak bandwidth of the memory bus. The most crucial performance handicap, that is latency, however, is not addressed by simply making faster chips. Implementing a small amount of SRAM functioning as row cache while leaving the actual storage to the DRAM array carried over from traditional designs offers the best of both worlds. This article looks into some design issues and illustrates, using stop action pictures, how increased real world bandwidth can be achieved. The performance analysis and some predictions of how average bandwidth is crucial for system performance is posted on HardOCP
Processor speed is experiencing an almost exponential growth. The raw power of the CPU itself closely follows the increments in clock speed, at least with regard to synthetic benchmarks operating in a semi-independent fashion from the rest of the system components, particularly the memory subsystem. However, in real life situations, we see a different picture emerging, showing a performance ceiling caused by insufficient data availability. The main bottleneck is found in the memory bus. To overcome this problem, the main approach has been to increase the bandwidth. Unfortunately, though, increasing bandwidth only provides a temporary solution, which, furthermore, targets only specific applications with a high locality of data and relying on consecutive page hits. In reality, page hits only constitute somewhere around 30% of all read requests, the majority of transfers still originates with page misses, causing several penalty cycles or latencies to occur until the correct data can be transferred to the CPU.
In all DRAM operations, there are three different kinds of latency. Briefly, after a bank activate command has been issued, a row within the DRAM array is selected by the Row Address Strobe (RAS) and activated. This process requires a certain amount of time, and a read command or column select command via the Column Address Strobe (CAS) cannot be issued before the entire row is ready to release the data to the adjacent sense amplifiers. Therefore, the time until the CAS can be activated is called the RAS-to-CAS Delay time (tRCD).
The next step involves the selection of a specific column address. As already mentioned, this is done by the column address strobe (CAS), which is essentially a small switch for selecting the correct column. This selection of a specific column, once again, takes a certain time which also includes setting the column select line high, latching the data into the sense amplifiers and moving the data out of the array to the global data lines. The signal strength needs to be kept as low as possible to avoid electrical crosstalk between neighboring wires. In turn, weaker signals travel more slowly and have limited reach. Therefore, in most cases, a secondary sense amplifier is embedded into the pathway to avoid deterioration of the signal integrity. All these processes require a certain amount of time which is called CAS latency.
In case a page miss is encountered while a page is still open, the DRAM array has to be restored to its native state which involves moving the data back to the cells of origin and precharging the entire array before any new command is accepted. The time required is called Precharge time or tRP.
In summary, we are looking at 3 independent latency categories, tRCD, CAS and tRP. Each one of these entities can consume either 2 or 3 bus cycles, respectively, at least in SDRAM. In DDR DIMMs, the situation is slightly different in that latencies can also span over halves of bus cycles since data transfer occurs at both the rising and the falling edge of the clock.
next page: => DRAM Bank Organization =>
If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.