|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home
|
. | . |
|
CPU Intel P4 840 D P4 820 D P4 630 P4 640 P4 650 P4 660 P4 670 AMD Athlon64 3500+ 3700+ 3800+ 4000+ X2-3800+ X2-4200+ X2-4400+ X2-4600+ X2-4800+ 1-Way Opteron Opteron 144 Opteron 146 Opteron 148 Opteron 150 Opteron 152 2-Way Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Opteron 248 Opteron 250 Opteron 252 2-Way Dual Core Opteron Opteron 270 Opteron 275 nVidia GF 7800GT GF 6800GT GF 6600GT ATI R X850 XT PE R X850 XT R X800 XT PE R X800 XT R X800 XL Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
|
| Registered ECC DDR400 for the Athlon64 FX and Opteron | |
| (Review by MS, October 3, 2003) |
A Need For Registers
What is the purpose of a registered module and what sets it apart from an un-registered module? It is actually quite simple, as the name says, a registered DIMM has a register chip, which is a pass-through buffer for address and command data. The purpose is that the chipset will not see the entire phalanx of memory chips on the module, in addition, the memory clock signal will not have to drive all those chips. Rather, there is a single chip per physical bank (heretoforth called "rank") that will take the address and command signals, translate them and reroute them to the respective memory chips. At the same time, a PLL (phase lock loop) chip on the registered DIMM will use the original clock input to generate a second clock signal running synchronously with the original clock.
For the clock, it does not matter when it is generated, whether the signal is output with one or even several cycles delay does not matter either as long as it is phase-locked with the original clock, that is, the edges are matched up so that they come to lie on top of each other. The reason is very simple, it is just a square wave with a rising and a falling edge and no further information content. For the address and command signals, timing is critical since the commands need to coincide with the clock boundaries, that is, they need to fall onto the rising edges of the clock signal in order to provide correct setup and hold times and to warrant proper signal integrity.
Buffered "Address Translation"
There are two ways to accomplish rerouting address and commands. The first is to use a buffering scheme where buffering means the use of temporary buffers / translation schemes that do address conversion on the fly, meaning within the same clock cycle. Since buffered DIMMs were never really fashionable, some mainboard manufacturers, foremost of all ABIT, used the buffers in the memory path from the chipset to the DIMM slots. It should be clear that even the so-called on-the-fly conversion will require a minimum amount of time, which means that there will be a signal skew of some sort. It is possible to compensate for this skew by adding a "negative delay" to the signals, meaning that the addresses are generated a little earlier than the clock so that the delay and the "negative delay" cancel each other out, however, there is a limitation as to how much can be done using this technology and, therefore, it is suitable only for lower operating frequencies.

Registers Instead Of Buffers
The second technique is much cleaner, however, it has the disadvantage of inserting an additional wait state into the signaling scheme. In short, instead of trying to squeeze the signals into the same clock cycle, the register waits with the output until the next rising clock edge. This way, there is no skew at all, and a single clock cycle delay on the address and command bus is something that only comes into play in a random access situation, where no pipelined outputs are under way simultaneously. The latter would be able to mask some of the initial access latencies.
In terms of application-specific performance, this means that there will be applications that will see no difference at all, as long as streaming data transfers are dominating. There will be other applications that will show a more pronounced performance hit, anything using more random data accesses or else a high amount of writes amongst the transfers will sacrifice more performance. In other words, there is no rule of thumb, the performance hit will need to be evaluated on a case-by case basis.
On the other hand, the more precisely controlled signals may allow to compensate for clock skew as it naturally occurs as a function of load in higher density system memory configurations. Therefore, it may be possible to run at tighter memory latencies by sacrificing this one single access latency cycle. All in all, yes, there is a drawback to Registered DIMMs but there are hidden benefits as well, so what it comes down to is weighing one issue against the others and taking the lesser evil or best of two worlds depending on the point of view.
next page: => The DIMM Blueprint =>
If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.