Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home

. .


CPU
Intel
P4 840 D
P4 820 D
P4 630
P4 640
P4 650
P4 660
P4 670

AMD
Athlon64
3500+
3700+
3800+
4000+
X2-3800+
X2-4200+
X2-4400+
X2-4600+
X2-4800+

1-Way Opteron
Opteron 144
Opteron 146
Opteron 148
Opteron 150
Opteron 152

2-Way Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246
Opteron 248
Opteron 250
Opteron 252

2-Way Dual Core Opteron
Opteron 270
Opteron 275

nVidia
GF 7800GT
GF 6800GT
GF 6600GT

ATI
R X850 XT PE
R X850 XT
R X800 XT PE
R X800 XT
R X800 XL

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

What are you
shopping for?







































































LOSTCIRCUITS

SHORTCUTS:
Top page
Clocking Strategies
OCD-Calibration
On-Die-Termination
Posted CAS-Additive Latency
All At One Glance
The Grand Picture
Winners And Not-So-Winners

Your Comments?

 DDR II   
A Technology Overview
Article by MS, January 6, 2003
updated last: Nov. 28, 2006


Posted CAS and Additive Latency (AL)

For the DRAM designer and those who work on memory controllers, it is self-understood that the command bus can carry only one signal at the time. The same holds for the time-multiplexed address bus but at this point, we are only concerned with the command bus. Essentially, there are four command lines that are important, that is RAS, CAS, Chip Select (CS) and Write Enable (WE). Leaving CS out of the picture for now (since it only selects the physical bank out of all DIMMs within the system), any combination of high and low signals on the RAS, CAS and WE lines means either a bank activate (ACT), read, write, precharge or refresh command to name those commands important for the following. Typical command signals for the three lines mentioned above would be e.g. 101 or 110 or 010 using the RAS, CAS and WE line matrix. Keep in mind that we are talking about physical lines from separate pins on the controller to separate pins on the DRAM.

It is important to understand that only one command can be issued at any time because each of the three command lines can only be either high or low. Any two commands issued on the same clock will cause bus contention and so-called data collision (since at least one line would need to be high and low at the same time). For example, in bank interleave mode, a bank activate command to a second or third internal bank on the DRAM chip can be issued after the specified Row-To-Row Delay (tRRD). At the same time, because of a pre-defined RAS-To-CAS delay, a read command is already scheduled. The two commands coincide on the same clock and conflict or collide with each other on the command bus. Consequently, the next bank activate command will have to be pushed out by one cycle.

The bank activate is the first step in every memory access and, therefore, all subsequent steps like a read command to the same bank will be pushed out by the same 1 clock. Whatever terrain has been lost on the bank activate cannot be gained back by a faster CAS latency (it is impossible to change CAS latency on demand) and, therefore, there will be a gap or bubble in the data stream which will manifest itself in a memory bandwidth performance hit. We don't like performance hits, don't we?


Conventional command issuing (top) compared to Posted CAS mode (bottom)

Out of the four internal banks accessible in bank interleave mode, three are shown in green, blue and purple (commands and resulting data). The clock traces refer to the I/O buffers, not to the core. Act: Bank activate command; Read: Read Command; P-Rd: Posted Read / Posted CAS; D: data output (1 bit/pin, 1 quadword / bus width).

Conventional Operation: Bank activate commands to internal banks are given with a Row-to-Row delay (tRRD) of 2. In this particular case, tRCD, i.e., the delay until a read command can be given, equals 4T. That means that the first read command (four cycles after the first bank activate) will fall onto the same clock as the third bank activate, in other words, the commands conflict with each other or collide on the bus. Consequently, one of the commands needs to be shifted by one clock. Since tRCD is defined in the BIOS setup, whereas bank activate commands are issued whenever a memory access is started, it is the bank activate that will be postponed. Bank activate, however, is the one command at the beginning of every access and, therefore, all subsequent commands to the same bank will be delayed by one cycle as well. This causes a bubble in the data stream. CAS latency equals four cycles here (20 or 15 ns for DDR400 or DDR533, respectively)

Posted CAS Operation: Bank activate and read commands (CAS) to the same bank are issued by the controller in back-to-back mode on consecutive cycles. In this case, all activate commands are done on even cycles whereas the read commands are always on odd cycles. Internally, the read commands (CAS commands) are held and then issued after a predefined additive latency (AL) as a postponed read (P-Rd) or Posted CAS. Since the Posted CAS does not require any external command, the bus is free to communicate a new activate command on the same clock. In summary, instead of a normal tRCD, we have a single cycle delay for the read command to which we need to add the internal delay (hence the name additive latency; AL) for the equivalent of a RAS-To-CAS delay and with no need for an additional read command. This will avoid bus collision.

Click here for a larger picture!

The solution to this problem is to issue the commands in form of bundles, that is, a read command is issued immediately on the next cycle after the bank activate command that it belongs to. A command buffer on the DRAM chip will hold the command and internally schedule it without any further input needed from the command bus. This means that the command bus is free to activate another bank. This mode of operation, using an early issued but internally postponed read (or CAS) command is called Posted CAS where the delay or additive latency (AL) is specified by the mode register set (MRS) during initialization of the DRAM chip.

The consequence is that bank activate and read commands that belong together can be issued on consecutive clock cycles and immediately thereafter free up the bus for the next frame information structure (oops, that was serial ATA). The net effect of Posted CAS and AL is that there will be no command bus collisions and, thus, no bubbles in the data stream.

Update

Posted CAS and Additive Latency are optional features that are supported by the DRAM devices but do not necessarily have to be used by any given controller. IBM's memory controllers are apparently using Posted CAS and AL, likewise, the features appear to be used in graphics cards. However, to the best of our knowledge, neither Intel, nVidia, ATI or AMD are using Posted CAS / AL on mainstream chipsets for the PC-Workstation platform. In this case, the AL feature is simply set to "0" and a conventional Read command is given after tRCD has been satisfied.

Variable Write Latency

Conventional SDRAM including DDR I uses random accesses as the name implies. This means that the controller is free to write to any location within the physical memory space, which, in most cases, means that it will write to whichever page is open and to the column address closest to the (CAS) strobe. The result is a write latency of 1T, as opposed to read or CAS-Latency values of 2, 2.5 or 3. In DDR II, this changes in that the write latency will be the Read Latency (RL) minus 1T.

That means that at CAS-4, and AL-3 for a combined read latency of RL=7, the write latency will be 6T. This sounds somewhat worse than it is, especially compared to the 1T in DDR I but one needs to consider that, just like a read command, a write command will be issued early and will be using Posted CAS. That is, the write command abides by the same rules as the read command, only that the "Write Enable" signal is a logical "true" in this case. Effectively, therefore, the CAS latency is the important timing parameter to determine write latency, meaning that in the above example, the write latency will be 3T. This is only 3 times as long as the equivalent latency in DDR I. It will be very interesting to look at integrated graphics using UMA and DDR II but it appears as if interesting is spelled u g l y.

next page:    => DDR I vs. DDR II at one glance =>

If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 1998 - 2007 LostCircuits