Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home

. .


CPU
Intel
P4 840 D
P4 820 D
P4 630
P4 640
P4 650
P4 660
P4 670

AMD
Athlon64
3500+
3700+
3800+
4000+
X2-3800+
X2-4200+
X2-4400+
X2-4600+
X2-4800+

1-Way Opteron
Opteron 144
Opteron 146
Opteron 148
Opteron 150
Opteron 152

2-Way Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246
Opteron 248
Opteron 250
Opteron 252

2-Way Dual Core Opteron
Opteron 270
Opteron 275

nVidia
GF 7800GT
GF 6800GT
GF 6600GT

ATI
R X850 XT PE
R X850 XT
R X800 XT PE
R X800 XT
R X800 XL

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

What are you
shopping for?







































































LOSTCIRCUITS

SHORTCUTS:
Top page
Clocking Strategies
OCD-Calibration
On-Die-Termination
Posted CAS-Additive Latency
All At One Glance
The Grand Picture
Winners And Not-So-Winners

Your Comments?

 DDR II   
A Technology Overview
Article by MS, January 6, 2003
updated last: Nov. 28, 2006


The Grand Picture

It is cheap, it offers a lot of bandwidth and nothing will ever be like it was before. Low power is a great accomplishment, higher density chips will be a side product and the wealth of new features like OCD-calibration, differential clock forwarding and ODT spell innovation. Different form factors, mechanical and electrical interfaces as well as differences in the command sets eliminate any backward compatibility between DDR II and DDR I, at least for the end user but that's what is called progress and the same happened with the migration from SDRAM to DDR so we don't complain. Is DDR II going to be the solution for high speed memory? We think yes but in all the marketing hype of DDR II, there are enough points that are being played down by the DRAM manufacturers.

Latencies

Overall data throughput depends on bandwidth and latencies. Peak bandwidth is important for certain applications that employ mostly streaming memory transfers. Other applications with more random accesses will get more mileage out of low latencies. There are different types of latency, from looking at the timing diagram two pages ago, it is obvious that there is little or no impact of the read or CAS latency on overall bandwidth as long as the controller stays in page and can employ bank interleaving.

However, the graph also shows something else and that is the initial access time (tRAC) that, in this case, is 8 cycles until the first data pop out. Eight cycles are the sum of tRCD plus CAS latency that are given as 20 ns (4 cycles) and 4 cycles each in the DDR II product description of Samsung (weren't they the ones who were heralding DDR II anyway?). These initial access latencies amount to exactly the same as what we got rid off a few years ago when PC-100 memory became obsolete. With DDR533 parts, we are looking at CAS-4 or CAS-5 with the higher speed bin potentially being able of running DDR400 in 3:3:3 mode. No matter which way one looks at it, the entire specs have the word latencies written all over them in fat, bold, neon, blinking letters. This is just the beginning of the story, though.

Not everybody here is familiar with timing diagrams and to sum up a few latencies and their effects on overall performance, we have simply calculated times required for certain transactions in DDR I 400 (2:2:2; not an official spec but we have been able to run at this speed for months already) and DDR II 400 (4:4:4). tRAC is the initial access latency, that is the time after a bank activate until the first word is output, shorter is better and there is little doubt which technology will take the lead. Initial access latency is one issue but there are other relevant scenarios. Common situations will be Read (BL=4) / Write / Read (BL=4) sequences where the second read can either fall into the same page or else be a page miss and go to an alternate page. The number of transactions in both cases will be the same, that is eight reads and 1 write. We assume the same read-write turnaround latency of 2T for both DDR I and DDR II (even though this may favor DDR II) and the same data phase latency (delay after a write until a read command can be issued) of 3T. As we said, lower or shorter is better. The actual bandwidth will be the number of transactions (9 each) / ns. Keep in mind that we do not count CPU and address decode latencies on the chipset level in this case, neither are we considering single random reads (see below).

Single data SDRAM is capable of doing a burst of 1 (totally random, single bit read), bursts of 2, 4, 8 or full page. DDR I can do bursts of 2, 4 or 8. Because of the possibility of bank interleaving, we don't care too much about the full page burst but what happened to the burst of 1? The answer is quite simple, it is possible in theory but in reality we have a prefetch of 2 and, thus, yes, we can do a burst of 1 and throw away the second bit or quadword but we will not gain anything from it except for a hole in the transfer.

With DDR II, the prefetch of 4 and the double speed I/O buffers, we are looking at a much bigger bubble, that is, in case a single random read is needed, by definition, the bus utilization cannot be more than 25%. From the hole in the head perspective, we are dealing with 75% of waste in this scenario. Did we mention the word "ugly" before?

Bottom line is that there are applications where DDR II appears to be somewhat unsuitable. A very obvious scenario is the server market where more than 95% of all memory accesses are random. Take a DDR II-equipped server and compare it with an EDO machine and guess which one will take the lead in daily operations. Honestly, we don't know, which is why we are putting up this challenge.

Another scenario is the united memory architecture. We have been looking at performance degradation and enhancements depending on resolution and latencies and the very clear message is that the importance of latencies will increase in a proportional fashion with resolution. What that means is that at low (e.g. 800 x 600 x 16) resolution, it is still possible to do away with high latencies, however, already at 1024 x 768 x 32, shaving off of a single latency cycle can cause as much as a 50% performance increase in 3D applications.

next page:    => Some will win, some will lose .... =>

If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 1998 - 2007 LostCircuits