Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home
. .

Prices:

Mainboards

ABIT
ASUS
Chaintech
Shuttle
Soyo
Tyan

CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:


























































































































LOSTCIRCUITS

SHORTCUTS:
PR-elude to the afternoon of a processor
Mommy! Look, No Pins
DDR2 Briefs
Power Plays
Sandra vs. Aida
Cachemem 2.65
A Neat Analyser
Intermission...
Give Us Some Feedback to Help Us Improve our Reviews

 Intel LGA775 SocketT
New and (Un) improved?
(Review by MS, July 28, 2004)
OCZ PC3200 DUAL-CHANNEL EL DDR 512MB(256X2)
400MHz DDR CAS2 - PLATINUM

Cachemem 2.65

Cachemem is interesting in that it measures CPU cycles that are used for each memory transaction and, moreover outputs an XY matrix where the block size vs stride length results are plotted against each other. Knowing the page size of 4kB it is relatively straightforward to correlate the relation between the mentioned parameters with page boundaries and associated page misses but a detailed analysis would exceed the scope of this article. Suffice it to say that by definition, prefetching has very little chance to play out its trump cards in Cachemem. For a better comparison, we have compiled the results to show two systems in one graph, lower is better in all cases.


i925 vs i915

i915 (transparent) vs i925 (solid). Lower is better.

In this graph we are showing the latencies in clock cycles rather than in ns. The reason is an easier correlation of the delta with the actual MCH clock cycles, which, at a multiplier of 17 are 17 CPU cycles by definition. As the results show, any read transaction that goes beyond either the chipset or the CPU buffers has approximately 17 extra CPU cycles delay on the i915 chipset, which suggests one extra pipeline stage either on the address / command or else on the data bus internal to the chipset. In this case we used a Pentium4 Extreme Edition.

For the rest of this analysis we leave the i915 chipset aside and use the i925 chipset at 4:4:4:12 (CAS:tRCD:tRP;tRAS) as reference.

i925 vs. i825

i925 (transparent) vs i875 (solid). Lower is better.

The graph shows a delta that is increasing with the stride length which shows the impact of the higher memory chip access latencies on each page miss. (Prescott 3.6GHz in both cases)

P4 2.4E (Prescott) At:

Reduced Latencies: 4:4:4 vs. 4:3:3

4:4:4 (transparent) vs 4:3:3 (solid). Lower is better.

The trend is similar to the one shown above, increasing stride length adds latency to each access. Note that the performance delta incrases with higher stride length as each doubling of the stride will also double the number of page missses compared to the previous block. This is a simple consequence of the limited DRAM (4 KB) page size, as used in current DRAM architectures.

The Intel reference board maxed out with the 4:3:3 settings and, in typical Intel fashion did not allow too many manipulations anyway. Just for the sake of the argument, we decided to play devil's advocate and see how much we could actually squeeze out of DDR2. In this case we used an ABIT AA8-3rd Eye Alderwood board with OCZ EB DDR2 memory running at 533 MHz at 3:2:2:8 and sychronous to the CPU (1066 MHz PSB)

Reduced Latencies: 4:4:4 vs. 3:2:2

The results exceeded our wildest expectations. Reduced chipset latencies combined with reduced memory access latencies push the 925 platform beyond what even the i875 chipset with DDR is capable (see below).

DDR2, running at 533 MHz with a synchronous host bus at 3:2:2 latencies bests the 875 chipset running at 400 MHz in synchronous mode at 2:3:2. It is somewhat unlikely that we are going to see this caliber of DDR2 going mainstream, after all, some manufacturers are already now calling their 4:4:4 products "incredibly low latency".

It is interesting to see that Cachemem is able to pinpoint the latency deltas between either the different chipset versions or else between the different latency settings on the DRAM device level by showing a constant offset or an exponential performance hit.

next page:       => Memory: A.N.A. =>

All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 2002 - 2008 LostCircuits