|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS
|
|
| Intel's V8-Demo System "Octopussy" | |
|
(Review by MS, May 31, 2007) |
The Advanced Memory Buffer
The AMB itself is the resuscitation of the memory translator hub – only this time located on the memory module itself and on steroids with its dual unidirectional buses replacing the Rambus interface. On the module end of the AMB, the serial command, address and write frames are de-serialized and broken down into Byte-wide chunks of data that are written to the memory chips. During reads, two transfers from the 72 bit-wide DDR2 (or DDR3) DRAM array are serialized into a 168 bit long northbound frame. Since the transfer clock is 6 x that of the memory module, a 667 MHz FBDIMM system signals to the memory controller at 4 GHz.
Performance Predictions
Based upon the architectural properties as outlined above, it is possible to make a few predictions about the general performance of the FBDIMM system. Under standard load in non-streaming applications, the relatively poor host bus utilization of Intel’s CPUs will result in gaps between transmissions, each data transfer to and from memory will then face initial access latencies on the AMB level. Especially in a standard Windows environment where roughly 90% of all memory accesses are reads, the concurrent bus transfers on the dual-ported northbound and southbound buses will not result in any marked performance improvements over a standard DDR(x) architecture since modern controllers are already very good at scheduling and deferring writes into gaps between read transactions.

Figure reproduced from "Ganesh et al. (2007): Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling"
In streaming applications, the initial access latencies are masked by ongoing transfers and here is where the FBDIMM system could potentially best the standard DDR(x) architecture because of its four individual channels compared to only two channels on the DDR(x) architecture. In multi-processor scenarios with a shared host bus, the concurrent R/W capability becomes a moot point because the host bus is bidirectional and, therefore, one of the northbound and southbound buses will be idle because of the lack of requests / data to process. In multi-CPU systems with multiple independent host buses where each CPU processes its own task, the FBDIMM system will scale better than a DDR(x) system since the R/W contention is effectively eliminated.
The biggest advantage of the FBDIMM architecture is its scalability towards high system memory densities, though. Each recent generation of memory has seen a reduction in serviceable slots, DDR2 still supports 4 DIMMs, DDR3 officially only supports 2 slots (even though most boards coming out are still sporting 4 slots). In terms of total system memory, this is offset by the fact that chip densities have literally exploded, where 64 Byte or 128 Byte DIMMs in SDRAM systems were standard, we are now looking at 2 GB or even 4 GB modules for DDR3. However, also demand on memory has increased with the introduction of 64-bit operating systems beyond the maximum of 2 GB of user memory supported by 32-bit versions of Windows. This said, for 32-bit consumer operating systems the whole argument about support of higher densities is circular.
In contrast to the scaled-down slot availability of standard DDR(x) systems, FBDIMMs can support up to eight modules per channel in a daisy-chain configuration. On four channels, this results in support for 32 modules total, which is 8 x the number of modules supported on a DDR2 / DDR3 platform. The latencies can only be offset in scenarios of high channel utilization with the minor caveat of additional latencies based on bus topology (fixed vs. variable latency mode) which essentially requires multiple processors on multiple independent buses. In that specific system configuration, also the increased power demand (ca. 3-4W per AMB) may become less important in the grand picture even though it may result in an extra 100 W of power consumption on the memory system alone (without the memory itself). On the other hand, there is always some trade-off between power and performance and time will tell where the cutoffs are for one or the other solution.
![]() (BX80557E6300) |
next page: => Intel's S5000XVN Motherboard =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!