Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home
. .

Prices:
CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:








































































LOSTCIRCUITS

SHORTCUTS:
SMP and Data Management
The Platform Question
ASUS K8N-DL At One Glance
Layout and Bundle
More Layout and Peripherals
BIOS Overview
Opteron By Numbers / Test Configurations
Synthetic Benchmarks - SiSoftware
3dsmax - Lightwave
Cinebench
Worldbench5 (II)
Gaming Performance
Power Consumption
Final Words

Comment on this review on the LC Forums

 ASUS K8N-DL
SMP with Power
(Review by MS, Aug 7, 2005)
Summary

Symmetric MultiProcessing (SMP) systems are becoming more and more mainstream. Aside from dual core processors, there is still the entire genre of true multiprocessor platforms geared primarily towards the server and workstation market. In the Intel camp, this field is dominated by Xeon processsors, however, there are some design and functional limitations stemming from a single core design and its adaptation to SMP functionality.

One of the major forte's of the Athlon64 CPU family has been from the beginning that the entire CPU was designed with clustering in mind. Interconnects between multiple processors using lightning fast data transport for the exchange of Bytes and a combined memory space with individual subdivisions into nodes are the architectural basis of a design not adapted to but optimized from the very beginning for SMP.

It still appears that one of the main problems AMD has been facing in the past, that is, great processors hampered by the lack of chipset support still rears its head, existing platforms are somewhat stale, bordering on obsolete technology. This is where nVidia's nForce4 Professional core logic aims to fill the void. Currently, less than a handful of boards are out there but we have secured one of them, namely the ASUS K8N-DL and run it through the paces. Several BIOS revisions later, we are still not quite where we want to be ...


The PC market currently consists of gaming systems, office systems, and then there is the twilight zone of multiprocessor systems. We call it the twilight zone because everybody loves it, many get hooked and yet, there are a number of things that border on the preternatural. In the end, what it comes down to is that everybody who has ever used a dual processor system will swear that there is no way back to a single CPU. On the other side of the coin is the problem that try as you might, the applications that could show a potential benefit of symmetric multiprocessing can be counted on the fingers of maybe three hands.

A pair of Opteron 252's running at 2.6 GHz and featuring the dual Hypertransport interface needed for symmetric multiprocessing

Multiprocessor systems have employed different strategies to interface with the system logic. In the case of the classic Intel architecture, the two processors are tied to the same bus and, consequently, that bus is shared. By extension, this means that if the chipset containing the memory controller is residing on a shared host bus, then the memory bandwidth between the two CPUs is shared. In addition, there is the issue of cache coherency that requires a dedicated bus between the two CPUs called snoop bus. Suffice it to say that both CPUs need to know what each other has in its cache and whether the data have been modified from the version found in the main system memory. The reason is simple, each CPU can retrieve data from the memory but not from its counterpart's cache - unless those data have been written back to main memory. The protocol used in this case is called M.E.S.I, short for modify, exclusive, shared ,invalid.

AMD has always gone a somewhat different way, starting with a slight change in the cache coherency protocol to allow one CPU to Own data (Owned: This cache is the owner of the block, it must service all requests by other processors for that block), which extends the protocol name to M.O.E.S.I. Keep in mind though that M.O.E.S.I. is only the name for a protocol invented a number of years ago. Therefore, the Owned part of the protocol really only specifies that no other processor (or even core in case of a dual core CPU) can use the data that are flagged as "owned" and that those data need to be written back to the main memory from the "Owning" processor before they can be accessed by another CPU / core. On the level of hardware, the Athlon64 multi-way architecture has taken a quantum leap over that of the original Intel concept. Briefly, each CPU has two or three Lightning Data Transport (LDT, now called HyperTransport) links and two memory controllers for dual channel access of the system memory.

The HT links are used to talk to the main system logic and, more importantly, to allow each CPU to directly communicate with at least one other CPU. By extension, each CPU can piggyback on the memory access of the other CPU - as long as the operating system supports what is called non uniform memory architecture or NUMA. Any NUMA architecture is - by definition - organized into nodes, where each node is essentially a memory controller. It follows that each CPU sees near nodes (it's own memory controllers) and far nodes (those memory controllers on one of the other CPUs in the system) that are linked to via the HT interface. Needless to say that the data within the different CPU caches need to be kept coherent, that is, under no circumstances can two processors modify different aspects of the same data since in the final instance this would result in updating the overall data set with obsolete data. Combining cache coherency with NUMA gives us another acronym: ccNUMA for cache-coherent non-uniform memory architecture.

On paper, all of this looks straightforward, in reality, there are snoop buses, address comparator circuits and an entire slew of additional logic that needs to be implemented in order to warrant functionality, not to forget support on both the BIOS and the OS level. The result, however, is that each processor can access data throughout the entire physical memory space and, moreover, load or store data from near and far memory modules, thereby effectively almost doubling the bandwidth available. Keep in mind that the "node hopping" required to access far memory adds additional latencies that result in overall lower effective bandwidth.

AMD Opteron 246

ASUS K8N-DL

next page:    => ASUS K8N-DL / nVidia nForce4 Professional 2200 =>

If you enjoyed reading this article and found it useful, please consider making a small donation to LostCircuits.
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed