|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS
|
|
| Intel's Core 2 Quad Extreme Edition QX6700 Codename "Kentsfield" | |
|
(Review by MS, November 1, 2006) |
Back to the Core
Intel's "Core" architecture going back to the Pentium 3 design changed all of this. Increased processing efficiency in the form of instructions per clock cycle overtook the MHz race. In addition, the core became modular with sophisticated clock-gating mechanisms designed to turn on specific areas only on demand and last not least, there were a number of memory access optimizations. Particularly the latter have provided intriguing evidence that raw memory bandwidth is relatively unimportant for performance. A better way of phrasing this would be that smart memory access can look bad on paper when it comes to raw "Read" or "Copy" bandwidth, however, in real life applications, it can easily substitute for the latter. Interestingly, the term "smart memory access" is what Intel borrowed from this article to describe their new data management technology based on pattern analysis and intelligent prefetch of data - well, maybe they coined it first.
The Shared AGTL Bus
One issue that has plagued Intel's approach to SMP in the past relates exactly to the memory access issue. Since the days of the Pentium, Intel has relied essentially on the same host bus interface. The fact that the bus has become quad pumped to deliver four times the bandwidth does not eliminate the bottleneck created by four cores trying to access the system logic simultaneously. Moreover, there is the issue of cache coherency, meaning that all copies of the same data need to be the same, regardless of whether they are located in main memory or in the processor cache.

Core2 Duo (left) and Core2 Quad (right). Note the two separate sets of capacitors on each of the power buses of the two dies
Double Jeopardy!
On paper, we can play the word-association game, we have a quad-pumped bus that can be shared between how many cores? Double Jeopardy! Which is four? In reality, one thing has nothing to do with the other. There is still the shared bus and the raw bus frequency helps to speed up burst transfers but that's about it. In other words, there is still a single connection to the system logic and all cores are tied to it.
The Shared Cache
One thing that Intel did and that really hit the spot was the implementation of the shared Level2 cache. In contrast to any independent L2 cache, even that found on dual core AMD processors, Intel's Core architecture features a shared L2 cache, meaning that on a monolithic dual core die both CPU cores have access to the same cache. In terms of functionality, that means that if data are requested by Core0 that were previously modified by Core1, then Core0 can load these data from the cache location where they were stored by Core1 without the need of writing them out to the main system memory and then re-import them. Since all main memory accesses have to go through the host bus (sometimes erroneously referred to as front side bus), the shared cache - by virtue of eliminating the main memory write and subsequent read - greatly reduces the host bus traffic necessary for even housekeeping processes. In other words, what was a bottleneck in the case of independent L2 caches is now irrelevant.

The two dies and consequently all four cores share the same bus. If one core requests data, it embeds its ID into the ourgoing request which is replicated in the returning data that are broadcasted to all cores on the same bus. All cores that do not match the ID will ignore the data. Likewise, Windows offers the possibility to assign a specific core to a specific process or thread. One example that we are using in this article is Prime95 of which we are running four separate instances allocated to one specific core each.
Or almost irrelevant! There is still the issue of providing data to multiple cores which requires a certain infrastructure. However, the combination of a shared L2 cache with Intel's Smart Memory Access technology does allow to push things a bit further towards the next buzzword: "Intelligent Die Pairing". The result is that now four cores are sharing the same host bus and the same memory controller, and we are asking whether these four cores, despite all preemptive measures as in Smart Memory Access and L2 cache sharing can actually get traction. Especially since the cache sharing only applies to two cores each on the same die whereas the two dies still have to interact via the shared host bus.
![]() (BX80557E6300) |
next page: => Power Issues and Cache Size Questions =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!