Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home

Xoxide Computer Mods
. .

Prices:

Mainboards

ABIT
ASUS
Chaintech
Shuttle
Soyo
Tyan

CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:








































































































































What are you
shopping for?



































































































































































LOSTCIRCUITS

SHORTCUTS:
A Process Shift
Brisbane by Numbers
Test Configurations
Memory Subsystem
Power Consumption
3D Rendering and Power per Renderpass
POV-Ray- Cinebench
AV-Encoding
3Dmark'06
F.E.A.R. DOOM3
FarCry
L2 Cache Latencies
The Plot Thickens
Final Thoughts

Give Us Some Feedback on this Review

 AMD's Brisbane Core - the Transition to 65 nm
And the cache latency
(Author: MS, January 5, 2007)

Summary

The latest line of CPU - codename "Brisbane" - is AMD's first foray into 65 nm interconnect technology. AMD claims that the process shift does not affect performance, only power consumption is greatly reduced. However, there has been some controversy regarding the effects of the process shift on cache performance, especially the Level 2 cache appears to be severely affected with respect to access latencies.

Sometimes numbers don't add up and sometimes, even if they do they don't really paint a consistent picture. We looked at the "Brisbane" from the top, from the bottom, turned it left and right and finally we suspected that some benchmarks are flawed. If we are right, then there is a small increase in cache access latency which is in the order of approximately 10% or 2 cycles between the Windsor "F" revision and the Brisbane "G" Rev. That's really all we can let out here, but there is more in the rest of the article.

ICs are constantly getting smaller using new and improved process technologies. Some 8 years ago, we witnessed the transition from 0.35 µm to 0.25 µm, later we saw the change from µm to nm in the metric, going to 110 nm, 90 nm and scaling down even further. At this point, Intel is ahead in the game with an established 65 nm wafer line that has already generated the "Presler" or 9xx versions of the Pentium4 and is cranking out legions of Core2 duos every day. At the same time the preparation of the next process shrink to 45 nm is already running at full throttle.

AMD is a bit behind in the shrink process, SOI appears to be a bit more stubborn when it comes to shrinking than conventional silicon. Moreover, the area where Intel is pioneering the shrink technology is the SRAM cache which abides by somewhat different rules than processor logic and is often considered the more problematic portion of the die when it comes to shrinking. Among other things, the addressing portion of any memory seems to scale not too well with smaller process technologies, bearing the somewhat counterintuitive side effect of increasing latencies as the area shrinks. For the record, though, AMD or rather the AMD-IBM alliance has also announced the schedule for the 45 nm process shift using immersion lithography and ultra-low-K interconnect dielectrics.

Die Shrinks, Logic vs. Memory

Memory, as mentioned above, does not scale as well as logic with smaller process technology, a case in point of modern processors is the on-die Level 2 cache. Particularly problematic appears to be the already mentioned increase in latency. On the other hand, the question is whether there is in fact a higher access times of the secondary cache and if there is, whether it really translates into much of a general performance hit. In this context, one has to consider that modern SRAM caches are capable of bursting data across the interface bus to the processor. In that case, just like in any DRAM streaming transfer, only the initial access latency will matter whereas the consecutive transfers are pipelined and the delays are hidden behind the previous transfers. Similar as in the case of system memory, the result may be only a minor performance hit even if the access latency is increased.

ADO, just as before, designates the lower TDP of 65 W which is the maximum power rating for the 65 nm "Brisbane" - based processors

In the case of AMD's cache, another factor playing into the cache access latency is the exclusive nature of the cache hierarchy. Depending on the benchmark or workload, some of the latency measurements may get skewed since an L1 victim may have to be written back to the L1 - which can show as additional latency. Needless to say that different access patterns, therefore, can show different latencies, among other things it depends on how intelligent the cache controller can handle the different requests. The latter primarily applies to multithreaded applications, though.

In the big picture, a cache is still an ultra fast memory designed for very fast access of recurrent data, bypassing the need to go out to the main memory in order to retrieve them. By extension, that still means that there is need to access data quickly and that includes not only bandwidth generated by a wide bus and high operating frequency but also a fast initial access latency - otherwise, the entire principle would defy its purpose.

Less Power

Aside from cache issues, a smaller design process is also generally considered to reduce power consumption. In reality, as we have mused occasionally, the situation is still somewhat different since any down-scaling also increases the passive current flow across the insulate, a phenomenon known as leakage current. On the other hand, a smaller design process also goes hand in hand with reduced voltage requirements of the parts. Voltage and frequency, in turn are the main contributors to the theoretical power consumption of any IC, depending on who one asks, the voltage factors in at the square or even cubic power.

In so far, any decrease in operating voltage will pay off substantially higher than a reduction in clock speed, e.g. a 10% lower core voltage will cause the power consumption to drop anywhere between 19 and 27 % - we always found that the cubic voltage equation was more accurate than the square voltage formula and the truth is somewhere in between.

More Die Per Wafer

Power consumption let alone, the other driving factor for the migration to smaller process technology is the number of dies per wafer. If a smaller process technology is used, the die size shrinks approximately with the square of the delta between processes and more dies come off one wafer, making it cheaper to manufacture the ICs. All of the above considerations are part of what went into the creation of AMD's latest core, codename "Brisbane"

next page: => Features by Numbers =>

All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 2002 - 2008 LostCircuits