|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
|
| Intel P4 Extreme Edition Cache Size Matters | |
| (Review by MS, October 11, 2003) |
Cache is always good. There appears to be no single factor to boost performance as much as adding cache to the CPU. Moreover, caches are relatively easy to design, after all, it is basically a repetition of the same transistor sextett that forms a single SRAM cell over and again. In the case of the P4EE, over and again means 110 million times. Needless to say that it sounds easier than it is done in reality, otherwise, there would not be these legions of half-cached processors floating around, courtesy of a few dysfunctional SRAM blocks that are disabled. In addition, there is hardly any structure on the die that is as area-consuming as the cache, the same goes for the transistor count. It is almost one for Trivial Pursuit:
The correct answer is:
If it is relatively cheap to design a cache, it is relatively expensive to produce it, die size is die size and there is no difference whether the area is only a huge ground plane, a large on-die cache or an execution or scheduling unit, the production costs are the same. If size matters, wafer costs are one of the few areas where this is actually true.
If cache is costly, what does it really buy and what are the applications that profit from a large cache? In short, anything that uses recurrent data, that is, data that have been used and will be used again. Those data could be anything, there is no discrimination between spreadsheets or geometry for gaming as long as the necessary repetition of patterns is there, the cached architecture will be able to take advantage of it. This is, however, exactly the place where some of the media encoding applications fall short of finding much advantage. It is true that there are some repetitive data and patterns, however, those are not as pronounced as in actually programmed sequences of event.
OpenGL with HW enabled T&L is probably a classical example. Instead of the randomness of any "data captured from the random environment of reality", programmed sequences of event will show patterns of data that are replicated because nobody will ever sit and code every single pixel in machine language. Along with office applications, this is where the extra L3 cache will be very beneficial.

There are other applications, file servers, web servers will profit from more cache. The idea behind that is that on any given day, there will be one or two hot topics and, thus, 95% of all data accesses will go to 2% or less of all data. If the cache is large enough to hold those 2%, it will provide a huge performance increase. It will get critical, however, if the cache is getting too small to hold all data and, therefore, has to be evicted in order to make space for new information. That is, data that are going to be overwritten have to be written back to the main memory first and that is where caches can become costly also with respect to performance. On the other hand, since there is hardly a chance for the memory bus to be 100% busy all the time, these procedures can be carried out in idle periods and will hardly result in any performance hit on the system level.
This said, it is still not explicit where the preferred applications are for the P4EE, however, it should be somewhat implicit, even if there are certain caveats, since one or the other application may behave different from the norm.
Overall, the Extreme Edition and it's L3 cache is an intriguing concept and certainly a much more efficient way of managing the anisotopic distribution of data accesses than any of the equivalent proposals of integrating the L3 into the chipset; that would bring us back to another variation of the socket7 days scheme. However, even the on-die L3 is not the all-encompassing solution, especially when it comes down to compensating for high initial access latencies. In so far, the P4EE and its future versions may be able to ameliorate some of the shortcomings of DDR-II but they may not be able to solve the fundamental problems of inflated initial access latencies.
Last not least, there is the price point and the latest we heard was in the order of about US$ 930.-, high enough to scare away Creti and Pleti and warrant some exclusivity. We have seen it with the original RDRAM systems, there were enough people willing to pay the same price or even more, as absurd as it may appear in retrospect.
It is a great processor and made for some fascinating experiences in personal computing during the testing here but, at the same time, it has its price and rather limited availability.
next page: => More =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Thank you!