Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home
. .

Prices:

Mainboards

ABIT
ASUS
Chaintech
Shuttle
Soyo
Tyan

CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:


























































































































LOSTCIRCUITS

SHORTCUTS:
Willy, Woody and Scotty
Coolers R'Us
Enhanced Power Management
Specs and Names
testConfiguration
NX vs. DX
Memory Bandwidth
Latencies
WorldBench5 -1
WorldBench5 -2
3dsmax
More 3D Rendering
Cinebench2003
Gaming
FutureMarks
Conclusions

Give Us Some Feedback on this Review

 Intel Pentium4 600 Series
(Review by MS February 21, 2005)
Intel P4 630+ At:

Overall, the 600 series is what the Prescott should have been from the very beginning. The most critical improvements are beyond a shade of doubt the Enhanced Power Management with dynamic frequency and voltage scaling to optimally configure the power consumption according to need rather than having a perpetual space heater running. Some of the numbers presented in the Intel collaterals don't make exactly sense, for example the same power figures for the 3.4, the 3.2 and the 3.0 GHz version appear to reflect the low power state, that is at the lowest multiplier / frequency and voltage - which are, in fact identical for all three models. However, that number should be the same also for the P4 660, yet we are looking at a jump from 85 to 119W. Whatever... we'll follow up on that. Likewise important is the migration to 64-bit processing and we have some preliminary 64-bit ported application benchmarks that show the P4 600 series in very favorable light, we'll have the full story shortly. The additional cache appears more eye candy than a real blockbuster, though, see below.


Pipelines, Caches and Predictions

One thing that is curious about the 600 series is that there is preciously little that is gained from the doubling of the cache and furthermore that the gains are diminishing with higher bus speed. It is necessary to keep in mind that the L2 cache of the Prescott is a write back cache, which could also be described as redundancy cache of the main memory. With the huge amount of bandwidth available in the DDR2 configuration, it is possible to use pattern analysis algorithms to do dynamic adaptive prefetching of data and instructions from main memory into the cache. However, the overall bandwidth depends on the processor side bus, that is, at 800 MHz, the maximum data transfer at any time cannot exceed 6.4GB/s. This does sound like a huge amount of bandwidth but in reality, READ transfers are interrupted by writes and the associated recovery and bus turnaround times, moreover, there are idle periods during which either no data requests are issued by the CPU or else, the memory management unit is busy sorting out where to get the data from.

If the next request comes in, an entire chunk of data is speculatively prefetched and that is where the difference in bus speed makes a difference in that at the lower speed the gaps are effectively used to fill the cache, the same happens at the higher bus frequency, but since the overall bandwidth is substantially higher, the processor’s dependency on data already in the cache is decreased. Still, a larger cache will increase the amount of data stored therein and, by extension, increase the chances for a cache hit.

Keep in mind here that the overall performance will also heavily depend on the accuracy of the branch prediction. That is, on average, every 7 instructions, a branch is encountered. The accuracy of the branch predictor varies with the application but on average 70% accuracy appears a reasonable approximation. The result is that statistically every 49 instructions, a misprediction will occur.

This sheds some light on the overall performance profile of the Prescott vs. e.g. Northwood core. With 31 pipeline stages, it is only 1.5 pipeline fills until the entire thing has to be flushed. In comparison, the Northwood with its 20 pipeline stages, can process two entire pipeline lengths until a misprediction occurs. In addition, the penalties are higher, 31 vs. 20 processor cycles during which nothing happens. Keep in mind that the probability is reset after every miss. By extension, a similar scenario holds for the L2 cache: size does not matter - if the content is wrong, more wrong content won’t help.

It is necessary to understand that the accuracy of the predictions heavily depends on the application running. In other words, in an application like SiSoft Sandra’s memory bandwidth benchmark, the algorithms are basically to match the address stride with the block size to scroll down columns within any open memory page. Other applications that fall into a similar category are some of the streaming multimedia applications. Prediction of the data flow is straightforward, there are few instruction branches and, therefore prefetching can be done with high efficacy. Needless to say that this is where the Prescott and the Prescott 600 shine, for most other applications, the extra L2 is not exactly an exercise in futility but it definitely shows very diminished returns.

With all these musings over the increased cache size and lack of effect thereof, it is very easy to forget that until there is a final product that can be tested, it is not possible to do anything more than a probably less educated guess about the impact on performance. Most of us have very little insight into the complexity of any modern IC design, from the actual design over the layout and netlist / verification to the final manufacturing process but suffice it to say that even with all libraries available, it is an effort of gargantuan proportions to redesign a processor. The raw number of 169 million transistors does not even touch on the complexity and difficulties and actual human efforts involved in getting the job done. And then, if the results don't meet the expectations and the new features don't qualify as deus ex machina, well tough, but life goes on. In the wake, there are still countless hours of overtime and barrels of coffee needed -- in other words, regardless of the outcome, lets pay some hommage to the designers and engineers that made things possible. We can still bitch at the marketing guys :)

Intel P4 Northwood 2.4
(Clearance Sales?)

next page: => More =>

All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 2002 - 2008 LostCircuits