|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS
|
|
| Intel Pentium4 "Prescott" Strained to the Silicon | |
|
(Review by MS, Feb. 1, 2004) |
|
Intel Prescott Starting at: |
Pipeline Length
The most overt difference between Northwood and Prescott appears to be the actual pipeline length to bring data to the processor core. Depending on the counting scheme applied, different numbers are floating around but on a 1:1 comparison basis, if Northwood pipeline stages are counted as 20, then the equivalent number of Prescott pipeline stages is 31. This increase in pipeline stages means that the number of clock cycles for data and instructions to reach the core has increased from 20 clock cycles to 31 cycles. In general, if nothing goes wrong, there is very little difference between a long and a short pipeline, however, if the wrong data have been predicted and are speculatively preloaded into the pipeline, these data have to be processed along the entire length of the pipeline as well before they can be evicted at the back-end. This naturally causes a bubble or delay and the size of the bubble or delay increases with the length of the pipeline.
Force Forwarding
In earlier steppings of the P4 feature a Load-to-Store forwarding mechanism determines whether there is a partial match between the cached data and those in the store forwarding buffer (SFB), using address comparator algorithms. If the cached data are older, then the newer data are pulled directly from the SFB for further processing after which the L1 cache is updated accordingly. There are possibilities that this mechanism does not work as advertised, as not all partial address matches will refer to the same data, likewise, an address misalignment may occur. The latest addition to this compare and update mechanism is, therefore, the so-called Force Forwarding which allows the Memory Ordering Buffer (MOB) through a forwarding-entry-selection multiplexing scheme to override the SFB selection decision and, thus, avoid wrong decisions early in the pipeline.
The obvious question is of course why even use a partial address match if it can cause problems, and the answer is rather simple. The smaller the partial chunk of address needed for the identification of the correct data in the cache is, the earlier will it be available. Therefore, the access latencies can be reduced on the basis of a partial "glimpse into the future", especially when it comes to determining whether the data are found within the L1 cache or whether there will be a "cache miss". If the partial address required increases in size, the probability for a correct match will naturally increase, too but this will be bought with a somewhat higher latency. In the end, it is like "Wheel Of Fortune", there is always a chance to wait until all characters are displayed but there is also a chance to cut ahead and come up with the right solution based on partial match and speculation.
Speaking of the Level 1 cache, the Willamette, the Northwood and the Gallatin core used in the ExtremeEdition feature a 4-way set-associative Level 1 data cache. In the case of the Prescott core, the L1 size has been increased to 16 kB and the associativity has been adjusted to 8-way.
Execution Units
The Integer execution units are largely unchanged, however, a shifter / rotator block was added to one of the ALUs. In addition, all P4 cores so far had to use the floating point unit in order to execute integer multiplications which resulted in extra latencies since the source operands needed to be moved from the ALU units to the FP side and the results had to be transferred back to the integer units. This issue has been solved with the addition of a dedicated integer multiplier.
next page: => Prescott Enhancements II =>
All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!