Navigate:

Advice
Beginners
BIOS Guide
CPUs
Links
Mainboards
Memory
Network
Storage
Video/Sound Cards

Contact
Forum
SiteMap
Sponsors
WebNews
Home
. .

Prices:

Mainboards

ABIT
ASUS
Chaintech
Shuttle
Soyo
Tyan

CPU
Intel
P4 2.4C-800
P4 2.6C-800
P4 2.8C-800
P4 3.0-800
P4 3.2-800

AMD
AthlonXP
XP 1700+
XP 2000+
XP 2400+
XP 2500+
XP 2700+
XP 3000+
XP 3200+

Athlon64
Athlon64 3200+
Athlon64 FX-51

Opteron
Opteron 240
Opteron 242
Opteron 244
Opteron 246

Memory

Corsair
Crucial
Kingston
Mushkin
OCZ

Search Prices:


























































































































LOSTCIRCUITS

SHORTCUTS:
Dual Cores
The HT Paradox
Smithfield
Intel 955X Overview
Test Configurations
CPU Performance
Memory Performance
WorldBench5
CineBench 2003
3dsmax
Lightwave [8]
TrueSpace & Multitasking
Gaming Performance
64-bit Performance
Power Consumption 1
Gaming Power
Max Power
Final Thoughts
Give Us Some Feedback on this Review

 Intel Pentium4 840 Extreme Edition and 840D
.... the name of the rose ...
(Review by MS June 20, 2005)
AMD Athlon 64 3800+ (Venice)

The HyperThreading Paradox

There is, furthermore, the issue of HyperThreading which gains completely new meaning in the context of SMP or dual core architectures by adding the flashy buzzword “Thread Level Parallelism”. Unfortunately, the predominant feature of HyperThreading still appears a solid amount of confusion regarding what it is all about. Briefly, and without going into detail about the silicon and architecture involved, HT is essentially a method to optimize execution of multiple threads using the existing execution units. Behind this somewhat trivial statement is the idea that in multithreaded applications the opening and closing of threads is the main bottleneck for delivering the data to the processing units. In other words, instead of implementing idle states or fairness algorithms to allow other threads to arbitrate for the execution units, most software is written somewhat poorly along the lines of “yes, we are multithreaded but multiple threads are scary and, therefore, we rather close one thread first before we really open the second”. That is still multithreaded but needs to be viewed in the context of somebody arbitrarily splitting a given workload in multiple chunks and calling them threads.


Consider a word processor and a very fast typist. Every time a key is pressed, a thread will be opened, executed and closed. If another application is running in the background, the opening and closing time overhead will prevent this other application from getting access of the execution units. On the other hand, adding “switching buffers” to keep multiple threads alive and pipeline them to the execution units while hiding the opening and closing of threads in the background will grant instant access of any second application to the execution units as soon as the instruction of the key command is done. This is pretty much what HT does. Note that this is a hypothetical example for illustration purposes only (and those on the Ace's forum).

A perfect example of thread level parallelism: Caligari TrueSpace rendering Adam Trachtenberg's "Vase" scene over four individual instances of logical CPUs, which includes two separate cores. The four different scan lines are perfectly obvious, likewise, all logical / physical CPUs are running at 100% load.

If we take this one step further to have two processors with HT, we end up with what is generally known as thread level parallelism or the capability to execute threads of a single application over several physically separate cores. It is easy to fathom the potentiation of the hypothetical benefits, however, there is also the risk of thread collision, that is, threads that would logically execute on one CPU which already has the data in the cache are all of a sudden routed to the second core, which then needs to check with the first CPU whether the data are in the cache, and if there are dirty bits, everything needs to be written back to the main memory before it can be loaded into the correct cache. That is somewhat counterproductive but a common scenario - once again caused by sub-optimal software code. We’ll have some examples later in this article.

Every coin has two faces and thread level parallelism is not necessarily where the P4 840 shines best. On the contrary, most of the marketing is geared towards parallel execution of single-threaded applications, e.g. browsing the web while running a virus scan on the email and playing a first person shooter. Each of these threads can be executed on one of the logical or physical processors without too much interference from other applications. As a result, it is possible to run some tasks in the background without experiencing a lag within the foreground applications, e.g. running some 3D rendering that can go on for hours while there is still enough horse power to write emails at a reasonable typematic rate. This particular “goodness” has been the secret of the dual CPU fanclub for years, something that is very difficult to document in the form of benchmarks but definitely noticeable to the user. Possibly even addictive..

Pentium 4 840D
(dual core)

next page: => Pentium4 840 ExtremeEdition Specs =>

All advice and educational articles on LostCircuits are free, but if you feel you can, please make a small donation to us!
Thank you!

General disclaimer: This page only reflects the author's personal opinion and assumes no responsibility whatsoever regarding any of the contents or any damages that may occur explicitly or implicitly from reading the contents of this site. All names and trademarks mentioned in this review are the exclusive property of the respective parent companies.
All contents of this site are protected by international copyright laws. Reproduction of the contents even in parts is not allowed except after written permission by the author and referral to this site.
Copyright 2002 - 2008 LostCircuits