|
Advice Beginners BIOS Guide CPUs Links Mainboards Memory Network Storage Video/Sound Cards Contact Forum SiteMap Sponsors WebNews Home |
. | . |
Prices: Mainboards ABIT ASUS Chaintech Shuttle Soyo Tyan CPU Intel P4 2.4C-800 P4 2.6C-800 P4 2.8C-800 P4 3.0-800 P4 3.2-800 AMD AthlonXP XP 1700+ XP 2000+ XP 2400+ XP 2500+ XP 2700+ XP 3000+ XP 3200+ Athlon64 Athlon64 3200+ Athlon64 FX-51 Opteron Opteron 240 Opteron 242 Opteron 244 Opteron 246 Memory Corsair Crucial Kingston Mushkin OCZ |
LOSTCIRCUITS |
|
| HP PA-8800 RISC Processor SMP On One Chip | |
| (Review by MS, October 19, 2001) |
At last week's MicroProcessor Forum, HP's David J. C. Johnson unveiled the details of HP's latest RISC processor destined to redefine performance in Server-Class processors. Following a relatively simple strategy, the PA-8800 processor combines two PA-8700 cores on a single chip to enable symmetric multiprocessing (SMP) on a single processor. Aside from bumping the core speed up to an initial 1 GHz, enhancements include the addition of combined 35 MB L1+L2 cache. The L1 cache consists of 2 blocks of 750 KByte Instruction and data caches for each core for a total of 3 MB. A huge 32 MB L2 cache is placed off-chip on the same cartridge in the form of four 72 Mbit chips using EMS "1 Transistor SRAM technology". Conservative estimates about the performance of the new PA-8800 processor are in the range of 900/1000 SPEC 2000 int/fp units and 800,000 transactions per minute in server applications.

Block-diagram of the HP PA-8800 dual core RISC processor featuring a 128 bit, 400 MHz data rate bus interface with 6.4 GB/sec bandwidth. Each core has separate 750 kByte instruction and data L1 caches. The 32 MB L2 cache is off-chip and shared by both logic cores. In detail,the L2 cache is made up of four 72 Mbit "1 Transistor SRAM" or ESRAM chips using clam shell mounting (detailed explanation below).
Symmetric MultiProcessing (SMP) is in most cases associated with the physical presence of several CPUs within a given system otherwise, multiprocessing would not be possible. Multiprocessor systems have some problems, though, the most critical being the cache coherency, meaning that each CPU needs to make sure that the data or instruction copy in its cache is valid. The commonly used protocols to verify cache coherency are MESI or AMD's version MOESI as described recently in this MP article. The drawback is still that no CPU can utilize the valid data contained in the other CPU's cache. One possible workaround for this problem is the addition of a backside shared cache as proposed by Multi Node Microsystems, however, this concept has not been implemented in real designs yet.
With the new PA-8800 RISC processor, HP is going a different route, that is, instead of using physically separate processors, the new concept involves placing two entire PA-8700 CPU cores in the same package. There is a certain sacrifice in terms of flexibility with such a concept since a single CPU cannot be purchased, on the other hand, since nobody uses a single CPU in a dual system anyway, it is actually a smart move that further solves a variety of problems, particularly since it enables the use of a shared L2 cache with high bandwidth access to the cache by both cores, which eliminates the bottleneck of the system / memory bus for access of valid data. A similar approach was taken by IBM with their Power4 processor that also uses a dual core on a single die.
There are some similarities between the IBM Power4 and the HP PA-8800 RISC processor. Both are running at 1 GHz clock frequency or greater and are built on SOI (Silicon-on-Insulator) technology. Minor difference between the two processors relate to the manufacturing process (180 nm SOI copper interconnect, 7 metal layers in the Power4; 130 nm SOI copper interconnect, 8 metal layers in the HP PA-8800 RISC)
Major differences between the IBM Power4 and the HP PA-8800 RISC processor are in the cache architecture. The Power4 uses 64 kB L1 cache per core and a shared 1.5 MB L2 cache with processor-to-L2 bandwidth of over 100 GB/sec. In addition, the IBM Power4 features an off-chip L3 cache of 32 MB.
The HP PA-8800 L1 cache is probably the biggest L1 that ever existed so far with separate 750 KBytes of data and instruction cache for each core. This results in no less of 4 blocks of ¾ MB density each for a total of an unprecedented 3 MB L1 cache, physically twice as much as the combined L1+L2 on IBM's Power4. Accordingly, the transistor count of the HP-PA8800 is with 300 Million transistors almost twice as high as the 170 Million transistors of the IBM Power4 and results in a die size of 23.6x15.5 mm2 or 361 mm2. The L2 cache of the PA-8800 is off-chip and consists of four 72 Mbit "1 Transistor SRAM" chips developed by Enhanced Memory Systems.
next page: => The Core =>