Hewlett Packard PA-8800 RISC Print E-mail
Written by Michael Schuette   
Nov 25, 2001 at 07:21 PM
Article Index
Hewlett Packard PA-8800 RISC
Page 2
Page 3
Page 4
Page 5

At last week's MicroProcessor Forum, HP's David J. C. Johnson unveiled the details of HP's latest RISC processor destined to redefine performance in Server-Class processors. Following a relatively simple strategy, the PA-8800 processor combines two PA-8700 cores on a single chip to enable symmetric multiprocessing (SMP) on a single processor. Aside from bumping the core speed up to an initial 1 GHz, enhancements include the addition of combined 35 MB L1+L2 cache. The L1 cache consists of 2 blocks of 750 KByte Instruction and data caches for each core for a total of 3 MB. A huge 32 MB L2 cache is placed off-chip on the same cartridge in the form of four 72 Mbit chips using EMS "1 Transistor SRAM technology". Conservative estimates about the performance of the new PA-8800 processor are in the range of 900/1000 SPEC 2000 int/fp units and 800,000 transactions per minute in server applications.

Block-diagram of the HP PA-8800 dual core RISC processor featuring a 128 bit, 400 MHz data rate bus interface with 6.4 GB/sec bandwidth. Each core has separate 750 kByte instruction and data L1 caches. The 32 MB L2 cache is off-chip and shared by both logic cores. In detail,the L2 cache is made up of four 72 Mbit "1 Transistor SRAM" or ESRAM chips using clam shell mounting (detailed explanation below).


Symmetric MultiProcessing (SMP) describes the presence of several identical processing units within a computer system, in contrast to other systems using for example one main and one co-processor (for example the 80386 and 80387 math co-processor). Multiprocessor systems have some problems, though, the most critical being the cache coherency, meaning that each CPU needs to make sure that the data or instruction copy in its cache is valid. The commonly used protocols to verify cache coherency are MESI or AMD's version MOESI. The drawback is still that no CPU can utilize the valid data contained in the other CPU's cache. One possible workaround for this problem is the addition of a backside shared cache as proposed by Multi Node Microsystems, however, this concept has not been implemented in real designs yet.

With the new PA-8800 RISC processor, HP is going a different route, that is, instead of using physically separate processors, the new concept involves placing two entire PA-8700 CPU cores in the same package. There is a certain sacrifice in terms of flexibility with such a concept since a single CPU cannot be purchased, on the other hand, since nobody uses a single CPU in a dual system anyway, it is actually a smart move that further solves a variety of problems, particularly since it enables the use of a shared L2 cache with high bandwidth access to the cache by both cores, which eliminates the bottleneck of the system / memory bus for access of valid data. A similar approach was taken by IBM with their Power4 processor that also uses a dual core on a single die.

There are some similarities between the IBM Power4 and the HP PA-8800 RISC processor. Both are running at 1 GHz clock frequency or greater and are built on SOI (Silicon-on-Insulator) technology. Minor difference between the two processors relate to the manufacturing process (180 nm SOI copper interconnect, 7 metal layers in the Power4; 130 nm SOI copper interconnect, 8 metal layers in the HP PA-8800 RISC)

Major differences between the IBM Power4 and the HP PA-8800 RISC processor are in the cache architecture. The Power4 uses 64 kB L1 cache per core and a shared 1.5 MB L2 cache with processor-to-L2 bandwidth of over 100 GB/sec. In addition, the IBM Power4 features an off-chip L3 cache of 32 MB.

The HP PA-8800 L1 cache is probably the biggest L1 that ever existed so far with separate 750 KBytes of data and instruction cache for each core. This results in no less of 4 blocks of ¾ MB density each for a total of an unprecedented 3 MB L1 cache, physically twice as much as the combined L1+L2 on IBM's Power4. Accordingly, the transistor count of the HP-PA8800 is with 300 Million transistors almost twice as high as the 170 Million transistors of the IBM Power4 and results in a die size of 23.6x15.5 mm2 or 361 mm2. The L2 cache of the PA-8800 is off-chip and consists of four 72 Mbit "1 Transistor SRAM" chips developed by Enhanced Memory Systems.

Last Updated ( Dec 12, 2008 at 03:44 AM )