AMD's Bobcat and Bulldozer (updated) Print E-mail
Written by Michael Schuette   
Aug 16, 2010 at 07:00 PM


Bobcat is destined to be the bread and butter CPU for AMD in the low-power, net appliance market segment, however, AMD also needs to compete in the high performance desktop/ server market. The new design has the charismatic name Bulldozer, and like Bobcat, is built out of a limited number of design blocks. Bulldozer, however, is a completely new design based on dedicated and shared components on three hierarchical levels as in core, module and chip.

In short, the synthesis of Bulldozer can be understood easiest by going through a few slides:

The base step is to synthesize a standard core including all front end, integer and floating point units and caches. The integer units feature two integer execution units and two address pipes whereas each floating point unit features two 128-bit FMAC (fused multiply-accumulate) units to perform multiply add operations without rounding in between steps but instead using a single rounding at the end of the "fused" multiply-add operation.

The next step is to take two of the new cores and combine them followed by the consolidation of resources or deduplication of redundant elements. As as result, only the "dedicated" integer cores remain in dual configurations whereas the floating point units including schedulers, the fetch and decode units as well as the L2 cache are "deduplicated". On the back-end or chip-level, a L3 cache is added as common fast local memory shared between all "modules"

In "gross anatomical" terms, this means that Bulldozer is a monolithic dual core building block with support for two threads of execution. In terms osf functionality, compared to HyperThreading this means that latency sensitive integer operations are executed in parallel without leading to intermittent "hurry up and wait" scenarios. On the other hand, floating point operations, according to AMD being latency tolerant, are sharing resources wtih dynamic allocation of resources between threads.

Needless to say that two physical cores will typically allow higher througput than two virtual cores. At the same time, all shared resources can be fully allocated to a single thread if only one thread is active at the time. AMD-internal modeling of workloads predict roughly 80% of CMP performance on a significantly smaller area and power footprint.

Discuss this article in our forums

Last Updated ( Sep 14, 2010 at 03:48 AM )
<Previous Article   Next Article>