AMD's FX-8150 "Zambezi" - Bulldozer in Action Print E-mail
Written by Michael Schuette   
Oct 11, 2011 at 08:13 PM


Caches are probably the single-most important addition to any modern microprocessor. Past generations of AMD processor have mostly had 2-way associative 64 kB L1 data and instruction caches. The L1I cache organization has not changed – except that it is one L1I per Bulldozer module. The L1 data cache though has been slashed to 16 kB and it is now 4-way associative. Keep in mind here that the L1D caches are per core, that is, there are two of them per module and they are private. The L2 cache is 16-way associative and shared among the two cores of the Bulldozer module. As mentioned already, the L1 cache is write-through.

Cache Performance: Latency and Bandwidth

Aside from the cache size, the most important factors for performance are access latencies and bandwidth. As it turns out, the Bulldozer L1D clocks in at 4 cycles latency compared to the 3 clocks of Phenom II and the L2 cache weighs in a whopping 25-29 cycles, up from 15 cycles in the case of Phenom II. This does not bode well for performance regardless of which way one looks at it but the small L1D cache makes the L2 latency even worse. Depending on the test used, the cache access latency can vary. In the case of SANDRA a fully random pattern is used whereas other utilities are using in-page accesses with a regular stride pattern, in which case the access latency for for example the L2 will be around 20-22 cycles.

Zambezi vs. Phenom II cache access latencies, lower is better. The Phenom II is the clear winner here

Cache level latencies depending on access pattern. We measured three different access types, that is sequential, in page random and fully random which represent the three most common scenarios.

Zambezi vs. Phenom II cache bandwidth, higher is better. There is no question who has faster access to the data.

Discuss this article in our forums

Last Updated ( Nov 27, 2011 at 01:47 PM )
<Previous Article   Next Article>