|
Page 2 of 8
Memory Subsystem
As mentioned in our architectural overview of Clarkdale, the memory ancillary chip containing the memory controller and the graphics engine are essentially chipset implants onto the proessor package, rather than fully integrated CPU devices. Moreover, any unified memory architecture shares memory resources between the CPU and the grapics, which typically results in a minor loss in memory bandwidth available for the main processor - which after all is what is measured by most benchmarks. In the case of Clarkdale/Arandale, Intel is using the same proven principle of dynamic video memory technology as introduced several years ago with the original "Core" platform, that is, video memory is dynamically allocated from the system memory pool, with a minimum of 32 MB hard-allocated, and up to 1.7 GB available on demand. As soon as the graphis memory demand decreases, the allocated memory is relinquished and goes back to the system memory pool.
Memory Bandwidth
We used SiSoft Sandra 1626 to look at memory bandwidth with and without discrete graphics. Since Intel's DH55TC board with the current BIOS version does not allow any memory adjustments, we were stuck at 1066 MHz, which, in dual channel configuration still amounts to 17.1 GB/sec theoretical memory bandwidth.

Memory Bandwidth in MB/sec, higher is better
As it turns out, overall memory bandwidth is in the expected ballpark, with the integratd graphics only hogging a small fraction, which, in the grand picture should not have any impact on performance.
Memory Latency
Compared to a true "on-die" memory controller, we would expect the main memory access latencies to be quite a bit higher, that is, more in the ballpark of what we saw in the Penryn designs. In order to avoid time stamp mismatching caused by using the base frequency as baseline whereas the actual benchmark would run at the turbo-boot frequency, we disabled TurboBoost for the latency measurements.

Access latencies across the entire memory subsystem, lower is better
This is kind of surprising. The on-die caches are exactly where they are supposed to be, namely at 4,10 and 50 cycles latency (1.2ns, 3ns and 15ns) but the main memory access latency is showing as roughly 100 ns. We got similar data from a number of benchmarks including Sandra and Cachemem, which are roughly in the same area as what we saw in the case of the Core2 architecture. Integrated or discrete graphics did not make a difference in system memory access latencies.
|