|
AMD Phenom II X4 940: Compared to Phenom X4 9950 BE and Intel Core 2 Q9550 |
|
|
|
Written by Fedja Drndarski
|
|
Monday, 12 January 2009 |
|
Page 2 of 8
Cache Architecture
First Phenom CPU had only 2MB of slow L3 cache, and that is inadequate for today standards. Beside 65nm K10 core seemed to be a big bite for AMD that’s tough to swallow because of the size of core: 286mm2. Problems that occurred in beginning were: low frequency, low yield and inadequate performance levels.
Cache hierarchy is identical as on first Agena core but size of L3 cache is enlarged from 2MB to 6MB with lower latencies, although speed of L3 cache is lower compared to Agena core. Integrated Northbridge and L3 cache of Deneb core are operating at only 1.8GHz – faster than on 65nm Phenom that operated at 2.0GHz. At same clock, latency of L3 cache is lowered for, approximately, 7.5%, although this is not confirmed by AMD. L3 cache associativity is from 32-way raised to 48-way. This allows simultaneous caching of 48 different memory locations. Usually this approach results in prolonged access time to L3 cache but in this occasion this is not the case. L3 on Daneb core is larger, more efficient and better adopted to power management. L3 cache is shared between all cores and 3-step architecture of cache is built with goal to achieve better multi-thread and multitasking performances while allowing maximal amount of cache per thread.
AMD Phenom II now supports “C6 deep sleep power state”. If one of cores is not used is can be completely and totally powered down which resulted in 21% lower power consumption. Since L3 cache is “victim” cache, it can work at the same time as inclusive (holds copy of L1 and L2 cache data) or exclusive (doesn’t hold copies of L1 and L2 cache data) if one of cores is powered down, content of L1 and L2 cache is copied to L3, integrity of data is unchanged and other cores can access data from L1 and L2 cache through L3 cache. Size of L1 and L3 cache is unchanged since appearance of 65nm K10 CPUs, while latency for L1 is standard 3, and for L2 – 15 CPU cycles.
Power Management: Cool’n’Quiet
Phenom is first x86 quad core CPU that supports independent change of frequency of cores and northbridge. It also supports independent P-states. In reality this mechanism wasn’t used too frequently because operating system can move threads from one core to another. This resulted in poor performances of CPUs when Cool’n’Quiet was enabled because cores with lowered frequency were responsible for “half’ threads. Phenom II introduces different solution. If one specific core needs to work on full frequency, CPU doesn’t allow independent change of frequency of other cores. Then all 4 cores are operating at full speed (frequency). With Cool’n’Quiet enabled performance drop is minimal and cannot be larger than 2%. This is much lower performance drop then on 65nm Phenom CPUs, so now it is possible to watch HD movies with enabled Cool’n’Quiet. Phenom II has 4 P-states in contrast to older 64nm Phenom. This resulted in ability of Phenom II CPU to set frequency for each core at values: 800, 1800, 2300 and 3000MHz in case of our test Phenom II X4 940 that is operating at 3000MHz.
|