
We've recently tried to explain and clarify most of the novelties brought to us by Sandy Bridge CPUs in an article, according to the information that we'd been able to acquire. Naturally, there were a few grains that have remained a mystery until the actual CPUs were physically available to us, so let's have a look at those.Besides the fact that overclocking will be very limited, and that only some of the CPUs (recognisable by the K suffix in the product name) will have the ability to reach for higher frequencies, we've found out that Intel is going to venture into the formerly AMD-exclusive playing field. It's no secret that the clock barrier has been reached, seemingly impossible to overcome in both present and foreseeable future CPUs, and that manufacturers have tried increasing performance by increasing the number of cores instead of their clock. This has led to current models coming with a maximum of six cores, but a staggering number of applications is still only able to use a single core, which has in turn led to a paradoxical case of affairs that dual-core models are still selling better than models with three, four or more cores. The presence of native quad-core models in Intel's latest series has never been questionable, but the interesting thing is that all models will be available as dual-core ones as well. Whether these CPUs will be those samples that end up with defective cores, or whether they will be fully functional, but with two cores physically cut off for the final samples to be sold as dual-core models, remains to be seen, but it's certain that they will be making an appearance on the market. According to current speculations, Intel will be making CPUs with four cores by default, later differentiating the models to quad-core or dual-core CPUs according to their needs. Either way, here's an excellent chance for more Core Unlocker potential, only this time, with Intel's models as well.
|
|
Perhaps one of the greatest advantages that the new generation has to offer as far as CPU architecture is concerned is the redesigned way of accessing Last Level Cache (LLC), more commonly known as L3 cache memory. Since this memory is to be common ground for all cores, System Agent (all components previously located in the northbridge) and GPU, there was the obvious problem of communication and memory usage. As a ring bus has been created for this purpose, all communication is bidirectional, so that communication can always be conducted the shortest possible way between the start and finish. Although the graphical display of the new architecture suggests that each core gets its own share of the LLC, this has turned out to be false, since an independent LLC memory is one of the key features of Sandy Bridge models, with the cores and the GPU accessing the part of the cache that contains the necessary data. For example, if we're looking at the third core, and we know that the information that it's supposed to process is located in the first and fourth part of the LLC, the communication is then bidirectional. When a request is sent for the data in the first part of the LLC to be fetched, the direction is clockwise, while the opposite goes for the data located in the fourth section of the LLC – the direction will be counter-clockwise in order to shorten the entire route.
As there are multiple cores and parts of L3 cache memory, it may happen at a certain point that one of the cores is waiting for lower priority information, located in the section of the cache containing high priority information processed by other cores. This sort of situation creates a data access queue. If the quantity of high priority information is too large, System Agent contains mechanisms to declutter the queue and allow lower priority information to be processed as well, in order to achieve optimal usage of all cores. We've already mentioned that the ring bus consists of four independent parts: Data, Request, Acknowledge and Snoop, each of these with a pre-assigned role depending on the nature of the ring. Having this in mind, it's obvious that the Data ring is the largest, since it's used for transferring information to and from the cores and the memory etc. Request, Acknowledge and Snoop rings don't have to bear large quantities of data, since they're mostly used for confirming/denying/checking specific operations.
The oft-asked question without a tangible answer in the previous months was the behaviour of the GPU when the CPU clocks are reduced in order to save power in idle state. As a reminder, LLC (L3 cache) memory works at the same clock as the CPU, which means that increasing the CPU clock also increases the LLC clock, providing a very fast L3 cache. However, since the GPU is also tied to the LLC, in case that the CPU is processing a graphically complex task using its GPU part only, which requires fast memory, what happens when the CPU clocks get “shot down” to their idle state values, therefore reducing the clock for the GPU memory as well? The solution is most simple. Since memory latency is no decisive factor in GPU performance, there are internal mechanisms that increase the CPU clock to the level required for the GPU to have an appropriately fast memory to process data.













