|
Written by Nebojsa Todorovic
|
|
Wednesday, 23 May 2007 |
|
Page 3 of 13 We'll try to make things more clear concerning the new AMD's “Unified Architecture”. Although the author of this article, unfortunately, didn't have the chance to be in Tunisia for the premiere of this graphics card, we hope for you to appreciate our attempt to transfer our knowledge to you. In the very beginning there is a “command processor”, which receives data from the drivers and has access to memory. The data further flows towards the “setup engine”, which consists of several different units.
The role of the setup engine is to prepare the instructions for the “Ultra-Threaded Dispatch Processor” (based on the type of the instructions), which further forwards shaders for the stream processors to process. The setup engine prepares pixel, vertex and geometry shaders, with only vertex shaders going through the tessellating process in the “Tesselator”. After entering the Ultra-Threaded Dispatch processor, shaders hold in the processing queue. Which type of shader is to be processed first is exclusively determined by the “arbiter” unit. Every SIMD (a block of 80 stream processors), and there are four of them, possesses a couple of arbiter units.
Processed threads are stored in the Shader Instruction cache memory, until a higher priority thread is processed. There is also another type of cache memory, namely the Shader Constant cache memory, used for storing often needed shaders. All of this heavily benefits the decrease of the response time within the GPU itself. Further movement of shaders is towards the SIMD. Inside the SIMD, up to six independent operations are performed, of which five are mathematical, whereas the sixth is the “flow control”. All of the operations are performed simultaneously inside a thread, except “Texture and Vertex Fetch” instructions, which are performed separately.
The task of a single block consists of 5 shader processors. The operation method is touted by AMD as “superscalar”, which should basically create an impression that it is better compared to nVidia. All five SPs can perform up to 5 scalar “multiply-add” instructions, while one of them (the “main” one) possesses the ability to execute transcendental instructions – SIN, COS, LOG, EXP and so on.”Flow” and conditional operations are controlled by the Branch Execution unit, in charge of the 5 SP block. Further on, the processed data is sent to the general purpose registers. Such data is divided into entry data, temporary values and output data.
|