The waiting is over. It’s been three months since AMD first presented their 28 nm graphics chip. Up to now, they’ve practically filled up the entire line-up of the Southern Islands family, and bar the lowest price range, we have Radeons 7xxx in all categories already available on the market. All these circumstances put a lot of weight and importance on NVIDIA’s new GeForce generation.
We can’t say we weren’t sceptical. Since the company has been applying a rather forceful approach of trying to stick in as many transistors as possible into a given GPU for several generations now, without any key architectural changes, it was quite realistic to expect that they’ll continue to thread the same wrong path. AMD has managed to change their direction on several occasions, bringing us VLIW 4 and 5, as well as the current GCN architecture, while NVIDIA was trying to improve Fermi through multiple iterations. The initial speculations speaking of a “doubled Fermi” sounded credible enough, therefore. Yet NVIDIA was very secretive about the new GPU, with no ground for any rumours to spring up on.
Contrary to what we expected, NVIDIA set out on an entirely different path, one of optimisation and balance, which is almost the exact opposite of what they’ve been doing in the past few years. Whether we’re talking about GeForce GTX 480 or GTX 580, NVIDIA was simply trying to uncompromisingly beat the competition, at any cost. This desire cost them so dearly that their cards became synonymous with high consumption and even higher temperatures, let alone the noise created by the coolers. After all, it’s no coincidence that Fermi was quickly renamed to “Termi” by the community. Luckily, the cards managed to remain competitive in the performance segment, so as long as the secondary aspects were disregarded, the company managed to sell enough of their products to keep up. GTX 580 was no power saver, but it still brought significant improvements over GTX 480 (and it was, as there was so much to improve). Things are looking different now, and one could say that NVIDIA has finally changed their philosophy, making Kepler a groundbreaking new GPU, not just a “doubled” one. The architecture has been significantly changed, and the same goes for the entire concept of what a high-end graphics card should be.
Fast and efficient
With experience from the past few years in mind, NVIDIA’s engineering team faced a very difficult task – creating a graphics card that would not only take the performance crown away from the competition, but also beat it in the energy efficiency segment. As the latter was a huge problem for the company in the past few years, it’s clear why we say it had to be a very difficult task. With all this, the cards had to switch to 28 nm chips, which is always a headache, especially in NVIDIA’s lithography record. This final move was a bold one indeed, as many, many things could have gone wrong. The new chip is made in TSMC’s 28 nm lithography, as expected. A surface of about 294 mm² houses 3.5 billion transistors. This is a major improvement over Fermi. Although it has only half a billion more transistors, Kepler has managed to place them all within a surface smaller by 226 mm² (that is correct, the GF100 GPU had a surface of 520 mm²). AMD’s Tahiti has a drastically higher number of transistors, around 4.3 billion, but also a much larger die size of 365 mm². This comes across as very surprising, as it’s always been NVIDIA that put out the bulky, large GPU.
Architecture 1, 2, 3…
The chip’s base still contains the GigaThread engine, used in previous GPUs as well, but the structure is a bit different now. Firstly, the basis is comprised of four GPCs, or Graphics Processing Clusters, which further consist of two SMX (Stream Multiprocessor-X) blocks. Each GPC is essentially independent, as it has almost all the characteristics of a GPU, with two SMX units sharing a single raster engine. Of course, as each SMX technically has 192 cores (i.e. stream processors), it’s clear that NVIDIA still relies on a high level of parallelisation in their GPUs. Each SMX has got its own separate PolyMorph engine, used to perform operations such as vertex fetch, tessellation, stream output and similar. There are also four warp schedulers present, as well as three types of cache memory. Besides the texture and uniform cache, there’s also 64 KB of L1 cache memory, while the L2 cache is common for al GPCs. Sixteen texture units are also placed at the SMX level, which means that their number will depend directly from the number of active SMXs. In other words, a 100% active Kepler GPU will have eight SMXs with 192 stream processors, which brings the total number to 1536! Yes, you’ve heard correctly, NVIDIA has finally managed to reorganise its architecture and triple the number of stream processors, despite a reduced die size and a lower number of transistors. Simple maths yield the number of 128 texture units in total (16 per SMX unit). ROPs are obviously not a changeable category anymore, so there are 32 of these, compared to 48 on the previous generation, which is about the only drawback “on paper”. As far as technology support goes, nothing has been left out, which means that Kepler fully supports DirectX 11, ShaderModel 5.0 and OpenGL 4.1. However, the best part of the story that this degree of overall optimisation has managed to reduced consumption below the levels previously seen on Fermi (and even AMD’s direct competitor), but more on that later.
NVIDIA’s strategy up to now was to clock stream processors to double the value of the rest of the chip. This was a good idea at first, but consequently proved very difficult to maintain, with minimum advances between generations. This is one of the direct causes of Fermi’s high consumption and TDP, but the situation has been reversed now. NVIDIA has brought back the old concept where the entire GPU works at the same clock, which brought down TDP and, at the same time, through changes in the architecture itself, enabled the number of stream processors to triple. Bear in mind that Kepler’s 1 GHz clock can be deemed as anything but low, though. In order to use the full potential of the chip, a new, “boost” clock has been introduced, differentiating it from the “base” clock. The latter is the default frequency that guarantees full stability 24/7. For GeForce GTX 680, this is 1004 MHz, and when the TDP leaves sufficient room, we’ve seen it go up to 1110 MHz for extended periods of time, which is an increase of about 10%. It’s interesting that temperature has a say in this dynamic overclocking process too, not just the TDP, so the warmer the card, the lower the boost clock. In short, as long as you keep it below 70 degrees Celsius, you’re in the clear. Above that, the boost clock will be reduced by 13.5 MHz for every five degrees, reverting to the base clock at 100 degrees. In real-world conditions, we haven’t seen the card go over the second degree, which is definitely commendable. Unfortunately for control freaks, the dynamic overclock can’t be turned off for the time being, and the same goes for the maximum TDP setting. The latter can be changed, though, in the -30% to +32% range, which is essentially the same concept as AMD’s PowerTune, except that the gap is a bit wider. NVIDIA claims that the safety levels are loaded inside the hardware itself and can’t be pushed any further, and time shall tell whether this is true. As dynamic overclocking can’t be turned off, the manual overclocking we all know and cherish is a thing of the past on NVIDIA cards. The only thing you can do to push your card further is to increase its offset value. The maximum currently stands at +549 MHz, which refers to the boost frequency the card will attempt to attain if the TDP and temperature allow. But the moment any value goes through any parameter’s limit, GPU will start reducing the clock to a comfortable value. The power limit has to be set to +32% to achieve optimum results, but overclocking has effectively been turned into a suggestion for the card rather than an order. Luckily, offset values can be applied to voltage too, and the drivers currently limit you to +0.1 V from the default value. Truth be told, this doesn’t function as bad as it may look on paper, but enthusiasts just won’t be that enthusiastic about NVIDIA’s new graphics cards anymore, at least not in the way they’re used to. Overclock has become a complicated affair relegated to the card itself, and no longer can the user attain the same level (and feeling) of control over the hardware. The card dynamically handles everything and the user is merely there to set values; whether these values can be met depends on many things.
The new AA and adaptive V-Sync
These changes are far from being the only ones; there’s also TXAA, a new antialiasing method combining parts of multisampling, temporal and AA resolve methods, all in order to achieve the quality of 8x MSAA with much less performance impact. Drivers contain TXAA 1 and TXAA 2 settings. In theory, the first one is supposed to give 8x quality at the “price” of 2x MSAA, while the other one gives an even higher quality with a 4x MSAA performance hit. In theory, this sounds ideal, as low multisampling AA levels impact performance very little in modern GPUs. There’s a catch, however. TXAA has to be supported and implemented by development teams, which means that it can only be used in optimised games. Therefore, the future of TXAA lies with the game creators. Still, as NVIDIA is pumping a lot of money into the gaming industry and has strong ties with game developers, chances are that this standard will come to life over time.
Another interesting feature is the adaptive V-Sync. It solves the problem when the fps rate drops significantly with V-Sync on, such as 60 to 30 fps. Although adaptive V-Sync doesn’t remove the visual degradation entirely, it manages to level it out to a reasonable level, and certainly way better than without it.
ASUS GeForce GTX 680
After removing the card form the packaging, we found it interesting to see an NVIDIA graphics card that doesn’t look monolithic. The production quality doesn’t follow the tradition either, and we have to admit that GTX 680 feels cheaper than GTX 580 or Radeon HD 7970. It seems that NVIDIA has taken this gamble because it knows that what lies underneath is sufficient to divert our attention towards the more important things.
And they’re right, in fact. The first info we got was that GTX 680 is in the same price range as AMD’s flagship model, Radeon HD 7970. The card is long, but still shorter than its direct competitor. Black shielding is all but usual in this category, and ASUS has merely affixed a sticker on top to make the card’s provenance known; that’s about as far as differences compared to the referent version go.
NVIDIA has installed a smaller blower-type fan on top of the GPU, spinning at as much as 4200 rpm, a rate always accompanied by significant noise. Although the automatic mode will never push the fan to this limit, many users prefer manual control, and we must say that the card is even noisier in this mode than previous-generation models, albeit not alarmingly so. Beneath the plastic shield, there’s a cooling profile made of copper and aluminium, with the fins and much of the cooling body being made of the latter, while only the vapour chamber base comprises copper. If you’ve missed on the concept earlier on, the vapour chamber technology is similar to the heatpipe one, but with the heatpipes lying directly on the GPU, whose heat permeates the liquid inside the heatpipes and changes its matter state depending on the temperature, which makes it move perpetually, thereby increasing the efficiency of the heat transfer from the GPU to the remainder of the cooler. The cooler itself is pretty complex; although single-cast, it covers the GPU, power section and memory chips all at once.
The PCB below is no less complicated, but it’s clear that NVIDIA has streamlined its design significantly. The layout looks very strange, for one. This is the first time that we’ve encountered a referent model with the power section placed horizontally instead of vertically. This comes across as even more surprising once you have a look at the power section itself and establish that it’s neither particularly complex (4-phase) nor of a particularly high quality (the coils could’ve been better).
The PCB’s simpler looks are also partly due to the 256-bit bus between the GPU and the memory. This was by far the biggest surprise, as it’s always been NVIDIA that forced wider buses and higher bandwidths; it’s almost as if AMD and NVIDIA switched roles this time around. While AMD finally opted for a 384-bit bus for their high-end card, NVIDIA picked 256 bits for GTX 680. In turn, the memory controller has been redesigned, allowing for much higher memory clocks. Fast GDDR5 memory chips are used, with a total capacity of 2 GB, which is also a relatively modest value for today’s high-end standards. However, they work at a frequency of 1500 MHz, which gives an effective 6 GHz. This is by far the highest referent memory clock ever seen on a graphics card; not a single GeForce or Radeon card worked at 6 GHz by default.
If you were expecting the new GeForce to be using a large number of additional power connectors, you’ll probably be just as surprised as us – only two 6-pin molex connectors need to be plugged in, which is perhaps the most interesting aspect of the new GeForce. Bluntly put, NVIDIA has finally managed to take the cake from AMD in the energy efficiency segment, which comes off as a total surprise. GTX 680 is declared to 195 W, which is a huge improvement over GTX 580, maybe even the most important one.
As far as connectivity is concerned, two SLI connectors suggest that tri-way SLI is an option with this card, and the card connects to the motherboard via the PCI-Express 3.0 standard. Two DVIs, one HDMI 1.4a and one DisplayPort 1.2 are there to ensure quality video output. Another first for NVIDIA – they are now dangerously close to AMD in terms of the number of monitors that can be connected to the card; four, to be precise, or three for 3D gaming, which previously took a SLI system with at least two graphics cards. All in all, the more we got to know GeForce GTX 680, the better we liked it.
Specifications and results
CPU: Intel Core i5 760 @ 3,8GHz
RAM: 2 x 4GB AMD Memory 1600MHz DDR3
MoBo: ASUS Maximuss III Extreme
Storage: WD 500GB
OS: Windows 7 64bit Ultimate
Driver: ForceWare 300.10 WHQL, Catalyst 12.3
As you can see from the charts, GeForce GTX 680 is often better than Radeon HD 7970, although it may look inferior to it on paper. They have the same number of texture and ROP units, but the number of stream processors is on Tahiti’s side, with a ratio of 3:4 (1536 on Kepler and 2048 on Tahiti). AMD’s competitor also has a wider bus, 384 bits, compared to Kepler’s 256.
|AMD Radeon HD7970 3GB||ASUS GeForce GTX680 2GB|
|1680x1050 4xAA 16xAF / 1920x1080 4xAA 4xAF|
|3DMark 11 (performance) GPU Score||3.569 / 2.787||4.349 / 3.417|
|3DMark 11 (extreme) GPU Score||3.085 / 2.470||4.044 / 2.946|
|3DMark Vantage (extreme) GPU Score||20.731 / 18.598||22.517 / 19.955|
|3DMark Vantage (high) GPU Score||22.979 / 20.409||23.452 / 20.889|
|Unigine Heaven 3.0 (DX11, high, normal tesselation) [fps]||71,9 / 65,7||78,8 / 71,1|
|Unigine Heaven 3.0 (DX11, high, extreme tesselation) [fps]||51,4 / 47,8||61,9 / 56,6|
|Crysis Warhead (DX10, enthusiast) [fps]||62,8 / 55,1||62,3 / 55,3|
|Crysis 2 (DX11, Ultra, HD textures) [fps]||88,2 / 80,2||91,5 / 84,3|
|Metro 2033 (DX11, very high, tesselation) [fps]||49,7 / 44,6||49,9 / 44,7|
|AvP DX11 (DX11, max) [fps]||66,8 / 60,3||63,7 / 56,8|
|F1 2010 (DX11, max) [fps]||74 / 71||96 / 89|
|Dirt 3 (DX11, ultra) [fps]||90,2 / 82,4||122,1 / 112,4|
|Lost Planet 2, Test B (DX11, high) [fps]||61,8 / 59,2||70,1 / 67,4|
|Potrošnja (idle / load) [W]||112 / 381||124 / 338|
The segment where NVIDIA has never emerged as the winner is now finally a reality; with about 40 W less compared to Tahiti, Kepler’s power consumption really sets new standards in the watt/performance and price/performance ratios. In simple terms, GeForce GTX 680 spends less electricity, costs less, and gives more.
We’re currently coming to terms with many novelties, primarily the dynamic overclocking, which tends to complicate things a bit. Still, the concept works great on its own, and we can’t say that there’s any real need for you to intervene. Our only reproach goes on account of the “keep users away” philosophy, which isn’t a direction we’d like the IT industry to take. We’re honestly hoping for a solution that would deal away with all these restrictions and give back enthusiasts the control over their dearly paid hardware, but that remains to be seen. Another thing that we could say GTX 680 lacks is something akin to AMD’s ZeroCore, but this remark fades away in a second when faced with all of the card’s advantages over the competition. All in all, NVIDIA has managed to justify and surpass all expectations and make up for the delays, and we can but hope that GTX 680 will be available in sufficient quantities; if Kepler’s yield turns out good, we can expect a merciless price war in the coming months, as AMD has few other arguments than to aggressively bring down their prices. ASUS’ model is just one in the fold, but since it’s the first one to bring us NVIDIA’s fantastic new product, the Editor’s Choice award shouldn’t and can’t be avoided.