Last week during CES in Las Vegas NVIDA held a sort of Deep Drive conference where they shared some new info regarding GF 100 – Graphics Fermi. As you have probably concluded by now, it is a codename for their upcoming series of graphic cards. Final name wasn’t reveled, but I think that nobody would be surprised to see something like GTX 360 and GTX 380 on the launch day, similar to 200 series launch. Speaking about launch day, that one wasn’t reveled either, actually, there were no more info other than “It is in mass production now” and “… it will be shipped this quarter”, so we are purely speculating when we say that Cebit time launch and late March availability wouldn’t be so much a surprise.
As you can expect from this kind of introduction, there were many interesting things that weren’t mentioned, like clock speeds, power consumption or prices. All of GF100 cards we could see were used for demonstrations and they were powered with one 8 pin and one 6 pin cable, and quite long. Once again, NVIDIA representatives stressed that this is no final product, and that there will be some tuning, but it wasn’t clear if it’s regarding power consumption, performance, size of the boards or all three in general.
|
|
Good news was that there were about a dozen of displayed systems and that least one had a triple SLI configuration in it, meaning that even in this early stage NVIDIA had enough samples for showing. However, as launch date is coming closer, factor if maturity of 40nm production process at their TSMC partner is good enough, will become more and more important. If yield is not high enough, it will take its toll on the price and availability of the products, and some inside info we have are talking about really low yields, about 20% of manufactured ones.
|
|
Now it is time to stop with speculations, and to talk a bit more about what NVIDIA have showed to us. After a short introduction from Drew Henry, first demonstration was a gameplay of Dark Void, a new title from Capcom. Basic idea was to show us a usage of PhysX effects, new and old ones. As these effects could be seen with older NVIDA VGAs that support PhysX, it’s a bit puzzling why they choose to start with this demonstration. As they told us, developer wanted to use particle effect as a support for devastation level poured on enemy with devastator gun. As you can see from the pictures, this effect is nice one, but somehow repetitive, and a bit slow comparing to the fast pace of the game. Second effect is called turbulence, and is hard to spot on still pictures, but is tied to combustion from jet engines.
|
|
The real fun started with Henry Morgan’s presentation, and we finally had opportunity to learn more about GF100. As expected, GF100 will bring full DX11 support, and as he said “… new geometry processing architecture that delivers eight times performance to support DX11…” As you will see a bit latter, it doesn’t mean that you will have this kind of performance increase overall, but just in some specific situation.
Best part of Morgan‘s demonstration was tied to Tessellation, new DX11 feature which will bring more realism to the games. If you haven’t heard of this before, here’s an explanation: everything is about a number of polygons that object is using to represent itself in gaming world. Obviously, if object is far away, there is no need for as many of polygons as should be if it is in front of view. Comparing to previous situation, where objects had the same number of polygons regardless of their position, new situation should improve quality of gameworld by increasing their numbers on the objects closer to the viewers. Another usage we already saw in Heaven benchmark, where Tessellation - used for Dragon figure, as well as for brick walls, roofs, etc - increased their looks.
Thing is, this kind of polygon count increase shouldn’t be expensive on performance side, or at least not as expensive as with only polygon number increase. Performance drop we experienced during our test of 5770 was around 50% (dropping from around 30fps to 15fps) in particular scene. NVIDIA is promising a much lower fps drop, and even give us a diagram comparing performance of GF100 and AMD 5870 in Heaven which you can check. Have in mind, that there are many unknown things about this diagram, like which configuration has been used, which drivers, display resolution, etc... Most important thing is that this is just a minute sample of whole demo, so the final figures for whole demo should be different. How much different - that info only NVIDIA have at the moment.
As they said, Tessellation implementation is different comparing to AMD solution, but resulting image will be the same. To successfully play new role, GPU architecture was drastically changed compared to one in GT200 as simply adding Tessellation to GT200 creates a geometry bottleneck. As you can see from the picture, new things like PolyMorph Engine, and Raster Engine were added. First one is composed of Vertex Fetcher, Tessellator, etc… components, and second one of Edge Setup, Rasterizer and Z-Cull. Interesting thing regarding PolyMorph engine is that with more cuda cores, efficiency in geometry related tasks is dropping. If application is increasing a number of cuda cores used for PolyMorph engine, performance gain won’t be directly related, but instead performance per core will be dropping if there are more cores.
|
|
Result is 8 times better geometry performance when you are comparing GF100 and GT200. They also compared 5870 and GF100 performance when geometric complexity is increased, but this must be taken with grain of salt. First, this is not the final product, meaning that performance of the card when it finally hit the market should be different than the one showed here. Second, this demos where developed by NVIDIA, and they are not the real world applications. Demos in question, cool as they are, were “Water” and “Hair”. First one showed a wig, with a lot of hair, and a real life effect on them. Second one showed a water demonstration, and we finally watched a new level of realism when we are talking about this kind of simulation. When this technique is used, performance drop depends on number of Tessellation, but it is roughly half performance for 500 to 1000 times complexity increase. NVIDIA guys told us that due architectural difference between GF100 and HD5870, when comparing just Tessellation performance, difference is about 2 to 6 times in favor of GF100. Only performance related figures shown in this part of presentation in application not developed for NVDIA was from Microsoft SDK for rendering cubemap faces. As GF100 can render six cubemap faces in one pass, it is 4 and 4.7 times faster when rendering ball and car objects from SDK.
|
|
Second part of this presentation was about GF100 architecture. As we already stressed before, NVIDIA representatives told us that “GF100 is in mass production”. Not exactly telling when it will be available, NVIDIA made enough room to finalize its specs ant tweak stuff that needed tweaking or in purpose of deceiving competitor. Most importantly, they didn’t give us info about clock speed for neither core, shaders, nor memory. There is no info about memory controller, and no word about power consumption. That’s why performance they have shown could be different when cards hit the market. On the other hand, there are many things that will not be changed, and never the less they are impressive. First, GF100 more than doubles the number of CUDA cores over the previous generation (512 comparing to 240). The geometry pipeline is significantly revamped, with vastly improved performance in geometry shading, stream out and culling. The number of ROP (Render Output) units per ROP partition is doubled and fillrate is greatly improved, enabling multiple displays to be used (more on this a bit latter). 8xMSAA performance is improved via enhanced ROP compression, and in occasions when it is not possible, due additional ROP units, better balance of overall GPU throughput is achieved even for portions of the scene that cannot be compressed.
|
|
You can check architecture of Streaming Multiprocessor as well as cache architecture. As you can see, GF100 has L1 cache while GT200 doesn’t have. Actually, there is 64KB of cache, where 16 or 48 KB of cache can be used for L1, depending if there is more need for physics and ray tracing calculation or for data cache to be reused among threads. This architecture improvements result in improved ROP and Texture performance when you are comparing to GT200.
|
|
GF100 replaces the traditional geometry processing architecture at the front end of the graphics pipeline with an entirely new distributed geometry processing architecture that is implemented using multiple “PolyMorph Engines”. Each PolyMorph Engine includes a Tessellation unit, an attribute setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph. Newly generated primitives are converted to pixels by four Raster Engines that operate in parallel (compared to a single Raster Engine in prior generation of NVIDIA GPUs). On-chip L1 and L2 caches enable high bandwidth transfer of primitive attributes between the SM and the Tessellation unit as well as between different SMs. Tessellation and all its supporting stages are performed in parallel on GF100, enabling breathtaking geometry throughput.
To improve Image quality, GF100 implements a new 32xCSAA antialiasing mode based on eight multisamples and 24 coverage samples. CSAA has also been extended to support alpha-to-coverage on all samples, enabling smoother rendering of foliage and transparent textures (like in Age of Conan for example).
(Roll the mouse cursor over the image to see the difference)
Shadow mapping performance is greatly increased with hardware accelerated DirectX11 four-offset Gather4. AI – path finding is greatly improved comparing to GT200, and now is more than 3x faster. This kind of GPU utilization is often used as example of how fast GPU can be in parallel tasks.
As NVIDIA takes a great care to promote their products not only as VGAs, but also as means to solve HPC style problems, it doesn’t come as surprise that in Ray Tracing demo GF100 scores around 3.5 times higher compared to GT200. Once again, it is done due efficient use of cache architecture, and sheer power of new GPU. Summarizing most important aspects of GF100 compared to previous generation, first to think of would be much higher geometry processing power, up to 8 times higher. Second, improving in Image quality with 32xCSAA and at the same time 3 times faster shadow map implementation. Third, almost 4 times better performance in the field of physics calculation, AI and ray tracing. And last, this should be the highest performance GPU on the market, with performance a bit better than GTX 295 @8xAA at high resolution.
As always, we prefer real life game performance comparison to synthetic, and finally, our wish has been fulfilled. On two identical systems, Far Cray 2 benchmark was running on 25x16 display resolution, with almost identical system setup. Only thing that was different was VGA card, where in first it was GF100, and in second it was 5870. During the demo run, there was obvious performance difference in favor of GF100, and figures showed that new NVIDIA champ scored a whopping 70% better figures then AMD. There were no other games on these systems, so we couldn’t compare performance in some other benchmark, but still, this kind of performance increase is really great!
They left best for the end: NVIDIA 3D Vision Surround was shown. Basically, it was a setup where 3D vision picture was shown on three displays. Probably due lack of outputs, a SLI configuration is a must for this setup to work. Game demoed was Need for Speed Shift, and we were hooked. It looked stunningly beautiful, and while in within car view, everything felt so naturally. Beside limitation regarding SLI setup, you must use three 3D Vision capable LCD’s or projectors, but with maximum resolution up to 1920x1080. In case 3D Vision is not your cup of tea, Surround is supported in higher resolutions, up to 2560x1600 pix. There is good news for owners of older NVIDIA cards as Surround vision will be possible starting from GTX260 and higher.
Final verdict is that GF100 is very strong performer. As far as we can say it will be much faster than 5870 and for sure much faster than 285. Unfortunately for NVIDIA, we are not talking September last year, we are probably talking March, some 6 months later. If they launched this card during last fall, this would probably be their biggest success, but now, future isn’t so bright. There are few things to consider, as this card is great performer but also very power hungry. All the cards used during this demo used at least 8+6 pin connectors, and although NVIDIA told us that these are not production sample, they also didn’t give us figures for power consumption. And, if GTX285 with 1.4 billion transistors in 55nm consumed more than 200W power, much bigger consumption about 3.6 billion transistors in 40nm shouldn’t surprise anybody. Another variable to add to equation is die size. With so many transistors (more than 150% comparing to 5870), die size will be significaly bigger than AMD’s, meaning fewer GPUs will be manufactured on the same waffle size. That will result with more expensive production per unit, and that there will be smaller yields in working GPUs. Maybe there is some truth to the rumors saying that yields are as low as 20%, but this we don’t know for sure. All we know is that we will have to wait for March to let NVIDIA give us this high performing product and to see how good pricing and availability will be. For the good of the market we believe that NVIDIA used those A0 and A1 silicon version to tweak things as good as they can be.

































