The following are results of some simple performance tests of various PC graphics cards under Linux, and of 3 SGI systems for comparison. See the end of the page for specifications of the different machines.
The first set of tests use SGI's tenmillion.c program; this is a program written by Phil Lacroute of SGI to demonstrate the performance of the Infinite Reality graphics hardware. It reports two different statistics - the number of triangles drawn per second, and the number of pixels filled per second. The exact numbers given by tenmillion are almost as meaningless as a CPU's MIPS rating, but they do give some idea of the relative performance of the different systems, and are also good for seeing the effect of different options (triangle size, texturing, lighting, etc). The source code for tenmillion, slightly modified to compile under Linux, is here: tenmillion-linux.c.
Since tenmillion is concerned solely with getting the absolute maximum number of triangles per second, one thing it does not do is clear the window. For Table 3, I modified tenmillion to clear its window once per iteration, for the purpose of determining just how fast the glClear operation is. This can be significant in real-time graphics - if you want to achieve (for instance) a 30Hz frame rate, you only have 33.3 milliseconds to draw the entire frame; on slower systems just clearing the screen can take up a large fraction of this time.
Tables 4 - 6 show the results of a simple pfCAVE test program using various "real" models. Apple.pfb is a basic sample model originally in Inventor format; teapot.pfb, iris_truck.pfb, and or.6.pfb are from Performer's example data and Friends-of-Performer directories; Duomo.pfb, Theseis.pfb, and grazie-wire.pfb are all taken from existing CAVE applications. Duomo.pfb is moderately complex, with 11 different texture maps totalling about 3.1 megabytes; Theseis.pfb is somewhat pathalogical in that it consists of 15,000 individual triangles, with no meshing at all; grazie-wire.pfb consists of ~30,000 lines and no triangles. The models were all positioned to fill as much of the window as possible. The source code for the pfspeed test program is here: pfspeed.cxx
I ran the tests for Tables 4 - 6 with 3 different window sizes - 16x16, 640x480, and 1280x1024 (except for the Voodoo2 and Celeron, which could not do 1280x1024). The 16x16 test shows the basic transformation speed of the systems, as there is very little pixel-filling to do. The other tests show the effect of increased pixel-fill load.
Tables 7 & 8 are also based on pfspeed, but use very artificial models intended to stress the texture-mapping and pixel-fill performance.
manyTex64.pfb (Table 7) is a model consisting of 64 squares in an 8x8 grid that fills the window; each square has a different texture map on it. The resolution of the individual textures was varied - 8x8, 128x128, and 256x256, giving a total texture size of 8 kilobytes, 2 megabytes, and 8 megabytes. (Because the MaxImpact has 4 MB of texture memory, the effect of exceeding this can be seen in the significant speed drop with 256x256 textures.)
fill64.pfb (Table 8) consists of 64 large squares, stacked one behind the other; on average, each pixel should be touched about 30 times per frame (the squares are arranged so that Performer should be drawing them from back to front, but I can't guarantee that for every version of Performer).
12 pixel | 100 pixel | 1000 pixel | 12 pixel | 100 pixel | 1000 pixel | ||
vertex data only | light, texture, zbuffer | ||||||
Pentium SW | 85,000 | 17,000 | 2,000 | 10,000 | 1,500 | 170 | |
Pentium SW (double-buffered) | 140,000 | 23,000 | 4,600 | 12,000 | 1,700 | 180 | |
TNT | 285,000 | 160,000 | 69,000 | 270,000 | 109,000 | 36,000 | |
Voodoo3 | 889,000 | 496,000 | 95,000 | 506,000 | 249,000 | 40,000 | |
Tornado 3000 ** | 1,027,000 | 742,000 | 89,000 | 442,000 | 442,000 | 71,000 | |
GeForce2 | 12,809,000 | 2,264,000 | 505,000 | 7,096,000 | 1,132,000 | 200,000 | |
O2 | 941,000 | 586,000 | 66,000 | 204,000 | 144,000 | 18,000 | |
MaxImpact | 2,595,000 | 1,441,000 | 192,000 | 1,076,000 | 645,000 | 85,000 | |
Onyx | 11,009,000 | 2,783,000 | 395,000 | 5,970,000 | 2,383,000 | 340,000 |
12 pixel | 100 pixel | 1000 pixel | 12 pixel | 100 pixel | 1000 pixel | ||
vertex data only | light, texture, zbuffer | ||||||
Pentium SW | 1 | 1.7 | 2 | 0.12 | 0.15 | 0.17 | |
Pentium SW (double-buffered) | 1.7 | 2.3 | 4.6 | 0.15 | 0.17 | 0.18 | |
TNT | 3.4 | 16 | 69 | 3.2 | 11 | 36 | |
Voodoo3 | 11 | 50 | 95 | 6.1 | 25 | 40 | |
Tornado 3000 | 12 | 74 | 89 | 5.3 | 44 | 71 | |
GeForce2 | 154 | 226 | 505 | 85 | 113 | 200 | |
O2 | 11 | 59 | 66 | 2.4 | 14 | 18 | |
MaxImpact | 31 | 144 | 192 | 13 | 65 | 85 | |
Onyx | 132 | 278 | 395 | 72 | 238 | 340 |
850x850 window, no clear (msec/frame) | 850x850 window, clear (msec/frame) | clear time (msec) | clear speed (Mpixels/sec) | |
Pentium SW (double-buffered) | 62.98 | 69.70 | 6.72 | 107 |
TNT | 4.20 | 11.29 | 7.09 | 102 |
Voodoo3 | 3.02 | 5.12 | 2.10 | 344 |
Tornado 3000 (750x750 window) |
1.22 | 3.26 | 2.04 | 276 |
GeForce2 | 0.57 | 1.36 | 0.79 | 915 |
O2 | 4.34 | 6.59 | 2.24 | 323 |
MaxImpact | 1.50 | 2.68 | 1.18 | 612 |
Onyx | 0.73 | 1.15 | 0.42 | 1720 |
Apple.pfb | teapot.pfb | Duomo.pfb | iris_truck.pfb | Theseis.pfb | or.6.pfb | grazie-wire.pfb | Peak Triangles/sec | |
1704 tris | 2256 tris | 2298 tris 3.1 MB tex | ~6000 tris 1.2 MB tex |
15081 tris 1 MB tex | 62819 tris | 29841 lines | ||
Celeron SW | 155 | 106 | 57 | 51 | 13 | 7.8 | 15 | 490,000 |
Pentium SW | 269 | 177 | 89 | 80 | 19 | 12 | 22 | 754,000 |
TNT * | 98 | 64 | 40 | 35 | 6 | 3.2 | 6.7 | 210,000 |
Voodoo2 | 60 | 60 | 60 | 57 | 12 | 6.3 | 14 | 396,000 |
Voodoo3 | 60 | 60 | 60 | 60 | 14 | 8.6 | 17 | 540,000 |
Tornado 3000 | 262 | 138 | 58 | 56 | 8.9 | 8.1 | 22 | 509,000 |
GeForce2 * | 1000 | 717 | 276 | 344 | 45 | 35 | 38 | 2,199,000 |
GeForce2 (FSAA) | 1033 | 753 | 288 | 380 | 45 | 35 | 38 | 2,199,000 |
O2 | 60 | 60 | 30 | 30 | 7.4 | 6.6 | 15 | 415,000 |
MaxImpact | 60 | 60 | 60 | 60 | 15 | 12 | 38 | 754,000 |
Onyx | 60 | 60 | 60 | 60 | 30 | 20 | 49 | 1,256,000 |
Apple.pfb | teapot.pfb | Duomo.pfb | iris_truck.pfb | Theseis.pfb | or.6.pfb | grazie-wire.pfb | Peak Triangles/sec | |
1704 tris | 2256 tris | 2298 tris 3.1 MB tex | ~6000 tris 1.2 MB tex |
15081 tris 1 MB tex | 62819 tris | 29841 lines | ||
Celeron SW | 9.2 | 13 | 0.7 | 0.9 | 1.7 | 1.6 | 5.6 | 101,000 |
Pentium SW | 14 | 20 | 1.1 | 1.3 | 2.5 | 2.4 | 8.4 | 151,000 |
TNT | 48 | 46 | 31 | 24 | 6 | 3 | 6.6 | 188,000 |
Voodoo2 | 60 | 60 | 58 | 49 | 8.5 | 4.8 | 12 | 302,000 |
Voodoo3 | 60 | 60 | 60 | 60 | 14 | 8.6 | 17 | 540,000 |
Tornado 3000 | 253 | 139 | 56 | 52 | 8.9 | 8.0 | 21 | 503,000 |
GeForce2 | 822 | 662 | 249 | 322 | 45 | 35 | 38 | 2,199,000 |
GeForce2 (FSAA) | 291 | 337 | 221 | 202 | 45 | 35 | 38 | 2,199,000 |
O2 | 60 | 60 | 30 | 30 | 6 | 6 | 15 | 377,000 |
MaxImpact | 60 | 60 | 60 | 60 | 15 | 12 | 30 | 754,000 |
Onyx | 60 | 60 | 60 | 60 | 30 | 20 | 31 | 1,256,000 |
Apple.pfb | teapot.pfb | Duomo.pfb | iris_truck.pfb | Theseis.pfb | or.6.pfb | grazie-wire.pfb | Peak Triangles/sec | |
1704 tris | 2256 tris | 2298 tris 3.1 MB tex | ~6000 tris 1.2 MB tex |
15081 tris 1 MB tex | 62819 tris | 29841 lines | ||
Pentium SW | 3.6 | 5.7 | 0.3 | 0.4 | 0.8 | 1.2 | 3.9 | 75,000 |
TNT | 22 | 25 | 18 | 13 | 5.5 | 2.7 | 5.9 | 170,000 |
Voodoo3 | 60 | 60 | 60 | 60 | 13 | 8.6 | 15 | 540,000 |
Tornado 3000 ** | 159 | 132 | 36 | 50 | 8.8 | 7.9 | 21 | 496,000 |
GeForce2 | 235 | 267 | 185 | 172 | 45 | 35 | 38 | 2,199,000 |
GeForce2 (FSAA) | 73 | 87 | 59 | 57 | 44 | 33 | 37 | 2,073,000 |
O2 | 30 | 30 | 15 | 15 | 6 | 5.4 | 12 | 339,000 |
MaxImpact | 60 | 60 | 47 | 30 | 15 | 10 | 30 | 628,000 |
Onyx | 60 | 60 | 60 | 60 | 30 | 20 | 60 | 1,256,000 |
8x8 textures | 128x128 textures | 256x256 textures | 8x8 textures | 128x128 textures | 256x256 textures | 8x8 textures | 128x128 textures | 256x256 textures | |||
16x16 window | 640x480 window | 1280x1024 window | |||||||||
Pentium SW | 185 | 183 | 185 | 1.2 | 1.1 | 1.1 | 0.3 | 0.3 | 0.3 | ||
TNT | 295 | 297 | 1.7 | 112 | 104 | 1.7 | 37 | 34 | 1.7 | ||
Voodoo2 | 60 | 60 | 2.6 | 60 | 60 | 2.6 | |||||
Voodoo3 | 60 | 60 | 8.6 | 60 | 60 | 8.6 | 60 | 30 | 8.6 | ||
Tornado 3000 | 306 | 295 | 4 | 184 | 97 | 3.9 | 67 | 53 | 3.6 | ||
GeForce2 | 687 | 673 | 664 | 584 | 546 | 556 | 203 | 166 | 169 | ||
O2 | 60 | 60 | 60 | 44 | 30 | 30 | 15 | 12 | 12 | ||
MaxImpact | 60 | 39 | 6 | 60 | 30 | 6 | 60 | 30 | 5.5 | ||
Onyx | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 | 60 |
At 640x480 resolution, the number of pixels drawn should be about 10 million; 300,000 pixels are cleared. (There are 64 squares; the rendered size of the squares ranges from full window to 44% x 44% of the window.) At 1280x1024, about 43.5 million pixels should be drawn, and 1.3 million cleared.
8x8 texture | 128x128 texture | 256x256 texture | 8x8 texture | 128x128 texture | 256x256 texture | 8x8 texture | 128x128 texture | 256x256 texture | |||
16x16 window | 640x480 window | 1280x1024 window | |||||||||
Pentium SW | 45 | 43 | 43 | 0.05 | 0.05 | 0.05 | 0.01 | 0.01 | 0.01 | ||
TNT | 435 | 441 | 437 | 19 | 17 | 17 | 4.6 | 4.2 | 4.2 | ||
Voodoo2 | 60 | 60 | 60 | 8.6 | 8.6 | 8.3 | |||||
Voodoo3 | 60 | 60 | 60 | 15 | 15 | 15 | 4 | 3.8 | 3.8 | ||
Tornado 3000 | 597 | 583 | 597 | 11 | 9.5 | 8.4 | 3.3 | 3.2 | 3.0 | ||
GeForce2 | 1087 | 1094 | 1084 | 81 | 79 | 75 | 19 | 19 | 19 | ||
O2 | 60 | 60 | 60 | 3.5 | 3.4 | 3.3 | 0.9 | 0.9 | 0.8 | ||
MaxImpact | 60 | 60 | 60 | 12 | 12 | 12 | 3.2 | 3.1 | 3 | ||
Onyx | 60 | 60 | 60 | 30 | 30 | 30 | 10 | 10 | 10 |
* By default, the nVidia cards (and Mesa software rendering) do not synchronize their swapbuffers to the video vertical retrace, while all the other systems do. This is why in several tests the Onyx and other systems are limited to 60 frames/second, while the GeForce2 gets up to 1000+. It is possible to force the nVidia cards to sync to vertical retrace, and it is also possible to allow the SGIs to not do this; at some point I will re-run the tests using these options, in order to give a more complete comparison.
** Because the Tornado's display resolution was 2048x768, the tests using window sizes of 850x850 and 1280x1024 are not entirely valid, and ought to be re-done.
I also ran tenmillion on the GeForce2 with a triangle size of only 3 pixels. At this size, it achieved 20 million triangles/second. Presumably, with a much faster CPU (800 MHz or better), it might actually reach the 30 million triangles/second that nVidia claims for their chip.
Most of these tests were run in early May 2000, using the then current drivers. The GeForce2 and Tornado 3000 tests were run in August 2000, using nVidia's 0.9-4 drivers (for the GeForce2) and Xi Graphics "LGD" demo X server (for the Tornado).