Linux/CAVE Graphics Performance Tests

The following are results of some simple performance tests of various PC graphics cards under Linux, and of 3 SGI systems for comparison. See the end of the page for specifications of the different machines.

The first set of tests use SGI's tenmillion.c program; this is a program written by Phil Lacroute of SGI to demonstrate the performance of the Infinite Reality graphics hardware. It reports two different statistics - the number of triangles drawn per second, and the number of pixels filled per second. The exact numbers given by tenmillion are almost as meaningless as a CPU's MIPS rating, but they do give some idea of the relative performance of the different systems, and are also good for seeing the effect of different options (triangle size, texturing, lighting, etc). The source code for tenmillion, slightly modified to compile under Linux, is here: tenmillion-linux.c.

Since tenmillion is concerned solely with getting the absolute maximum number of triangles per second, one thing it does not do is clear the window. For Table 3, I modified tenmillion to clear its window once per iteration, for the purpose of determining just how fast the glClear operation is. This can be significant in real-time graphics - if you want to achieve (for instance) a 30Hz frame rate, you only have 33.3 milliseconds to draw the entire frame; on slower systems just clearing the screen can take up a large fraction of this time.

Tables 4 - 6 show the results of a simple pfCAVE test program using various "real" models. Apple.pfb is a basic sample model originally in Inventor format; teapot.pfb, iris_truck.pfb, and or.6.pfb are from Performer's example data and Friends-of-Performer directories; Duomo.pfb, Theseis.pfb, and grazie-wire.pfb are all taken from existing CAVE applications. Duomo.pfb is moderately complex, with 11 different texture maps totalling about 3.1 megabytes; Theseis.pfb is somewhat pathalogical in that it consists of 15,000 individual triangles, with no meshing at all; grazie-wire.pfb consists of ~30,000 lines and no triangles. The models were all positioned to fill as much of the window as possible. The source code for the pfspeed test program is here: pfspeed.cxx

I ran the tests for Tables 4 - 6 with 3 different window sizes - 16x16, 640x480, and 1280x1024 (except for the Voodoo2 and Celeron, which could not do 1280x1024). The 16x16 test shows the basic transformation speed of the systems, as there is very little pixel-filling to do. The other tests show the effect of increased pixel-fill load.

Tables 7 & 8 are also based on pfspeed, but use very artificial models intended to stress the texture-mapping and pixel-fill performance.

manyTex64.pfb (Table 7) is a model consisting of 64 squares in an 8x8 grid that fills the window; each square has a different texture map on it. The resolution of the individual textures was varied - 8x8, 128x128, and 256x256, giving a total texture size of 8 kilobytes, 2 megabytes, and 8 megabytes. (Because the MaxImpact has 4 MB of texture memory, the effect of exceeding this can be seen in the significant speed drop with 256x256 textures.)

fill64.pfb (Table 8) consists of 64 large squares, stacked one behind the other; on average, each pixel should be touched about 30 times per frame (the squares are arranged so that Performer should be drawing them from back to front, but I can't guarantee that for every version of Performer).


Table 1. Original tenmillion - triangles per second

12 pixel 100 pixel 1000 pixel 12 pixel 100 pixel 1000 pixel
vertex data only light, texture, zbuffer
Pentium SW 85,000 17,000 2,000 10,000 1,500 170
Pentium SW (double-buffered) 140,000 23,000 4,600 12,000 1,700 180
TNT 285,000 160,000 69,000 270,000 109,000 36,000
Voodoo3 889,000 496,000 95,000 506,000 249,000 40,000
Tornado 3000 ** 1,027,000 742,000 89,000 442,000 442,000 71,000
GeForce2 12,809,000 2,264,000 505,000 7,096,000 1,132,000 200,000
O2 941,000 586,000 66,000 204,000 144,000 18,000
MaxImpact 2,595,000 1,441,000 192,000 1,076,000 645,000 85,000
Onyx 11,009,000 2,783,000 395,000 5,970,000 2,383,000 340,000

Table 2. Original tenmillion - Mpixels per second

12 pixel 100 pixel 1000 pixel 12 pixel 100 pixel 1000 pixel
vertex data only light, texture, zbuffer
Pentium SW 1 1.7 2 0.12 0.15 0.17
Pentium SW (double-buffered) 1.7 2.3 4.6 0.15 0.17 0.18
TNT 3.4 16 69 3.2 11 36
Voodoo3 11 50 95 6.1 25 40
Tornado 3000 12 74 89 5.3 44 71
GeForce2 154 226 505 85 113 200
O2 11 59 66 2.4 14 18
MaxImpact 31 144 192 13 65 85
Onyx 132 278 395 72 238 340

Table 3. tenmillion with clearing

850x850 window, no clear (msec/frame) 850x850 window, clear (msec/frame) clear time (msec) clear speed (Mpixels/sec)
Pentium SW (double-buffered) 62.98 69.70 6.72 107
TNT 4.20 11.29 7.09 102
Voodoo3 3.02 5.12 2.10 344
Tornado 3000
(750x750 window)
1.22 3.26 2.04 276
GeForce2 0.57 1.36 0.79 915
O2 4.34 6.59 2.24 323
MaxImpact 1.50 2.68 1.18 612
Onyx 0.73 1.15 0.42 1720

Table 4. Various Performer models, 16x16 window

Apple.pfbteapot.pfbDuomo.pfbiris_truck.pfb Theseis.pfbor.6.pfbgrazie-wire.pfb Peak Triangles/sec
1704 tris2256 tris2298 tris
3.1 MB tex
~6000 tris
1.2 MB tex
15081 tris
1 MB tex
62819 tris29841 lines
Celeron SW 1551065751 137.815 490,000
Pentium SW 2691778980 191222 754,000
TNT * 98644035 6 3.26.7 210,000
Voodoo2 60606057 126.314 396,000
Voodoo3 60606060 14 8.617 540,000
Tornado 3000 2621385856 8.9 8.122 509,000
GeForce2 * 1000717276344 45 3538 2,199,000
GeForce2 (FSAA) 1033753288380 45 3538 2,199,000
O2 60603030 7.46.615 415,000
MaxImpact 60606060 151238 754,000
Onyx 60606060 302049 1,256,000

Table 5. Various Performer models, 640x480 window

Apple.pfbteapot.pfbDuomo.pfbiris_truck.pfb Theseis.pfbor.6.pfbgrazie-wire.pfb Peak Triangles/sec
1704 tris2256 tris2298 tris
3.1 MB tex
~6000 tris
1.2 MB tex
15081 tris
1 MB tex
62819 tris29841 lines
Celeron SW 9.2 130.70.9 1.71.65.6 101,000
Pentium SW 14201.11.3 2.52.48.4 151,000
TNT 48463124 6 3 6.6 188,000
Voodoo2 60605849 8.54.812 302,000
Voodoo3 60606060 14 8.617 540,000
Tornado 3000 2531395652 8.9 8.021 503,000
GeForce2 822662249322 45 3538 2,199,000
GeForce2 (FSAA) 291337221202 45 3538 2,199,000
O2 60603030 6 6 15 377,000
MaxImpact 60606060 151230 754,000
Onyx 60606060 302031 1,256,000

Table 6. Various Performer models, 1280x1024 window

Apple.pfbteapot.pfbDuomo.pfbiris_truck.pfb Theseis.pfbor.6.pfbgrazie-wire.pfb Peak Triangles/sec
1704 tris2256 tris2298 tris
3.1 MB tex
~6000 tris
1.2 MB tex
15081 tris
1 MB tex
62819 tris29841 lines
Pentium SW 3.65.70.30.4 0.81.23.9 75,000
TNT 22251813 5.52.75.9 170,000
Voodoo3 60606060 13 8.615 540,000
Tornado 3000 ** 1591323650 8.8 7.921 496,000
GeForce2 235267185172 45 3538 2,199,000
GeForce2 (FSAA) 73875957 44 3337 2,073,000
O2 30301515 6 5.412 339,000
MaxImpact 60604730 151030 628,000
Onyx 60606060 302060 1,256,000

Table 7. Many texture maps (manyTex64.pfb)

8x8 textures128x128 textures256x256 textures 8x8 textures128x128 textures256x256 textures 8x8 textures128x128 textures256x256 textures
16x16 window 640x480 window 1280x1024 window
Pentium SW 185183185 1.21.11.1 0.30.30.3
TNT 2952971.7 1121041.7 37 34 1.7
Voodoo2 60 60 2.6 60 60 2.6
Voodoo3 60 60 8.6 60 60 8.6 60 30 8.6
Tornado 3000 306 295 4 184 97 3.9 67 53 3.6
GeForce2 687 673 664 584 546 556 203 166 169
O2 60 60 60 44 30 30 15 12 12
MaxImpact 60 39 6 60 30 6 60 30 5.5
Onyx 6060 60 60 60 60 60 60 60

Table 8. Heavy pixel-fill (fill64.pfb)

At 640x480 resolution, the number of pixels drawn should be about 10 million; 300,000 pixels are cleared. (There are 64 squares; the rendered size of the squares ranges from full window to 44% x 44% of the window.) At 1280x1024, about 43.5 million pixels should be drawn, and 1.3 million cleared.

8x8 texture128x128 texture256x256 texture 8x8 texture128x128 texture256x256 texture 8x8 texture128x128 texture256x256 texture
16x16 window 640x480 window 1280x1024 window
Pentium SW 45 43 43 0.050.050.05 0.010.010.01
TNT 435441437 19 17 17 4.64.24.2
Voodoo2 60 60 60 8.68.68.3
Voodoo3 60 60 60 15 15 15 4 3.83.8
Tornado 3000 597 583 597 11 9.5 8.4 3.3 3.2 3.0
GeForce2 1087 1094 1084 81 79 75 19 1919
O2 60 60 60 3.53.43.3 0.90.90.8
MaxImpact 60 60 60 12 12 12 3.23.13
Onyx 60 60 60 30 30 30 10 10 10

System Specs

Other Notes

* By default, the nVidia cards (and Mesa software rendering) do not synchronize their swapbuffers to the video vertical retrace, while all the other systems do. This is why in several tests the Onyx and other systems are limited to 60 frames/second, while the GeForce2 gets up to 1000+. It is possible to force the nVidia cards to sync to vertical retrace, and it is also possible to allow the SGIs to not do this; at some point I will re-run the tests using these options, in order to give a more complete comparison.

** Because the Tornado's display resolution was 2048x768, the tests using window sizes of 850x850 and 1280x1024 are not entirely valid, and ought to be re-done.

I also ran tenmillion on the GeForce2 with a triangle size of only 3 pixels. At this size, it achieved 20 million triangles/second. Presumably, with a much faster CPU (800 MHz or better), it might actually reach the 30 million triangles/second that nVidia claims for their chip.

Most of these tests were run in early May 2000, using the then current drivers. The GeForce2 and Tornado 3000 tests were run in August 2000, using nVidia's 0.9-4 drivers (for the GeForce2) and Xi Graphics "LGD" demo X server (for the Tornado).


Last modified 11 August 2000.
Dave Pape, pape@evl.uic.edu