A GPU-centric Visualization Pipeline for Large-Scale High-Resolution Displays

Khairi Reda
mreda2 -at- uic -dot- edu



Source code can be downloaded from:


To get a copy of the data, please e-mail me at: mreda2 -at- uic -dot- edu



Large High-Resolution Displays (LHD) are becoming increasingly common in research labs, universities, governmental institutions, and commercial corporations. Thanks to their large surface and high resolution, these displays allow scientists to visualize extremely large datasets at their native resolution. Alternatively, scientists can display a large number of moderately-sized datasets side-by-side simultaneously, in what is referred to as the "small-multiples" layout.

The small multiples layout on LHD facilitate comparison between similar datasets, eliminating the need to switch between multiple windows. The increase in the amount of data displayed simultaneously seem to increase the performance of common information visualization tasks, even when the size of data outpaced the visual acuity of the human eye [Yost07]. These findings suggest that the number of datasets visualized in the small-multiples layout can be pushed even further than previously thought to go even beyond the perceptual limits of the human visual system. This calls in for additional research to design visualization and interaction techniques to take advantage of the full resolution of LHD, down to every pixel.

Recently, the hardware that drives LHDs is shifting from multi-node computer clusters to a single-node computer with multiple graphic cards. This shift reflects the increasing computational power of today's GPUs and the emphasis on multi-core parallelism. Despite the net increase in the graphical ability of these single-node, mutli-GPU machines, the architecture introduces a major challenge; the bandwidth between the CPU and GPU does not scale with the number of GPUs, making it a bottleneck in the visualization pipeline. Most info vis interactions (such as brushing, changing detail level, calculating the difference between two or more datasets) traditionally operated as early and as close to the data source as possible in order to reduce the amount of data that has to travel the vis pipeline. While this has been a good strategy when the limitation was the number of graphical points that can be rendered, the need to constantly update the GPUs with the latest copy of the data will lead to suboptimal performance with todays multi-GPU machines. Some solutions have been proposed to better lerverage the potential of GPUs in InfoVis [McDonnel09]. However, they mostly concern improving the speed of rendering or improving its quality, rather than shifting some of the computational burden to the increasingly powerful GPU.

Therefore, in order to take advantage of the hardware architecture that drives today's LHDs, InfoVis operations have to be moved-up in the vis pipeline so they operate as close as possible to the rendering stage. That is, theses operations have to be implemented in the GPU. The role of the CPU will then be restricted to loading the data, preprocessing it, packaging it in a format that is suitable for the GPU to operate on, and transferring it to the GPU memory (ideally, using a one time upload). The bulk of the work will be done by th GPU. This paradigm, if proven successful, will allow the amount of data rendered on LHD to scale up, while still maintaining high-levels of interactivity with the visualization.


Scenario: Ant trajectory analysis

We describe the Ant Experiment Browser, a prototype LHD visualization environment for the analysis of ant trajectories. Each trajectory represents the movement of a single ant (of the Eastern African Harvester Ant specie) which has been taken away from the ant colony. The ant will attempt to navigate the strange environment in order to get back to its friends. The data set has approximately 500 trajectories, with each trajectory ranging in duration between 10 seconds and 3 minutes. The Ant Experiment Browser displays the trajectories in a small-multiples layout, allowing an analyst to look at a large subset of the experiments simulataneously. This layout facilitates comparison between trajectories, which capture different aspects of ant behavior under a variety of experimental conditions.


Design and Implementation

This section describes the design and implementation of the Ant Experiment Browser. First, we describe the general idea and the expected benefits. The technical details of the implementation are then described.

Small-multiples on the GPU

The small-multiples layout lend itself very well for the single instruction, multiple data paradigm. For this reason, the GPU represents an ideal platform for this layout; a GPU computational task can be spawned to visualize each of the small views in the layout. The task can perform the entire visualization pipeline on the GPU, including filtering, visual mapping, primitive generation, and rendering. There are two main advantages to this model:

  1. The computation required to generate each of the small views can be performed independently, and in parallel with other views. Not only the rendering phase can be done in parallel, but also filtering, visual mapping, and interactions. This way a larger number of small views can be displayed interactively. These views can be displayed on the surface of a Large-scale High-resolution Display (LHD).


  2. In many visualzation systems (such as VTK), interactions incur a relatively large overhead. For example, when the user rotates a 3D volume, or removes some of its slices by filtering, the CPU performs the appropriate operations on the data, and uploads a modified subset of the data to the GPU for rendering. This incurs latency sometimes on the order of several seconds, which could be irritating to the user, discouraging exploration. However, when the computation required to perform those interactions (such as transformations, brushing, and filtering) happens on the GPU, the computational overhead is divided among the few thousand cores in the GPU which execute in parallel. Moreover, there is no need for the CPU to transfer a new copy of the data to the GPU. Thus, latency is greatly reduced and the frame rate remain high, even when complex interactions are performed. 

The combination of those two factors will allow a greater number of views to be displayed in the Ant Experiments Browser. Moreover, they will allow for more complex interaction schemes that focus on revealing the relationships between the views (such as linked-brushing and filtering).


The Ant Experiment Browser is implemented in C++, OpenGL, and GLSL. The GLSL component comprises the entire visualization pipeline (filtering, visual mapping, and interactions). The CPU is responsible only for loading the data from disk, indexing it, and uploading it to the GPU. The entire dataset, which consists of about 500 ant trajectories, is put into a single Vertex Buffer Object and uploaded one time to the GPU during the initialization. To render the small-multiples layout, the application prepares a Display List with OpenGL commands necessary for drawing the entire layout, and uploads it to the GPU. This display list changes only when the layout is changed (that is, when the user requests different trajectories to be displayed). The Display List contains only glViewport and glDrawArrays commands. From this point on, the application calls a single display list to render the entire layout on the screen. The rest of the work is done by GLSL programs.


Configurable small-multiples layout

The application allows the screen to be configured to a number of small-multiples layout with varying number of views. Current, 4 layout can be configured using a config file. The user can switch between them by pressing the appropriate number on the keypad: '1', '2', ... In a future implementation, the user will be able to specify an arbitary configuration from the GUI. The following figure shows a close-up of the small-multiples in a (24 x 6) configuration

Stereo rendering of trajectories

The trajectories are rendered in stereo. The X and Y axes display the movement of the ant in the experimental arena. The Z axis reflects time; trajectory points that were recorded later in the experiment are rendered further away from the screen. The trajectories will appear as if they are "reaching out" of the screen towards a viewer who is wearing a pair of polarized stereo glasses.

To accomplish the stereo, the data rendering display list is called twice with a slighly modified Model-View matrix to reflect bionocular disparity. A GLSL shader is responsible for interlacing the two images so they can be viewed correctly on a stereo display with a polarized line screen.

Cylinder rendering

By default, the trajectories are rendered using lines with solid colors. An alternative visual mapping uses cylinders. The cylinders are shaded with a per-pixel phong illumination. This enhances perception of the third dimension (time) as well as the direction of movement in the trajectroy. The cylinder is constructed entirely with GLSL. A geometry shader takes the raw trajectory and emits vertices to dress each segment of the trajectory with a cylinder. The fragment shader performs a per-pixel illumination of the cylinder. The following figure shows the same trajectory drawn with a line and a cylinder.

A trajectory rendered with lines (left) and cylinders (right)


The user can highlight a portion of the background using a paint-brush tool. This causes all trajectories in the layout to be highlighted when they move over an area that has been brushed. This offers the user the ability to look at similarities in the data across the entire layout. For example, the user can brush a portion of one interesting trajectory, which would cause trajectories with a similar pattern to be highlighted. Multiple areas can be brushed with a different color.

The linked-brushing feature is implemented at the fargment shader level. The fragment shader looks up a "brushed texture", which reflects that areas that have been brushed by the user. If a specific pixel in the brushed texture is turned on, the trajectory fragment shader will take its colors and apply it to the corresponding portion of the trajectory. The following figure shows the linked-brushing feature in action:

Linked-brushing with lines (left) and cylindres (right)

Time filtering

A time filter can be applied. This restricts rendering to parts of the trajectory during the specified time window. The time window can be selected using a range slider. The time filtering is also implemented in the fragment shader. The vertex (and geometry shader) pass the raw timestamp of each vertex, which is automatically interpolated across the primitive. The fragment shader discards any fragment that does not fall within the specified time window. The following figure shows this feature:

A trajectory the has been time-filtered to show only movement in the final moments of the trajectory

Time shading

To enhance perception of time, the line (or cylinder) can be shaded to reflect time (a darker color reflects points later in time). This feature is also implemented at the fragment shader level.

Time shading


User Interface

The user interface consists of a "Floating Panel", which can be moved across the screen. The panel contains a number of sliders which control the followings:


Performance results

The application has been tested in the Cybercommons, a 20 x 6 foot LHD display with a total resolution of about 8K x 2.5K. Due to limitations of the Xinerama (which allows for a large OpenGL window), the resolution utilized during the experiments was about 8K x 1.5K (roughly 2/3 of the display). Five different layouts were tested as outlined in following table. Each layout was tested in mono and interlaced stereo mode. Moreover, two visual mappings were tested: lines and cylinders. The stereo required twice the amount of data to be rendered. Therefore, we report the performance of mono and stereo experiments separately.

Layout (columns X rows) Approx. number of data points
in Mono mode. Stereo is 2x as much
4 X 9 = 36 18 K
4 X 15 = 60 30 K
6 X 24 = 144 72 K
9 X 30 = 270 135 K
12 X 36 = 432 216 K

The charts below show performance results under the above conditions. The X axis reflect the number of views in the layout, and the Y axis the frame rate in frames per second (FPS).

Even when almost the entire dataset is rendered (432 out of 500 trajectories), the frame rate remains at around 15 FPS. It should be noted that the current implementation relies on a single OpenGL context with support from Xinerama. This setup has proven to incur a singificant overhead on frame rate (in some cases, a 50x drop in performance). A future implementation of the application will use multiple OpenGL context and by pass Xinerama. Therefore, we expect the maximum achievable frame rate to be significantly higher than the reported numbers.

It is also worth pointing out that the difference between mono and stereo performance is small, despite the latter requiring 2x the amount of data to be rendered. This demonstrates the scalability of our approach.

The effect of interactions on performance was also tested. As predicted by the design, the effect of interactions on frame rate proved to be negligible, with no effect on the numbers reported above. For example, the linked-brushing did not have any effect on frame rate, regardless of the number of views in the layout, or the amount of "brushing".


Limitations and Future work

The major draw back in the current implementation lies in that fact that it uses a single OpenGL context to render the entire layout. While this works fine with laptop computers, it does not scale very well for LHD displays, which have much higher resolutions, requiring a much larger OpenGL context. Currently, the application runs on LHD displays with support form Xinerama. Xinerama proved to be inherently inefficient with large OpenGL contexts. A future implementation of the application will open multiple OpenGL contexts, eliminating the need to use Xinerama for window management, and thus increasing performance

One design aspect that I encountered early on in the project is designing a user-interface that scales well from laptop computers to LHD displays. While the current interface with the "Floating Panel" seem to work fine, in many cases the design choices I made were dicatated by time constraints rather than informed decisions on how users might want to use the application. Therefore, additional research is needed to determine the best UI design paradigm for this application, as well as for visualization applications targeting LHD environments in general.


[McDonnel09] B. McDonnel and N. Elmqvist. Towards Utilizing GPUs in Information Visualization: A Model and Implementation of Image-Space Operations. IEEE Transations on Visualization and Computer Graphics 15(6):1105-1112, 2009

[Yost07] B. Yost, Y. Haciahmetoglu, and C. North. Beyond visual acuity: the perceptual scalability of information visualizations for large displays. In Proc. of CHI '07.

Last update: April 16, 2012