Visualization Workshop


Papers at:


My suggestion is to add these papers to the currently assigned papers, write 5 line summaries for each paper and rank them according to their importance (ie., their relevance to us). Present *a maximum* of 3-4 papers (10 minutes per paper) which are most relevant and briefly mention the idea behind other papers. We will *not* have time to go over all the papers in detail.


If all your papers are good/relevant, somehow manage them within 40-45 minutes. Here's my short summary of our goals, if it helps to rank the papers: (this is for people who missed out on our meetings)


Our general goal

Š  We need to develop our next generation strategy for very large time varying volumes, to be rendered to scalable tiled displays, as part of the OptIPuter project.

Š  The goal is to be able to interact (at tolerable frame rates) with large seismic or microscopy data

Š  So we need to be aware of the cutting edge techniques or strategies out there for visualizing large data (parallel algorithms, architectures). look out for special techniques for time varying volumes

Š  We have to explore multi resolution techniques(e.g.: using wavelet or fractal compression) because its required to interact with large data

Š  We have to know all the hardware techniques used to maximize GPU usage



















1.K.L. Ma, "High Quality Lighting and Efficient Pre-Integration for Volume Rendering", Eurographics Symposium on Visualization, 2004

2.Sadiq, Kaufman, "Fast and Reliable Space Leaping for Interactive Volume Rendering", IEEE Vis2002

3.Sort First Distributed Memory, Parallel Visualization and Rendering

4.A Hardware Assisted Hybrid Rendering Technique for Interactive Volume Visualization

5.Time Critical Multiresolution Volume Rendering Using 3D Texture Mapping Hardware

6.Multiresolution Representation and Visualization of Volume Data



1.Klaus Muller, Arie Kaufman, "Empty Space Skipping and Occlusion Clipping for Texture-based Volume Rendering", IEEE Vis2003

2.Woodring, Wang, "High Dimensional Direct Rendering of Time-Varying Volumetric Data", IEEE Vis2003

3.Parallel Rendering with K-Way Replication

4.A Framework for Interactive Hardware-Accelerated Remote 3D Visualization

5.Accelerating Large Data Analysis by Exploiting Regularities

6.Multi-layered Image Cache for Scientific Visualization



1.K.L.Ma, "Visualizing Industrial CT Volume Data for Nondestructive applications", IEEE Vis2003

2.Kelly, K.L.Ma, "A Spreadsheet Interface for Visualization Exploration", IEEE Vis2000

3.A Hardware-Assisted Scalable Solution for Interactive Volume Rendering of Time-Varying Data

4.Multiresolution View-Dependent Splat Based Volume Rendering of Large Irregular Data

5.Distributed Interactive Ray Tracing for Large Volume Visualization

6.Visibility Based Pre-fetching for Interactive Out-of-core rendering



1.Kruger, "Acceleration Techniques for GPU based Volume Rendering", IEEE Vis2002

2.Viola et al, "Hardware Based Non Linear Filtering and Segmentation Using High Level Shading Languages", IEEE Vis2003



1.K.LMa, "Visualizing Very Large Earthquake Simulations", Supercomputing


3.Michael Bailey, Nadeau, "Visualizing Volume Data Using Physical Models", IEEE Vis2000

4.Efficient Out-Of-Core Iso-surface Extraction

5.TRex: Interactive Texture Based Volume Rendering for Extremely Large Datasets

6.Sort Last Parallel Rendering for Viewing Extremely Large Datasets on Tiled Displays

7.Interactive Rendering of Large Volume Datasets

8.An Application Architecture for Large Data Visualization: A Case Study



1.Balmelli et al, "Volume Warping for Adapting Isusurface Extraction", IEEE Vis2002

2.Qu, Kaufman, "Image Based Rendering with Stable Frame Rates", IEEE Vis2000

3.Interactive volume rendering using multi-dimensional transfer functions and direct manipulation widgets

4.Multidimensional Transfer Functions for Interactive Volume Rendering

5.Survey of parallel volume rendering algorithms

6.Efficient Implementation of Real-Time View-Dependent Multiresolution Meshing

7.Application Controlled Demand Paging for Out-of-core Visualization



1.K.L.Ma, "Interactive Exploration of Large 3-D Unstructured Grid Data", Report for ICase 1996

2.Guthe, Straber, "Real time Decompression and Visualization of Animated Volume Data", IEEE Vis01

3.Volume Clipping via Per-Fragment Operations in Texture-Based Volume Visualization

4.Real Time Volume Rendering of Time-varying data using Fragment-Shader compression approach

5.An Interleaved Parallel Volume Renderer with PC clusters

6.Compression Domain Volume Rendering



1.Boada, et al, "Multiresolution volume visualization with texture-based

2.octree", Visual Computer 2001

3.Interactive translucent volume rendering and procedural modeling

4.IBR Assisted Volume Rendering -

5.Multiresolution- Techniques for Interactive Texture-Based Volume Visualization

6.TRex: Interactive Texture Based Volume Rendering for Extremely Large Datasets




1.Greg Humphreys et al, "Chromium: A Stream Processing Framework for Interactive Rendering on Clusters", ACM Siggraph 2002



1.Parallel rendering




"Acceleration Techniques for GPU based Volume Rendering”

-       Integration of acceleration technique into volume rendering to reduce per-fragment operations (expensive ops, fill-limiting)

o      “Early ray termination” detection

§       terminate processing when sufficient opacity is reached

§       A lot of unused fragments: from 0.2% to 4% of fragment used in the final image

o      Empty-space skipping

§       Skip empty space along rays of sight

o      Done using a ray-casting on the GPU

§       3x improvement on ATI9700

-       Empty-space skipping

o      Apply before shader program executed

o      If depth value not modified, no shader executed

o      Skip the lighting computations, blending operations

-       Ray-casting

o      Intersection coordinate stored into a 2D texture

o      This texture used next pass, to restrict computations

o      Pass 1: entry point into the volume on bounding box (viewport)

o      Pass 2: ray direction computation on slices

o      Passes 3 to N: ray traversal and termination test

o      8 passes max, hardware limitation

-       Empty-space

o      Data structure to encode empty regions in the data

o      Blocks of 8^3, storing min and max values

o      Another 3D texture for this encoding (1/8 of size in each direction).

-       Results

o      Without optimizations, ray-casting VR is worse than slice-base VR

o      Works well for volumes with opaque and empty regions

o      Works well also for iso-surfaces, since stop criterion is simpler

o      From x1.3 to x3 performance increase on 256^3 volumes.



“Hardware Based Non Linear Filtering and Segmentation Using High Level Shading Languages”

-   Non-linear filtering for volume analysis and better volume understanding

o      MRI/CT volumes are noisy

o      Pre-processing needed

o      Linear filters, ie convolution: smoothing, edge detect, gradient estimation

§       Mean, Gaussian, Sobel, Laplacian,…

o      Non-linear: not convolution, i.e. dilatation, erosion, median,…

§       Here, edge-preserving smoothing

o      Result: binary mask for segmentation read-back to main memory or textures used for visualization

-   High Level Shading Language are being exposed

o      Vertex and Fragment processing

o      Cg, DirectX HLSL, OpenGL Shading language

-   GPU-based segmentation pipeline

o      Use textures and p-buffers are memory storage

o      Vector operations on these buffers

o      Resources are scarce: number of textures, number of coordinates

§       Careful optimizations

§       Fixed-point values: 12bits of MRI scanner into 16bit value

-   Results:

o      On GeForce 5900 Ultra vs. software on AtlhonXP 2200+

o      Complex non-linear filters: ~2x compared to software

o      Simple linear filters: ~10x or 15x compared to software




“Multiresolution Representation and Visualization of Volume Data”

Š      Highlights

o      Offline multiresolution volume data visualization using an SGI application

o      User can run an app that renders model at different resolutions and saves models for later interactive use

Š      Pros

o      Provides significant speedups for interactive rendering

Š      Cons

o      High requirements for memory and processing

o      Increases space complexity by 2.5 times at highest mesh accuracy

o      Offline and requires user to manually create different resolutions of models


“Sort-First, Distributed Memory Parallel Visualization and Rendering”

Š      Highlights

o      Sort-first distributed, parallel viz system using Chromium and OpenRM

o      Distributed scene graph with synch render ops using Chromium


Š      Pros

o      Scalable performance characteristics

o      Supports LOD

o      Sort-first uses less bandwidth than sort-last


Š      Cons

o      Hurt by jitter between rendering and computation servers

o      Poor blocking results in duplication of data

o      Lots of changes in view results in increased bandwidth needs



“Time-Critical Multiresolution Volume Rendering using 3D TextureMapping Hardware”

Š      Highlights

o      Multiresolution visualization using importance factors

o      Importance factors assist in an automatic LOD selection

o      Supports texture mapping hardware


Š      Pros

o      Can maintain a steady frame-rate

o      Subvolumes divided according to complexity and individually rendered at different LODs

o      Control algorithm has very minimal overhead


Š      Cons

o      Current work done on "small" datasets (done using single PC?)

o      Little difference between "low" and "medium" importance



“Fast And Reliable Space Leaping For Interactive Volume Rendering”

Š      Highlights

o      Fast, reliable space leaping method to accelerate ray casting for large volumes

o      Combines temporal and object space coherence


Š      Pros

o      Notable speedup in rendering

o      Usable for all volume grid types


Š      Cons

o      Generic algorithm but works well in empty scenes

o      Questionable image quality (figure 5) or bad PDF image

o      New object detection tends to fail when view changes too much between adjacent frames




“A Hardware-Assisted Hybrid Rendering Technique for Interactive Volume Visualization”

Š      Highlights

o      Rendering involves hardware-based texture mapping and point rendering

-       Geometry for large, smooth areas

-       Points for fine detail or fast change

o      Improved interaction frame rates


Š      Pros

o      Significant compression for storing data using hybrid method

o      Allows data to be stored entirely in graphics card

o      Displays images of reasonable quality


Š      Cons

o      Error calculation to adjust opacity is not completely straightforward

o      Possibility of incorrect color despite correct opacity for volume and point combinations

o      Transfer function seems to be very view-dependent and requires manual adjustment


“High-Quality Lighting and Efficient Pre-Integration for Volume Rendering”

Š      Highlights

o      Pre-integrated volume rendering technique that utilizes an improved lighting technique

o      Method takes O(n^2) instead of O(n^3)

o      Considers an "isoslab" (multiple isosurfaces) for sampling

o      Lighting behaves like Gouraud shading but lighting value is interpolated instead of normal


Š      Pros

o      Uses two tables for lighting interpolation-- specular and diffuse

o      Uses front and back sample planes to create properly combined lighting values with pre-integrated densities and colors


Š      Cons

o      Bottleneck in texture lookup

o      Rapidly changing or poorly-defined normals create minor lighting artifacts

o      Somewhat slower rendering (with respect to other pre-integrated methods) since algorithm is fill-rate dependencies




“Parallel Rendering with K-Way Replication”


“Accelerating Large Data Analysis by Exploiting Regularities”



“Multi-layered Image Cash”



“Framework for Interactive Hardware-Accelerated Remote 3D-Viz”



“Empty Space Skipping and Occlusion Clipping for Texture-Based Volume Rendering”





“Efficient Implementation of Real-Time View-Dependent Multiresolution







Survey of parallel volume rendering algorithms”

1.     Algorithm Control Flow

a.    View Reconstruction

                                               i.     Backward -Ray Casting

                                             ii.     Forward -Splatting

                                            iii.     Multipass Forward

                                            iv.     Fourier

b.    Outer Loop Data Space

                                               i.     Object Space

                                             ii.     Image Space

2.     Targeted Hardware

a.    Graphics (G)

b.    Volume Rendering (VR)

c.     Parallel Shared Address Space (PS)

d.    Parallel Distributed Address Space (PD)

e.    Distributed (D)

3.     Application Data Characteristics

a.    Input Topologies

                                               i.     Rectilinear (R)

                                             ii.     Curvilinear (C)

                                            iii.     Unstructured (U)

b.    Data Types

                                               i.     Scalar, Vector, Tensor

c.     Data Units

d.    Voxel Format

4.     Visualization Method

5.     Publication Specifics


Provides a nice list of references to parallel volume rendering algorithms. A bit out of date.


"Image Based Rendering with Stable Frame Rates"

Qu, et al, State University of New York at Stony Brook



Key-frameless voxel-based terrain rendering system.


Key-frameless rendering algorithm:


"Volume Warping for Adaptive Isosurface Extraction"

Balmelli, et al. - IBM





"Multidimensional Transfer Functions for Interactive Volume Rendering"


"Interactive volume rendering using multi-dimensional transfer functions and

direct manipulation widgets"

by Kniss, et al and Kniss, et al, University of Utah, respectively.





1. Efficient Out-Of-Core Isosurface Extraction


The paper presents an approach for parallel isosurfacing for out of core data. Load balancing is done – cells of the volume are classified as active (which figure in the isosurfaces) and non-active cells (which don’t). Data is split according to a range of isovalues and is distributed between processors in such a way that isosurface calculation is load balanced. A Volume is split into small blocklets, which are merged into variable sized blocks based on the range of isovalues. The granularity of access is hybrid, ie. blocklet size is chosen so that access is neither too coarse or too fine grained.


The load balancing is static based on a work estimation model – so there is overhead involved in redistributing blocklets between processors at run time. Experimental results show effects of block sizes, blocklet merging, load balancing and scalability with number of processors.


2. An application controlled demand paging for out-of-core visualization


The paper describes an application controlled paging scheme to dynamically load out of core data (that does not fit into main memory) on demand. Visualization algorithms with sparse traversal of the data sets benefit from this scheme. Data is divided into variable sized segments on disk (eg: a part of 1 time step can be stored as 1 cube file on the disk and loaded as fixed page sizes in memory. When a page is demanded into memory, adjacent pages are also pre-fetched to reduce access time.


Translation of 3D buffers into 1D space is useful in increasing hit ratio – a small sub-cube of the volume can be stored as 1 block (or page)


Results show that paged method is better than mapped methods and cubed storage (with a translation) is better than flat storage. An additional experiment shows a remote paging scheme over a network (NFS) instead of a local scheme. The remote paging scheme over a network performs at par with a local paging scheme from the disk.


3. Sort Last Parallel Rendering for Viewing Extremely Large Datasets on Tiled Displays


The paper presents a sort last strategy for rendering geometry on tiled displays. The general idea is that N processors running a T tile display generate T images, 1 for each tile, composite and displayed at the processors controlling the tiles. Polygons to be rendered at distributed on N processors and projection information is scattered, telling the processors which tiles their images should go to (many tiles will not have any geometry rendered onto them)


Four different strategies are described for composition

1.     Serial (every node in charge of a tile generates T images for that tile and composes) – worst case algorithm.

2.     Virtual trees – composition is done in several binary trees in parallel - the tiles done with composition drop the computation and join other trees. The scheduling is done so that processors with least number of images to send act as receivers and vice versa. A disadvantage is that during the final stage of composition, most processors are idle.

3.     Tile, Split and Delegate – assign a processor to a section of a tile – more processors are assigned for tiles which require more image composition – a disadvantage is communication cost is high - O(N2)

4.     Reduce to a single tile – images rendered at any processor are sent directly to a single processor (for each tile), where a binary swap algorithm is used to composite them. Communication time – O(N*T + NlogN) – more scalable


Optimizations are described (bucketing, active pixel encoding and floating viewport) and results presented show reduce strategy to perform better and scale with a linear speedup.


4. Interactive Rendering of Large Volume Datasets


The paper presents an interactive rendering method by hierarchically storing data in wavelet compressed form using octrees (children of any node are of higher resolution than the parents).


The compression involves 2 steps – wavelet representation of data and huffman, run-length or arithmetic encoding to further reduce space for wavelet coefficients. Compression ratio of huffman encoding (used in the implementation) is 3:4:1 for lossless compression.


A projective classification eliminates rendering voxels not in the view frustum.


View dependent priority is assigned to nodes depending on their voxel depths. The number of voxels that can be displayed is preset (depending on texture memory) and a priority queue is used to insert node by node of the octree, the closer voxel nodes having higher priority.

The node with higher priority in the queue is fetched and its high frequency wavelet coefficients are decompressed and the child is inserted in the queue – this process halts when the number of voxels exceed the preset limit.


The volume is decomposed hierarchically into k3 (usually k=16) blocks, which are rendered as 3D textures using hardware. Block size must be a power of 2, because of OpenGL texture restrictions. The target image is 256*256 pixels. For all 2562 possible values of entry and exit, volume integrals are pre-computed. Tri-linear interpolation done by texture hardware might need multiple blocks in the octree – therefore neighboring blocks in the octree might have to be coalesced. A greater block size (k = 32) reduces this overhead


Caching of decompressed data is required for interactive frame rates. Unaddressed issues: Interpolating between multiple resolutions,


5. K.LMa, "Visualizing Very Large Earthquake Simulations", Supercomputing 2003


A parallel rendering algorithm is presented to visualize a 3D seismic propagation (time varying data) of Northridge earthquake (highest resolution vol viz of an earthquake simulation to date).


- Large data (parallel rendering algorithm should be highly scalable) – amortize communication and computation

- Time varying data – we need – compression, load balancing according to rendering load and maximal overlapping uploading of each time step with rendering and delivering of the image (look at survey of techniques to visualize time varying data)

- Unstructured grid – a parallel cell projection algorithm is used which requires no connectivity between adjacent cells unlike ray tracing


In summary, we need: interleaving load distribution, communication and computation overlap, avoiding per time step processing, buffering intermediate results to amortize communication overheads, compression


Parallel Rendering – uses an octree based representation of the volume – at multiple resolutions and the appropriate data resolution is used to match image resolution. Projection data is scattered to the nodes to eliminate nodes that are not viewed. The data block size is made coarse enough for faster traversal. The loading of blocks from disk and rendering are overlapped.


Parallel Image Compositing – SLIC, Scheduled Linear Image Compositing (check paper) is used – pixels are classified as background (ignored), non-overlapping (sent directly to final processor) and 1, 2, 3 overlapping etc (figure 4) – the overlapping calculation has to be redone when view point changes (scheduling for composition is also recalculated)


Test results should low compositing cost in most cases but after n=32, the parallel algorithm is inefficient due to load imbalance