Here are some notes on mapping computational problems to image problems - ie the 'old' way we used to do these things a couple years ago.

We will start by looking at this paper:

A Survey of General-Purpose Computation on Graphics Hardware by John Owens et al

Proceedings of Eurographics 2005, Dublin, Ireland, Aug 29 - Sep 02, 2005, pp 21-51.

available in pdf format from here: http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=844

and course notes from SIGGRAPH 2005: http://www.gpgpu.org/s2005/#outline

and here are some related course notes from IEEE Vizualization 2005

http://www.gpgpu.org/vis2005/

and here is an introductory paper

http://numod.ins.uni-bonn.de/research/papers/public/StDoKo05sim.pdf

a realy good site for this sort of thing is

http://www.gpgpu.org

lets start with:

- http://www.gpgpu.org/s2005/slides/luebke.Introduction.ppt

then

- http://www.gpgpu.org/s2005/slides/harris.Mapping.ppt

and then

- http://www.gpgpu.org/s2005/slides/purcell.SortingAndSearching.ppt

and

- http://www.gpgpu.org/s2005/slides/woolley.GPUProgramOptimization.ppt

The main issue here is that you need to map the computational tasks into a graphical form

SIGGRAPH notes on how to map computational concepts to the GPU

http://www.gpgpu.org/s2005/slides/harris.Mapping.ppt

Do work in the Fragment Processor (tend to have many more of them than vertex processors)

Works well for highly parallel tasks.

Works well for large data sets (but they must fit into texture memory)

Multiple passes may be necessary

So whats not there right now:

- no real integer data type
- no bitwise logical operations
- no 64bit support

A stream is an ordered set of data of the same type of any length

A kernel takes one or more streams as inputs and produces one or more streams as output

Stream Operations

- Map (apply)
- given a stream of data elements and a function, apply the function to each data element (e.g. the convolution operation, or a simple addition of elements from multiple textures)
- Reduce
- given a stream of data, compute a smaller stream (e.g. sum or maximum). For example given with a 512x512 block of data, a 256x256 pixel fram buffer would be created and then each of the 256x256 elements could compute the operation on the (x,y), (x+256,y), (x, y+256), (x+256, y+256) elements. Then in the next iteration a 128x128 pixel frame buffer could be used.
- Scatter and Gather
- Scatter is indirect write: d[a] = v ... very hard in a fragment shader
- Gather is indirect read: v = d[a] ... pretty easy in a fragment shader
- Stream Filtering (non-uniform reduction)
- given a stream of data, select a subset of the elements.
- Sort
- given a stream of data, reorder the stream into an ordered set of data. Hard to do without scatter.
- GPU-based sorting will take a fixed number of steps no matter what the input data is. ie sorting an already sorted stream will take the same amount of time to sort as an unsorted stream.
- Odd Even MergeSort is a simple one: http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/networks/oemen.htm
- A faster one is Bitonic Mergesort on a GPU (with some odd grammar) - http://www.cis.upenn.edu/~suvenkat/700/lectures/19/sorting-kider.pdf
- Another Bitonic Mergesort paper - http://www.cs.mu.oz.au/498/notes/node38.html
- And a more mathematical one - http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/bitonic/bitonicen.htm
- And some nice animations of how it works - http://www.tools-of-computing.com/tc/CS/Sorts/bitonic_sort.htm
- And some CG code on page 71 of http://171.64.77.146/papers/tpurcell_thesis/tpurcell_thesis.pdf
- But basically:
- Given a bitonic sequence with 2^n data elements (that is it only has one local minimum or maximum so the sequence is either V shaped or A shaped in its magnitudes.)
- Perform Binary Split to break the sequence in half
- Perform Bitonic Merge on the sequence (swapping partner elements in the two halves if necessary to get smaller elements on the left and larger on the right
- Recurse for each half of the sequence
- But what if my data doesn't start out as a bitonic sequence? Individual elements are bitonic sequences of length 1. From these elements you can build bitonic sequences of length 2, 4, 8 etc - typically ascending for the left half and descending for the right half - using Bitonic Merge.
- Search
- binary search
- nearest neighbour search (kNN-grid)
- searching notes from SIGGRAPH: http://www.gpgpu.org/s2005/slides/purcell.SortingAndSearching.ppt
- instead of making each individual search faster, we use the GPU to do multiple searches simultaneously

Now how about some code ...

There is a good example as Tutorial 0 from the GPGPU site:

http://cg.in.tu-clausthal.de/publications.shtml#shader_maker

This is a good starting point, though for some reason the code doesn't have a glutInit which makes my powerbook unhappy. But after adding that it works fine (aside from not allowing me to use the escape key to exit.)

The project page is http://sourceforge.net/projects/gpgpu/

There is code here http://sourceforge.net/project/showfiles.php?group_id=104004&package_id=117303&release_id=245080

Here are some nice Optimization notes from SIGGRAPH 2005:

http://www.gpgpu.org/s2005/slides/woolley.GPUProgramOptimization.ppt

So what else can you do?

GPUSort (Windows, Linux with an Nvidia card)

http://gamma.cs.unc.edu/GPUSORT/index.html

FFT on a GPU

http://www.cs.unm.edu/~kmorel/documents/fftgpu/

last revision 5/16/08