Lecture
5 - old version pre CUDA
GPGPU
Concepts and Examples
Here are some
notes on mapping computational problems to image problems - ie the
'old' way we used to do these things a couple years ago.
We will
start by looking at this paper:
A Survey of General-Purpose Computation on Graphics Hardware by John
Owens et al
Proceedings of Eurographics 2005, Dublin, Ireland, Aug 29 - Sep 02,
2005, pp 21-51.
available in pdf format from here: http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=844
and course notes from SIGGRAPH 2005: http://www.gpgpu.org/s2005/#outline
and here are some related course notes from IEEE Vizualization 2005
http://www.gpgpu.org/vis2005/
and here is an introductory paper
http://numod.ins.uni-bonn.de/research/papers/public/StDoKo05sim.pdf
a realy good site for this sort of thing is
http://www.gpgpu.org
lets start with:
- http://www.gpgpu.org/s2005/slides/luebke.Introduction.ppt
then
- http://www.gpgpu.org/s2005/slides/harris.Mapping.ppt
and then
- http://www.gpgpu.org/s2005/slides/purcell.SortingAndSearching.ppt
and
- http://www.gpgpu.org/s2005/slides/woolley.GPUProgramOptimization.ppt
The main issue here is that you need to map the computational tasks
into a graphical form
SIGGRAPH notes on how to map computational concepts to the GPU
http://www.gpgpu.org/s2005/slides/harris.Mapping.ppt
Do work in the
Fragment Processor (tend to have many more of them than vertex
processors)
Works well for highly parallel tasks.
Works well for large data sets (but they must fit into texture memory)
Multiple passes
may be necessary
So whats not
there right now:
- no real integer
data type
- no bitwise
logical operations
- no 64bit support
Branching issues
A stream is an
ordered set of data of the same type of any length
A kernel takes one or more streams as inputs and produces one or more
streams as output
Stream Operations
- Map (apply)
- given a stream of data elements and a function, apply the
function to each data element (e.g. the convolution operation, or a
simple addition of elements from multiple textures)
- Reduce
- given a stream of data, compute a smaller stream (e.g. sum or
maximum). For example given with a 512x512 block of data, a 256x256
pixel fram buffer would be created and then each of the 256x256
elements could compute the operation on the (x,y), (x+256,y), (x,
y+256), (x+256, y+256) elements. Then in the next iteration a 128x128
pixel frame buffer could be used.
- Scatter and Gather
- Scatter is indirect write: d[a] = v ... very hard in a fragment
shader
- Gather is indirect read: v = d[a] ... pretty easy in a fragment
shader
- Stream Filtering (non-uniform reduction)
- given a stream of data, select a subset of the elements.
- Sort
- Search
Now how about
some code ...
There is a good example as Tutorial 0 from the GPGPU site:
http://cg.in.tu-clausthal.de/publications.shtml#shader_maker
This is a good
starting point, though for some reason the code doesn't have a glutInit
which makes my powerbook unhappy. But after adding that it works fine
(aside from not allowing me to use the escape key to exit.)
The project page
is http://sourceforge.net/projects/gpgpu/
There is code
here http://sourceforge.net/project/showfiles.php?group_id=104004&package_id=117303&release_id=245080
Here are some
nice Optimization notes from SIGGRAPH 2005:
http://www.gpgpu.org/s2005/slides/woolley.GPUProgramOptimization.ppt
So what else can
you do?
GPUSort
(Windows, Linux with an Nvidia card)
http://gamma.cs.unc.edu/GPUSORT/index.html
FFT on a GPU
http://www.cs.unm.edu/~kmorel/documents/fftgpu/
Coming
Next Time
Case
Studies
last
revision 5/16/08