Final Exam - Spring 2008
In this final exam we are going to take another look at
optimizations under CUDA.
This final exam is a 'take home' exam. You are expected to work
on it completely by yourself. It is due Monday December 8th,
2008 at 11:59pm. By that time you should have set up a web page
with your solution, and emailed the location of that web page to
Andy. During the scheduled final exam time we will meet in class
and you will have 10 minutes to briefly describe your work to
the class. We won't have time for a question / answer period;
this way we can get through everyone within two hours.
The code we are going to look at is the the convolution code from CUDA.
There is a very nice paper on optimizing this code included with
the CUDA examples and available on the web here:
The CUDA examples come with 3 convolution examples. The two
important ones here are convolutionTexture and
convolutionSeparable. convolutionTexture has some optimizations;
convolutionSeparable has more optimizations. Your job is to show
how much of an affect those different optimizations have
compared to a naive CUDA version of the algorithm with no
Your code should be designed to run from the command line
(though launching it from a bat file which contains a command
line is fine.) The code should read in a single image in raw
format, apply a filter, and write out the new image in raw
format. If it makes things easier you can make use of existing
libraries to read in, and write out a standard format (jpeg,
tiff, png, ppm, etc.) The time used to compare optimizations
should be based on the time taken to do the image conversion,
not on the time taken to read in and write out the image. In
order to be able to get large enough values for comparison you
may need to run the convolution kernels multiple times.
Your grade will be based on the number of optimizations that you
evaluate and the quality of the web-based documentation for your
testing. You should start with a naive non-separated version,
and then apply the changes in the paper (separating the
horizontal and vertical work, using shared memory, reducing idle
threads, coalescing memory access, unrolling the loops) or start
from the optimized version and back-off those optimizations in
turn. When reading through your web site someone new to CUDA
should get a better idea where they should focus their energy in
optimizing their code.
Be sure to
detail which graphics card you are using.
supplied a test image in multiple resolutions in both tif and
raw rgb format in
ftp.evl.uic.edu/pub/INcoming/andy/525/hubble.tar.gz and is about
includes images in the following sizes to allow you to test the
performance of the code on a variety of image sizes:
1024 x 1024
2048 x 2048
4096 x 4096
8192 x 8192
original image is from
tutorial is due Monday December 8th at 11:59pm.
time of the final we will have two hours, and each person will
give a brief 10 minute talk on their work.
last revision 12/12/11