Final Exam - Spring 2010
In this final exam we are going to
take another look at optimizations under CUDA.
exam is a 'take home' exam. You are expected to work on it
completely by yourself. It is due Thursday May 6th 2010 at
11:59pm. By that time you should have set up a web page with
your solution, and emailed the location of that web page to
Andy. During the scheduled final exam time we will meet in class
and you will have 10 minutes to briefly describe your work to
the class. We won't have time for a question / answer period;
this way we can get through everyone within two hours.
we are going to look at is the the Matrix Transposition code from CUDA There is a
very nice chapter on optimizing this code included with the CUDA
examples and available on the web here:
is to show how much of an affect those different optimizations
have compared to a naive CUDA version of the algorithm with no
optimizations on your card, or multiple cards if you chose.
current code relies on square matrices with each side being a
multiple of 32. You should relax this requirement in your code.
You should also play with the thread size and thread orientation
in a block to see what affect they have on the speed of the
either code each of the versions in a single file or in multiple
files. Your code should be designed to run from the command line
(though launching it from a bat file which contains a command
line is fine). Your code timing should include the time to do
the transpose, and the timing for the setup and readback. I
would highly suggest setting up an automated way to do multiply
runs with different parameters so you can gather data more
will be based on the number of optimizations that you evaluate
and the quality of the web-based documentation for your testing.
When reading through your web site someone new to CUDA should
get a better idea where they should focus their energy in
optimizing their code.
Be sure to
detail which GPU(s) you are using.
last revision 12/12/11