Final Exam - Spring 2010

In this final exam we are going to take another look at optimizations under CUDA.

This final exam is a 'take home' exam. You are expected to work on it completely by yourself. It is due Thursday May 6th 2010 at 11:59pm. By that time you should have set up a web page with your solution, and emailed the location of that web page to Andy. During the scheduled final exam time we will meet in class and you will have 10 minutes to briefly describe your work to the class. We won't have time for a question / answer period; this way we can get through everyone within two hours.

The code we are going to look at is the the Matrix Transposition code from CUDA There is a very nice chapter on optimizing this code included with the CUDA examples and available on the web here:

http://developer.download.nvidia.com/compute/cuda/3_0/sdk/website/CUDA/website/C/src/transposeNew/doc/MatrixTranspose.pdf

Your job is to show how much of an affect those different optimizations have compared to a naive CUDA version of the algorithm with no optimizations on your card, or multiple cards if you chose.

The current code relies on square matrices with each side being a multiple of 32. You should relax this requirement in your code. You should also play with the thread size and thread orientation in a block to see what affect they have on the speed of the solution.

You can either code each of the versions in a single file or in multiple files. Your code should be designed to run from the command line (though launching it from a bat file which contains a command line is fine). Your code timing should include the time to do the transpose, and the timing for the setup and readback. I would highly suggest setting up an automated way to do multiply runs with different parameters so you can gather data more efficiently.


Your grade will be based on the number of optimizations that you evaluate and the quality of the web-based documentation for your testing. When reading through your web site someone new to CUDA should get a better idea where they should focus their energy in optimizing their code.

Be sure to detail which GPU(s) you are using.



last revision 12/12/11