Getting Loopy with OpenMP (25 points)

Overview

In this assignment, you will extend the provided code in loopy.cc by parallelizing four computational tasks using OpenMP. The original code implements each task in serial, and your goal is to modify the code to introduce parallelism where appropriate. The four tasks include:

  1. Matrix Initialization: Populate a 2D matrix.
  2. Row Sum Calculation: Compute the sum of each row.
  3. Stencil Computation with Loop Unrolling: Update matrix elements based on neighboring values.
  4. Unrolled Sum Reduction: Calculate a total sum of matrix elements with loop unrolling.

Each task is timed using OpenMP’s timing function omp_get_wtime(), and the results (including sample data values, sums, and execution times) are written to a file named loopy.dat.

Objective and Expected Learning Outcomes

Objectives

  • Introduce Parallelism with OpenMP: Apply OpenMP directives (e.g., #pragma omp parallel for, collapse, reduction) to the provided serial code.
  • Measure Performance: Use OpenMP’s timing functions to compare the execution times of the serial versus parallel implementations.
  • Analyze Speedup: Evaluate the benefits and potential overheads of parallelization by analyzing the output timings.
  • Build Automation: Develop a Makefile that compiles loopy.cc with the necessary OpenMP flag (-fopenmp).

Expected Outcomes

  1. Enhanced Parallel Programming Skills:
    • Successfully convert serial loops to parallel using OpenMP.
    • Understand common challenges such as data races and proper reduction.
  2. Accurate Performance Evaluation:
    • Utilize omp_get_wtime() to capture precise timing for each task.
    • Compare the cumulative execution times for the serial and parallel implementations.
  3. Effective Build Management:
    • Create a robust Makefile that compiles the code with OpenMP support.
    • Ensure reproducible builds on designated systems.
  4. Documentation and Reflection:
    • Write clear, well-commented code.
    • Reflect on your approach, challenges, and outcomes in a reflection file.

Instructions

Part 1: Code Modification and Parallelization

  1. Review the Provided Code:
    • Examine loopy.cc, which includes implementations of four tasks in serial.
    • Understand the purpose and flow of each task, as well as how performance is measured using omp_get_wtime().
  2. Implement Parallel Versions:
    • Task 1 (Matrix Initialization): Parallelize the nested loops using #pragma omp parallel for collapse(2).
    • Task 2 (Row Sum Calculation): Parallelize row sum computation using a local sum variable for each thread.
    • Task 3 (Stencil Computation): Apply parallelization to the inner loop (or outer loop as appropriate) while keeping the loop unrolling intact.
    • Task 4 (Unrolled Sum Reduction): Use OpenMP’s reduction clause to parallelize the total sum computation.
    • Verify that both serial and parallel implementations produce consistent results, you can use check.py.
  3. Output Results:
    • Ensure that your program writes the following to loopy.dat:
      • A sample matrix value (e.g., Matrix[500][500]).
      • A sample row sum (e.g., RowSums[500]).
      • The total sum.
      • The execution time for each version (serial and parallel).
    • Use the provided writeResults function as a guide.

Part 2: Makefile Development

  1. Create a Makefile:
    • Write a Makefile that compiles loopy.cc into an executable named loopy.
    • Include the OpenMP flag -fopenmp in your compilation options.
    • Define at least two targets:
      • all: Compiles the executable.
      • clean: Removes build artifacts and the executable.
  2. Testing:
    • Confirm that your Makefile builds the project correctly on the designated systems.

Part 3: Experiment Analysis and Reflection

  1. Performance Analysis:
    • Document the execution times for both the serial and parallel versions.
    • Optionally, generate a graph or chart (e.g., performance.png) to illustrate the performance improvements.
  2. Reflection:
    • Write a brief reflection in a file named loopy.txt covering:
      • Your approach to parallelizing each task.
      • Challenges encountered (e.g., handling data dependencies or synchronization).
      • An analysis of whether the observed speedup met your expectations.
      • Lessons learned and ideas for future optimizations.

Submission Guidelines and Evaluation Criteria

  • Add and/or edit the following files in your GitHub repository:- loopy.cc – The source file with your serial and parallel implementations.
    • Makefile – A Makefile that compiles your project with OpenMP support.
    • loopy.dat – The output file generated by your program.
    • loopy.txt – A reflection on your approach, challenges, and insights.
    • (Optional) loopy.png – A graph or chart illustrating performance differences.
  • When your program is ready for grading, commit and push your local repository to the remote git classroom repository following the Assignment Submission Instructions.
  • Your work will be evaluated on your adherence to instructions, code quality, file naming, completeness, and your reflection on the assignment.

Additional Resources