December 16th, 2015
Achievable networking performance of applications in a supercomputer depends on the exact combination of the communication patterns of the applications and the routing algorithms used by the supercomputer. In order to achieve the highest networking performance for the applications, the routing algorithms need to be designed optimally for those communication patterns. However, while communication patterns usually vary from application to application and even from phase to phase in an application, routing algorithms have limited variation and usually are optimized for typical communication patterns. This results in high networking performance for some communication patterns. In this paper, we present approaches for improving communication performance by using multiple paths and re-balancing load on physical links on the Blue Gene/Q supercomputer. We realize our approaches in a framework called OPTIQ and demonstrate the efficacy of our framework via a set of benchmarks. Our results show that we can achieve 43 -- 67% higher throughput on average from 91 experiments, and can achieve higher throughput than default MPI_Alltoallv used for certain communication patterns.
Bui, H., Malakar, P., Vishwanath, V., Munson, T., Jung, E., Johnson, A., Papka, M., Leigh, J., Improving Communication Throughput by Multipath Load Balancing on Blue Gene/Q, In the Proceedings of 22nd annual IEEE International Conference on High Performance Computing (HiPC 2015), Bengaluru, India, December 16th, 2015. https://doi.org/10.1109/HiPC.2015.44