Machine Learning Driven Auto-tuning for Non-uniform All-to-All Collectives

December 17th, 2025

Categories: Deep Learning, Machine Learning, Data Science, High Performance Computing

Runtime flow of lookup table query and algorithm dispatch in the auto-tuned non-uniform all-to-all communication.
Runtime flow of lookup table query and algorithm dispatch in the auto-tuned non-uniform all-to-all communication.

Authors

Qi, K., Fan, K., Domke, J., Ba, S., Vishwanath, V., Papka, M.E., Kumar, S.

About

Non-uniform all-to-all communication patterns present optimization challenges in parallel computing due to their irregular data distribution and dynamic behavior. While MPI_Alltoallv provides the standard interface for such exchanges, achieving optimal performance requires careful selection among multiple implementation variants and tuning of algorithm-specific parameters. This paper presents a data-driven autotuning framework that combines machine learning–based runtime prediction with a lookup-table mechanism for fast configuration selection. The ML model estimates the communication time of each algorithm configuration under a given system setup, allowing the framework to identify the optimal implementation and parameter set based on predicted performance. We validate our approach through comprehensive benchmarking of MPI_Alltoallv and two specialized algorithms across varying process counts, message sizes, and tunable parameters. Applied to a real MPI-based transitive closure application on the Fugaku supercomputer, our framework achieves up to 6.03× reduction in communication time over he vendor implementation, providing a detailed understanding of non-uniform collective communication behavior and a practical framework for automatic performance optimization in HPC applications.

Index Terms: MPI Alltoallv, autotuning, collective communication, machine learning, sensitivity analysis

Resources

PDF

Citation

Qi, K., Fan, K., Domke, J., Ba, S., Vishwanath, V., Papka, M.E., Kumar, S., Machine Learning Driven Auto-tuning for Non-uniform All-to-All Collectives, IEEE International Conference on High Performance Computing, Data, Analytics, and Data Science (HiPC) 2025, Hyderabad, India, December 17th, 2025.