November 16th, 2025
Categories: Applications, Supercomputing, Data Science, High Performance Computing
High-performance computing (HPC) systems are essential for scientific discovery and engineering innovation. However, their growing power demands pose significant challenges, particularly as systems scale to the exascale level. Prior uncore frequency tuning studies have primarily focused on conventional HPC workloads running on CPU-only systems. As HPC advances toward heterogeneous computing, integrating diverse GPU workloads on heterogeneous CPU-GPU systems, it becomes imperative to revisit and enhance uncore scaling. Our investigation reveals that uncore frequency scales down only when CPU power approaches its thermal design power (TDP), which is rare in GPU-dominant applications. As a result, modern computing systems experience unnecessary power waste. In this study, we present MAGUS, a user-transparent uncore frequency scaling runtime for heterogeneous computing. MAGUS dynamically adjusts uncore frequencies according to distinct application execution phases, effectively minimizing power waste caused by consistently using maximum uncore frequencies. Our design incorporates several key techniques, including real-time monitoring and prediction of memory accesses, intelligent handling of frequent phase transitions, and leveraging vendor-provided power management features. We evaluate MAGUS with various GPU benchmarks and applications on multiple heterogeneous systems with different CPU and GPU architectures. Experimental results demonstrate that MAGUS achieves up to 27% energy savings compared to the default settings, while maintaining a performance loss of less than 5% and an overhead of under 1%.
Keywords: GPU workloads, heterogeneous CPU-GPU systems, uncore frequency scaling, energy efficiency, performance-power trade-offs
Zheng, Z., Sultanov, S., Papka, M.E., Lan, Z., Minimizing Power Waste in Heterogenous Computing via Adaptive Uncore Scaling, The International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’25), St. Louis, MO, ACM, New York, NY, pp. 12, November 16th, 2025. https://doi.org/10.1145/ 3712285.3759879