CQSim+: Symbiotic Simulation for Multi-Resource Scheduling in High-Performance Computing

June 23rd, 2025

Categories: Applications, Supercomputing, Data Science, High Performance Computing

A time diagram of CQSim+ performing meta-scheduling for multiple systems.
A time diagram of CQSim+ performing meta-scheduling for multiple systems.

Authors

Kurkure, Y., Sharma, S., Wang, X., Papka, M., Lan, Z.

About

Efficient job scheduling is crucial in high-performance computing (HPC), balancing user demands for quick job turnaround with facility goals for high resource utilization. Traditional scheduling requires users to specify a system at job submission, which can lead to inefficiencies. A unified scheduling approach, viewing the resources within a computing facility as an integrated pool, promises improved resource use and reduced job wait times. This paper presents CQSim+, an open-source, discrete event-driven simulator tailored for symbiotic multi-resource scheduling. CQSim+ supports dynamic simulation by continuously integrating real-time data from job schedulers, enabling adaptive scheduling based on the system’s current state. Through extensive experimentation, we demonstrate CQSim+’s ability to enhance resource utilization and decrease job wait times in both homogeneous and heterogeneous HPC environments.

Additionally, we present a case study that coordinates job scheduling between two production systems, illustrating how CQSim+ can effectively optimize job scheduling across distinct systems.

CCS Concepts: Computing methodologies - Modeling methodologies; Discrete-event simulation; Real-time simulation; Simulation tools; Social and professional topics - System management

Keywords: Multi-resource scheduling, Simulation tool, Resource management systems, High-performance computing

https://doi.org/10.1145/3726301.3728404

Resources

PDF

URL

Citation

Kurkure, Y., Sharma, S., Wang, X., Papka, M., Lan, Z., CQSim+: Symbiotic Simulation for Multi-Resource Scheduling in High-Performance Computing, In 39th ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS ’25), Santa Fe, NM, ACM, New York, NY, June 23rd, 2025. https://doi.org/10.1145/3726301.3728404