Overcoming RL Limitations in HPC Scheduling: A Model-Based MCTS Approach for Practical Deployment (poster)

May 8th, 2025

Categories: Applications, Supercomputing, Deep Learning, Data Science, High Performance Computing

Authors

Kurkure, Y,, Zhang, Y., Papka, M. E., Lan, Z.

About

High-performance computing (HPC) job scheduling has seen promising advances with Deep Reinforcement Learning (DRL). However, challenges such as low interpretability, instability, and high computational cost hinder DRL’s practical adoption. We explore a model-based alternative using Monte Carlo Tree Search (MCTS) to overcome these limitations. By leveraging existing HPC simulators as models for MCTS and focusing on transparent decision-making, we aim to develop a scalable and interpretable scheduling solution fit for real-world deployment.

Resources

PDF

URL

Citation

Kurkure, Y,, Zhang, Y., Papka, M. E., Lan, Z., Overcoming RL Limitations in HPC Scheduling: A Model-Based MCTS Approach for Practical Deployment (poster), The 12th Greater Chicago Area Systems Research Workshop (GCASR), Chicago, IL, May 8th, 2025. https://gcasr.org/2025/posters