April 29th, 2022
Categories: Applications, Devices, Software, User Groups, Deep Learning, Machine Learning, Instrumentation, Data Science, High Performance Computing
The Electronic Visualization Laboratory at the University of Illinois Chicago acquired a 24 compute node, 64 GPU composable infrastructure compute cluster, COMPaaS DLV - Composable Platform as a Service: Instrument for Deep Learning & Visualization in 2019. Since then 40+ users have designed and run computations on COMPaaS. To track cluster utilization, evaluate hardware performance, and simply analyze application resource utilization of this experimental composable infrastructure system we set out to collect metrics on the hardware and software infrastructure. Users deploy their applications on the cluster using Kubernetes, a container orchestration platform that enables users to deploy applications on the cluster without the need to have complete knowledge of the underlying hardware and with minimal assistance from system administrators. Following this, we utilized tooling already adapted for Kubernetes-based clusters to collect hardware and software metrics. The metrics provided by this tooling is useful for system administrators in evaluating cluster performance and utilization, as well as useful for users in determining their application’s performance and discovering bottlenecks.
Bargo, T., Monitoring COMPaaS, Final Research Report, April 29th, 2022.