A High-Performance Sensor for Cluster Monitoring and Adaptation

Venkatram Vishwanath 1, Wu-chun Feng2, Mark Gardner3  and  Jason Leigh1

1 Electronic Visualization Laboratory, University of Illinois at Chicago

2 Department of Computer Science, Virginia Tech

3 Advanced Computing Laboratory, Los Alamos National Laboratory

Version: May 5, 2006

ResearchGear ID: 20060505_vishwanath

As Beowulf clusters have grown in size and complexity, the task of monitoring the performance, status, and health of such clusters has become increasingly more difficult but also more important.  Consequently, tools such as Ganglia and Supermon have emerged in recent years to provide the robust support needed for scalable cluster monitoring.  However, the scalability comes at the expense of accuracy in that the tools only obtain data samples through an entry in the /proc filesystem and only at the granularity of a kernel tick, i.e., 10 milliseconds.  As an alternative to using /proc as a sensor for Ganglia and Supermon, we propose a dynamic, high-fidelity, event-based sensor called MAGNET (Monitoring Apparatus for General kerNel-Event Tracing).  Unlike our previous incarnation of MAGNET, this incarnation allows for the dynamic insertion and deletion of instrumentation points and improves performance by approximately 100% over our previously low-overhead MAGNET and approximately 25% over the Linux Trace Toolkit (LTT) while providing superior functionality and robustness over LTT.  Furthermore, our latest MAGNET is flexible enough to morph itself into other tools such as tcpdump and yet still high performance enough to perform over 250% better than tcpdump.  It can also be used as a diagnostic (or debugging) tool, a performance-tuning tool, or a reflective tool to enable self-adapting applications in clusters or grids.


Relevant Links

 

EVL ResearchGear publishes preliminary software, technical reports, data or results that the Electronic Visualization Laboratory openly shares with the research community. The work presented here is preliminary and we are not responsible for any damages that may result from its use or misuse. If you would like to cite any of this information in your research papers, presentations, etc, please reference the ResearchGear ID above. Thank you, and we hope you find the information on this page useful.


Visit ResearchGear Home Page