OPTIQ: An optimization and QoS framework for data movement in data-centric applications on supercomputers

May 12th, 2014

Categories: MS / PhD Thesis

H. Bui, EVL
H. Bui, EVL

About

Huy Bui PhD Prelim Exam
Monday, May 12, 2:30pm

I propose a holistic approach to optimize throughput and provide prioritization of data flows in data-centric applications on supercomputing systems. My proposed thesis comes from an observation that on supercomputing systems, data-centric applications need to reliably and rapidly compute and move large amounts of data through interconnect networks. The same trend is also observed in commodity clusters. Thus, optimizing data movement and improving quality of services are essential at extreme scales in order to effectively utilize the systems. However, most of the optimization works are carried out at system routing, not at applications. The proposed approach will take system’s network routing, interconnect network topology and application’s communication patterns into optimization promising to yield better performance over current data movement mechanisms. The approach will be realized in an Optimization and Quality of Service framework (OPTIQ) that will provide an application programming interface (API) requiring minimal changes in application.

To optimize data movement, I propose to use linear programming combining with heuristic algorithms. At current scale of many supercomputers, linear programming alone would not be feasible due to search time for any solution. Therefore a supercomputing system needs to be modeled in such a way that it can be divided into a number of subsystems satisfying 2 conditions. First of all, an optimal solution can be found for each subsystem using linear programming in reasonable amount of time. Second, the solutions of subsystems can be combined heuristically to form a close-to-global optimal solution. Depending on the complexity of the systems and application communication patterns, the framework supports options to search for solutions offline, on-the-fly or hybrid combination of the two. The framework also supports data movement for different flows with different priorities. With the support from the framework, application developers can assign priorities for data flows that can result in lowest time-to-solution for their applications. Thus, the combination of optimization and quality of service can help to improve performance of data-centric application on supercomputing systems.