A Distributed Graph Approach for Retrieving Linked RDF Data Using Supercomputing systems

April 21st, 2017

Categories: Networking, Supercomputing


EVL PhD Candidate, Michael J. Lewis defends his thesis “A Distributed Graph Approach for Retrieving Linked RDF Data Using Supercomputing systems”:

842 W. Taylor Street, Room 2068 (EVL Cybercommons)
Chicago, IL 60607

Date: Friday, April 21, 2017, 1PM

Committee: Andrew Johnson (advisor and chair), Ugo Buy, Ajay Kshemkalyani, Jason Leigh (University of Hawaii, Manoa), Venkatram Vishwanath (Argonne National Laboratory)

Graph systems play a dominant role in discovering relationships and patterns over large unstructured data in which data does not explicitly declare its meta-data. Graph analytics through queries, has been increasingly used in the business and research community alike in the areas of biomedical research and genome sequencing, disease research, social media and gaming networks, financial services, IT security - enhancing networks against cyber attacks, and in pharmaceuticals - helping to discover patterns from drug interactions. The Research Descriptive Framework, a graph based,data model is used by many graph systems in order to effectively create graph structures that are queryable for relational based and graphical based queries.

My research looks at the ways pre-processing linked data into memory can improve query retrieval timings over a range of query types, as compared to a solely graph explorational method. In this dissertation I review RDF systems that retrieve queried RDF data from a relational, table based approach, an iterative data merge approach using Map-Reduce, and from a graph explorational approach. I introduce the work called Mantona, which is a distributed RDF system used within a supercomputing environment. I explain in detail, the workings of two different algorithms implemented in Mantona using MPI. The first algorithm (graph-cache) algorithm stores partial joins into a graph in order to expedite query retrievals. The second algorithm (graph-retrieval) retrieves a query by exploring RDF graph nodes. I conclude by presenting results generated from Mantona covering the graph-cache approach and the graph-retrieval approach, and in showing the range of queries in which a graph-cache implementation improves query retrieval timings over a solely explorational, graph-retrieval implementation.