Semantic aware partitioning and distribution with partial replication for RDF graph
Abstract
At this stage, managing huge RDF data is one of the major challenges for building modern day applications. Relational approach and Graph based approach aretwo techniques for RDF data management. This thesis introduces a graph-based partitioning technique, Distributed Semantic Aware Partitioning DSAP. It has two phases; Semantic aware partitioning, Distribution of partitions with partial replication. Partitioning phase partitions the data using semantic relation between subjects and objects. Distribution phase distributes partitions to the available nodes in distributed environment with partial replication. DSAP is demonstrated using benchmark LOD data and query set. Performance of DSAP is analyzed using set of quantitative and qualitative parameters where data is scaled from 1x to 5x. Using these parameters, performance of the DSAP are compared with stateof the-art relational and graph based techniques. DSAP queries record 71% QET gain when averaged over four types of query. For most frequent query types, Linear and Star, on an average 65% QET gain is recorded over original configuration for scaling experiments. For other two types, Administrative and Snowflake queries, on an average 55% QET gain is recorded over original configuration. Algorithm Execution Time increases rapidly when data size increases from 4x to 5x. DSAP eliminates joins from queries and it does not require workload information. However, scaling beyond 4x needs to be addressed.
Collections
- M Tech Dissertations [923]