Dynamic partition and allocation for distributed systems
Abstract
Data Partitioning and Allocation is crucial to improve query performance in Distributed Systems. Several workloads based partitioning techniques have been proposed by research community. For changing dataset and query workload, existing workload based partitioning techniques do not perform well. This thesis presents the Dynamic Partitioning and Allocation (DPA) algorithm for query workload. In this strategy, we present the idea of query-centric strategy for dynamic partition and allocation for the distributed systems. Strategy has two important phases categorized as static partitioning based on known workload which is named as Static Partitioning Phase. The phase two is incremental repartitioning in which partitions are fine-tuned based on changing workload. To speed up data access, further implementing data blocking technique which also reduces disk access time. Data blocking technique implemented by maintaining metadata for each block of tuples and query may skip data block if metadata indicates that block is not relevant which leads to faster query execution. This strategy is demonstrated using benchmark TPC-H data and query set. Performance of the proposed system is evaluated using QET metric, distributed joins and internode communication. Proposed strategy leads to faster query execution for ad-hoc queries by 8% compared to non-partitioned database and able to reduce distributed joins by 75%. DPA algorithm is able to answer 38% queries by accessing only one cluster. The 50% queries require average of two cluster access and less than 15% queries require to access 3 or more than 3 clusters. The strategy can be implemented for building interactive applications which requires faster query execution for ad-hoc queries.
Collections
- M Tech Dissertations [923]