Data blocking for partitioned data
Abstract
Since last few years the data consumed and produced by various applications is increasing
tremendously. This thesis aims to achieve faster query processing for this data. The overall work
of the thesis is divided into three phases, data partitioning, data blocking, and data skipping. Data
partitioning includes identifying hot and cold partitions of data and storing as separate data
blocks. Partitioned data is stored contiguously on the disk and verified. Data blocking is storing
the data blocks on disk such that all hot data blocks are stored together and all cold data blocks
are stored together. Data skipping is performed in order to reduce the disk seek time while
accessing the data form disk. Data partitioning and blocking is implemented on column oriented
database system. Data blocking resulted in significant reduction in amount of data scanned and
query response time. The results are obtained for query execution time on three different query
categorization such as range queries, nested queries and aggregate queries. On an average for
these three types of queries QET became 55 times faster for partitioned data. For the above
query categorization data blocking and skipping on an average results in reduction of 97% data
scan and hence by accelerates queries.
Collections
- M Tech Dissertations [923]