Study of Consistency and Performance Trade-off in Cassandra
Abstract
Cassandra is a type of columnoriented NoSQL database. It is a distributed database with great scalability and performance that can manage massive amounts of data that is not structured. The experiments performed as a part of this research analyse the Cassandra NoSQL database's performance and investigate the trade off between data consistency and processing times. The primary objective is to track the Cassandra performance for different consistency settings. The setup includes a replicated Cassandra cluster deployed using VMWare. Benchmarking read and write operations individually and in general yields performance statistics. We show how Cassandra's performance is affected by different consistency settings under varying workloads. For different consistency settings, the results are measured using threads from 10 to 1000. The parameters that are measured are Latency and Throughput. The results measure values for latency and throughput for various settings of consistency and threads. Based on the results, an optimal value for consistency setting is identified such that delays are minimized, performance is maximized and strong data consistency is guaranteed. Understanding this trade off is necessary to quantify the effective usage of the Cassandra database. One of our primary results is that by coordinating consistency settings for both read and write requests, it is possible to minimise Cassandra delays while still ensuring high data consistency. Cassandra offers tuneable consistency because of which the consistency level can be set externally for the read and write requests. By taking advantage of this Cassandra feature, we present results showing how Cassandra behaves for different scenarios of consistency.
Collections
- M Tech Dissertations [923]