Show simple item record

dc.contributor.advisorJat, P M
dc.contributor.authorSheth, Vinay
dc.date.accessioned2024-08-22T05:21:18Z
dc.date.available2024-08-22T05:21:18Z
dc.date.issued2023
dc.identifier.citationSheth, Vinay (2023). Comparative Performance Analysis of Column Family Databases : Cassandra and HBase. Dhirubhai Ambani Institute of Information and Communication Technology. viii, 56 p. (Acc. # T01118).
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/1177
dc.description.abstractUp until now, relational databases have been unquestionably the most prevalenttype of databases used to handle data. The advent of cloud computing and bigdata has underlined the need for databases that are capable of managing and analyzingbig data. By allowing storage and retrieval of structured as well as unstructureddata, NoSQL databases circumvent the limitations of relational databases.Because of their support for schema flexibility, rapid data access and potential toscale up quickly, they have emerged as the favored choice for big data processing.These systems have several properties/parameters which can be tuned to achievespecific performance goals based on business needs. Having well-defined performanceobjectives assist us in articulating the acceptable trade-offs for our application.This motivates us to evaluate the performance of one such frequently usedNoSQL system: Cassandra. Apache Cassandra is an open-source, decentralized,distributed, fault-tolerant, highly available, elastically scalable, tunably consistent,row-oriented database. In order to accomplish the performance evaluation,we use the Yahoo! Cloud Serving Benchmark (YCSB) for benchmarking efforts.Our findings highlight that increasing thread count initially improves throughputand CPU utilization but later decreases it. Higher record count, consistency level,and dataset size lead to decreased throughput and increased latency. Strongerconsistency level also increases the CPU utilization. Increasing operation countimproves throughput but increases latency as well. These findings provide guidancefor optimizing Cassandra�s performance by adjusting these parameters.We also assess Apache HBase, another well-known NoSQL database, using YCSB.The relative performance of these databases under analytical as well as updateheavyworkloads is the primary focus of our investigation. Our test results demonstratethat for both workloads, Cassandra outperforms HBase in read operations,whereas HBase excels in write operations. This research quantifies the performancetraits of Cassandra and HBase, assisting developers and architects in choosingthe best database system for their big data applications.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectRelational databases
dc.subjectSQL database
dc.subjectApache Cassandra
dc.subjectBig data
dc.classification.ddc005.74 SHE
dc.titleComparative Performance Analysis of Column Family Databases : Cassandra and HBase
dc.typeDissertation
dc.degreeM. Tech
dc.student.id202111032
dc.accession.numberT01118


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record