Introduction to Cassandra


May 17, 2021 10:00 Cassandra


Table of contents


Apache Cassandra is a highly scalable, high-performance distributed database that processes large amounts of data on a large number of commercial servers, providing high availability with no single point of failure. /b20> This is a NoSQL type database. /b20> Let's take a look at the NoSQL database first.

NoSQL database

A NoSQL database (sometimes referred to as a "not unique SQL") is a database that provides a mechanism for storing and retrieving data, rather than the table relationships used in a relationship database. /b20> These databases are architecture-free, support simple replication, have simple APIs, are ultimately consistent, and can handle large amounts of data.

The main purpose of the NoSQL database is to:

  • The design is simple
  • Scale horizontally
  • Better control of availability

The NoSql database uses a different data structure than a relationship database. /b20> It makes some operations in NoSQL faster. /b20> The suitability of a given NoSQL database depends on the problem it must address.

NoSQL database and relationship database

The following table lists the points that distinguish the database relationship database from NoSQL.

The relationship database NoSQL database
Supports a strong query language. Very simple query languages are supported.
It has a fixed pattern. There is no fixed mode.
Follow ACID (atomicity, consistency, isolation and persistence). Only "final agreement".
Support transactions. Transactions are not supported.

In addition to Cassandra, we have the following NoSQL databases that are quite popular:

  • Apache's HBase - HBase is an open source, non-relationship, distributed database modeled on Google's BigTable, written in Java. /b20> Developed as part of the Apache Hadoop project and run on top of HDFS, it provides Hadoop with BigTable-like functionality.

  • MongoDB - MongoDB is a cross-platform document-oriented database system that avoids traditional table-based relationship database structures and uses dynamic patterns of JSON-like documents, making data integration easier and faster in certain types of applications.

What is Apache Cassandra?

Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database) for managing large amounts of structured data around the world. /b20> It provides highly available services with no single point of failure.

Here are some notable things to note about Apache Cassandra:

  • It is scalable, fault caused and consistent.

  • It is a column-oriented database.

  • Its distribution is based on Amazon's Dynamo and its data model on Google's Bigtable.

  • Created on Facebook, it's very different from a relationship database management system.

  • Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful "column family" data model.

  • Cassandra is used by some of the biggest companies, such as Facebook, Twitter, Cisco, Rackspace, eBay, Netflix and others.

Cassandra features

Cassandra has become so popular because of its superior technical features. Here are some of Cassandra's features:

  • Elastic Scalability - Cassandra is highly scalable; /b20> It allows you to add more hardware to accommodate more customers and more data as required.

  • Always architecture-based - Cassandra has no single point of failure and can be used continuously for business-critical applications that cannot afford failures.

  • Fast Linear Performance - Cassandra is linear scalability, which increases your throughput by increasing the number of nodes in your cluster. T herefore, maintain a fast response time.

  • Flexible data storage - Cassandra adapts to all possible data formats, including: structured, semi-structured and unstructured. I t dynamically adapts to changing data structures based on your needs.

  • Convenient data distribution - Cassandra gives you the flexibility to distribute data when you need it by copying it across multiple data centers.

  • Transaction support - Cassandra supports properties such as atomicity, consistency, isolation, and persistence (ACID).

  • Fast Write - Cassandra is designed to run on cheap commodity hardware. /b20> It performs fast writes and can store hundreds of tb of data without sacrificing read efficiency.

The history of Cassandra

  • Cassandra developed inbox search on Facebook.
  • It was opened by Facebook in July 2008.
  • Cassandra was incorporated into the Apache incubator in March 2009.
  • It has been an Apache top-level project since February 2010.