Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Cassandra architecture


May 17, 2021 Cassandra


Table of contents


Cassandra is designed to handle big data workloads across multiple nodes without any single point of failure. /b20> Cassandra has a equivalent distributed system between its nodes, and the data is distributed among all nodes in the cluster.

  • All nodes in the cluster play the same role. /b10> Each node is independent and is interconnected to other nodes at the same time.

  • Each node in the cluster can accept read and write requests, regardless of where the data is actually located in the cluster.

  • When a node is down, read/write requests can be made from other nodes in the network.

Data replication in Cassandra

In Cassandra, one or more nodes in the cluster act as copies of a given piece of data. /b20> If some nodes are detected to respond with expired values, Cassandra returns the most recent value to the client. /b20> After returning the latest value, Cassandra performs a read repair in the background to update the invalid value.

The following illustration shows how Cassandra uses data replication between nodes in the cluster to ensure that there is no schematic for a single point of failure.

Cassandra architecture

Note - Cassandra uses the Gossip protocol in the background, allowing nodes to communicate with each other and detect any failed nodes in the cluster.

Cassandra's components

The key components of Cassandra are as follows:

  • Node - It is where data is stored.

  • Data Center - It is a collection of related nodes.

  • Cluster - A cluster is a component that contains one or more data centers.

  • Commit Log - Commit Log is a crash recovery mechanism in Cassandra. /b20> Each write is written to the commit log.

  • Mem- Table - Mem-Table is a data structure where memory resides. /b20> After the log is submitted, the data is written to the mem table. /b21> Sometimes, for a single-column family, there will be more than one mem table.

  • SSTable - It is a disk file that refreshes data from the mem table when its contents reach a threshold.

  • Blom filters - These are just fast, non-determinic algorithms for testing whether an element is a member of a collection. /b20> It is a special kind of cache. /b21> Access the Bloom filter after each query.

Cassandra query language

Users can access Cassandra through their nodes using the Cassandra Query Language (CQL). /b20> CQL treats the database (Keyspace) as a container for the table. /b20> Programmers use cqlsh: Prompt to use CQL or a separate application language driver.

Clients access any node for their read and write operations. /b20> The node (coordinator) plays the agent between the client and the node that holds the data.

Write

Each write activity for a node is captured by a commit log written in the node. /b20> The data is later captured and stored in the memory table. /b20> Whenever the memory table is full, the data is written to the SStable data file. /b20> All writes are automatically partitioned and copied throughout the cluster. /b20> Cassandra regularly consolidates SSTables and discards unnecessary data.

Read the operation

During the read operation, Cassandra gets the value from the MEM-table and checks the filter bloom to find the appropriate SSTable to save the required data.