Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Zookeeper Overview


May 26, 2021 Zookeeper


Table of contents


ZooKeeper is a distributed coordination service for managing mainframes. /b10> Coordinating and managing services in a distributed environment is a complex process. /b11> ZooKeeper solves this problem with its simple architecture and APIs. /b12> ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.

The ZooKeeper framework was originally in "Yahoo!" b uilt to access their applications in a simple and robust way. /b10> Later, Apache ZooKeeper became the standard for organized services used by Hadoop, HBase, and other distributed frameworks. /b11> For example, Apache HBase uses ZooKeeper to track the status of distributed data.

Before we go any further, it's important to understand one or two things about distributed applications. /b10> So let's start a quick discussion of the overview of distributed applications.

Distributed applications

Distributed applications can run on multiple systems in the network at a given time (at the same time), coordinating them to accomplish specific tasks in a fast and efficient manner. /b10> Typically, for complex and time-consuming tasks, non-distributed applications (running on a single system) can take several hours to complete, while distributed applications can be done in minutes by using the computing power involved in all systems.

By configuring distributed applications to run on more systems, you can further reduce the time it takes to complete tasks. /b10> A set of systems that a distributed application is running is called a cluster, and each machine running in the cluster is called a node.

Distributed applications have two parts, Server (server) and Client (client) applications. /b10> Server applications are actually distributed and have a common interface so that clients can connect to any server in the cluster and get the same results. /b11> Client applications are tools that interact with distributed applications.

Zookeeper Overview

ZooKeeper is an open source framework for distributed coordination services. I t is mainly used to solve the problem of the consistency of application systems in distributed clusters, such as how to avoid the problem of dirty reading caused by operating the same data at the same time. Z ooKeeper is essentially a distributed small file storage system. P rovides data storage based on a tree-like approach to the file system, and can effectively manage nodes of tree species. T his allows you to maintain and monitor the state changes of the data you store. D ata-based cluster management can be achieved by monitoring changes in the state of the data. Such as: unified naming service (dubbo), distributed configuration management (solr configuration centralized management), distributed message queue (sub/pub), distributed lock, distributed coordination and other functions.

Zookeeper architecture diagram

Zookeeper Overview

  • Leader: The sole scheduler and handler of the core transaction requests (write operations) for the work of the ZooKeeper cluster, ensuring the sequentiality of the cluster transactions; For write requests such as create, setData, delete, etc., you need to forward them uniformly to the leader, who needs to determine the number and perform the operation, a process called a transaction.
  • Follower: Processing client non-transaction (read operation) request Forward transaction request to Leader to participate in cluster leader election voting 2n-1 can do cluster voting In addition, for the more accessible zookeyer cluster, you can also add observer roles
  • Observer: Observer: Observer role, observe the latest state changes in the ZooKeeper cluster and synchronize these states, which can be handled independently for non-transaction requests, and for transaction requests, forwarded to the Leader server for processing services that do not participate in any form of voting and are typically used to increase the cluster's non-transactional processing power without compromising the cluster's transaction processing power

Master and Master in ZooKeeper:

  • Main from: fewer primary nodes, more nodes, the main node assigned tasks, from the node specific tasks
  • Master: The primary node and backup node, mainly used to solve the problem of how to select a new primary node after our host node hangs up, to ensure that our main node will not go down
  • Many times, the Lord does not have too obvious a line between the Lord and the Lord, and many times they appear together

Zookeeper's features

  1. Global data is consistent: each server keeps an identical copy of the data, and the data presented is consistent regardless of which server the client is linked to
  2. Reliability: If the message is accepted by one of the servers, it will be accepted by all servers
  3. Sequential: Includes both global order and partial order: global order means that if message a is published before message b on one server, message a is published before message b on all servers, and partial order means that if a message b is published by the same sender after message a, a must be ranked first in b
  4. Data update atomicity: A data update either succeeds or fails, with no intermediate state
  5. Real-time: ZooKeeper guarantees that clients will receive updated information about the server, or information about server failures, within an interval

Benefits of distributed applications

  • Reliability - Failure of one or more systems does not cause the entire system to fail.

  • Scalability - You can increase performance when you need it, make small changes to your application configuration by adding more machines, without downtime.

  • Transparency - Hides the complexity of the system and displays it as a single entity/application.

The challenge of distributed applications

  • Competitive Conditions - Two or more machines attempt to perform a specific task, which is actually done by a single machine at any given time. /b10> For example, shared resources can only be modified by a single machine at any given time.

  • Deadlock - Two or more operations wait for each other to complete indefinitely.

  • Ins agreement - Part of the data failed.

What is Apache ZooKeeper?

Apache ZooKeeper is a service used by clusters (node groups) to coordinate between themselves and maintain shared data through robust synchronization techniques. /b10> ZooKeeper itself is a distributed application that serves writing to distributed applications.

ZooKeeper offers the following common services:

  • Naming services - Identify nodes in the cluster by name. /b10> It is similar to DNS, but only for nodes.

  • Configuration management - Join the node's most recent and up-to-date system configuration information.

  • Cluster management - Join/leave nodes in the cluster and node states in real time.

  • Election Algorithm - Elects a node as the leader for coordination purposes.

  • Lock and sync services - Lock data while modifying it. /b10> This mechanism helps you automatically fail back when connecting to other distributed applications, such as Apache HBase.

  • Highly reliable data registry - data is available even when one or more nodes are down.

Distributed applications offer many benefits, but they also throw out complex and difficult challenges. /b10> The ZooKeeper framework provides a complete mechanism to overcome all challenges. C ompetitive conditions and deadlocks are handled using the Fail Safe Sync method. /b12> Another major drawback is data incoherence, and ZooKeeper uses atomic resolution.

The benefits of ZooKeeper

Here are the benefits of using ZooKeeper:

  • A simple distributed coordination process

  • Synchronization - Mutual exclusion and collaboration between server processes. /b10> This process helps Apache HBase with configuration management.

  • Ordered messages

  • Serialization - Encodes data according to specific rules. /b10> Make sure your application runs consistently. /b11> This method can be used in MapReduce to coordinate queues to execute running threads.

  • Reliability

  • Atomicity - Data transfer is completely successful or failed completely, but no transaction is partial.