May 26, 2021 Zookeeper
ZooKeeper is a distributed coordination service for managing mainframes. /b10> Coordinating and managing services in a distributed environment is a complex process. /b11> ZooKeeper solves this problem with its simple architecture and APIs. /b12> ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.
The ZooKeeper framework was originally in "Yahoo!" b uilt to access their applications in a simple and robust way. /b10> Later, Apache ZooKeeper became the standard for organized services used by Hadoop, HBase, and other distributed frameworks. /b11> For example, Apache HBase uses ZooKeeper to track the status of distributed data.
Before we go any further, it's important to understand one or two things about distributed applications. /b10> So let's start a quick discussion of the overview of distributed applications.
Distributed applications can run on multiple systems in the network at a given time (at the same time), coordinating them to accomplish specific tasks in a fast and efficient manner. /b10> Typically, for complex and time-consuming tasks, non-distributed applications (running on a single system) can take several hours to complete, while distributed applications can be done in minutes by using the computing power involved in all systems.
By configuring distributed applications to run on more systems, you can further reduce the time it takes to complete tasks. /b10> A set of systems that a distributed application is running is called a cluster, and each machine running in the cluster is called a node.
Distributed applications have two parts, Server (server) and Client (client) applications. /b10> Server applications are actually distributed and have a common interface so that clients can connect to any server in the cluster and get the same results. /b11> Client applications are tools that interact with distributed applications.
Reliability - Failure of one or more systems does not cause the entire system to fail.
Scalability - You can increase performance when you need it, make small changes to your application configuration by adding more machines, without downtime.
Transparency - Hides the complexity of the system and displays it as a single entity/application.
Competitive Conditions - Two or more machines attempt to perform a specific task, which is actually done by a single machine at any given time. /b10> For example, shared resources can only be modified by a single machine at any given time.
Deadlock - Two or more operations wait for each other to complete indefinitely.
Ins agreement - Part of the data failed.
Apache ZooKeeper is a service used by clusters (node groups) to coordinate between themselves and maintain shared data through robust synchronization techniques. /b10> ZooKeeper itself is a distributed application that serves writing to distributed applications.
ZooKeeper offers the following common services:
Naming services - Identify nodes in the cluster by name. /b10> It is similar to DNS, but only for nodes.
Configuration management - Join the node's most recent and up-to-date system configuration information.
Cluster management - Join/leave nodes in the cluster and node states in real time.
Election Algorithm - Elects a node as the leader for coordination purposes.
Lock and sync services - Lock data while modifying it. /b10> This mechanism helps you automatically fail back when connecting to other distributed applications, such as Apache HBase.
Highly reliable data registry - data is available even when one or more nodes are down.
Distributed applications offer many benefits, but they also throw out complex and difficult challenges. /b10> The ZooKeeper framework provides a complete mechanism to overcome all challenges. C ompetitive conditions and deadlocks are handled using the Fail Safe Sync method. /b12> Another major drawback is data incoherence, and ZooKeeper uses atomic resolution.
Here are the benefits of using ZooKeeper:
A simple distributed coordination process
Synchronization - Mutual exclusion and collaboration between server processes. /b10> This process helps Apache HBase with configuration management.
Ordered messages
Serialization - Encodes data according to specific rules. /b10> Make sure your application runs consistently. /b11> This method can be used in MapReduce to coordinate queues to execute running threads.
Reliability
Atomicity - Data transfer is completely successful or failed completely, but no transaction is partial.