Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Hadoop reliability


May 26, 2021 Hadoop


Table of contents


HDFS - Reliability

There are a few key points to hdFS reliability:

  • Redundant replica policy
  • Rack strategy
  • Heartbeat mechanism
  • Safe mode
  • The effect of the test and the same
  • Recycle bin
  • Metadata protection
  • The snapshot mechanism

1. Redundant copy policy

You can set the copy factor .xml number of copies in hdfs-site

All blocks of data are replicatable

When DataNode starts, traverse the local file system to produce a list of the corresponding relationships between HDFS blocks and local files (blockreport) reported to Namenode

2. Rack strategy

HDFS's "rack awareness" sends a packet between nodes to sense whether they are in the same rack

Typically, one copy is stored in the rack and one more in the other rack, which prevents data loss in the event of rack failure and increases bandwidth utilization

3. Heartbeat mechanism

NameNode periodically receives heartbeat information and block reports from DataNode

NameNode validates metadata based on block reports

DataNode, which does not send a heartbeat on time, is marked as down and will not be given any I/O requests

If a DataNode failure causes the number of replicas to decrease and falls below a pre-set value, NameNode detects the databases and replicates them again at the appropriate time

The reasons for the re-replication also include corruption of the copy of the data itself, disk errors, increased replication factors, and so on

4. Safe mode

NameNode starts with a "safe mode" phase

The Safe Mode phase does not produce data writes

At this stage NameNode collects reports from individual DataNode, which is considered "safe" when the block reaches a minimum number of replicas

After a certain percentage (setable) of blocks of data are determined to be "safe," after a few times, the safe mode ends

When an insufficient number of replicas is detected, the block is copied until the minimum number of replicas is reached

5. The effect is the same

When a file is created, each block of data produces a validation

The checksum is saved in the namespace as a separate hidden file

When the client gets the data, it can check that the checks are the same and find that the block is corrupt

If the block being read is corrupted, you can continue reading other copies

6. Recycle bin

When you delete a file, you actually put it in the Trash /trash

The files in the recycle bin can be recovered quickly

You can set a time value, and when files in the recycle bin have been stored longer than this value, they are completely deleted and the occupied blocks of data are released

7. Metadata protection

Image files and thing logs are the core data of NameNode. You can configure to have multiple copies

Replicas slow down NameNode,but increase security

NameNode is still a single point to switch manually if a failure occurs