May 26, 2021 Hadoop
There are a few key points to hdFS reliability:
You can set the copy factor .xml number of copies in hdfs-site
All blocks of data are replicatable
When DataNode starts, traverse the local file system to produce a list of the corresponding relationships between HDFS blocks and local files (blockreport) reported to Namenode
HDFS's "rack awareness" sends a packet between nodes to sense whether they are in the same rack
Typically, one copy is stored in the rack and one more in the other rack, which prevents data loss in the event of rack failure and increases bandwidth utilization
NameNode periodically receives heartbeat information and block reports from DataNode
NameNode validates metadata based on block reports
DataNode, which does not send a heartbeat on time, is marked as down and will not be given any I/O requests
If a DataNode failure causes the number of replicas to decrease and falls below a pre-set value, NameNode detects the databases and replicates them again at the appropriate time
The reasons for the re-replication also include corruption of the copy of the data itself, disk errors, increased replication factors, and so on
NameNode starts with a "safe mode" phase
The Safe Mode phase does not produce data writes
At this stage NameNode collects reports from individual DataNode, which is considered "safe" when the block reaches a minimum number of replicas
After a certain percentage (setable) of blocks of data are determined to be "safe," after a few times, the safe mode ends
When an insufficient number of replicas is detected, the block is copied until the minimum number of replicas is reached
When a file is created, each block of data produces a validation
The checksum is saved in the namespace as a separate hidden file
When the client gets the data, it can check that the checks are the same and find that the block is corrupt
If the block being read is corrupted, you can continue reading other copies
When you delete a file, you actually put it in the Trash /trash
The files in the recycle bin can be recovered quickly
You can set a time value, and when files in the recycle bin have been stored longer than this value, they are completely deleted and the occupied blocks of data are released
Image files and thing logs are the core data of NameNode. You can configure to have multiple copies
Replicas slow down NameNode,but increase security
NameNode is still a single point to switch manually if a failure occurs