Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

What is true about hdfs?


Asked by Kairi Conley on Dec 05, 2021 FAQ



HDFS is a distributed file system that handles large data sets running on commodity hardware. It is used to scale a single Apache Software FoundationThe Apache Software Foundation is an American non-profit corporation (classified as a 501(c organization in the United States) to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated on March 25, 1999. Hadoop cluster to hundreds (and even thousands) of nodes. HDFS is one of the major components of Apache Hadoop, the others being MapReduce and YARN .
Consequently,
It mainly designed for working on commodity Hardware devices (devices that are inexpensive), working on a distributed file system design. HDFS is designed in such a way that it believes more in storing the data in a large chunk of blocks rather than storing small data blocks.
In addition, HDFS stores the data in the form of the block where the size of each data block is 128MB in size which is configurable means you can change it according to your requirement in hdfs-site.xml file in your Hadoop directory. It’s easy to access the files stored in HDFS. HDFS also provide high availibility and fault tolerance.
Furthermore,
The current HDFS architecture has two layers – Namespace – This layer manages files, directories and blocks. This layer supports the basic file system operations e.g. listing of files, creation of files, modification of files and deletion of files and folders.
Subsequently,
HDFS stores users’ data in files and internally, the files are split into fixed-size blocks. These blocks are stored on DataNodes. NameNode is the master node that stores the metadata about file system i.e. block location, different blocks for each file, etc. 1. Availability if DataNode fails