Hadoop - Introduction
Hadoop can be operated on general commercial servers, with high fault tolerance, high reliability, high scalability and so on
Ideal for writing once and reading scenes multiple times
For
-
Large-scale data
-
Streaming data (write once, read multiple times)
-
Commercial hardware (general hardware)
Not suitable
-
Low-latency data access
-
Lots of small files
-
Frequently modify files (basically 1 write)
Hadoop architecture
-
HDFS:
Distributed file storage
-
YARN:
Distributed resource management
-
MapReduce: Distributed
computing
-
Others: Take
advantage of YARN's resource management capabilities for additional data processing
The internal nodes are basically master-Woker architectures