Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Introduction to Hadoop


May 26, 2021 Hadoop


Table of contents


Hadoop - Introduction

Hadoop can be operated on general commercial servers, with high fault tolerance, high reliability, high scalability and so on

Ideal for writing once and reading scenes multiple times

For

  • Large-scale data
  • Streaming data (write once, read multiple times)
  • Commercial hardware (general hardware)

Not suitable

  • Low-latency data access
  • Lots of small files
  • Frequently modify files (basically 1 write)

Hadoop architecture

Introduction to Hadoop

  • HDFS: Distributed file storage
  • YARN: Distributed resource management
  • MapReduce: Distributed computing
  • Others: Take advantage of YARN's resource management capabilities for additional data processing

The internal nodes are basically master-Woker architectures