May 26, 2021 Hadoop
This architecture has the following issues:
In general, it's a single point and resource utilization
YARN is to split JobTracker's responsibilities, resource management and task scheduling monitoring into separate x7ACBs; Process: A global resource management and a management of each job (ApplicationMaster) ResourceManager and NodeManager provide allocation and management of computing resources, while ApplicationMaster completes the operation of the application
Under the YARN architecture, a common resource management platform and a common application computing system #x5E73; to avoid the single point of the old architecture and resource utilization issues, while also making the applications running on it no longer limited to mapReduce
1. Job submission
Get an Application ID from ResourceManager to check the job output configuration and calculate the input shrapned copy job resources (job jar, profile, shrapned information) to HDFS for the execution of later tasks
2. Job initialization
ResourceManager submits the job to Scheduler (there are many scheduling algorithms, typically based on priority) and Scheduleer assigns a Container to the job, and ResourceManager loads an application master process and hands it over to NodeManager.
Managing ApplicationMaster is primarily about creating a series of monitoring processes to track the progress of the job, getting input shrapned, creating a Map task for each shrapth and the corresponding reduce task Application Master also deciding how to run the job, if the job is small (configurable), directly under the same JVM
3. Task assignment
ApplicationMaster requests resources from Resource Manager (Container one by one, specifying resource requirements for task assignments) that are typically allocated based on data locality
4. Task execution
ApplicationMaster starts Container in the corresponding NodeManager to read the resources required for a task (job jar, profile, etc.) from HDFS, and then performs the task, depending on the allocation of ResourceManager
5. Progress and status update
Report the progress and status of the task to ApplicationMaster Client on a timely schedule to get the progress and status of the entire task from ApplicationMaster
6. Job completion
Client regularly checks whether the entire job is complete When the job is complete, temporary files, directories, and so on are emptied