Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

What's the right number of reducers for mapreduce?


Asked by Tinsley Acosta on Dec 07, 2021 FAQ



The right number of reducers are 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of the maximum container per node>). With 0.95, all reducers immediately launch and start transferring map outputs as the maps finish.
Subsequently,
The number of reducers can be set in two ways as below: Using the command line: While running the MapReduce job, we have an option to set the number of reducers which can be specified by the controller mapred.reduce.tasks. This will set the maximum reducers to 20.
Next, The Reducer Of Map-Reduce is consist of mainly 3 processes/phases: Shuffle: Shuffling helps to carry data from the Mapper to the required Reducer. With the help of HTTP, the framework calls for applicable partition of the output in all Mappers.
In fact,
Reducer in Hadoop MapReduce reduces a set of intermediate values which share a key to a smaller set of values. In MapReduce job execution flow, Reducer takes a set of an intermediate key-value pair produced by the mapper as the input. Then, Reducer aggregate, filter and combine key-value pairs and this requires a wide range of processing.
In this manner,
Partitioner decides how outputs from combiners are sent to the reducers. The output of the partitioner is shuffled and sorted. This output is fed as input to the reducer. The reducer combines all the intermediate values for the intermediate keys into a list called tuples.