Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Pig Foreach operator


May 26, 2021 Apache Pig


Table of contents


The FOREACH operator is used to generate specified data transformations based on column data.

Grammar

The syntax of the FOREACH operator is given below.

grunt> Relation_name2 = FOREACH Relatin_name1 GENERATE (required data);

Cases

Suppose you have a file called pig_data in the HDFS directory /student_details.txt, as shown below.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi 
004,Preethi,Agarwal,21,9848022330,Pune 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 
006,Archana,Mishra,23,9848022335,Chennai 
007,Komal,Nayak,24,9848022334,trivendram 
008,Bharathi,Nambiayar,24,9848022333,Chennai

Through the student_details loads this file into pig, as shown below.

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);

Now let's get each student's id, age, and city values from relationship student_details and store it in another relationship called foreach_data using the foreach operator, as shown below.

grunt> foreach_data = FOREACH student_details GENERATE id,age,city;

Verify

Use the DUMP operator to validate the relationship foreach_data, as shown below.

grunt> Dump foreach_data;

Output

It produces the following output, which shows the foreach_data relationship.

(1,21,Hyderabad)
(2,22,Kolkata)
(3,22,Delhi)
(4,21,Pune) 
(5,23,Bhuwaneshwar)
(6,23,Chennai) 
(7,24,trivendram)
(8,24,Chennai)