May 26, 2021 Apache Pig
The COGROUP operator works in the same way as the GROUP operator. /b10> The only difference between the two operators is that the group operator is typically used for a relationship, while the cogroup operator is used for statements that involve two or more relationships.
Suppose you have two files in the HDFS directory /pig_data /, namely student_details.txt and employee_details.txt, as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad 002,siddarth,Battacharya,22,9848022338,Kolkata 003,Rajesh,Khanna,22,9848022339,Delhi 004,Preethi,Agarwal,21,9848022330,Pune 005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 006,Archana,Mishra,23,9848022335,Chennai 007,Komal,Nayak,24,9848022334,trivendram 008,Bharathi,Nambiayar,24,9848022333,Chennai
employee_details.txt
001,Robin,22,newyork 002,BOB,23,Kolkata 003,Maya,23,Tokyo 004,Sara,25,London 005,David,23,Bhuwaneshwar 006,Maggy,22,Chennai
Load these files into Pig, with the relationship names student_details and employee_details, as shown below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray); grunt> employee_details = LOAD 'hdfs://localhost:9000/pig_data/employee_details.txt' USING PigStorage(',') as (id:int, name:chararray, age:int, city:chararray);
Now group the student_details/employee_details relationships by keywordage, as shown below.
grunt> cogroup_data = COGROUP student_details by age, employee_details by age;
Use the DUMP operator to validate the relationship cogroup_data, as shown below.
grunt> Dump cogroup_data;
It produces the following output, showing the contents of cogroup_data called the file, as shown below.
(21,{(4,Preethi,Agarwal,21,9848022330,Pune), (1,Rajiv,Reddy,21,9848022337,Hyderabad)}, { }) (22,{ (3,Rajesh,Khanna,22,9848022339,Delhi), (2,siddarth,Battacharya,22,9848022338,Kolkata) }, { (6,Maggy,22,Chennai),(1,Robin,22,newyork) }) (23,{(6,Archana,Mishra,23,9848022335,Chennai),(5,Trupthi,Mohanthy,23,9848022336 ,Bhuwaneshwar)}, {(5,David,23,Bhuwaneshwar),(3,Maya,23,Tokyo),(2,BOB,23,Kolkata)}) (24,{(8,Bharathi,Nambiayar,24,9848022333,Chennai),(7,Komal,Nayak,24,9848022334, trivendram)}, { }) (25,{ }, {(4,Sara,25,London)})
The cogroup operator groups grouped groups from each relationship by age, each of which describes a specific age value.
For example, if we consider the first group of results, which are grouped by age 21, it contains two packages
The first package holds all the metagroups that have a 21-year-old first relationship (student_details the first relationship);
The second package has a second relationship (in this case, employee_details) for all metagroups, which are 21 years old.
If the relationship does not have a group with an age value of 21, an empty package is returned.