May 26, 2021 Apache Pig
The FILTER operator is used to select the desired group from the relationship based on the criteria.
The syntax of the FILTER operator is given below.
grunt> Relation2_name = FILTER Relation1_name BY (condition);
Suppose you have a file called pig_data in the HDFS directory /student_details.txt, as shown below.
student_details.txt
001,Rajiv,Reddy,21,9848022337,Hyderabad 002,siddarth,Battacharya,22,9848022338,Kolkata 003,Rajesh,Khanna,22,9848022339,Delhi 004,Preethi,Agarwal,21,9848022330,Pune 005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 006,Archana,Mishra,23,9848022335,Chennai 007,Komal,Nayak,24,9848022334,trivendram 008,Bharathi,Nambiayar,24,9848022333,Chennai
Load this file through student_details into pig in , as shown below.
grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);
Now use the Finder operator to get more information about students who belong to Chennai City.
filter_data = FILTER student_details BY city == 'Chennai';
Use the DUMP operator to validate the relationship filter_data, as shown below.
grunt> Dump filter_data;
It produces the following output, which shows the relationship filter_data as follows.
(6,Archana,Mishra,23,9848022335,Chennai) (8,Bharathi,Nambiayar,24,9848022333,Chennai)