Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Pig Order By operator


May 26, 2021 Apache Pig


Table of contents


The ORDER BY operator is used to display the contents of a relationship in a sort order based on one or more fields.

Grammar

The syntax of the ORDER BY operator is given below.

grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);

Cases

Suppose you have a file called pig_data in the HDFS directory /student_details.txt, as shown below.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi 
004,Preethi,Agarwal,21,9848022330,Pune 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 
006,Archana,Mishra,23,9848022335,Chennai 
007,Komal,Nayak,24,9848022334,trivendram 
008,Bharathi,Nambiayar,24,9848022333,Chennai

The relationship student_details loads this file into pig, as shown below.

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);

Now let's arrange the relationships in descending order based on the student's age and store them in another relationship called order_by_data using the ORDER BY operator, as shown below.

grunt> order_by_data = ORDER student_details BY age DESC;

Verify

Use the DUMP operator to validate the relationship order_by_data, as shown below.

grunt> Dump order_by_data; 

Output

It produces the following output, which shows the order_by_data relationship.

(8,Bharathi,Nambiayar,24,9848022333,Chennai)
(7,Komal,Nayak,24,9848022334,trivendram)
(6,Archana,Mishra,23,9848022335,Chennai) 
(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)
(3,Rajesh,Khanna,22,9848022339,Delhi) 
(2,siddarth,Battacharya,22,9848022338,Kolkata)
(4,Preethi,Agarwal,21,9848022330,Pune) 
(1,Rajiv,Reddy,21,9848022337,Hyderabad)