Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Pig Limit operator


May 26, 2021 Apache Pig


Table of contents


The LIMIT operator is used to get a limited number of groups from a relationship.

Grammar

The syntax of the LIMIT operator is given below.

grunt> Result = LIMIT Relation_name required number of tuples;

Cases

Suppose you have a file called pig_data in the HDFS directory /student_details.txt, as shown below.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi 
004,Preethi,Agarwal,21,9848022330,Pune 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 
006,Archana,Mishra,23,9848022335,Chennai 
007,Komal,Nayak,24,9848022334,trivendram 
008,Bharathi,Nambiayar,24,9848022333,Chennai

The relationship student_details loads this file into pig, as shown below.

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray,age:int, phone:chararray, city:chararray);

Now, let's sort relationships in descending order based on the student's age and store them in another relationship called limit_data using the ORDER BY operator, as shown below.

grunt> limit_data = LIMIT student_details 4; 

Verify

Use the DUMP operator to validate the relationship limit_data, as shown below.

grunt> Dump limit_data; 

Output

It produces the following output, showing the relationship limit_data as follows.

(1,Rajiv,Reddy,21,9848022337,Hyderabad) 
(2,siddarth,Battacharya,22,9848022338,Kolkata) 
(3,Rajesh,Khanna,22,9848022339,Delhi) 
(4,Preethi,Agarwal,21,9848022330,Pune)