Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Pig runs the script


May 26, 2021 Apache Pig


Table of contents


In this chapter, we'll learn how to run apache Pig scripts in batch mode.

Comments in the Pig script

When we write a script to a file, we can include comments in it, as shown below.

Multiple lines of comments

We'll start with a multi-line comment with '/', ending with ''/'.

/* These are the multi-line comments 
  In the pig script */ 

A one-line comment

We'll start a single line of comments with "--".

--we can write single line comments like this.

The Pig script is executed in batch mode

When executing apache Pig statements in batches, follow these steps.

Step 1

Write all required Pig Latin statements in a single file. /b10> We can write all Pig Latin statements and commands to a single file and save them as .pig files.

Step 2

Execute the Apache Pig script. Y ou can execute pig scripts from Shell (Linux), as shown below.

Local mode MapReduce mode

$ pig -x local Sample_script.pig

$ pig -x mapreduce Sample_script.pig

You can use the exec command to execute it from the Grunt shell, as shown below.

grunt> exec /sample_script.pig

Perform pig scripts from HDFS

We can also execute pig scripts that reside in HDFS. /b10> Suppose you have a pig_data script named Sample_script.pig in the HDFS directory named /Sample_script/. /b11> We can execute it as follows.

$ pig -x mapreduce hdfs://localhost:9000/pig_data/Sample_script.pig 

Cases

Suppose you have a file in HDFS that has the following student_details.txt.

student_details.txt

001,Rajiv,Reddy,21,9848022337,Hyderabad 
002,siddarth,Battacharya,22,9848022338,Kolkata
003,Rajesh,Khanna,22,9848022339,Delhi 
004,Preethi,Agarwal,21,9848022330,Pune 
005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar 
006,Archana,Mishra,23,9848022335,Chennai 
007,Komal,Nayak,24,9848022334,trivendram 
008,Bharathi,Nambiayar,24,9848022333,Chennai

We also have a sample script called sample_script.pig in the same HDFS directory. /b10> This file contains statements that perform actions and transformations on the student relationship, as shown below.

student = LOAD 'hdfs://localhost:9000/pig_data/student_details.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray);
	
student_order = ORDER student BY age DESC;
  
student_limit = LIMIT student_order 4;
  
Dump student_limit;
  • The first statement of the script loads the data student_details.txt file named student into a relationship called student.

  • The second statement of the script arranges the metagroups of relationships in descending order by age and stores them as student_order.

  • The third statement of the script stores student_order the first four groups of the student_limit.

  • Finally, the fourth statement dumps the relationship into student_limit content.

Now, execute sample_script.pig, as shown below.

$./pig -x mapreduce hdfs://localhost:9000/pig_data/sample_script.pig

Apache Pig is executed and provides output with the following.

(7,Komal,Nayak,24,9848022334,trivendram)
(8,Bharathi,Nambiayar,24,9848022333,Chennai) 
(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar) 
(6,Archana,Mishra,23,9848022335,Chennai)
2015-10-19 10:31:27,446 [main] INFO  org.apache.pig.Main - Pig script completed in 12
minutes, 32 seconds and 751 milliseconds (752751 ms)