Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Pig Grunt Shell


May 26, 2021 Apache Pig


Table of contents


After calling the Grunt shell, you can run the Pig script in the shell. /b10> In addition, there are some useful shell and utility commands provided by Grunt shell. /b11> This chapter explains the shell and utility commands provided by Grunt shell.

Note: In some parts of this chapter, commands such as Load and Store are used. See the section for more information about them.

Shell command

Apache Pig's Grunt shell is primarily used to write Pig Latin scripts. /b10> Until then, we can use sh and fs to call any shell command.

sh command

Using the sh command, we can call any shell command from the Grunt shell, but we cannot execute the command as part of the shell environment (ex-cd).

Grammar

The sh command syntax is given below.

grunt> sh shell command parameters

Example

We can use the sh option to call the ls command of the Linux shell from the Grunt shell, as shown below. /b10> In this example, it lists the files in the /pig/bin/directory.

grunt> sh ls
   
pig 
pig_1444799121955.log 
pig.cmd 
pig.py

fs command

Using the fs command, we can call any FsShell command from grunt shell.

Grammar

The syntax of the fs command is given below.

grunt> sh File System command parameters

Example

We can use the fs command to call the HDFS ls command from Grunt shell. /b10> In the following example, it lists the files in the HDFS root.

grunt> fs –ls
  
Found 3 items
drwxrwxrwx   - Hadoop supergroup          0 2015-09-08 14:13 Hbase
drwxr-xr-x   - Hadoop supergroup          0 2015-09-09 14:52 seqgen_data
drwxr-xr-x   - Hadoop supergroup          0 2015-09-08 11:30 twitter_data

In the same way, we can use the fs command to call the shell command of all other file systems from the Grunt shell.

Utility commands

Grunt shell provides a set of utility commands. T hese include utility commands such as clear, help, history, quit, and set; A description of the practical commands provided by Grunt shell is given below.

Clear command

The clear command is used to clear the screen of the Grunt shell.

Grammar

You can use the clear command to clear the screen of the grunt shell, as shown below.

grunt> clear

Help command

The help command provides a list of Pig commands or Pig properties.

Use

You can use the help command to get a list of Pig commands, as shown below.

grunt> help

Commands: <pig latin statement>; - See the PigLatin manual for details:
http://hadoop.apache.org/pig
  
File system commands:fs <fs arguments> - Equivalent to Hadoop dfs  command:
http://hadoop.apache.org/common/docs/current/hdfs_shell.html
	 
Diagnostic Commands:describe <alias>[::<alias] - Show the schema for the alias.
Inner aliases can be described as A::B.
    explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml] 
       [-param <param_name>=<pCram_value>]
       [-param_file <file_name>] [<alias>] - 
       Show the execution plan to compute the alias or for entire script.
       -script - Explain the entire script.
       -out - Store the output into directory rather than print to stdout.
       -brief - Don't expand nested plans (presenting a smaller graph for overview).
       -dot - Generate the output in .dot format. Default is text format.
       -xml - Generate the output in .xml format. Default is text format.
       -param <param_name - See parameter substitution for details.
       -param_file <file_name> - See parameter substitution for details.
       alias - Alias to explain.
       dump <alias> - Compute the alias and writes the results to stdout.

Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
       Execute the script with access to grunt environment including aliases.
       -param <param_name - See parameter substitution for details.
       -param_file <file_name> - See parameter substitution for details.
       script - Script to be executed.
    run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
       Execute the script with access to grunt environment.
		 -param <param_name - See parameter substitution for details.         
       -param_file <file_name> - See parameter substitution for details.
       script - Script to be executed.
    sh  <shell command> - Invoke a shell command.
    kill <job_id> - Kill the hadoop job specified by the hadoop job id.
    set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
       The following keys are supported:
       default_parallel - Script-level reduce parallelism. Basic input size heuristics used 
       by default.
       debug - Set debug on or off. Default is off.
       job.name - Single-quoted name for jobs. Default is PigLatin:<script name>     
       job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high.
       Default is normal stream.skippath - String that contains the path.
       This is used by streaming any hadoop property.
    help - Display this message.
    history [-n] - Display the list statements in cache.
       -n Hide line numbers.
    quit - Quit the grunt shell. 

The history command

This command displays a list of statements executed/used since the Grunt shell was called.

Use

Let's say we've executed three statements since we opened the Grunt shell.

grunt> customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(',');
 
grunt> orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
 
grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
 

Then, using the history command produces the following output.

grunt> history

customers = LOAD 'hdfs://localhost:9000/pig_data/customers.txt' USING PigStorage(','); 
  
orders = LOAD 'hdfs://localhost:9000/pig_data/orders.txt' USING PigStorage(',');
   
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',');
 

Set command

The set command is used to display/assign values to keys used in Pig.

Use

With this command, you can set the value to the following key.

Key Description and value
default_parallel Set the number of reducers for the mapping job by passing any integers as values to this key.
debug Turn off or on debugging in Pig by passing on/off to this key.
job.name Set the job name to the desired job by passing the string value to this key.
job.priority

Set the priority of the job by passing one of the following values to this key:

  • very_low
  • low
  • normal
  • high
  • very_high
stream.skippath For streaming, you can set a path that does not transfer data by passing the desired path to this key as a string.

Quit command

You can use this command to exit the Grunt shell.

Use

Exit from the Grunt shell, as shown below.

grunt> quit

Now let's look at the commands that control Apache Pig from the Grunt shell.

exec command

Using the exec command, we can execute the Pig script from the Grunt shell.

Grammar

The syntax for the utility command exec is given below.

grunt> exec [–param param_name = param_value] [–param_file file_name] [script]

Example

Let's assume that there is a file pig_data student in the /.txt directory of HDFS that contains the following.

Student.txt

001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi

Also, let's say we have a script file called sample_script.pig in the /pig_data/directory of HDFS with the following.

Sample_script.pig

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',') 
   as (id:int,name:chararray,city:chararray);
  
Dump student;

Now, let's use the exec command to execute the script above from the Grunt shell, as shown below.

grunt> exec /sample_script.pig

Output

The exec command executes sample_script in the .pig. /b10> As instructed in the script, it loads the student.txt file into Pig and displays the results of the Dump operator, showing the following.

(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi) 

Kill command

You can use this command to terminate a job from the Grunt shell.

Grammar

The syntax of the kill command is given below.

grunt> kill JobId

Example

Suppose you have a running Id_0055 with an id and use the kill command to terminate it from the Grunt shell, as shown below.

grunt> kill Id_0055

Run command

You can run the Pig script from the Grunt shell using the run command

Grammar

The syntax of the run command is given below.

grunt> run [–param param_name = param_value] [–param_file file_name] script

Example

Suppose you have a file called student pig_data in the /.txt directory of HDFS that contains the following.

Student.txt

001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi

Also, suppose we have a script file called sample_script.pig in the local file system and have the following.

Sample_script.pig

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);

Now, let's run the script above from the Grunt shell using the run command, as shown below.

grunt> run /sample_script.pig

You can use the Dump operator to view the output of the script, as shown below.

grunt> Dump;

(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)

Note: The difference between exec and run commands is that, if run is used, the statements in the script are available in the history command.