Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Pig performs


May 26, 2021 Apache Pig


Table of contents


In the last chapter, we explained how to install Apache Pig. /b10> In this chapter, we'll discuss how to execute Apache Pig.

Apache Pig execution mode

You can run Apache Pig in two modes, Local and HDFS.

Local mode

In this mode, all files are installed and run from the local host and local file system, without Hadoop or HDFS. /b11> This pattern is typically used for testing purposes.

MapReduce mode

MapReduce mode is where we use Apache Pig to load or process data that exists in the Hadoop File System (HDFS). /b10>In this mode, whenever we execute a Pig Latin statement to process data, a MapReduce job is called on the back end to perform specific operations on the data that exists in HDFS.

Apache Pig execution mechanism

Apache Pig scripts can be executed in three ways: interactive mode, batch mode, and embedded mode.

  • Grunt shell - You can use Grunt shell to run Apache Pig in interactive mode. /b10> In this shell, you can enter the Pig Latin statement and get the output (using the Dump operator).

  • Batch Mode (Script) - You can run Apache Pig in batch mode by writing the Pig Latin script to a single file with the .pig extension.

  • Embedded Mode (UDF) - Apache Pig allows us to define our own functions (UDF user-defined functions) in programming languages such as Java and use them in our scripts.

Call the Grunt shell

You can use the "-x" option to call the Grunt shell in the desired mode (local/MapReduce), as shown below.

Local mode MapReduce mode

Command (Command) -

$ ./pig -x local

Command (Command) -

$ ./pig -x mapreduce

Output (Output) -

Apache Pig performs

Output (Output) -

Apache Pig performs

Both commands give the Grunt shell prompt, as shown below.

grunt>

You can exit the Grunt shell using "ctrl-d".

After calling the Grunt shell, you can execute the Pig script by entering the Pig Latin statement in Pig directly.

grunt> customers = LOAD 'customers.txt' USING PigStorage(',');

Perform Apache Pig in batch mode

You can write the entire Pig Latin script in the file and execute it using -x command. /b10> Let's assume that there is a sample_script in a file called "1.pig", as shown below.

Sample_script.pig

student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING
   PigStorage(',') as (id:int,name:chararray,city:chararray);
  
Dump student;

Now you can execute the script in the file above, as shown below.

Local mode MapReduce mode
$ pig -x local Sample_script.pig $ pig -x mapreduce Sample_script.pig

Note: We'll discuss in detail how to run pig scripts in Batch Mode and Embedding Mode.