May 26, 2021 Apache Pig
In the last chapter, we explained how to install Apache Pig. /b10> In this chapter, we'll discuss how to execute Apache Pig.
You can run Apache Pig in two modes, Local and HDFS.
In this mode, all files are installed and run from the local host and local file system, without Hadoop or HDFS. /b11> This pattern is typically used for testing purposes.
MapReduce mode is where we use Apache Pig to load or process data that exists in the Hadoop File System (HDFS). /b10>In this mode, whenever we execute a Pig Latin statement to process data, a MapReduce job is called on the back end to perform specific operations on the data that exists in HDFS.
Apache Pig scripts can be executed in three ways: interactive mode, batch mode, and embedded mode.
Grunt shell - You can use Grunt shell to run Apache Pig in interactive mode. /b10> In this shell, you can enter the Pig Latin statement and get the output (using the Dump operator).
Batch Mode (Script) - You can run Apache Pig in batch mode by writing the Pig Latin script to a single file with the .pig extension.
Embedded Mode (UDF) - Apache Pig allows us to define our own functions (UDF user-defined functions) in programming languages such as Java and use them in our scripts.
You can use the "-x" option to call the Grunt shell in the desired mode (local/MapReduce), as shown below.
Local mode | MapReduce mode |
---|---|
Command (Command) - $ ./pig -x local |
Command (Command) - $ ./pig -x mapreduce |
Output (Output) - |
Output (Output) - |
Both commands give the Grunt shell prompt, as shown below.
grunt>
You can exit the Grunt shell using "ctrl-d".
After calling the Grunt shell, you can execute the Pig script by entering the Pig Latin statement in Pig directly.
grunt> customers = LOAD 'customers.txt' USING PigStorage(',');
You can write the entire Pig Latin script in the file and execute it using -x command. /b10> Let's assume that there is a sample_script in a file called "1.pig", as shown below.
student = LOAD 'hdfs://localhost:9000/pig_data/student.txt' USING PigStorage(',') as (id:int,name:chararray,city:chararray); Dump student;
Now you can execute the script in the file above, as shown below.
Local mode | MapReduce mode |
---|---|
$ pig -x local Sample_script.pig | $ pig -x mapreduce Sample_script.pig |
Note: We'll discuss in detail how to run pig scripts in Batch Mode and Embedding Mode.