May 26, 2021 Apache Pig
5. Pig Latin - Arithmetic operator
6. Pig Latin - Comparison operator
Pig Latin is the language used to analyze data in Hadoop using Apache Pig. In this chapter, we'll discuss the basics of Pig Latin, such as Pig Latin statements, data types, general operators, relationship operators, and Pig Latin UDF.
As discussed in the previous section, Pig's data model is completely nested. tion is the outermost structure of the Pig Latin data model. It is a package where:
When working with data using Pig Latin, the statement is the basic structure.
These statements use the relation, which includes expressions and schema.
Each statement is a sign (; The end.
We'll use the operators provided by Pig Latin to perform various operations through statements.
In addition to LOAD and STORE, when all other operations are performed, the Pig Latin statement takes a relationship as input and produces another relationship as output.
As soon as you enter the Load statement in the Grunt shell, a semantic check is performed. /b10> To view the contents of the pattern, you need to use the Dump operator. /b11> The MapReduce job that loads data into the file system is performed only after the dump operation is performed.
Here's a Pig Latin statement that loads the data into Apache Pig.
grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
The table given below describes the Pig Latin data type.
Serial number | The data type | Description and example |
---|---|---|
1 | Int |
Represents a signed 32-bit integer. Example: 8 |
2 | long |
Represents a signed 64-bit integer. Example: 5L |
3 | float |
Represents a 32-bit floating point with a symbol. Example: 5.5F |
4 | double |
Represents a 64-bit floating point. Example: 10.5 |
5 | chararray |
Represents an array of characters (strings) in Unicode UTF-8 format. Example: 'w3cschool' |
6 | Bytearray |
Represents an array of bytes (blobs). |
7 | Boolean |
Represents the Boolean value. Example: true / false. |
8 | Datetime |
Represents the date time. Example: 1970-01-01T00:00:00.000 plus 00:00 |
9 | Biginteger |
Represents Java BigInteger. Example: 60708090709 |
10 | Bigdecimal |
Represents Java Big Decimal Example: 185.98376256272893883 |
Complex type | ||
11 | Tuple |
A u-group is an ordered set of fields. Example :(raja,30) |
12 | Bag |
A package is a collection of meta-groups. Example: (raju, 30), (Mohhammad, 45) |
13 | Map |
The map is a set of key value pairs. Example: 'name' s'Raju', 'age' #30' |
The values of all the above data types can be NULL. /b10> Apache Pig handles empty values in a similar way to SQL. /b11> null can be an unknown or non-existent value, which is used as a placeholder for an optional value. /b13> These empty values can occur naturally or as a result of an operation.
The following table describes the arithmetic operator of Pig Latin. /b10> Suppose a is 10 and b is 20.
Operator | Describe | Example |
---|---|---|
+ |
Add - The values on both sides of the operator are added |
A plus b will result in 30 |
− |
Subtract - Subtract the number on the right from the number to the left of the operator |
a-b will result in -10 |
* |
Multiply - The values on both sides of the operator multiply |
A-b will result in 200 |
/ |
Division - Divide the number on the left side of the operator by the number on the right |
b / a will result in 2 |
% |
Coefficient - Divide the number on the left by the number on the right side of the operator and return the remaining number |
b%a will result in 0 |
? : |
Bincond - Evaluate the Boolean operator. /b10> It has three operasts, as shown below. Variable x (expression)? value1 (if true): value2 (if false). |
b =(a == 1)? 20:30; If a is 1, the value of b is 20. If a! The value of b is 30. |
CASE WHEN THEN ELSE END |
The Case-case operator is equivalent to a nested bincond operator. |
CASE f2 % 2 WHEN 0 THEN 'even'
WHEN 1 THEN 'odd' END |
The following table describes the comparison operator for Pig Latin.
Operator | Describe | Example |
---|---|---|
== |
Equal to - Check that the values of the two numbers are equal; |
(a s b) is not true. |
!= |
Not equal to - Check that the values of the two numbers are equal. /b10> If the values are not equal, the condition is true. |
(a! b) is true. |
> |
Greater than - Check if the value of the number on the left is greater than the value of the number on the right. /b10> If so, the condition changes to true. |
(a) B) is not true. |
< |
Less than - Check that the value of the number on the left is less than the value of the number on the right. /b10> If so, the condition changes to true. |
(a-lt;b) is true. |
>= |
Greater than or equal to - Check whether the value of the number on the left is greater than or equal to the value of the number on the right. /b10> If so, the condition changes to true. |
It is not true. |
<= |
Less than or equal to - Check whether the value of the number on the left is less than or equal to the value of the number on the right. /b10> If so, the condition changes to true. |
(a
slt; sb) is true.
|
matches |
Pattern Match - Check that the string on the left matches the constant on the right. |
f1 matches '.* tutorial.*' |
The following table describes the type structure operator for Pig Latin.
Operator | Describe | Example |
---|---|---|
() |
Yuan group constructor operator - This operator is used to build a ed group. |
(Raju,30) |
{} |
Package constructor operator - This operator is used to construct a package. |
{(Raju,30),(Mohammad,45)} |
[] |
Map Constructor Operator - This operator is used to construct a map. |
[name#Raja,age#30] |
The following table describes the relationship operator for Pig Latin.
Operator | Describe |
---|---|
Load and store | |
LOAD | Load data from the file system (local/HDFS) into the relationship. |
STORE | Store data from the file system (local/HDFS) into the relationship. |
Filter | |
FILTER | Remove unwanted rows from the relationship. |
DISTINCT | Remove duplicate rows from the relationship. |
FOREACH,GENERATE | Data transformations are generated based on data columns. |
STREAM | Use external programs to transform relationships. |
Grouping and connecting | |
JOIN | Connect two or more relationships. |
COGROUP | Group data into two or more relationships. |
GROUP | Group data in a single relationship. |
CROSS | Create a vector product for two or more relationships. |
Sort | |
ORDER | Sort relationships based on one or more fields (ascending or descending). |
LIMIT | Get a limited number of groups from the relationship. |
Merge and split | |
UNION | Merge two or more relationships into a single relationship. |
SPLIT | Split a single relationship into two or more relationships. |
Diagnostic operator | |
DUMP | Print the contents of the relationship on the console. |
DESCRIBE | The pattern that describes the relationship. |
EXPLAIN | View logical, physical, or MapReduce execution plans to calculate relationships. |
ILLUSTRATE | View the step-by-step execution of a series of statements. |