Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Pig Latin Foundation


May 26, 2021 Apache Pig


Table of contents


Pig Latin is the language used to analyze data in Hadoop using Apache Pig. In this chapter, we'll discuss the basics of Pig Latin, such as Pig Latin statements, data types, general operators, relationship operators, and Pig Latin UDF.

Pig Latin - Data model

As discussed in the previous section, Pig's data model is completely nested. tion is the outermost structure of the Pig Latin data model. It is a package where:

  • A package is a collection of meta-groups.
  • A u-group is an ordered set of fields.
  • A field is a piece of data.

Pig Latin - Statement

When working with data using Pig Latin, the statement is the basic structure.

  • These statements use the relation, which includes expressions and schema.

  • Each statement is a sign (; The end.

  • We'll use the operators provided by Pig Latin to perform various operations through statements.

  • In addition to LOAD and STORE, when all other operations are performed, the Pig Latin statement takes a relationship as input and produces another relationship as output.

  • As soon as you enter the Load statement in the Grunt shell, a semantic check is performed. /b10> To view the contents of the pattern, you need to use the Dump operator. /b11> The MapReduce job that loads data into the file system is performed only after the dump operation is performed.

Example

Here's a Pig Latin statement that loads the data into Apache Pig.

grunt> Student_data = LOAD 'student_data.txt' USING PigStorage(',')as 
   ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );

Pig Latin - Data type

The table given below describes the Pig Latin data type.

Serial number The data type Description and example
1 Int

Represents a signed 32-bit integer.

Example: 8

2 long

Represents a signed 64-bit integer.

Example: 5L

3 float

Represents a 32-bit floating point with a symbol.

Example: 5.5F

4 double

Represents a 64-bit floating point.

Example: 10.5

5 chararray

Represents an array of characters (strings) in Unicode UTF-8 format.

Example: 'w3cschool'

6 Bytearray

Represents an array of bytes (blobs).

7 Boolean

Represents the Boolean value.

Example: true / false.

8 Datetime

Represents the date time.

Example: 1970-01-01T00:00:00.000 plus 00:00

9 Biginteger

Represents Java BigInteger.

Example: 60708090709

10 Bigdecimal

Represents Java Big Decimal

Example: 185.98376256272893883

Complex type
11 Tuple

A u-group is an ordered set of fields.

Example :(raja,30)

12 Bag

A package is a collection of meta-groups.

Example: (raju, 30), (Mohhammad, 45)

13 Map

The map is a set of key value pairs.

Example: 'name' s'Raju', 'age' #30'

Null value

The values of all the above data types can be NULL. /b10> Apache Pig handles empty values in a similar way to SQL. /b11> null can be an unknown or non-existent value, which is used as a placeholder for an optional value. /b13> These empty values can occur naturally or as a result of an operation.

Pig Latin - Arithmetic operator

The following table describes the arithmetic operator of Pig Latin. /b10> Suppose a is 10 and b is 20.

Operator Describe Example
+

Add - The values on both sides of the operator are added

A plus b will result in 30

Subtract - Subtract the number on the right from the number to the left of the operator

a-b will result in -10
*

Multiply - The values on both sides of the operator multiply

A-b will result in 200
/

Division - Divide the number on the left side of the operator by the number on the right

b / a will result in 2
%

Coefficient - Divide the number on the left by the number on the right side of the operator and return the remaining number

b%a will result in 0
:

Bincond - Evaluate the Boolean operator. /b10> It has three operasts, as shown below.

Variable x (expression)? value1 (if true): value2 (if false).

b =(a == 1)? 20:30;

If a is 1, the value of b is 20.

If a! The value of b is 30.

CASE

WHEN

THEN

ELSE

END

The Case-case operator is equivalent to a nested bincond operator.

CASE f2 % 2

WHEN 0

THEN

'even'


WHEN 1

THEN

'odd'

END

Pig Latin - Comparison operator

The following table describes the comparison operator for Pig Latin.

Operator Describe Example
==

Equal to - Check that the values of the two numbers are equal;

(a s b) is not true.
!=

Not equal to - Check that the values of the two numbers are equal. /b10> If the values are not equal, the condition is true.

(a! b) is true.
>

Greater than - Check if the value of the number on the left is greater than the value of the number on the right. /b10> If so, the condition changes to true.

(a) B) is not true.
<

Less than - Check that the value of the number on the left is less than the value of the number on the right. /b10> If so, the condition changes to true.

(a-lt;b) is true.
>=

Greater than or equal to - Check whether the value of the number on the left is greater than or equal to the value of the number on the right. /b10> If so, the condition changes to true.

It is not true.
<=

Less than or equal to - Check whether the value of the number on the left is less than or equal to the value of the number on the right. /b10> If so, the condition changes to true.

(a slt; sb) is true.
matches

Pattern Match - Check that the string on the left matches the constant on the right.

f1 matches '.* tutorial.*'

Pig Latin - Type Structure Operator

The following table describes the type structure operator for Pig Latin.

Operator Describe Example
()

Yuan group constructor operator - This operator is used to build a ed group.

(Raju,30)
{}

Package constructor operator - This operator is used to construct a package.

{(Raju,30),(Mohammad,45)}
[]

Map Constructor Operator - This operator is used to construct a map.

[name#Raja,age#30]

Pig Latin - Relationship operator

The following table describes the relationship operator for Pig Latin.

Operator Describe
Load and store
LOAD Load data from the file system (local/HDFS) into the relationship.
STORE Store data from the file system (local/HDFS) into the relationship.
Filter
FILTER Remove unwanted rows from the relationship.
DISTINCT Remove duplicate rows from the relationship.
FOREACH,GENERATE Data transformations are generated based on data columns.
STREAM Use external programs to transform relationships.
Grouping and connecting
JOIN Connect two or more relationships.
COGROUP Group data into two or more relationships.
GROUP Group data in a single relationship.
CROSS Create a vector product for two or more relationships.
Sort
ORDER Sort relationships based on one or more fields (ascending or descending).
LIMIT Get a limited number of groups from the relationship.
Merge and split
UNION Merge two or more relationships into a single relationship.
SPLIT Split a single relationship into two or more relationships.
Diagnostic operator
DUMP Print the contents of the relationship on the console.
DESCRIBE The pattern that describes the relationship.
EXPLAIN View logical, physical, or MapReduce execution plans to calculate relationships.
ILLUSTRATE View the step-by-step execution of a series of statements.