Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

impala GROUP BY clause


May 26, 2021 impala


Table of contents


The Impala GROUP BY clause is used in collaboration with the SELECT statement to arrange the same data into groups.

Grammar

The following is the syntax of the GROUP BY clause.

select data from table_name Group BY col_name;

Cases

Suppose we have my_db table called customers in the database database, which reads as follows -

[quickstart.cloudera:21000] > select * from customers; 
Query: select * from customers 
+----+----------+-----+-----------+--------+ 
| id | name     | age | address   | salary | 
+----+----------+-----+-----------+--------+ 
| 1  | Ramesh   | 32  | Ahmedabad | 20000  | 
| 2  | Khilan   | 25  | Delhi     | 15000  | 
| 3  | kaushik  | 23  | Kota      | 30000  | 
| 4  | Chaitali | 25  | Mumbai    | 35000  | 
| 5  | Hardik   | 27  | Bhopal    | 40000  | 
| 6  | Komal    | 22  | MP        | 32000  | 
+----+----------+-----+-----------+--------+ 
Fetched 6 row(s) in 0.51s

You can use group BY queries to get the total salary for each customer, as shown below.

[quickstart.cloudera:21000] > Select name, sum(salary) from customers Group BY name;

When executed, the above query gives the following output.

Query: select name, sum(salary) from customers Group BY name 
+----------+-------------+ 
| name     | sum(salary) | 
+----------+-------------+ 
| Ramesh   | 20000       | 
| Komal    | 32000       | 
| Hardik   | 40000       | 
| Khilan   | 15000       | 
| Chaitali | 35000       | 
| kaushik  | 30000       |
+----------+-------------+ 
Fetched 6 row(s) in 1.75s

Suppose this table has more than one record, as shown below.

+----+----------+-----+-----------+--------+ 
| id | name     | age | address   | salary | 
+----+----------+-----+-----------+--------+ 
| 1  | Ramesh   | 32  | Ahmedabad | 20000  |
| 2  | Ramesh   | 32  | Ahmedabad | 1000   | 
| 3  | Khilan   | 25  | Delhi     | 15000  | 
| 4  | kaushik  | 23  | Kota      | 30000  | 
| 5  | Chaitali | 25  | Mumbai    | 35000  |
| 6  | Chaitali | 25  | Mumbai    | 2000   |
| 7  | Hardik   | 27  | Bhopal    | 40000  | 
| 8  | Komal    | 22  | MP        | 32000  | 
+----+----------+-----+-----------+--------+

You can now use the Group By clause, as shown below, to consider duplicate record entries to get the employee's total salary.

Select name, sum(salary) from customers Group BY name;

When executed, the above query gives the following output.

Query: select name, sum(salary) from customers Group BY name 
+----------+-------------+ 
| name     | sum(salary) | 
+----------+-------------+ 
| Ramesh   | 21000       | 
| Komal    | 32000       | 
| Hardik   | 40000       | 
| Khilan   | 15000       | 
| Chaitali | 37000       | 
| kaushik  | 30000       | 
+----------+-------------+
Fetched 6 row(s) in 1.75s