May 31, 2021 Article blog
This article is from the public number: Code Farmer Knowledge Point, author Monica2333
The ES7.x version of x-pack comes with
ElasticSearch SQL
which we can
use
directly through SQL REST APIs, SQL CLI, and more.
Enter in
Kibana Console
POST /_sql?format=txt
{
"query": "SELECT * FROM library ORDER BY page_count DESC LIMIT 5"
}
Just replace the SQL above with your own SQL statement. The return format is as follows:
author | name | page_count | release_date
-----------------+--------------------+---------------+------------------------
Peter F. Hamilton|Pandora's Star |768 |2004-03-02T00:00:00.000Z
Vernor Vinge |A Fire Upon the Deep|613 |1992-06-01T00:00:00.000Z
Frank Herbert |Dune |604 |1965-06-01T00:00:00.000Z
elasticsearch-sql-cli
is a script file for the bin directory when ES is installed and can also be downloaded separately.
We run in the ES directory
./bin/elasticsearch-sql-cli https://some.server:9200
Enter SQL to query
sql> SELECT * FROM library WHERE page_count > 500 ORDER BY page_count DESC;
author | name | page_count | release_date
-----------------+--------------------+---------------+---------------
Peter F. Hamilton|Pandora's Star |768 |1078185600000
Vernor Vinge |A Fire Upon the Deep|613 |707356800000
Frank Herbert |Dune |604 |-144720000000
Enter in
Kibana
POST /_sql/translate
{
"query": "SELECT * FROM library ORDER BY page_count DESC",
"fetch_size": 10
}
To get the converted DSL query:
{
"size": 10,
"docvalue_fields": [
{
"field": "release_date",
"format": "epoch_millis"
}
],
"_source": {
"includes": [
"author",
"name",
"page_count"
],
"excludes": []
},
"sort": [
{
"page_count": {
"order": "desc",
"missing": "_first",
"unmapped_type": "short"
}
}
]
}
Because query-related statements have been generated, we can happily use DSLs with or without modification on this basis.
Let's go into more detail about the SQL statements supported by ES SQL and how to avoid misuse.
First you need to understand the correspondence between SQL terms and ES terms in the SQL statements supported by ES SQL:
Most of ES SQL's syntax support follows ANSI SQL standards, with supported
SQL
statements such as DML queries and partial DDL queries.
DDL queries such
DESCRIBE table
SHOW COLUMNS IN table
slightly chicken ribs, we mainly look at
SELECT,Function
DML query support.
The syntax structure is as follows:
SELECT [TOP [ count ] ] select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
Indicates that row data is obtained from 0-N tables. SQL is executed in the following order:
FROM
to determine the table name.
WHERE
condition, filter out all non-compliant rows.
GROUP BY
condition, the aggregation is grouped, and if there is a
HAVING
condition, the results of the aggregation are filtered.
select_expr
to determine the specific data returned.
ORDER BY
condition, the returned data is sorted.
LIMIT
or
TOP
condition, a subset of the results of the previous step are returned.
Unlike common SQL, ES SQL supports
TOP [ count ]
andPIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) )
c lause.TOP [ count ]
: IfSELECT TOP 2 first_name FROM emp
indicates that up to two pieces of data are returned and cannot be shared withLIMIT
condition. T hePIVOT
clause rows the results of its aggregation criteria for further operation. I haven't used this, I don't introduce it.
Based on the
above SQL
we are actually able to have filtering, aggregation, sorting, paging function of
SQL
too. B
ut we need to learn more about function support in ES SQL to write a rich SQL with full-text search, aggregation, and
grouping
capabilities.
Show
SHOW FUNCTIONS
allows you to list the supported function names and types.
SHOW FUNCTIONS;
name | type
-----------------+---------------
AVG |AGGREGATE
COUNT |AGGREGATE
FIRST |AGGREGATE
FIRST_VALUE |AGGREGATE
LAST |AGGREGATE
LAST_VALUE |AGGREGATE
MAX |AGGREGATE
MIN |AGGREGATE
SUM |AGGREGATE
........
We mainly look at aggregation, grouping, full-text search related common functions.
Full-text matching function
MATCH
Equivalent to
match and multi_match
query in the DSL.
MATCH(
field_exp, --字段名称
constant_exp, --字段的匹配值
[, options]) --可选项
Examples of use:
SELECT author, name FROM library WHERE MATCH(author, 'frank');
author | name
---------------+-------------------
Frank Herbert |Dune
Frank Herbert |Dune Messiah
SELECT author, name, SCORE() FROM library WHERE MATCH('author^2,name^5', 'frank dune');
author | name | SCORE()
---------------+-------------------+---------------
Frank Herbert |Dune |11.443176
Frank Herbert |Dune Messiah |9.446629
QUERY
Equivalent to query_string query in
query_string
DSL.
QUERY(
constant_exp --匹配值表达式
[, options]) --可选项
Examples of use:
SELECT author, name, page_count, SCORE() FROM library WHERE QUERY('_exists_:"author" AND page_count:>200 AND (name:/star.*/ OR name:duna~)');
author | name | page_count | SCORE()
------------------+-------------------+---------------+---------------
Frank Herbert |Dune |604 |3.7164764
Frank Herbert |Dune Messiah |331 |3.4169943
SCORE()
Returns the relevance of the input data and the returned
relevance
Examples of use:
SELECT SCORE(), * FROM library WHERE MATCH(name, 'dune') ORDER BY SCORE() DESC;
SCORE() | author | name | page_count | release_date
---------------+---------------+-------------------+---------------+--------------------
2.2886353 |Frank Herbert |Dune |604 |1965-06-01T00:00:00Z
1.8893257 |Frank Herbert |Dune Messiah |331 |1969-10-15T00:00:00Z
The aggregation function
AVG(numeric_field)
Calculates the average of fields of a numeric type.
SELECT AVG(salary) AS avg FROM emp;
COUNT(expression)
the total number of input data, including data with a value of null field_name when COUNT().
COUNT(ALL field_name)
Returns the total number of input data, excluding data with field_name corresponding value of null.
COUNT(DISTINCT field_name)
Returns the total number of values field_name in the input data that are not null.
SUM(field_name)
Returns the sum of the corresponding values field_name numeric fields in the input data.
MIN(field_name)
Returns the minimum value of the corresponding value for the numeric field field_name in the input data.
MAX(field_name)
Returns the maximum value of the corresponding value for the numeric field field_name in the input data.
Group functions
The grouping function here is the
bucket
grouping in the corresponding DSL.
HISTOGRAM
The syntax is as follows:
HISTOGRAM(
numeric_exp, --数字表达式,通常是一个field_name
numeric_interval --数字的区间值
)
HISTOGRAM(
date_exp, --date/time表达式,通常是一个field_name
date_time_interval --date/time的区间值
)
The following returns data on births in the early hours of January 1st each year:
ELECT HISTOGRAM(birth_date, INTERVAL 1 YEAR) AS h, COUNT(*) AS c FROM emp GROUP BY h;
h | c
------------------------+---------------
null |10
1952-01-01T00:00:00.000Z|8
1953-01-01T00:00:00.000Z|11
1954-01-01T00:00:00.000Z|8
1955-01-01T00:00:00.000Z|4
1956-01-01T00:00:00.000Z|5
1957-01-01T00:00:00.000Z|4
1958-01-01T00:00:00.000Z|7
1959-01-01T00:00:00.000Z|9
1960-01-01T00:00:00.000Z|8
1961-01-01T00:00:00.000Z|8
1962-01-01T00:00:00.000Z|6
1963-01-01T00:00:00.000Z|7
1964-01-01T00:00:00.000Z|4
1965-01-01T00:00:00.000Z|1
Because
ES SQL
and
ES DSL
do not exactly match functionally, the SQL limitations mentioned in the official documentation are:
Large queries may throw ParsingException
During the parsing phase, a large query consumes too much memory, in which case
Elasticsearch SQL
engine aborts parsing and throws an error.
The representation of the nested type field
Fields of the nested type are not
nested
in SQL and can only be used
[nested_field_name].[sub_field_name]
This form refers to inline subfields. Examples of use:
SELECT dep.dep_name.keyword FROM test_emp GROUP BY languages;
The nested type fields cannot be used on the Scalar functions ofwhere and order by
SQL such as the following is all errors
SELECT * FROM test_emp WHERE LENGTH(dep.dep_name.keyword) > 5;
SELECT * FROM test_emp ORDER BY YEAR(dep.start_date);
Simultaneous queries for multiple nested fields are not supported
Such as nested fields
nested_A
and
nested_B
cannot be used at the same time.
Nested inner field pagination limit
When a paginated query has a
nested
field, the pagination results may be incorrect.
This is because a paginated query in ES occurs on
Root nested document
not on its inner field.
The field of the keyword type does not support normalizer
Fields of array types are not supported
This is because there is only one value for a
field
in SQL, in which case we can convert the API of SQL To DSL described above into a DSL statement, and it would be nice to query with DSL.
The limit of aggregate sorting
Limit
clause, such as:
SELECT * FROM test GROUP BY age ORDER BY COUNT(*) LIMIT 100;
Here's an example of an error:
SELECT age, ROUND(AVG(salary)) AS avg FROM test GROUP BY age ORDER BY avg;
SELECT age, MAX(salary) - MIN(salary) AS diff FROM test GROUP BY age ORDER BY diff;
The limit of the subquery
Subquery contains
GROUP BY or HAVING
or than
SELECT X FROM (SELECT ...) WHERE [simple_condition]
This structure is complex and can be unsuccessful.
The time data type fields do not support GROUP BY conditions and HISTOGRAM functions
The following query is wrong:
SELECT count(*) FROM test GROUP BY CAST(date_created AS TIME);
SELECT HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES) as h, COUNT(*) FROM t GROUP BY h
However, wrapping a field of the TIME type as a
Scalar
function return is supported by GROUP BY, such as:
SELECT count(*) FROM test GROUP BY MINUTE((CAST(date_created AS TIME));
The limit of the return field
If a field is not stored in source, it cannot be queried.
keyword, date, scaled_float, geo_point, geo_shape
These types of fields are not subject to this restriction because they are not returned from
_source
but from
docvalue_fields
The above is
W3Cschool编程狮
about
query ElasticSearch: using SQL instead of DSL
related to the introduction, I hope to help you.