Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

MySQL peddle query method and optimization


May 16, 2021 MySQL



When the data volume of the database is very large, the results of one-time query will become very slow, in order to improve the efficiency of query, we can use MySQL's peddle query function. T his article brings you MySQL peddle query method and optimization.


Recommended reading:

21-minute MySQL Getting Started tutorial

MySQL full tutorial


P-page query method:

In MySQL, peddle queries are generally implemented using the limit clause, which declares the following:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset

The LIMIT clause can be used to specify the number of records returned by the SELECT statement. Here are a few points to note:

1, the first parameter specifies the offset of the first return record line

2, the second parameter specifies the maximum number of rows returned to the record

3, if given only one parameter: it represents the maximum number of record rows returned

4, the second parameter is -1, which means retrieving all record rows from an offset to the end of the record set

5. The offset of the initial record line is 0 (not 1)


Here's an example of an app:

select * from orders_history where type=8 limit 1000,10;

The statement will look up 10 orders_history data, i.e. articles 1001 to 1010, after the data in Table orders_history.

Records in the data table are sorted by default using the primary key (generally id), and the result above is equivalent to:

select * from orders_history where type=8 order by id limit 10000,10;

The three query times are:

3040 ms

3063 ms

3018 ms

For this query, the following tests the impact of query records on time:

select * from orders_history where type=8 limit 10000,1;
select * from orders_history where type=8 limit 10000,10;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 10000,1000;
select * from orders_history where type=8 limit 10000,10000;

The three queries are as follows:

Query 1 record: 3072ms 3092ms 3002ms

Query 10 records: 3081ms 3077ms 3032ms

Query 100 records: 3118ms 3200ms 3128ms

Enquiries for 1000 records: 3412ms 3468ms 3394ms

Enquiries for 10000 records: 3749ms 3802ms 3696ms

In addition, I also made more than a dozen queries, from the query time point of view, it can be basically determined that when the query record volume is less than 100, the query time is basically no gap, as the query record volume is getting larger and larger, the time spent will be more and more.


Testing for query offsets:

select * from orders_history where type=8 limit 100,100;
select * from orders_history where type=8 limit 1000,100;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 100000,100;
select * from orders_history where type=8 limit 1000000,100;

The three queries are as follows:

Query 100 offset: 25ms 24ms 24ms

Query 1000 offset: 78ms 76ms 77ms

Query 10000 offset: 3092ms 3212ms 3128ms

Query 100000 offset: 3878ms 3812ms 3798ms

Query 1000000 offset: 14608ms 14062ms 14700ms

As the query offset increases, especially after the query offset is greater than 100,000, the query time increases dramatically.

This plyted query starts scanning from the first record in the database, so the slower the query, the slower the query, and the more data the query has, the slower the total query speed.


Use sub-query optimization

This method, which first locates the id of the offset position and then queries later, applies to the case of id increment.

select * from orders_history where type=8 limit 100000,1;

select id from orders_history where type=8 limit 100000,1;

select * from orders_history where type=8 and 
id>=(select id from orders_history where type=8 limit 100000,1) 
limit 100;

select * from orders_history where type=8 limit 100000,100;

The query time for the four statements is as follows:

Article 1: 3674ms

Article 2: 1315ms

Article 3: 1327ms

Article 4: 3710ms

For the above query you need to be aware of:

1, compare the 1st statement and 2nd statement: use select id instead of select speed increased by 3 times

2, compare the 2nd statement and 3rd statement: the speed difference of tens of milliseconds

3, compare the 3rd statement and 4th statement: Thanks to the speed increase of select id, the query speed of the 3rd statement increased threefold

This approach will be several times faster than the original general query method.


Use id to qualify optimization

Assuming that the id of the data table is continuously incremented, we can calculate the scope of the query's id based on the number of pages in the query and the number of records in the query, and we can use id between and to query:

select * from orders_history where type=2 
and id between 1000000 and 1000100 limit 100;

Query time: 15ms 12ms 9ms

This query method can greatly optimize the query speed, basically can be completed in tens of milliseconds. The limitation is that you can only use a clear id, but generally when you set up a table, you add a basic id field, which facilitates ply-page queries.

There is another way to write:

select * from orders_history where id >= 1000001 limit 100;

Of course, you can also use in to query, which is often used when multiple tables are associated, using the id collection of other table queries to query:

select * from orders_history where id in
(select order_id from trade_2 where goods = 'pen')
limit 100;

Note how this in query is made: some mysql versions do not support the use of limit in the in clause.


The id description of the data table

In general, when a table is established in a database, each table forces an id increment field to be added, which makes it easier for us to query the data.

If the amount of data is large, such as orders, it is generally recommended to sub-list. This is not recommended for id as a unique identity at this time, but should be generated using a distributed high-incidence unique id generator and stored in a separate field in the data table.

The range query is used first to locate the id (or index), and then the index is used to locate the data, i.e. select id first, and then select s;


Original address: http://uusama.com/458.html