Cloud Development Index

The index of the database is similar to the index/catalog of a book, with the index does not need to turn the whole book, the database can be found directly in the index, after the index found entries, you can jump directly to the location of the target document, which can improve the search speed by several orders of magnitude. Q ueries that do not use indexes are called full-table scans, which means that the server must find an entire book to find the results of the query, and for large collections, you should try to avoid full-table scanning, otherwise it is very inefficient. I ndexing is an important means to ensure database performance and program experience. We should index all the fields that need to be query criteria.

First, the type and management of the index

We can index the fields of each collection from the Cloud Development Console database tab, set the properties of the index to unique or non-unique, sort in ascending or descending order, and see the space and hits that the index occupies. An index is a file that takes up physical space, so we can be careful not to over-index wasted space, and the number of hits can also be used to determine whether the index is valid.

1, _id index and _openid index

The cloud development database provides a default index and _id index for _openid collection. _ id index hits when we make a db.collection ('collection name') .doc ('_id value'). And when we add _openid as a query condition in where, we hit the _openid index, in the small terminal db.collection ('collection name') query, because the default condition _openid is the user's openid, so in the small terminal query will hit the _openid index.

2, single-field index

Single-field indexes are the most common indexes and are not created automatically. F or fields that need to be filtered as query criteria, we can create a single-field index. I f you need to index nested fields, you can connect the names of nested fields with points through Point Notation. F or example, when we need to index the color field in a record in the following format, we can use style.color. When you set up a single-field index, it is available to specify that the sort is ascending or descending.

{
  "_id": '',
  "style": {
    "color": ''
  }
}

3, combined index

A composite index is an index that contains multiple fields, and when a composite index is added, you should pay attention to the order of the fields, and the effect of different indexes will be different. Hits the index when the field used by the query criteria is included in all fields or prefix fields defined by the index.

The combined index follows the principle of the left-most prefix, such as the combined index (A, B, C) defined in the three fields A, B, C, then the query criteria A, A, B, (A, C, C) index will be valid, the query condition B, C, (B, C) will not hit the index. B ased on the left prefix principle, we can see that the combined index (A, B) and swap field order (B, A) have different effects. W hen the combined index is defined as (A, B), the index is sorted by A field and then by B field. So when the combined index is set to (A, B), the query for field A can hit the (A, B) index, even if we don't index field A individually.

When defining an index, the way fields are sorted also determines whether the sort query is valid, such as we set the following indexes for fields A and B: (A: ascending, B: descending), then when we query A, B, we can specify that the sort result is A ascending, B descending, and the exact opposite sort A descending, B ascending is valid, while A ascending B liters and A descending b descending do not hit the index.

There are also some query conditions, need to query or sort the range, then the range query and sorted fields should be put back as far back as possible, because the range query after the field index can not hit. The benefits of combining indexes are already mentioned above, and if the database has an a index, and now the b column also needs an index, then it is necessary to establish (a,b) directly

4, the uniqueness of the index

When you create an index, you can specify an increase in uniqueness limits, and an index with uniqueness limits requires that the indexed collection cannot have two records with the same value of the indexed field. It is important to note that if a field does not exist in the record, its value defaults to null for the index field, and if the index has a uniqueness limit, two or more records with the field are not allowed to be empty / the field does not exist.

db.collection("china")
  .where({
    gdp: _.gt(3000),
    city_area:_.lt(10000),
    reg_pop:_.gt(6000)
  })
  .field({
    _id:false,
    city: true,
    city_area: true,
    gdp:true
  })
  .orderBy('gdp', 'desc')
  .orderBy('city_area', 'asc')

Because there are three query conditions that allow you to index three query criteria in order, because these values cannot be non-unique and there is a possibility of empty values (some cities do not have data), the creation is not unique.

Second, the description of the creation of the index

1, the advantages and disadvantages of indexing

Indexes can be very efficient at increasing query speed, while at the same time slowing down the updating of tables. I n fact, the index is also a table that holds the primary key and index fields and points to the records of the entity table, so the index column is also space-consuming. T he index needs to be looked up twice, once for index entries, once for the corresponding document based on the index pointer, and only once for the full table query. Large collections, large documents, selective queries are more suitable for indexing.

In fact, the principle of indexing is to minimize disk I/O operations, not in disk sorting, but in memory in order, through the sorting rules to specify disk read on the row, also do not need to read randomly on disk.

The more indexes, the better, and having a large number of indexes in a table not only takes up disk space, but also affects the performance of statements such as additions and deletions, because indexes are adjusted and updated as the data in the table changes. Avoid designing too many indexes for frequently updated tables, and have as few columns as possible in the index, while indexing fields that are frequently used for queries should be created, but avoid adding unnecessary fields.

To reduce the number of indexes, you can create a composite index, which is the use of multiple columns to build an index together. I ndexing takes precedence over expanding to a composite index on an index that already exists, or continuing to add fields on a composite index that already exists. Because the more indexes there are, the higher the maintenance costs and the negative effects of slower insertion.

2, index creation conditions

Where you need to create an index:

Fields that frequently serve as query criteria should generally be indexed;

When the query has more than one condition, the combined index is more cost-effective than the single-field index.

Sorted fields in a query, if accessed through an index, will greatly improve the sorting speed, paying attention to the single-field index and combined index collation

Statistical or grouped fields in queries should also be indexed

What you don't need to do to create an index:

Too few records in the set or too few fields in the records;

A collection or field that is frequently added or deleted; indexes increase query speed, but also slow down the updating of tables because when you update a table, you save not only the data, but also the index file, because if the collection writes more and reads less, adding an index can affect write performance

Don't create an index for fields that are not used in the where condition

Fields with duplicate data and uniform publications are not suitable for indexing (fields that are too unique are not suitable for indexing), such as gender, true or false values

Columns that participate in column calculations are not suitable for indexing

3, index space and hit

Each collection in the cloud development console has its own index management, where, in addition to creating an index, you can understand the space each index occupies and determine whether the index hits the query. Each index is recommended to create no more than 5 indexes, which occupy space

Third, the principles and precautions of indexing

1, it is best to use a unique index

Specifies a unique index when uniqueness is a characteristic of a data itself. Using a unique index requires data integrity that ensures the defined columns to speed up queries

2, and simple fields are indexed

The normal index of the Innodb table holds the key value of the primary key, so the primary key to choose as short a data type as possible, can effectively reduce the disk footprint of the index, improve the cache effect of the index. T oo long an index first takes up a lot of disk space, and second, too long an index can make the index bloated, slowing down index queries. Q uerying a book's specified chapter through a catalog is fast because the index is lightweight enough that the advantage is not obvious if the index is too long. And the data in the index and the data in the table are redundant in themselves, and if the index is too long, the more disk space is wasted.

3, with a higher degree of differentiation of the column index

Fields with multiple duplicate values have the worst index results. F or example, the field where the ID card is stored because the values are different, it is easy to distinguish, the index effect is better, and used to record the gender of the field, because only "male" and "female", no matter which value is searched for, will get about half of the value, such an index on the performance of the improvement is not high. I f several columns are unique, select the column that is most commonly the access condition as the primary key of the index. C olumns that simply enumeration values do not index. I ndex on columns with more different values that are often used in conditional expressions, and do not index on columns with fewer different values, such as gender fields where there are only men and women. I f indexing does not improve query efficiency, it can seriously slow down updates. Since we index and want the index to achieve maximum performance, this time we should fully consider whether the column is suitable for indexing, according to the column's degree of differentiation can be judged, the degree of differentiation is too low, can not consider indexing, the higher the degree of differentiation and the higher the efficiency.

4, the value of the indexed field is best not to have an empty value

It is best not to have an empty value for an indexed field, and to select non-uniqueness when indexing a field with an empty value. Unique indexes do not allow available values.

5, the index of the field is best not to participate in the calculation

Index columns cannot participate in calculations and remain "clean"; Index columns participate in calculations;

6, query as far as possible are walking on the index

Ensure that the fields contained in the index are independent of the query statement, not in the expression, and that the query is more efficient when the columns of the query are in the indexed fields, so you should try to avoid using select , which fields are needed, and only check which fields. Index where the index base is high (e.g. mailbox, user name, not gender)

7. Avoid duplicate and redundant indexes

Repeated indexes: Multiple indexes are established in the same column or several columns in the same order, become duplicate indexes, make no sense, and delete redundant indexes: two or more indexes override columns that overlap, such as columns m, n, index index m(m), indexmn (m, n), called redundant indexes.

8, the use of indexes to obtain ordered data

The index itself is ordered, using an ordered index to obtain ordered data. U sing indexes to optimize or avoid indexing sorted fields can increase speed. Index columns that are frequently sorted or grouped (i.e., group by or by operations), and if you have more than one column to sort, you can index the combinations on those columns

9. The combined index follows the principle of the far left prefix

To establish a composite index, we should consider both the frequency of column queries and the degree of column differentiation, the high degree of differentiation takes precedence. F or example, a user table of the global population with fields such as gender, nationality, age, etc. I n general, nationality is more differentiated than gender, for example, Chinese less than men. Nationality therefore gave priority to gender when building composite indexes.

10, index fragmentation and maintenance

During long-term changes to the data sheet, both the index file and the data file create holes and fragmentation. The process of fixing a table is resource-intensive and can be fixed with a relatively long period of time.

11. Pay attention to the uniqueness and non-uniqueness of the index

The unique index must be used with care, it has a unique constraint, due to the previous demand is unknown, etc. , may cause us to misjudge the unique column.

12, it is recommended not to use randomly generated _id as the primary key

It is best to use the self-added ID field as the primary key for indexing instead of the random _id as the primary key, as the non-incremental primary key causes frequent page splits, which reduces the efficiency of insertion. S o in general, we'll use a self-added ID field in the table instead of _id (which uses the atomic update instruction inc for self-addition) and use that field as the primary key of the table. I f you need _id query, the lookup needs to go back to the table, and the lookup is a little less efficient. If only one unique index _id is needed in the table, then the _id can be used as the primary key;