Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Cloud Development Database Performance and Optimization


May 22, 2021 Mini Program Cloud Development Advanced



Although the cloud development database is high-performance, support elastic expansion, but many users in the process of use, pay more attention to the implementation of functions, while ignoring the database design, index creation and statement optimization on the performance of the impact, so will encounter a lot of problems affecting database performance, so here is a special summary of the cloud development database performance optimization considerations.

First, database performance and optimization recommendations

Here are some optimization recommendations that affect database performance, which, of course, should be handled in the light of specific business situations and cannot be generally discussed. In particular, some requests are relatively large, more frequent, such as small program home page data requests, database optimization should pay special attention to.

1, to use the index reasonably

Using indexes can improve the query, update, delete, and sort operations of documents, so you should create an index appropriately in the case of queries. T o avoid full-table scanning, you should first consider indexing the columns involved in where and order by. More index details are described in the section of Index.

2, good at combining query situation to create a combination index

For queries that contain multiple field (key) criteria, creating a composite index that contains those fields is a good solution. T he combined index follows the left-most prefix principle, so it is important to create the order in which it is created, and if you do not understand the composite index, you can determine whether the composite index is effective in combination with the hit of the index. Be good at using composite indexes to cover the most queries with the fewest indexes.

3, query to try to limit the data through conditions and limit

Where in a query can limit the number of documents processed, while in the aggregation operation match is placed before the group, reducing the number of documents to be processed by the group operation. Both normal and aggregate queries should use limit to limit the amount of data returned.

In fact, cloud development for ordinary queries db.collection ('dbName').get() by default has limit limit, in the small terminal limit of 20 (custom upper limit is also 20), in the cloud function limit of 100 (custom upper limit can be set to 1000), aggregation in the small terminal and cloud function side default is 20 (custom no upper limit, tens of thousands of can be, provided that the data taken out can not be greater than 16M), That is, the cloud development database has come with some performance optimization, we should not take these default restrictions as a constraint, but to break through these limitations at will.

4, recommended in the small program side to add and delete the database

In combination with the security rules of the database, the addition and deletion of the database can be carried out on the small terminal, which will be faster and save the resources of the cloud function.

The addition and deletion of the cloud development database can be carried out on the small terminal, can also be carried out on the cloud function side, then in the end should the database addition and deletion check in the small terminal or the cloud function side? I n general, it is recommended to put on the small terminal, which will only consume the number of database requests, without additional consumption of cloud function resource usage GBs, out-of-network traffic. Although the cloud function has higher permissions for database operation, the combination of small terminal security rules can also make the permissions of the database more granular, but also to meet most of the permission requirements.

5, as far as possible to limit the return of fields and other data

If the query does not need to return the entire document or is simply used to determine the existence of key values, normal queries can limit the returned fields through filed, aggregated queries can be made through project, reducing network traffic and client memory usage.

  1. {
  2. "title": "为什么要学习云开发",
  3. "content": "云开发是腾讯云为移动开发者提供的一站式后端云服务",
  4. "comments": [{
  5. "name": "李东bbsky",
  6. "created_on": "2020-03-21T10:01:22Z",
  7. "comment": "云开发是微信生态下的最推荐的后台技术解决方案"
  8. }, {
  9. "name": "小明",
  10. "created_on": "2020-03-21T11:01:22Z",
  11. "comment": "云开发学起来太简单啦"
  12. }]
  13. }

A cloud database is a related database, a record can nest a very large number of arrays and objects, if you take out the entire record of all nested content is too much performance traffic, such as the above nested array, sometimes the business does not need to display some fields in the commons, you can use field point notation to limit the returned fields.

  1. //不显示comments里的created_on
  2. .field({
  3. "comments.created_on":0
  4. })
  5. //只显示comments里的comment,comments里的其他字段都不显示
  6. .field({
  7. "comments.comment":1
  8. })

6, when the number of queries is large, it is recommended not to use the positive query

Regular expression queries cannot use indexes and take longer to execute than most selectors, so where the volume of business is relatively large, you can do not use regular queries without regular queries (try to replace regular queries in other ways possible), and even with regular queries, it is important to blur the range of matches as much as possible, such as using the start matcher or end matcher.

For example, someone is such a regular query, he wants to filter customer source data according to the province and city, but the customer source address address fill is "Shenzhen City, Guangdong Province" or "Guangdong Shenzhen", the provincial and municipal data is not standardized, so use regular fuzzy query, but if you need to often filter customer sources according to the address, then you should be in the database to clean the reorganization data to replace fuzzy query.

7, as far as possible to use update instructions

Modifying a document with an update instruction often results in better performance because the update instruction does not require a query to the record to update the document directly at the field level, especially if you do not need to update the entire document to update only some of the fields.

Or the record above, for example, we need to add comments to the article, that is, add values to the generals _.push to the array field for field-level operations, instead of taking out the entire record, and then adding the comments to the record using the concat or push method of the array, and then updating the entire record:

  1. .update({
  2. data:{
  3. comments:_.push([{
  4. "name": "小明",
  5. "created_on": "2020-03-21T11:01:22Z",
  6. "comment": "云开发学起来太简单啦"
  7. }])
  8. }
  9. })

A record in a cloud development database can be nested in many layers, so it can be large, and it's better to use update instructions for field-level micro-exercise than to use a record-level update like update directly.

8. Don't sort too much data

Don't take out too much data at once and sort the data, if you need to sort, try to limit the amount of data in the result set, for example, we can first use where, match and other operations to limit the amount of data, that is, usually put the orderBy on the last face of a normal query or aggregate query.

In particular, it is emphasized here that many people find that because of the database sorterBy and page-turning skip do not understand, even if the amount of data is only a hundred thousand, this is not the correct way to deal with, should be prohibited to do so. Sorting can be done by using the database's normal or aggregate queries, and the default limit data limit for cloud development does not affect the results of sorting, prohibiting the stupidity of traversing all the data and sorting it again.

Of course, if your business needs to sort multiple fields of the same data frequently, such as items that are often sorted by the latest shelf, price, origin, discount strength, etc., it is recommended to take the data out at once, store it in the cache, and sort it using an array of JavaScript instead of querying it with a database.

9, as little as possible in the business volume of the place with the following query instructions

Some query instructions in a query can cause poor exists nin array, indicating that any number of or not these query instructions as little as possible where business usage is high.

What is said here as little as possible does not mean not to use, but can be replaced in the most direct way, so that database queries as simple as possible rather than too complex, as little as possible to let query instructions do these complex things.

10, the number of Chinese file can be regularly archived

The amount Chinese data in the collection file can affect query performance, and unused or expired data can be archived and deleted periodically. For example, we can also use timed triggers to periodically back up and delete data in the database.

11. Don't let database requests do redundant things and keep as few officers as possible

Can use JavaScript alternative calculations, arrays, object operations, etc. , as far as possible with JavaScript processing;

12, in the database design can use embedded documents to replace lookup

Cloud development database is a non-relationship database, you can often use lookup cross-table query situation to do anti-paradigm of inline document design, in this way to replace the linked table query lookup can improve a lot of performance.

Reduce the use of linked table query lookup way to pay attention to two points, one is through the embedded document is to reduce the relationship between the table and table of the relationship database, such as to link the table out of the blog's latest 10 articles and the corresponding comments in the article, which in the relationship database was originally required to link table query, but when the comments embedded in the collection of articles, there is no need to link tables; Union tables can be replaced by multiple queries.

13. Short field names are recommended

Different from a relationship database, a cloud development database is a document database, and every document in the collection needs to store a field name, so the length of the field name requires more storage space than a relationship database.

  1. "comments": [{
  2. "name": "李东bbsky",
  3. "created_on": "2020-03-21T10:01:22Z",
  4. "comment": "云开发是微信生态下的最推荐的后台技术解决方案"
  5. }, {
  6. "name": "小明",
  7. "created_on": "2020-03-21T11:01:22Z",
  8. "comment": "云开发学起来太简单啦"
  9. }]

Here the field name name, created_on, comment how many records, how many nested objects will be written how many times, sometimes longer than the value of the field, is more space.

Second, database design and processing optimization recommendations

1. Add redundant fields

Some critical data in the business can be obtained through indirect queries, but due to computational, cross-table, and other issues, it is recommended to add some redundant fields at this time.

For example, we want to count the number of comments below the article, perhaps you will be the article's comments independently built a collection such as generals, this time to get the number of comments per article can be based on the article's id conditions to count how many comments the article has. A lternatively, you can nest an array of comments for each article as a sub-document into the generals field of each article record, at which point you can calculate the number of comments for that article by the length of the array. Similar to the number of comments there are like likes, favorites, and so on, although these can be obtained indirectly by counting or arrays, but in the case of a large number of comments, count and arrays of lyngth is very performance-consuming, and count also need to occupy a request independently.

In this case, it is recommended that when designing a database, the so-called redundant field be used to record the likes, comments, favorites of each article, and update the value of the field directly on the small program side in the form of inc atomic self-growth.

  1. {
  2. "title": "为什么要学习云开发",
  3. "content": "云开发是腾讯云为移动开发者提供的一站式后端云服务",
  4. "commentNum":2, //新增一个评论数的字段
  5. "comments": [{
  6. "name": "李东bbsky",
  7. "created_on": "2020-03-21T10:01:22Z",
  8. "comment": "云开发是微信生态下的最推荐的后台技术解决方案"
  9. }, {
  10. "name": "小明",
  11. "created_on": "2020-03-21T11:01:22Z",
  12. "comment": "云开发学起来太简单啦"
  13. }]
  14. }

For example, we want to display a list of articles on the front page of a blog, and each article will show the total number of comments. Although we can get the number of comments through the array length of the generals and, if there are secondary three-level comments (especially in this case), we can get the number of comments by array method, but it is better to commentNum directly.

2, false deletion

Sometimes our business requires users to frequently delete records in a database or arrays in records, but deleting data is a very performance-intensive thing, and during peak business periods, the database experiences performance problems. At this time, it is recommended to add redundant fields to do false deletion, such as adding a delete field to the record, the default value is false, when the deletion is performed, the value of the field can be set true, query only display delete for false records, so that the data on the front end does not show, do a false deletion, in the business trough, such as the early morning can be combined with the time trigger this time of day to clean up.

3, try not to put database requests into the loop body

We often have query the data in the database, and the data processing and then write back to the database requirements, if the query to the data has a lot of data, we will need to cycle processing, but this time we must note, do not put the database request into the loop body, but first query multiple data, in the loop body after processing the data once written back to the database.

Of course, some interfaces of small programs can not carry out array operations, can only be performed one by one, such as sending subscription messages, uploading files and other operations, this avoidable exception. But some can avoid this problem by designing databases, such as databases that often add large numbers of records to an array of data that only requires new inline documents.

4, try to use a database request instead of multiple database requests

In the design of the database and on the code of the database request, as far as possible, one database request instead of multiple database requests, especially the most frequently accessed home page, if a page of the database request too many, will cause the database's problems. Some data can be cached to the small terminal to the small program level without overemphasized data consistency.

5, plan the document to create empty fields in due course

We have a collection that will eventually be used to store the user's personal information, such as when we click on the login to get the user's nickname and avatar, so the general logic is that we create a record in the database, as follows:

  1. _id:"",
  2. userInfo:{
  3. "name":"李东bbsky",
  4. "avatarUrl":"头像地址"
  5. }

But a better way is that we should create a complete record (designed by the final field), and build these empty fields consistently, even if there is no data yet, so that we can fill the data directly with the update later.

  1. _id:"",
  2. userInfo:{
  3. "name":"李东bbsky",
  4. "avatarUrl":"头像地址",
  5. "email":"",
  6. "address":""
  7. ...
  8. },
  9. stars:[],//存储点赞的文章
  10. collect:[] //存储收藏的文章

Third, slow inquiries and alarms

At present, we can not directly view the time spent by the database request, but there are some other data as evidence, in the cloud function side of the database request, if the execution time of the cloud function more than 100ms or more, can basically be determined as slow query, the database needs to be optimized. At this point, slow queries affect not only the performance of the database, but also the performance of cloud functions.

We know that cloud functions and cloud databases are very dependent on their time-consuming, and if database queries slow down, query times can increase from tens of milliseconds to hundreds of milliseconds at a time, or even in seconds, which is resource-intensive and impact-consuming:

  • Cloud function resource usage GBs: resource usage s function configuration memory X run billing time, if there are database requests in the cloud function that consume runtime, cloud function resource usage will also increase;

  • The QPS of the database , the number of simultaneous connections to the database , the execution time of the 1000ms/database request, and if the execution time of the database request increases significantly, the QPS will multiply, affecting the co-production of the database very Connection num overrun

We can set up -alert settings in the cloud development console to set up runtimes for specified cloud functions, especially the most frequently called cloud functions for business, and alerts for errors in cloud functions to keep you informed of the health of the cloud development environment.