Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Cloud Development Database Design


May 22, 2021 Mini Program Cloud Development Advanced



When we're building a project (application), the first thing we usually do is design the database. Different from a relationship database that stores data in fixed tables that consist of rows and columns, a cloud-developed database uses structured documents to store data, which no longer has to be there and has only one value at every row intersection in a relationship database, which can be an array, an object, or more complex nesting.

First, the design of the database

1, the design of the database needs to think ahead about what problems

Before you implement a cloud-developed database, you need to understand the nature of the data stored, how it is stored, and how it will be accessed, which requires you to make a decision in advance to get the best performance by organizing data and page data interactions. Specifically, you need to think ahead about the following questions:

  • What are the basic objects, time, address, price, rich text, pictures, commodity attributes, and so on that actual information needs to be used in the database?

  • Are the relationships between different object types one-to-one, one-to-many, or many-to-many? How do complex relationships such as product classification, detail pages, comment pages, shopping carts, membership information, user information, shipping addresses, and so on be linked?

  • How often do I add new objects to a cloud database? H ow often are records added to the collection? H ow often do I add the value of a field to an embedded document? H ow often do you modify records and the values of fields in records? How often do I delete a record or field from a database?

  • How often do I query a database based on criteria? I s it a query record list, or is it the value of a field in the record? How would you intend to query a record or the value of a record, through an ID, field, condition, or other means?

  • Which collection you create is most important? W hich one will be placed on the home page? W hich collection user will have a large amount of access to the complexity? How should collections with large amounts of traffic be designed to improve performance?

  • Which operations require high data consistency and require atomic or transactional operations? (Later atomic operations and transactions will be introduced)

  • Which collection or which collection of records will grow faster and have a large amount of data?

  • Which collection or which collection of records will be greatly adjusted as the business grows?

2, behind the function is also the design of the database

Behind the complex business functions of the application are simple data, when designing the database to know clearly which functions will perform what kind of data operations, collections and collections, collections and fields have a relationship.

  • For example, a news app will have a list of articles and article details page, which are two features, the article list emphasizes the query is eligible records, and the article detail page is a single recorded field, what is the difference between the two?

  • For example, in addition to personal information, users also have identity readers and authors, readers and authors of identity is how to reflect? W hat about the role of administrator, editor, etc.? What are the differences between different roles in processing data?

  • For example, the user's likes, favorites, comments, etc. these should be put into the user's collection, or should be put into the article collection, or a separate collection to store this data? What is the basis for choosing this approach?

  • How does the front end reflect the data in the database through form addition and deletion? How do browsing pages, pull-ups, searches, carncasts, menus, etc. are reflected in the database?

  • How do APIs such as file upload, image download, map data acquisition, server time, etc. combine with the database?

Second, anti-paradigm and paradigm design

Paradigmization is the dispersion of data into different collections like a relationship database, where data can be referenced to each other by a unique ID. However, reference to this data often requires multiple queries or a linked table query using lookup.

Denormalization, on the other hand, embeds the data required by the document into the inside of the document, and if you want to update the data, you may have to find out the entire document, modify it, and then store it in the database, and if there is no update instruction, the field-level update can be made, the performance of the new fields for large documents will be poor. On the other hand, because the collection is more dispersed, it is smaller, and only a relatively small document can be updated when the data is updated.

Data can be embedded (anti-paradigm) or reference (paradigm), the two strategies do not have advantages and disadvantages, but also have their own advantages and disadvantages, the key is to choose the right scenario for their own application. A completely anti-paradigm design that embeds all the data needed for a document into a document can significantly reduce the number of document queries. A paradigmized design is a better choice if data is updated more frequently, and if data queries are more frequent and do not need to be updated much, there is no need to spread the data across different collections at the expense of query efficiency. For complex applications such as blog systems, market mall systems, using only one collection (completely anti-paradigm design) will lead to too large collections, more redundant data, poor data write performance and other issues, this time need to carry out a certain paradigm design, that is, with more collections, rather than larger collections.

More suitable for inline Better for references Description
Inline documents will eventually be smaller Inline documents will eventually be large The upper limit of a record is 16M, the business will continue to grow data is not suitable for embedding, for example, a blog article will continue to grow can not be embedded in the record, blog comments will grow, but the growth of limited can be embedded
The record will not change Records often change When a new record is created, you can use inline if the business only needs to update the fields in the record or the fields in the nest, rather than the entire record
The final data is consistent The data for the intermediate stage must be consistent Inline affects data consistency, but most businesses don't need strong consistency, such as embedding user comments in a collection of articles, and the avatar of previous comments doesn't change immediately after the user changes the avatar, which doesn't make much difference
The document data increased slightly Document data has increased significantly If the business needs to substantially update many of the values in the record or significantly add new records, such as a large number of user orders, the user's order data is not embedded, but in the form of records
Data usually requires a secondary query to obtain Data is usually not included in the results Inline documents can get nested arrays and objects through a query, such as article records nested comments, query articles can get all the comments of the article, reduce the number of queries
A quick query is required It needs to be added and deleted quickly If your data additions and deletions are written more frequently, handling them with nested arrays and objects can be cumbersome

Cloud Development Database Design

A non-correlated database, such as a cloud development database, is stored in a document, and the field of a document can nest arrays and objects, an inline way that nests the relationship between a table and a table of a non-relationship database into a document, reducing the association that requires cross-collection operations.

3. Inline documents (inline arrays or objects)

Earlier, we learned that a document in a cloud development database can contain a lot of data, or even a complete application with only one collection. For example, a user, only one shopping cart in the relationship database, we need to build two tables to store data, a table is to store all customer information user list User, and a list of orders to store all user orders Order, but the cloud development database can be embedded in the original multiple tables into a table.

  1. {
  2. "name": "小明",
  3. "age": 27,
  4. "address":"深圳南山腾讯大厦",
  5. "orders": [{
  6. "item":"苹果",
  7. "price":15,
  8. "number":3
  9. },{
  10. "item":"火龙果",
  11. "price":18,
  12. "number":4
  13. }]
  14. }
  15. }

Using this embedded design model, when we want to query a user's information and all his orders, we can only do through a query of the user's information, all orders are obtained, and unlike the relationship database needs to first check the user's information in the User table, and then according to the user's id to check all orders.

The same article will have N users to comment to produce N comments data, and this N comments belong only to this article, there is no comment belongs to both A article and B article. We can still use an anti-paradigm design that embeds comments related to the article in this article:

  1. {
  2. "title": "为什么要学习云开发",
  3. "content": "云开发是腾讯云为移动开发者提供的一站式后端云服务",
  4. "comments": [{
  5. "name": "李东bbsky",
  6. "created_on": "2020-03-21T10:01:22Z",
  7. "comment": "云开发是微信生态下的最推荐的后台技术解决方案"
  8. }, {
  9. "name": "小明",
  10. "created_on": "2020-03-21T11:01:22Z",
  11. "comment": "云开发学起来太简单啦"
  12. }]
  13. }

When we are going to enter the details page of the article, in addition to the information of the article, but also to read the comments at once, this anti-paradigm embedded document can do, that is, you can get all the required data through a single query. But if the article is a big V-like hot spot, often there are thousands of tens of thousands of comments, all the comments embedded in the article record may exist record overflow (such as more than 16M), add-delete check efficiency will also be reduced, this time is not suitable for embedded way, but reference.

Fourth, refer to the document

Sometimes the relationship between data and data is more complex, no longer one-to-one or one-to-many relationships, such as when sharing collaboration, a user can send N documents, and a document has N authors (users), this N-to-N complex relationship, the use of embedded documents is not so good processing.

Imagine if you created only one user table and embedded the documents that A was involved in editing into the fields of the corresponding record, and so did the B users, and if A and B users were involved in editing the same document, then a document would be embedded in a user's record, and if the document had N authors, it would be repeated N times. If we only need to check which documents the user has edited, this is not a problem, but if we want to find out how many authors have edited a document, it is more difficult, if the document updates more frequently, then the operation is more complex, then the embedded document is obviously not appropriate, should adopt a paradigm design.

For example, we store the user in the user collection, store all the documents in the file collection, the collection and the _id the following user collection mainly stores the user's information, and the file collection records that need to be referenced _id _id written to the user collection,

  1. {
  2. "_id": "author10001",
  3. "name": "小云",
  4. "male":"female",
  5. "file": ["file200001","file200002","file200003"]
  6. }
  7. {
  8. "_id": "author10002",
  9. "male":"male",
  10. "name": "小开",
  11. "books": ["file200001","file200004"]
  12. }

In the files collection, information for all documents is stored, and _id in the files collection only a _id referenced:

  1. {
  2. "_id": "file200001",
  3. "title": "云开发实战指南.pdf",
  4. "categories": "PDF文档",
  5. "size":"16M"
  6. }
  7. {
  8. "_id": "file200002",
  9. "title": "云数据库性能优化.doc",
  10. "categories": "Word文档",
  11. "size":"2M"
  12. }
  13. {
  14. "_id": "file200003",
  15. "title": "云开发入门指南.doc",
  16. "categories": "Word文档",
  17. "size":"4M"
  18. }
  19. {
  20. "_id": "file200004",
  21. "title": "云函数实战.doc",
  22. "categories": "Word文档",
  23. "size":"4M"
  24. }

If we want to query at once which files the user is involved in editing and the corresponding file information, we can use aggregated lookups on the cloud function side, which is equivalent to consolidating two collections into one collection.

  1. const cloud = require('wx-server-sdk')
  2. cloud.init({
  3. env: cloud.DYNAMIC_CURRENT_ENV,
  4. })
  5. const db = cloud.database()
  6. const _ = db.command
  7. const $ = db.command.aggregate
  8. exports.main = async (event, context) => {
  9. const res = db.collection('user').aggregate()
  10. .lookup({
  11. from: 'files',
  12. localField: 'file',
  13. foreignField: '_id',
  14. as: 'bookList',
  15. })
  16. .end()
  17. return res
  18. }

And if we're going to modify the information for a specified document, we're going to query _id the files collection. The document is updated once, and all information involved in editing the document is updated to ensure consistency in the contents of the file.

It's worth noting that although we've decentralized data into different collections through paradigm design, we can still nest less complex relationships in arrays and objects, as we do with relationship databases and Excel fields.

If each user participates in the editing of the document is particularly large and each document participates in the co-editing of the user is relatively small, the file is embedded in the user collection is more performance-consuming, this time can be reversed, the user's id embedded in the file collection, so the design of the database and the actual business has a great relationship.

  1. //由于file数组过大,user集合不再内嵌file了
  2. {
  3. "_id": "author10001",
  4. "name": "小云",
  5. "male":"female",
  6. }
  7. //把用户的id嵌入到files集合里,相当于以文档为主,作者为辅
  8. {
  9. "_id": "file200001",
  10. "title": "云开发实战指南.pdf",
  11. "categories": "PDF文档",
  12. "size":"16M",
  13. "author":["author10001","author10002","author10003"]
  14. }

Again here, cross-table queries and union table queries are two different things, cross-table queries we can find results through multiple queries of fields (fields of the same meaning) that are associated between collections and collections, while union table queries combine data from multiple collections into columns through associated fields. If you do not need to return the entire column of data across the collection, it is not recommended to use a union table query, let single N tables, can cross the table query on the cross table query.

V. Precautions for database design

1, the database's data pattern

The data pattern of the cloud development database is flexible, the relationship database requires you to define the pattern structure of a table before inserting data, and the collection collection of cloud databases does not limit the document structure of records. Relationship databases can vary greatly from field to field, type of field, length, and so on, while cloud databases do not need to be predefined and have no restrictions on the structure of records.

This flexibility makes mapping between objects and database documents easy. E ven if there are significant variations between data records, each document can be well mapped to different records. Of course, in practice, it is best to have a similar structure (the same fields, the same embedded document structure) in the same collection for bulk addition and deletion and aggregation.

As application usage time increases and demand changes, the database's data patterns may also need to grow and change accordingly. T he easiest way to do this is to add fields based on the old data patterns, which ensures that the database supports all legacy patterns. F or example, user information table, because the business needs to add some fields, such as gender, age, cloud database can be easily added, but there will be some problems, that is, the user information collected in the past gender, age these fields are empty, and only newly added users have. If the data changes in the business is relatively large, the data pattern of the document will also have a conflict of version confusion, which is also to be considered at the beginning of the database design.

2, pre-populated data

If you already know which fields you're going to use in the future, you'll prefill them the first time you insert them, and you'll be able to update the field level with update instructions later, instead of adding fields to the collection, which is much more efficient.

  1. {
  2. "_id":"user20200001",
  3. "nickname": "小明",
  4. "age": 27,
  5. "address":"",
  6. "school":[{
  7. "middle":""
  8. },{
  9. "college":""
  10. }]
  11. }

For example, the resume website user information form address, school, users do not have to fill in when logging in, but before posting the resume this information is required, if these fields are not pre-set, the collection of this information needs to use doc to record-level updates.

  1. db.collection("user").doc("user20200001")
  2. .update({
  3. data:{
  4. "address":"深圳",
  5. "school":[{
  6. "middle":"华中一附中"
  7. },{
  8. "college":"清华大学"
  9. }]
  10. }
  11. })

However, if these fields are pre-set, the field-level updates are made using the update operator, which can greatly improve performance when the collection is larger and less modified.

  1. db.collection("user").doc("user20200001")
  2. .update({
  3. data:{
  4. "address":_.set("深圳"),
  5. "school.0.middle":_.set("华中一附中"),
  6. "school.1.college":_.set("清华")
  7. }
  8. })

3, consider the growth of documents

The use of inline documents as an anti-paradigm design is a great benefit when querying, but there are some document update operations, will add elements in the array of embedded documents or add a new field, if with the needs of the business such operations lead to the size of the document is longer, such as we in order to facilitate the built-in comments into the embedded documents, early design is no problem, but if the cumulative increase in comments will lead to too large inline documents, the more new comments will affect performance. A nd the upper limit for a record in a cloud database is 16M. If this data growth also affects anti-paradigm design patterns, you may want to redesign the data model to use references between different documents rather than embedded data structures.

Because the update instruction not only micro-manipulates (additions and deletes) at the field level of the data, but also atomic operations, it not only performs well but also supports high synths. What's more, by designing embedded documents in an anti-paradigm, updating the atomic operations of instructions can replace the functionality of some transactions, as described in the Atomic Operations and Transactions section.