How to achieve a good database design?

May 31, 2021 Article blog

1. 1. Why focus on database design?

2. 2. Normalization

3. 3. Data type

4. 4. Constraints

5. 5. Index

6. 6. Conclusion

1. Why focus on database design?

Whether it's an application or a database, data is always the most important part. I n general, data is the primary purpose for which the system exists. That's why we shouldn't just think of database systems as black boxes that hold data, but as tools for verifying and preventing data corruption.

To do this, you need a robust and thoughtful database design. Of course, the business logic is encoding at the application layer, which ensures that the data is formatted correctly before it reaches the database.

But who can guarantee that network failures or defects will not release unreliable "guests"? I n addition, the application layer is not the only "door" to the database. W e can use import scripts, maintenance scripts, and DBAs and developers will interact with them. We can take precautions at the bottom to ensure that the data is always checked before it is stored.

Having robust, reliable data also helps with development and testing. Setting a column to Not Null eliminates many of the test scenarios that assume the column is empty, and simplifies the code so that developers don't have to check values (almost) every time they access it.

After emphasizing the importance of a good database design, let's see what tools we can use to implement it.

2. Normalization

How to achieve a good database design?1

This is undoubtedly the primary principle of good design. Here, we're not going to delve into normalization rules, we're just going to emphasize their importance.

Here's a good piece of information on this topic that you can read further.

https://docs.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description

3. Data type

Another thing to note is to define the appropriate property type. T his not only improves the performance of the database, but also validates the data before it is stored. So, we should save numeric data in the integer, numeric fields, timestamp, timestamptz fields, Boolean values in bit, char(1), or boolean fields, and so on.

The date deserves special attention. I f the Date property assumes that there is only the date part (OrderDate, ReleaseDate), use the Date type that does not have a time section. If you only need to keep time (StartTime, EndTime), use the appropriate time type.

If you do not need to specify precision, specify it as zero ("time(0)"). O ne problem with dates with a time section is that you must always truncate the time section and display only the dates, and when you want to display them in a different place from the time zone in which the database is located, make sure that the format does not appear as yesterday or tomorrow. When you jump to daylight saving time, there may also be problems with date time additions and subtractions with a time portion.

4. Constraints

Constraints are the focus of this article. T hey exclude invalid data and ensure the robustness of the data. Let's look at it one by one.

Non-empty constraints

If business rules require that the property should always exist, set it to Not Null without hesitation. F ields that are suitable for Not Null are Id, Name, AddedDate, IsActive, State, CategoryId (if all items should have a category), ItemCount, Price, and many other fields. T ypically, these properties play an important role in the business logic. Other optional information fields may still be set to Null.

Note, however, that you do not use Not Null constraints on properties that can be empty. For example, a long-running task always has a StartTimestamp, but endTimestamp is updated only when the task is complete.

Another typical example is the ManagerId of the Employee table, where not all employees have managers. D on't try to keep ManagerId blank and insert "0" or "-1" for employees who don't have a manager. This causes other problems when we add foreign key constraints.

Unique constraint

Similarly, according to business rules, some attributes (or combinations of attributes) should be unique, such as Id, PinNumber, BookId, AuthorId, OrderNo, and so on. The uniqueness of these properties should be guaranteed by adding unique constraints.

It's also important to note that you can use unique indexes to achieve the same effect, but adding constraints is a better approach. Because when you add a unique constraint, a non-unique index is automatically created.

Therefore, if for some reason you have to temporarily disable/enable constraints, it will be very easy. In the case of unique indexes, you must delete/recreate the index, which is an expensive operation in terms of performance and time.

The primary key

Not Null, together with the unique constraint, forms the primary key. W hen we think of the primary key, we quickly think of columns like Id or ObjectId. However, primary keys can also be composite, such as BookId and AuthorId.

The challenge here is, do you want to use a separate Id column as the primary key, or do you want to combine the two as the primary key? I n general, using a separate Id column is a better approach because it makes the connection clearer and makes it easy to add another column to a unique combination. However, even with a separate primary key (Id), we still want to add unique constraints to the BookId and AuthorId columns.

Check constraints

Check constraints allow us to define valid values/ranges of data. Properties that are appropriate for Check constraints are percentages (between 0 and 100), status (0, 1, 2), price, amount, total (greater than or equal to 0), PinNumber (fixed length), and so on.

Similarly, don't try to encode business logic into Check constraints. I remember once adding a Check constraint of "greater than or equal to zero" to the AccountBalance column to avoid accidental overdrafts.

The default constraint

Default constraints are also important. They allow us to add new Not Null columns to existing tables and make the "old" API compatible with the new structure until all parties complete the upgrade (although the default constraint should be removed after a full upgrade).

Here's one thing to keep in mind. D o not write business logic in the default constraints. F or example, the function "now()" might be appropriate (although not always) as the default value for timestamp fields in the log table, but not for the OrderDate field of the Orders table. You may prefer to omit OrderDate from the insert statement and rely on the default constraints, but this means extending the business logic to the database layer.

In addition, in some cases, the business may assign values to OrderDate only after the order is approved, because the default constraints are buried deep in the database, so it is less obvious when we make changes to the code at the application layer.

Foreign key constraints

Foreign key constraints are the king of relational database design. F oreign keys work with primary keys to ensure data consistency between tables. N ormalization rules tell us when to extract data into a table and reference it with foreign keys. Here we'll look at detail differences, such as OnDelete and OnUpdate rules.

How to achieve a good database design?2

Let's start with a simple section: OnUpdate. F oreign keys refer to primary keys, which are rarely, if any, modified. T herefore, OnUpdate rules are not very common, but it makes sense to set them to Cascade because we may sometimes have to update the primary keys for some rows (usually after migration). In this way, the database will allow us to update and propagate the new id to the child table.

The OnDelete rule is a bit complicated. D epending on the database, we have NoAction, Restrict, SetNull, SetDefault, and Cascade options. So, which one to choose?

Typically, we select NoAction for entities that are found or do not reference entities by key reference. F or example, Products -> Categories, Books -> Authors, and so on. In most cases, Restrict is the same as NoAction, but for some databases, they are slightly different.

https://www.vertabelo.com/blog/on-delete-restrict-vs-on-delete-no-action/

On the other hand, select Cascade when the child record cannot exist without a parent record. I n the Book and Author examples, when we delete a book, we should also delete records from the BookAuthor table. O ther examples include OrderDetails - > Orders, PostComments - > Posts, and more. H ere, some people may disagree that databases should not automatically delete sub-rows, they should be deleted by the application layer. A ccording to the business logic, this is the case. Sometimes, however, "unimportant" child deletions can be delegated to the database.

SetNull is rarely used. F or example, the foreign key between Employee.ManagerId and Employee.Id can be SetNull. W hen a manager is removed, his subordinates have no manager. Obviously, the rule can only be selected if the column is empty.

SetDefault is the rarest of these rules. W hen the parent record is deleted, it sets the column to its default value. B ecause the foreign key refers to the primary key, it is difficult to imagine a field with a foreign key hard-coded the default value. But in any case, this option exists, and we may need it.

5. Index

Indexes are an important part of good database design, but they are a bit off our discussion because they do little to protect our data (except for unique indexes).

It is important to note that some RDBMS systems, such as Oracle, automatically create indexes when foreign keys are created without us having to worry. Other databases, such as MS SQL Server, do not do so, and we must add our own indexes.

6. Conclusion

A well-thought-out design can save us a lot of coding, testing, and troubleshooting time. I t's nice to write queries and reports on a well-designed database. Publishing data and migrating it to a new system can also be easy.

Above is W3Cschool编程狮 about how to achieve good database design? Related to the introduction, I hope to help you.