The rules of routing in reality can be much more complex than you think

Jun 01, 2021 Article blog

The article was reproduced from the public number: Little Sister Taste

Almost every distributed system provides users with the ability to customize routes. B ecause, only through range mod hash and other methods, the probability is no longer enough to meet the needs of users. Let's take a real-world scenario as an example and talk about the idea of data routing.

It's about data routing, not nuginx or something.

scenario

A large toB application, using MySQL storage, has hundreds of millions of meters of data per table, and has shown significant bottlenecks in terms of structural changes and data queries that require a sub-table.

Implementation steps

Find the cut key

The first step is to find the latitude of the cut. For example, if the business is queried according to the latitude of time, then the creation time is used as the cut key.

The cut-off key for this business is merchant id (similar to the unique id you've been assigned to if you're opening a store in the U.S.). F or historical reasons, this id is a database primary key id and is self-growing. The business has the following characteristics:

Business operations are initiated by a merchant, and each table has a merchant id field
Merchant data is uneven, some merchants have tens of millions, some may only a dozen
There are some vip merchants with very large amounts of data
Stores a large number of statistical requirements, so it cannot be divided into tables, only libraries
There is a possibility of traversing the data, such as partial timing

Cut the requirements phase

The sub-library is imminent. Through analysis, some vip merchants, the amount of data is huge, it is not too much to transfer it to a database alone.

Control the flow of vip merchants to data stores by maintaining a mapping file. At this point, you need to customize the route.

The pseudocode is as follows:

function viptable(id){
    10 => "mysql-002"
    101 => "mysql-003"
}
function router4vip(id){
    aimDb = viptable(id)
    if(aimDb) return aimDb


    return "mysql-001"
}

The merchant is 10, the data will fall to mysql-002 the merchant will be 101, and the data will fall to mysql-003 and the data will be stored mysql-001 by default.

In addition, because id is an auto-generated self-adding field, and the route has a chicken or egg first problem, so modify the id field to manually set the value, extending out another numbering system, not to mention here.

The rules of routing in reality can be much more complex than you think1

Cut the requirements in phase two

The vip merchant issue has been resolved, and the next step is to resolve mysql-001 issue. As the business grows, more and more data falls on the default library, and it quickly hits a bottleneck.

The way to think of it is to split it in two. mysql-001 data is scattered into two libraries. This break-up rule, we directly use mod.

Why not split it into three? Primarily based on the following considerations, assume that the split db is:

mysql-001-1
mysql-001-2

In this case mysql-001 becomes a logical cluster. When mysql-001-1 and mysql-001-2 also reach a bottleneck, then we can continue to split it, still one for two, and that's when mod 4 does, without involving complex data migrations.

The db after split is:

mysql-001-1-1
mysql-001-1-2
mysql-001-2-1
mysql-001-2-2

So far, we've used the vip sub-library, mod 4 sub-library, with the following pseudocode:

...


function routertable(pivot){
    0 => "mysql-001-1-1"
    1 => "mysql-001-1-2"
    2 => "mysql-001-2-1"
    3 => "mysql-001-2-2"
}


function router4mod(id){
    aimDb = router4vip(id)
    if(aimDb) return aimDb


    pivot = mod4(id)
    return routertable(pivot)
}

By now, we've divided six libraries. Through the fission mode, it has better extensibility.

So you can rest easy?

The rules of routing in reality can be much more complex than you think2

Cut the requirements in three stages

Unfortunately, every time we expand, it's exponential. T he next time, it's mod 8 and the next time, it's mod 16 Every time you expand, you move half the data, wtf.

Finally, decide to write on the scope of the merchant id.

First, make a long merchant id longer than any other in the existing system, primarily considering that the new rules do not affect the old routing rules.

The first layer of virtual clusters is then divided according to the scope of the merchant id, and then the second layer of virtual clusters is divided according to mod Our route, now a two-tier route.

For example, we set the merchant number to 9 bits (10 digits in java) and do the following routing table:

100 000000 - 100 100000=> 虚拟集群1
100 100000 - 100 200000=> 虚拟集群2
...

The first three, used to divide the first layer of virtual clusters, support 899; Beneath each range, there are their own routing rules, some may mod 2 some may mod 3 and some may be range again.

Okay, let's join the new cluster:

mysql-range0-0 代表号段在范围1中的偶数id
mysql-range0-1

The pseudocode is as follows:

...
function router4range(id){
    if(id < 100000000){
        return router4mod(id)
    }else if
    (id in [100000000-100100000]){
        return 
            "mysql-range0-"+mod2(id)
    }
}

So far, we have eight libraries, two for vip, four for legacy routing algorithms, and two for new branch rules.

The rules of routing in reality can be much more complex than you think3

With three improvements, our routing meets:

1. When we find that when the merchant id grows to 100 056400 and reaches a bottleneck, we can add a new scope, and we only need to change the routing table logic

Second, when a merchant grows into a vip within a certain range, we can extract it separately and add a new vip library

Third, a range of data hot spots are serious, then you can mod 4 capacity expansion, does not affect the data outside the scope

4. Merchant id also has the concept of time latitude, which can be archived for some old merchants

Cut the requirements in four stages

The system wants to reserve another part of the number segment to provide some test accounts for the customer to try. After the first three rounds of renovation, we can easily plan for it.

End

Why do you think redis-cluster slot design is a chicken rib, because it gives the routing rules dead, and I'm sure I'm going to put it on the drive layer when I'm going to design it?

Some architects come and go, leaving an indelible mark. I n order to be compatible with the routing code of these legacy systems, branches become more complex, and each company has a bunch of stories, nothing more than swearing and being scolded. T he stability is as great as the mountain, and the routing code is probably the most important uncedited if else You have to die when you move.

Just ask if you're afraid?

These are W3Cschool编程狮 about the real-world routing rules, which may be more complex than you think, and hopefully it will help you.