Jun 01, 2021 Article blog
The article was reproduced from the public number: Little Sister Taste
Almost every distributed system provides users with the ability to customize routes. B
ecause, only through
range
mod
hash
and other methods, the probability is no longer enough to meet the needs of users.
Let's take a real-world scenario as an example and talk about the idea of data routing.
It's about data routing, not nuginx or something.
A large
toB
application, using
MySQL
storage, has hundreds of millions of meters of data per table, and has shown significant bottlenecks in terms of structural changes and data queries that require a sub-table.
The first step is to find the latitude of the cut. For example, if the business is queried according to the latitude of time, then the creation time is used as the cut key.
The cut-off key for this business is merchant id (similar to the unique id you've been assigned to if you're opening a store in the U.S.). F or historical reasons, this id is a database primary key id and is self-growing. The business has the following characteristics:
The sub-library is imminent. Through analysis, some vip merchants, the amount of data is huge, it is not too much to transfer it to a database alone.
Control the flow of vip merchants to data stores by maintaining a mapping file. At this point, you need to customize the route.
The pseudocode is as follows:
function viptable(id){
10 => "mysql-002"
101 => "mysql-003"
}
function router4vip(id){
aimDb = viptable(id)
if(aimDb) return aimDb
return "mysql-001"
}
The merchant is 10, the data will fall to
mysql-002
the merchant will be 101, and the data will fall to
mysql-003
and the data will be stored
mysql-001
by default.
In addition, because id is an auto-generated self-adding field, and the route has a chicken or egg first problem, so modify the id field to manually set the value, extending out another numbering system, not to mention here.
The vip merchant issue has been resolved, and the next step is to resolve
mysql-001
issue.
As the business grows, more and more data falls on the default library, and it quickly hits a bottleneck.
The way to think of it is to split it in two.
mysql-001
data is scattered into two libraries.
This break-up rule, we directly use mod.
Why not split it into three? Primarily based on the following considerations, assume that the split db is:
mysql-001-1
mysql-001-2
In this case
mysql-001
becomes a logical cluster.
When
mysql-001-1
and
mysql-001-2
also reach a bottleneck, then we can continue to split it, still one for two, and that's when
mod 4
does, without involving complex data migrations.
The db after split is:
mysql-001-1-1
mysql-001-1-2
mysql-001-2-1
mysql-001-2-2
So far, we've used the vip sub-library,
mod 4
sub-library, with the following pseudocode:
...
function routertable(pivot){
0 => "mysql-001-1-1"
1 => "mysql-001-1-2"
2 => "mysql-001-2-1"
3 => "mysql-001-2-2"
}
function router4mod(id){
aimDb = router4vip(id)
if(aimDb) return aimDb
pivot = mod4(id)
return routertable(pivot)
}
By now, we've divided six libraries. Through the fission mode, it has better extensibility.
So you can rest easy?
Unfortunately, every time we expand, it's exponential. T
he next time, it's
mod 8
and the next time, it's
mod 16
Every time you expand, you move half the data, wtf.
Finally, decide to write on the scope of the merchant id.
First, make a long merchant id longer than any other in the existing system, primarily considering that the new rules do not affect the old routing rules.
The first layer of virtual clusters is then divided according to the scope of the merchant id, and then the second layer of virtual clusters is divided according to
mod
Our route, now a two-tier route.
For example, we set the merchant number to 9 bits (10 digits in java) and do the following routing table:
100 000000 - 100 100000=> 虚拟集群1
100 100000 - 100 200000=> 虚拟集群2
...
The first three, used to divide the first layer of virtual clusters, support 899;
Beneath each range, there are their own routing rules, some may
mod 2
some may
mod 3
and some may be
range
again.
Okay, let's join the new cluster:
mysql-range0-0 代表号段在范围1中的偶数id
mysql-range0-1
The pseudocode is as follows:
...
function router4range(id){
if(id < 100000000){
return router4mod(id)
}else if
(id in [100000000-100100000]){
return
"mysql-range0-"+mod2(id)
}
}
So far, we have eight libraries, two for vip, four for legacy routing algorithms, and two for new branch rules.
1.
When we find that when the merchant id grows to
100 056400
and reaches a bottleneck, we can add a new scope, and we only need to change the routing table logic
Second, when a merchant grows into a vip within a certain range, we can extract it separately and add a new vip library
Third,
a range of data hot spots are serious, then you can
mod 4
capacity expansion, does not affect the data outside the scope
4. Merchant id also has the concept of time latitude, which can be archived for some old merchants
The system wants to reserve another part of the number segment to provide some test accounts for the customer to try. After the first three rounds of renovation, we can easily plan for it.
Why do you think
redis-cluster
slot
design is a chicken rib, because it gives the routing rules dead, and I'm sure I'm going to put it on the drive layer when I'm going to design it?
Some architects come and go, leaving an indelible mark. I
n order to be compatible with the routing code of these legacy systems, branches become more complex, and each company has a bunch of stories, nothing more than swearing and being scolded. T
he stability is as great as the mountain, and the routing code is probably the most important uncedited
if else
You have to die when you move.
Just ask if you're afraid?
These are
W3Cschool编程狮
about
the real-world routing rules, which may be more complex than you think,
and hopefully it will help you.