Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

It's enough to read this article about Cache Penetration/Cache Breakdown/Cache Avalanche


May 31, 2021 Article blog


Table of contents


National Day plus Mid-Autumn Festival has passed, are you ready to learn?

If redis is used in a project, it is primarily used as a cache

Since it is used as a cache, there is bound to be a cache penetration/cache breakdown/cache avalanche problem

This article says how to deal with this situation when it comes to it

Cache penetration

First let's figure out what cache penetration is. These three words are so similar that you have to figure out if the concept is not

In fact, just literally speaking, probably can also know a little, cache penetration, is directly through the cache, the request to the database to go

In general, to query the data, the cache should be there, but can not prevent hackers ah, if the hacker request query is the database does not exist in the data, the database does not have data, the cache will certainly not have, right, then the request will hit our database inside, this is the cache penetration

You think, hackers want to attack, how can only request once, must be a large number of requests to come, because the database does not exist in the id to request, then these requests no doubt directly hit the database above, then our database may be because these a large number of requests directly down

How to solve it?

Let's go back to the scenario that created this problem, why do so many requests hit the database? Because there is no corresponding key in the cache, it crosses the cache directly to the database

Then the problem is solved, there is no corresponding key in the cache? OK, if this key database doesn't have it, then I'm in redis, I'm in this key, the value is null, so if there's a request to query this key, I'll go straight back to null and I'm done, so I don't have to hit the database

Note that it's usually three to five minutes to remember to set its expiration time

But the other side is a hacker, may use a key to request? He might send requests with a lot of keys in a short period of time, so if a key were to store a null value in redis, would so many keys store so many null values?

In that case, is there a value of null in redis?

So is there a better solution?

That has to be there! Blom filter, you're worth trying

What is a blom filter? It just tells you that a value must not exist or may exist (emmmm), and I don't know if I've made it clear

So you can cache the contents of the database to the Bloem filter, so that when a large number of requests come over, redis inside there is no, it doesn't matter, and then go to the Blom filter filter, so that the request does not have to hit the database above, you can determine whether there is in this key database

Doesn't that reduce the pressure on the database, but it's a genius?

Cache breakdown

Cache breakdown is that, in a high concurring situation, if multiple requests are querying for a key, unfortunately, this key has failed for some reason (such as the set expiration time, the cache server went down), which causes so many requests to hit the database directly

Then if the number of these requests is large enough, the database may be wiped out directly

Knowing what's causing the results, it's good to find a solution

Not because many requests hit the database, but they are only a key request, so here can be implemented with an exclusion lock, the first request to reach the request key found cache inside does not, allow it to go to the database query, while locking, so that the second request, the third request ... will be blocked to the present and will no longer hit the database, which reduces the concurrent pressure on the database

Cache an avalanche

Cache avalanche, avalanche avalanche, it is more serious, breakdown is a key failure situation, avalanche refers to the occurrence of large-scale cache failure, this is possible, for example, my cache server down, that is not directly on the large-scale cache failure; Or, I was trying to save trouble, many key settings of the expiration time are the same, and then just when the cache is invalid, a lot of requests different keys came over

Solution, in fact, is not suitable for the use of locked way to solve, because this is a lot of requests different key, it is not a

And, we are because many key settings of the expiration time are the same, so the solution is, we do not set the same time to let the cache invalidate, let's give a random time, let the cache random invalidate, so that the large-scale cache failure situation is reduced a lot

What if my cache server goes down directly? Also good to do, to a cluster on the solution, here is just a solution, its landing implementation is not the focus of this article ha

Talk about the Blom filter again

OK, if you see here, in fact, the content of this article is finished

But I feel the Blom filter that piece, I did not say clearly, so here to take out to say in detail (I know you must be silently quaal powder is a warm man, good, just know, don't really say it, I will be shy.)

The Blom filter is a data structure, a probability-type data structure that tells you that "something must not exist or may exist"

You might say, this just didn't say it, it's quite tongue-in-cheek, you said

Not because this sentence is more important, I think this sentence to understand thoroughly, then the Blom filter should understand should also be in place

Come on, in order to bring the image to life, let's give an example. The Blom filter is a bit vector or bit array, probably long like this:

 It's enough to read this article about Cache Penetration/Cache Breakdown/Cache Avalanche1

Now, we need to store the "AliPay" field. The approximate stored procedure is to map the values, generate multiple hash values using multiple different hash functions, and then set the bit to 1 for each generated hash value

To give, for example, we'll now map the "AliPay" value through three different hash functions, and that's probably it:

 It's enough to read this article about Cache Penetration/Cache Breakdown/Cache Avalanche2

Again, now that I'm storing another value, "WechatPay," I might map it like this:

 It's enough to read this article about Cache Penetration/Cache Breakdown/Cache Avalanche3

Careful you may find that the value of position 4, at first not to "AliPay", and then "WechatPay" is also there, so that the value is not covered

Well, yes, it's covered

Next, we query "Ali" Then after the query, the Blom filter may give you a value of "0,1,2", and the result is that the position of "2" is 0, indicating that no value is mapped to this location, so we can determine that there is no "Ali" value in the database

Then if I query "AliPay", there is no doubt that it will definitely return to me "1,4,6", so can we say that there must be "AliPay" in the database? No, because the value of "1,4,6" may have been overwritten by other values, we can only say that there may be "AliPay" in the database.

That's what the Blom filter says, "A value must not exist or may exist."

Honey, do you understand?

This article comes from the public number: Java Geek Technology Author: Duck Blood Fans

Above is W3Cschool编程狮 About Cache Penetration/Cache Breakdown/Cache Avalanche It's enough to read this article. I hope to help you.