The most complete back-end technology in history

May 29, 2021 Article blog

1. System development

2. Architecture design

3. Network communication

4. The fault is abnormal

5. Monitor alerts

6. Service governance

7. The test method

8. Publish the deployment

Background development as the apple of the eye of Internet technology, has always been the peak that programmers chase, this article for you to introduce the back-end technology.

Original: http://t.cn/AiQWteI2

System development

1. High cohesion/low coupling

High cohesion means that a software module is made up of highly relevant code and is responsible for only one task, which is often referred to as the single principle of responsibility. The cohesion of the module reflects the degree of closeness within the module.

The closer the connection between modules, the stronger their coupling and the less independent the modules are. T he coupling between modules depends on the complexity of the interfaces between modules, how they are called, and the information passed. A complete system, between modules and modules, as far as possible to make it independent. In general, the higher the cohesion of the modules in the program structure, the lower the coupling between the modules.

2. Over-design

Over-design is to carry out too much future-oriented design or to think of relatively simple things complex, excessive pursuit of modularity, scalability, design patterns, etc. , for the system to add unnecessary complexity.

3. Premature optimization

Prematurely refers not to the early stages of the development process, but to the time when the future direction of changes in demand has not been clarified. Not only may your optimization not only prevent you from implementing new requirements well, but your guesses about your expectations for optimization may also be wrong, resulting in the fact that you get nothing but complicating the code.

The right approach is to implement your needs in quality, write enough testcase, and then do profile to find performance bottlenecks, this time to do optimization.

4. Refactoring

Refactoring is to improve the quality and performance of software by adjusting program code, making its program design pattern and architecture more reasonable, and improving the scalability and maintenance of software.

5. Window break effect

Also known as the broken window theory, broken window effect (Broken windows theory) is a theory of criminology. T his theory holds that if the undesirable phenomena in the environment are allowed to exist, they will induce people to follow suit, or even intensify. A building with a few broken windows, for example, if those windows are not repaired, there may be vandals who will destroy more windows. Eventually they will even break into the building and, if they are found uninhabited, perhaps settle there or set fire to it.

Application in software engineering is, must not let the system code or architectural design of the hidden dangers have the opportunity to take the lead, otherwise, over time, the hidden dangers will become more and more serious. On the contrary, a high-quality system of its own will make people involuntarily write high-quality code.

The principle of mutual distrust

Refers to the program running up and down the entire link, each point is not guaranteed to be absolutely reliable, any point can fail at any time or unpredictable behavior, including machine network, service itself, dependent environment, input and request, and so on, so everywhere to guard.

7. Persistence

Persistence is the mechanism by which program data is transformed between temporary and persistent states. In layman's terms, temporary data (such as data in memory that cannot be permanently saved) is persisted into persistent data (e.g., persisted to a database or on a local disk that can be persisted for a long time).

8. Critical zone

A critical area is used to represent a public resource, or shared data, that can be used by multiple threads, but each time, only one thread can use it, and once the critical area resource is occupied, other threads must wait to use the resource.

9. Blocking/non-blocking

Blocking and non-blocking often describe the interaction between multiple threads. F or example, if a thread takes up a critical zone resource, then all other threads that need it must wait in that critical zone, and waiting causes the thread to hang. T his situation is blocking. A t this point, if the resource-consuming thread has been reluctant to release the resource, then all other threads blocking on this critical area will not work. Instead of blocking, multiple threads are allowed to enter the critical area at the same time.

10. Sync/Asynchronous

Usually synchronization and asynchronous refers to function/method calls.

Synchronization is when a function call is made and does not return until results are ut found. Asynchronous calls return instantaneously, but the instant return of asynchronous calls does not mean that your task is complete, and he starts a thread in the background to continue the task, notifying the caller by calling back callback or otherwise when the task is complete.

Concurrent/parallel

Parallel means that multiple instructions are executed simultaneously on multiple processors at the same time. So both are performed together, both at the micro and macro levels.

Concurrency means that only one instruction can be executed at the same time, but multiple process instructions are executed quickly, giving the effect of multiple processes executing simultaneously at the macro level, but not simultaneously at the micro level, but dividing time into segments that allow multiple processes to execute quickly and alternately.

Architecture design

1. High Concurrency

Because of distributed systems, High Concurrency is often designed to ensure that the system can handle many requests simultaneously. I n layman's terms, high concurring refers to the simultaneous access of many users to the same API interface or URL address at the same point in time. It often occurs in business scenarios where there are large numbers of active users and high concentrations of users.

2. High Availability

High Availability IS ONE OF THE FACTORS THAT MUST BE CONSIDERED IN THE DESIGN OF A DISTRIBUTED SYSTEM ARCHITECTURE, WHICH TYPICALLY MEANS THAT A SYSTEM IS SPECIFICALLY DESIGNED TO REDUCE DOWNTIME WHILE MAINTAINING A HIGH DEGREE OF AVAILABILITY OF ITS SERVICES.

3. Separation of reading and writing

In order to ensure the stability of database products, many databases have dual-machine hot-ready function. That is, the first database server, is to provide external addition and deletion business production server;

4. Cold/hot

Cold preparation: two servers, one running, one not running as a backup. T his way, once the running server goes down, the backup server is up and running. The cold-ready scheme is relatively easy to implement, but the disadvantage of cold-ready is that the host failure when the standby machine will not automatically take over, the need to actively switch services.
Hot preparation: This is commonly referred to as the active/standby approach, where server data, including database data, is written to two or more servers at the same time. I n the event of a failure of the active server, the Standby machine is activated by software diagnostics (usually through heartbeat diagnosis) to ensure that the application is fully restored to normal use in a short period of time. When one server goes down, automatically switch to another standby.

5. Live more offsite

Off-site multi-life generally refers to the establishment of independent data centers in different cities, "live" is relative to cold backup, cold backup is to back up the full amount of data, usually do not support business needs, only when the host room failure will switch to the backup room, and live, refers to these rooms in the daily business also need to walk traffic, do business support.

6. Load Balance

Load balancing is a load balancing service that distributes traffic to multiple servers. Automatic allocation of application external service capabilities across multiple instances enhances application availability by eliminating single points of failure, enabling you to achieve a higher level of application fault tolerance, seamlessly providing the load-balancing capacity required to distribute application traffic, providing you with efficient, stable, and secure service.

7. Dynamic separation

Dynamic separation refers to the architectural design method of separating static pages from dynamic pages or static content interfaces and dynamic content interfaces in the web server architecture, thereby improving the performance and maintainability of the entire service access.

8. Cluster

The concurrent carrying capacity of a single server is always limited, when the processing power of a single server reaches the performance bottleneck, the combination of multiple servers to provide services, this combination is called a cluster, each server in the cluster is called a "node" of the cluster, each node can provide the same service, thereby multiplying the concurrent processing power of the entire system.

9. Distributed

A distributed system is a complete system broken down into many separate subsystems according to business functions, each subsystem is called a "service", and the distributed system sorts and distributes requests to different subsystems, allowing different services to handle different requests. In distributed systems, subsystems operate independently, and they are connected by network communication to achieve data interoperability and combined services.

10. CAP theory

CAP theory refers to a distributed system where Consistency (consistency), Availability, and Partition Tolerance (partition fault tolerance) cannot be established at the same time.

Consistency: It requires that all data backups in a distributed system be the same or in the same state at the same time.

Availability: The system is still able to respond correctly to a user's request after a portion of the system cluster node goes down.

Partition fault tolerance: The system can tolerate the failure of network communication between nodes.

Simply put, in a distributed system, up to two of these properties can be supported. B ut obviously since distributed is doomed we are bound to partition, since partitioning, we can not 100 percent avoid partition errors. Therefore, we can only make choices about consistency and availability.

In distributed systems, we tend to pursue availability, which is more important than consistency, so how to achieve high availability, here is another theory, is BASE theory, it has made a further expansion of CAP theory.

11. BASE theory

BASE theory states:

Basically Available (basically available)

Soft state

Endly consistent (final consistency)

Base theory is the result of a trade-off between consistency and availability in CAP, and the central idea of the theory is that we can't be strongly consistent, but each application can use the appropriate approach to achieve ultimate consistency in the system based on its own business characteristics.

12. Horizontal/vertical expansion

Horizontally scale Scale Out increases storage and computing power by adding more server or program instances to spread the load.

Scale Up vertically increases stand-alone processing power.

There are two ways to scale vertically:

(1) enhance the performance of stand-alone hardware, for example: increase the number of CPU cores such as 32 cores, upgrade better network cards such as 10,000, upgrade better hard drives such as SSD, expand hard disk capacity such as 2T, expand system memory such as 128G;

(2) improve the performance of stand-alone software or architecture, for example: using Cache to reduce the number of IOs, using asynchronous to increase single-service throughput, using lockless data structure to reduce response time;

13. Parallel expansion

Similar to horizontal scaling. T he nodes in the cluster server are parallel peer nodes, and when capacity expansion is required, you can increase the service capability of the cluster by adding more nodes. In general, critical paths in the server (such as logins, payments, core business logic, etc.) in the server need to support dynamic parallel expansion of the runtime.

14. Elastic expansion

Refers to dynamic online capacity expansion of deployed clusters. Elastic expansion systems can automatically add more nodes (including storage nodes, compute nodes, network nodes) to increase system capacity, improve system performance, or enhance system reliability, or accomplish these three goals at the same time, depending on the actual business environment.

15. State sync/frame synchronization

State synchronization: State synchronization is when the server is responsible for calculating all game logic and broadcasting the results of those calculations, and the client is solely responsible for sending the player's actions and representing the results of the game received.

Features: High state synchronization security, convenient logic update, fast disconnection, but low development efficiency, network traffic with the complexity of the game increased, the server needs to bear more pressure.

Frame synchronization: The service side only forwards messages, does not do any logical processing, each client has the same number of frames per second, processing the same input data at each frame.

Features: Frame synchronization requires ensuring that the system has the same output at the same input. F rame synchronization development is efficient, traffic consumption is low and stable, and the pressure on the server is very small. However, the network requirements are high, the disconnection time is long, the client computing pressure is high.

Network communication

1. Connect the pool

Pre-establish a connection buffer pool and provide a set of connection usage, allocation, and management policies, so that connections in the connection pool can be efficiently and securely reused, avoiding the cost of frequent connection establishment and closure.

2. Disconnect rewiring

Because network fluctuations cause the user to intermittently disconnect from the server, the server attempts to connect the user to the state and data of the last disconnect after the network is restored.

3. Session hold

Session retention refers to a mechanism on a load balancer that identifies the correlation between the client and the server, and at the same time ensures that a series of related access requests are assigned to a machine. In human terms, multiple requests made during a session fall on the same machine.

4. Long connection/short connection

Usually refers to long and short connections to TCP. L ong connection is to establish tCP connection, has been maintaining this connection, generally will send each other a heartbeat to confirm the existence of the corresponding, the middle will do multiple business data transmission, generally will not actively disconnect. A short connection generally means that after a connection is established, after a transaction is performed (e.g., an http request), the connection is then turned off.

5. Flow control/congestion control

Traffic control prevents the sender from sending too fast, draining the receiver's resources, leaving the receiver too late to process.
Congestion control prevents the sender from sending too fast, making the network too late to deal with congestion, which in turn causes this part and even the whole network performance to decline phenomenon, seriously even lead to the network communication services to a standstill.

6. Surprise effect

The group effect is also known as the thunder group effect, but what is called, in short, the panic phenomenon is when a multiprocess (multithreaded) is blocking waiting for the same event at the same time (sleep state), if the waiting event occurs, then he will wake up all the processes (or threads) waiting, but ultimately only one process (thread) gets "control" of the time, the event is processed, while other processes (threads) fail to gain "control" and can only go back to sleep. This phenomenon and performance waste is called surprise.

7. NAT

NAT (Network Address Translation, Network Address Conversion) is the replacement of the address information at the header of an IP message. NAT is typically deployed at an organization's network exit location to provide public network accessibility and upper-level protocol connectivity by replacing the internal network IP address with the exit IP address.

The fault is abnormal

1. Downtime

Downtime, generally refers to the computer host unexpected failure and crash. Second, some servers, such as database deadlocks, can also be called downtime, and some server services hang up, so to speak.

2. coredump

When a program goes wrong and an exception breaks, OS stores the current state of the program's work as a coredunmp file. Typically, coredump files contain memory, register status, stack pointers, memory management information, etc. when the program is running.

3. Cache penetration/breakdown/avalanche

Cache penetration: Cache penetration refers to querying a certain non-existent data, because the cache is not hit when the need to query from the database, can not find the data is not written to the cache, which will lead to this non-existent data every request to the database to query, and thus put pressure on the database.
Cache breakdown: Cache breakdown refers to when the hot spot key expires at a certain point in time, and happens to have a large number of concurrent requests for this Key at this point in time, resulting in a large number of requests hitting db.
Cache avalanche: Cache avalanche refers to the cache of data bulk to expiration time, and query data volume is huge, causing database pressure is too large or even down machine.
Unlike cache breakdowns, save breakdowns are hot spot key failures, and cache avalanches are a large number of key failures at the same time.

4. 500/501/502/503/504/505

500 Internal Server Error: An internal service error, typically when the server encounters an unexpected situation and is unable to complete the request. Possible causes: 1, program errors, such as: ASP or PHP syntax errors;
501Not implemented: The server does not understand or support the requested HTTP request.
502Bad Gateway:WEB server failure, may be due to insufficient program processes, the requested php-fpm has been executed, but for some reason did not complete, resulting in the php-fpm process terminated. Possible causes: 1, Nginx server, php-cgi process is not enough;
503Service Unavailable: The server is currently unavailable. T he system maintenance server is temporarily unable to process client requests, which is only temporary. You can contact the next server provider.
504Gateway Timeout: Server 504 error indicates a timeout, which means that the request made by the client did not reach the gateway, the request did not reach the php-fpm that can be executed, and is generally related to the configuration of the nginx.conf.
505HTTP Version Support Not Supported: The server does not support the version of the HTTP protocol used in the request. (HTTP version not supported)

In addition to 500 errors may be program language errors, the rest of the errors, can probably be understood as server or server configuration problems.

5. Memory overflow/memory leak

Memory overflow: Out Of Memory refers to a program that does not have enough memory for the applicant to use when it requests memory, or gives you a piece of storage space to store int-type data, but you store long-type data, and the result is that there is not enough memory, and then the OOM, the so-called memory overflow, is reported incorrectly.
Memory leak: Memory Leak refers to the heap memory that has been dynamically allocated in the program for some reason the program is not released or can not be released, resulting in a waste of system memory, resulting in the program running slowly or even system crashes and other serious consequences.

6. The handle is leaking

A handle leak is when a process does not release an open file handle after calling a system file. The general phenomenon after a handle leak is that the machine slows down, the CPU soars, and the CPU usage of cgi or server that leaks the handle increases.

7. Deadlock

Deadlock refers to two or more threads in the execution process, due to competitive resources or because of communication with each other caused by a blocking phenomenon, if there is no external force, they are inhibited in a blocking state and can not continue, at this time said that the system is deadlocked or the system produced a deadlock.

8. Soft break/hard interrupt

Hard interrupt: We usually refer to a hard interrupt (hardirq). It is primarily used to notify the operating system of changes in the state of peripherals.
Soft interruption: 1, usually hard interrupt service program to the kernel interruption;

Linux to achieve this feature, when an outage occurs, a hard interrupt processes work that can be done in a short period of time, and puts the work that handles the event longer to be done after the interrupt, which is a soft interrupt (softirq).

9. Burrs

At some point in time, server performance metrics (e.g. traffic, disk IO, CPU usage, etc.) are much larger than the time period before and after that time. The appearance of burrs represents uneven and inadequate use of this server resource, which can easily induce other, more serious problems.

10. Replay the attack

An attacker sends a package that the destination host has received for the purpose of spoofing the system, primarily during the authentication process, and undermines the correctness of authentication. I t is a type of attack that repeats a valid data transfer maliciously or fraudulently, and replaying an attack can be carried out by the initiator or by an enemy that intercepts and resents the data. Attackers use network listening or other means to steal authentication credentials and then re-send them to the authentication server.

11. Network silos

Network silos refer to a cluster environment in which some machines lose network connectivity to the entire cluster, split into small clusters, and data inconsistencies occur.

12. Data tilt

For cluster systems, the general cache is distributed, i.e. different nodes are responsible for a certain range of cached data. W e don't have enough cached data dispersion, resulting in a large amount of cached data concentrated on one or more service nodes, called data tilt. Generally speaking, the data tilt is caused by the poor performance of load balancing implementation.

13. Cerebral palsy

Cerebral fissure refers to the system division caused by the inaccessibility of the network between some nodes in the cluster system, the different divided small clusters will provide services according to their respective state, the original cluster will have inconsistent reactions at the same time, resulting in the nodes competing for resources, system confusion, data corruption.

Monitor alerts

1. Service monitoring

The primary purpose of service monitoring is to be able to detect service problems accurately and quickly when they are about to occur to reduce the scope of impact. Service monitoring generally has a variety of means, which can be divided into:

System layer (CPU, network status, IO, machine load, etc.)
Application layer (process status, error logs, throughput, etc.)
Business layer (service/interface error code, response time)
User layer (user behavior, public opinion monitoring, front-end burial point)

2. Full link monitoring

Service dial measurement: Service dial measurement is the monitoring method of detecting service (application) availability, through dialing node to periodically probe the target service, mainly through availability and response time measurement, dial measurement nodes usually have multiple offsite.
Node detection: Node detection is used to discover and track the network availability and smoothness between different room (data center) nodes monitoring methods, mainly through response time, drop rate, hop count to measure, detection methods are generally ping, mtr or other private protocols.
Alarm filtering: Filter certain predictable alarms without entering alarm statistics, such as http response 500 errors caused by a small number of reptile accesses, business system custom exception information, and so on.
Alarm heavy: When an alarm is notified to the person in charge, the same alert will not continue to be received until the alarm is restored.
Alarm suppression: In order to reduce the interference caused by system jitter, suppression is also required, such as the server's instantaneous high load, which may be normal and requires attention only if the high load lasts for a period of time.
Alarm Recovery: Development/operations personnel need to receive not only an alert notification, but also a notification that the fault elimination alert is back to normal.
Alarm merge: Merge multiple identical alarms generated at the same time, such as a microservice cluster at the same time more than one subservice overload alarm, need to be merged into a single alarm.
Alarm convergence: Sometimes when an alarm is generated, it is often accompanied by other alarms. A t this time, only the root cause can be alerted, other alarms converge for sub-alerts together to send a notification. I f a cloud server has a CPU load alert, it is often accompanied by an availability alert for all systems it is carrying.
Fault self-healing: real-time detection of alarms, pre-diagnosis analysis, automatic recovery of faults, and open the surrounding system to achieve the entire process of closed loop.

Service governance

1. Microservices

Microservices architecture is an architectural pattern that advocates the division of a single application into a small set of services that coordinate and work with each other to provide the ultimate value to users. Each service runs in its own process, and services communicate with each other using lightweight communication mechanisms (usually HTTP-based Restful APIs). Each service is built around a specific business and can be deployed independently to a production environment, class production environment, and so on.

2. Service discovery

Service discovery is the use of a registry to record information about all services in a distributed system so that other services can quickly locate these registered services. Service discovery is the core module that underpins large-scale SOA and microservices architectures and should be as highly available as possible.

3. Flow peaking

If you look at the request monitoring curve of a raffle or second kill system, you will find that such a system will have a peak during the period during which the activity is open, and when the activity is not open, the system's request volume and machine load are generally relatively smooth. T o save machine resources, we can't always provide maximum resource capacity to support short-term peak requests. T herefore, some technical means need to be used to weaken the instantaneous request peak, so that the system throughput in the peak request to maintain control. P eaking can also be used to eliminate burrs and make server resource utilization more balanced and adequate. Common peaking strategies include queues, frequency limits, layered filtering, multi-level caching, and so on.

4. Version compatible

During the upgrade process, you need to consider whether the new data structure will be able to understand and parse the old data after the upgrade, whether the newly modified protocol will understand the old protocol, and make the appropriate processing within the expected. This requires version compatibility during the service design process.

5. Overload protection

Overload refers to the current load has exceeded the maximum processing power of the system, overload, will lead to some services are not available, if mishandled, it is very likely to cause the service is completely unavailable, or even an avalanche. Overload protection is exactly what you do for this anomaly to prevent the service from being completely unavailable.

6. Service fuse

A service fuse acts like a fuse in our home, temporarily stopping calls to a service in order to prevent an avalanche throughout the system when it becomes unavailable or responds to a timeout.

7. Service downgrade

Service degradation is the release of server resources to ensure the proper functioning of core tasks in the event of a surge in server stress and a strategic downgrade of some services and pages based on current business conditions and traffic. Downgrades tend to specify different levels and face different levels of exception to perform different treatments.

Depending on the service: services can be rejected, services can be delayed, and sometimes random services can be provided.
Depending on the scope of the service: you can cut off a feature or you can cut off some modules.

In summary, service degradation requires different degradation strategies based on different business needs. The main purpose is that the service is damaged but better than nothing.

8. Fuse VS degraded

The same point: the goal is consistent, are based on availability and reliability, in order to prevent system crashes;
Different points: different causes of triggering, service fuse is generally caused by a service (downstream service) failure, and service degradation is generally considered from the overall load;

9. Service restrictions

The current limit can be thought of as one of the service downgrades, the limit is to limit the system's input and output traffic has reached the purpose of protecting the system. G enerally speaking, the throughput of the system can be measured, in order to ensure the stable operation of the system, once the threshold of the need to limit, it is necessary to limit traffic and take some measures to complete the purpose of limiting traffic. For example, delay processing, denial of processing, or partial denial of processing, and so on.

10. Fault shielding

Remove the faulty machine from the cluster to ensure that new requests are not distributed to the faulty machine.

The test method

1. Black box/white box test

Black box testing does not take into account the internal and logical structure of the program, mainly used to test whether the function of the system meets the requirements specifications. There is usually an input value, an input value, and an expected value to compare.

White box testing is mainly used in the unit testing stage, mainly for code-level testing, for the internal logical structure of the program, test means are: statement coverage, decision coverage, condition coverage, path coverage, condition combination coverage

2. Unit/integration/system/acceptance testing

Software testing is generally divided into four stages: unit testing, integration testing, system testing, acceptance testing.

Unit tests: Unit tests are the smallest verifiable units in the software, such as a module, a process, a method, and so on. Unit tests are minimal in granularity and are generally tested by the development team in a white box manner, with the main test units meeting the "design".
Integration testing: Integration testing, also known as assembly testing, usually takes all program modules into orderly, incremental testing on a unit test basis. Integration testing between unit testing and system testing, play a "bridge role", generally by the development team using white box plus black box way of testing, both to verify "design" and "requirements."
System testing: System testing will be integrated test software, as part of the computer system, combined with other parts of the system, in the actual operating environment to carry out a series of rigorous and effective tests to find potential problems in the software, to ensure the normal operation of the system. System testing is the most granular, generally by the independent test team to use the black box way to test, the main test system is in line with the "requirement specification."
Acceptance testing: Acceptance testing, also known as delivery testing, is a formal test of user needs and business processes to determine whether the system meets acceptance criteria, and it is up to the user, customer, or other authorized authority to decide whether to accept the system. Acceptance testing is similar to system testing, the main difference is that the tester is different, acceptance test is performed by the user.

3. Regression test

Retest when defects are found and modified, or when new features are added to the software. Used to check if the discovered defect has been corrected, and the modifications made do not raise new questions.

4. Smoke test

The term comes from the hardware industry. A fter you make changes or fixes to a hardware or hardware component, power up the device directly. I f there is no smoke, the component passes the test. In software, the term "smoke testing" describes the process of validating code changes before embedding them in the product's source tree.

Smoke testing is a fast basic function verification strategy for software version packages in the software development process, which is a means of confirming and verifying the basic functions of software, not an in-depth test of software packages.

For example: for a login system smoke test, we only need to test the correct user name, password, verification login this core function point, as for the input box, special characters, etc., can be carried out after the smoke test.

5. Performance testing

Performance testing tests the system's performance metrics by simulating a variety of normal, peak, and abnormal load conditions through automated testing tools. Both load and stress tests are performance tests that can be combined.

Through load testing, the performance of the system is determined under various workloads, and the goal is to test how the performance metrics of the system change as the load gradually increases.
Stress testing is a test that obtains the maximum service level that a system can provide by identifying bottlenecks or unacceptable performance points.

6. Benchmark

Benchmarking is also a way to measure the maximum actual performance of a machine's hardware and software-optimized performance improvements, as well as to identify CPU or memory efficiency issues for a piece of code. Many developers use benchmarks to test different concurresion patterns, or benchmarks to help configure the number of work pools to ensure maximum system throughput.

7. A/B test

The A/B test is a randomly assigned, similar number of samples from two or more groups, and if the experimental results of the experimental group and the comparison group are statistically significant on the target indicator, it can show that the function of the experimental group can lead to the results you want, thus helping you to validate assumptions or make product decisions.

8. Code override test

Code coverage is a measure of software testing that describes the proportion and extent to which source code is tested in a program, and the resulting ratio is called code coverage. W hen doing unit tests, code coverage is often used as a measure of testing good or bad, and even code coverage is used to assess test task completion, for example, code coverage must reach 80% or 90%. As a result, the tester went to great pains to design the case coverage code.

Publish the deployment

1. DEV/PRO/FAT/UAT

DEV (Development environment): A development environment for developer debugging, with varying versions.
FAT (Feature Acceptance Test environment): A functional acceptance test environment for software testers to test.
User Acceptance Test environment: A user acceptance test environment for functional validation in a production environment that can be used as a pre-release environment.
PRO:Production environment: Production environment, formal online environment.

2. Grayscale release

Grayscale release refers to in the upgrade version process, through partition control, whitelist control and other ways to some users first upgrade product features, while the rest of the users remain unchanged, when after a period of time to upgrade product features of the users do not have feedback problems, gradually expand the scope, and eventually open the new version of features to all users, grayscale release can ensure the stability of the overall system, in the initial grayscale can be found, modify the problem to ensure its impact.

3. Rollback

Refers to the act of restoring a program or data to the last correct state (or the last stable version) when a program or data processing error occurs.

That's all you need to do with the most complete back-end technology in history.