Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Nginx load balancing module


May 23, 2021 Nginx Getting started


Table of contents


Load balancing module

The load balancing module is used upstream a host from the list of back-end hosts defined by the upstream directive. N ginx first uses the load balancing module to find a host, and then uses the upstream module to interact with that host. To facilitate the introduction of the load balancing module, and to do so, the following is an analysis using Nginx's built-in ip hash module as a practical example.

Configuration

To understand the development method of the load balancing module, we first need to understand how the load balancing module is used. Because the load balancing module is errthy different from the one mentioned in the previous book, it's easier to understand from the configuration.

In the configuration file, we need to use the load balancing algorithm of ip hash. We need to write a configuration similar to the following:

        upstream test {
            ip_hash;

            server 192.168.0.1;
            server 192.168.0.2;
        }

From the configuration we can see the use of the load balancing module:

  1. Core ip_hash only be used in upstream . T his instruction is used to notify Nginx to use the ip hash load balancing algorithm. I f this instruction is not added, Nginx uses the default round robin load balancing module. Do readers have something in common with the configuration of the handler module?
  2. The instructions in upstream may appear server instruction, after the server server or between the server instructions. R eaders may have questions, what's the difference? So let's try this configuration:
        upstream test {
            server 192.168.0.1 weight=5;
            ip_hash;
            server 192.168.0.2 weight=7;
        }

The magic came out:

        nginx: [emerg] invalid parameter "weight=7" in nginx.conf:103
        configuration file nginx.conf test failed

You ip_hash instructions can really affect the resolution of the configuration.

Instructions

Configure the decision instruction system, and now let's look at ip_hash definition of the instruction:

    static ngx_command_t  ngx_http_upstream_ip_hash_commands[] = {

        { ngx_string("ip_hash"),
          NGX_HTTP_UPS_CONF|NGX_CONF_NOARGS,
          ngx_http_upstream_ip_hash,
          0,
          0,
          NULL },

        ngx_null_command
    };

There is nothing special except that the instruction property is NGX_HTTP_UPS_CONF. This property indicates that the scope of application of the directive is upstream.

Hook

With the experience gained from the previous sections, you should know that this is the entry point for modules. The hook code for the load balancing module is regular, and here the ip_hash is analyzed by using the module.

    static char *
    ngx_http_upstream_ip_hash(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
    {
        ngx_http_upstream_srv_conf_t  *uscf;

        uscf = ngx_http_conf_get_module_srv_conf(cf, ngx_http_upstream_module);

        uscf->peer.init_upstream = ngx_http_upstream_init_ip_hash;

        uscf->flags = NGX_HTTP_UPSTREAM_CREATE
                    |NGX_HTTP_UPSTREAM_MAX_FAILS
                    |NGX_HTTP_UPSTREAM_FAIL_TIMEOUT
                    |NGX_HTTP_UPSTREAM_DOWN;

        return NGX_CONF_OK;
    }

There are two points in this code that deserve our attention. One is the setting for uscf-gt;flags, and the other is to set init_upstream callback.

Set up uscf-sgt;flags

  1. NGX_HTTP_UPSTREAM_CREATE: Create a flag, and if it contains a create flag, Nginx checks for duplicate creation and whether the necessary parameters are filled in;

  2. NGX_HTTP_UPSTREAM_MAX_FAILS: You can use the max_fails property in server;

  3. NGX_HTTP_UPSTREAM_FAIL_TIMEOUT: You can use the fail_timeout property in server;

  4. NGX_HTTP_UPSTREAM_DOWN: You can use the down property in the server;

  5. NGX_HTTP_UPSTREAM_WEIGHT: The weight property can be used in server;

  6. NGX_HTTP_UPSTREAM_BACKUP: You can use the backup property in the server.

If a wise reader is reminded of the magic configuration error they just encountered, one can conclude that the properties supported by the server instruction in the load balancing module server of the load balancing module. T his is an important nature because different load balancing modules support different properties, so it makes sense to resolve the configuration file by detecting whether unsupported load balancing properties are used and giving error warnings. However, this mechanism is also flawed, as shown in the previous example, and there is no mechanism to append checks that server instructions that do not support properties were configured before server

Set init_upstream callback

When Nginx initializes upstream, the set callback function is called in the ngx_http_upstream_init_main_conf function to initialize the load balancing module. W hat is not well understood here is the exact location of uscf. The following diagram illustrates the memory layout of the configuration of the upstream load balancing module.

Nginx load balancing module

As you can see from the figure, there is an array of pointers in the configuration items of the ngx_upstream_module module in MAIN_CONF, and each element in the array corresponds to the information for each upstream in the configuration file. More specific will be discussed in a later topic of principle.

Initialize the configuration

init_upstream The callback function needs to initialize the configuration of the load balancing module when executing, and also set up a new hook function, which is called as an initialization function when Nginx processes each request, and the function of this new hook function is described in detail later. Here, we first analyze the code for the initialization configuration of the IP hash module:

    ngx_http_upstream_init_round_robin(cf, us);
    us->peer.init = ngx_http_upstream_init_ip_hash_peer;

The code is simple: the IP hash module first calls the initialization function of round Robin, another load balancing module, and then sets its own initialization hook for the processing request phase. I n fact, several load balancing modules can form a list, each time starting with the module at the top of the chain. I f the module decides not to process, the processing power can be handed over to the next module in the list. Here, the IP hash module specifies the Round Robin module as its successor load balancing module, so the Round Robin module is also initialized in its own initialization configuration function.

Initialize the request

After Nginx receives a request, the corresponding peer.init function is executed if it is found that access to upstream is required. T his is a callback function that is set when the configuration is initialized. T he most important function of this function is to construct a table in which the upstream servers that the current request can use are added in turn. T he most important reason why this table is needed is that if there is an exception to the upstream server and the service is not available, other servers can be obtained from this table to retry the operation. I n addition, this table can be used for load balancing calculations. The behavior of constructing this table is put here instead of in the previous phase of initializing the configuration because upstream needs to provide a separate isolation environment for each request.

To discuss the core of peer.init, let's look at the implementation of the IP hash module:

    r->upstream->peer.data = &iphp->rrp;

    ngx_http_upstream_init_round_robin_peer(r, us);

    r->upstream->peer.get = ngx_http_upstream_get_ip_hash_peer;

The first line is to set the data pointer, which points to the table mentioned earlier;

The second line is to call the round Robin module's callback function to request initialization of the module. As mentioned earlier, one load balancing module can invoke other load balancing modules to supplement its functionality.

The third line is to set up a new callback function get. T his function is responsible for removing a server from the table. I n addition to the get callback function, there is another callback function for r->upstream->peer.free T he function is called after the upstream request is complete and is responsible for doing some rehabilitation work. F or example, if we need to maintain an upstream server access counter, we can add 1 to it in the get function and minus 1 in free. I n the case of SSL, Nginx also provides two callback functions peer.set_session and peer.save_session. In general, there are two entry points for the load balancing algorithm, one here and the other in the get callback function.

peer.get and peer.free callback functions

These two functions are the bottom layer of the load balancing module and are responsible for actually getting a connection and reclaiming a connection in preparation. T his is said to be preparatory because in both functions, the action of establishing or releasing a connection is not actually performed, but simply the operation of getting the address of the connection or maintaining the connection state. T o be clear, getting the address information for the connection in the peer.get function does not mean that the connection must not have been established at this time, but rather, with the return value of the get function, Nginx can see if a connection is available and whether the connection has been established. These return values are summarized below:

Returns a value Description Nginx follow-up action
NGX_DONE The connection address information is obtained and the connection is established. Send data directly using the connection.
NGX_OK The connection address information is obtained, but the connection is not established. To establish a connection, such as a connection that cannot be established immediately, set events,
Suspend this request and execute another request.
NGX_BUSY All connections are not available. Returns a 502 error to the client.

Readers see this table above and there may be a few questions that arise:

Q: When is the connection established?

A: When you use a back-end keyalive connection, the connection is not closed after use, but is stored in a queue, and new requests only need to remove the connection from the queue, which is ready.

Q: What do you mean all connections are not available?

A: During the initialization of the request, a table is established, and the get function is responsible for removing a connection each time it is not repeated, i.e. all connections are not available when a new connection cannot be obtained from the table.

Q: For a request, could the peer.get function be called multiple times?

A: Officially so. W hen a peer.get function gets a connection address that doesn't connect, or requests an unusual response from the corresponding server, Nginx executes ngx_http_upstream_next and may then call the peer.get function again to try another connection. The overall process for upstream is as follows:

Nginx load balancing module

This section reviews

This section describes the basic components of the load balancing module. T he configuration area of the load balancing module is concentrated in the upstream block. T he callback function system of the load balancing module starts with init_upstream, goes through init_peer, and eventually reaches peer.get and peer.free. w here init_peer is responsible for establishing a list of servers used for each request, peer.get is responsible for selecting a server from the server list (generally without repeated selection), and peer.free is responsible for the release of resources before the server is released. Finally, this section shows the relationship between the upstream module and the load balancing module during request processing through a diagram.