Introduction to the Nginx upstream module

May 23, 2021 Nginx Getting started

1. Introduction to the upstream module

2. upstream module interface

3. Memcached module analysis

4. Handler module?

5. Upstream module

6. Callback function

7. This section reviews

Introduction to the upstream module

Nginx modules are generally divided into three main categories: handler, filter, and upstream. I n the previous sections, readers have learned about handler and filter. W ith these two types of modules, Nginx can easily do any stand-alone work. The upstream module described in this chapter will enable Nginx to receive, process, and forward network data across stand-alone limits.

Data forwarding provides Nginx with horizontal processing power across a stand-alone machine, freeing Nginx from the limitations of providing only a single capability for endpoints and emerging it with strategic capabilities for network application-level splitting, encapsulation, and consolidation. I n today's cloud model, data forwarding is a key component of Nginx's ability to build a network application. O f course, given the cost of development, the key components of a web application are often developed in a high-level programming language at first. B ut when the system reaches a certain size and needs to pay more attention to performance, the components developed by the high-level language must be structured to meet the required performance goals. A t this point, Nginx's upstream module is extremely attractive for modification costs because it is inherently fast. As an incident, Nginx's configuration system provides a high degree of hierarchy and loose coupling that makes the system more scalable.

After all, here's how to write upstream.

upstream module interface

In essence, upstream belongs to handler, but instead of producing his own content, he gets it by requesting a back-end server, so it's called upstream. The entire process of requesting and obtaining response content has been encapsulated inside Nginx, so the upstream module only needs to develop a few callback functions to complete specific tasks such as constructing requests and parsing responses.

These callback functions are shown in the following table:

SN	Describe
create_request	Generates a request buffer (buffer chain) sent to the back-end server, which is used when initializing upstream.
reinit_request	In the event of an error on one back-end server, Nginx tries another back-end server. Once Nginx selects a new server, this function is called to re-initialize the working state of the upstream module, and then the upstream connection is made again.
process_header	Process the header of the information returned by the back-end server. The so-called head is provided for by the protocol that communicates with upstreamserver, such as the header portion of the HTTP protocol, or the response status portion of the memcached protocol.
abort_request	Called when the client discards the request. There is no need to implement the function to shut down the back-end server connection, the system will automatically complete the steps to close the connection, so generally this function will not do any specific work.
finalize_request	Calling the function after a normal request with the back-end server is the same as abort_request, and generally does not do any specific work.
input_filter	Handles the response body returned by the back-end server. N ginx's default input_filter encapsulates the received content as a buffer ngx_chain. T he chain is positioned by out_bufs field of upstream, so developers can get the body data returned by the back-end server outside the module. The memcached module implements its input_filter module, which is analyzed later.
input_filter_init	Initialize the context of the input filter. Nginx's default input_filter_init returns directly.

Memcached module analysis

memcache is a high-performance distributed cache system that is very widely used. m emcache defines a set of private communication protocols that make it impossible to access memcache through HTTP requests. But the protocol itself is simple and efficient, and memcache is widely used, so most modern development languages and platforms provide memcache support for developers to use.

Nginx provides ngx_http_memcached module that provides the ability to read data from memcache, not to write data to memcache. As a Web server, this design is acceptable.

Let's start analyzing ngx_http_memcached modules and get a glimpse of the mysteries of upstream.

Handler module?

At first glance at the memcached module, you might not find it special. If you look a little closer, or even feel a bit like the handler module, when you see this code, you must wonder why it is exactly the same as the handler module.

        clcf = ngx_http_conf_get_module_loc_conf(cf, ngx_http_core_module);
        clcf->handler = ngx_http_memcached_handler;

Because the upstream module uses the handler module access. At the same time, the instruction system of the upstream module is designed to follow the basic rules of the handler module: configure the module to execute the module.


        { ngx_string("memcached_pass"),
          NGX_HTTP_LOC_CONF|NGX_HTTP_LIF_CONF|NGX_CONF_TAKE1,
          ngx_http_memcached_pass,
          NGX_HTTP_LOC_CONF_OFFSET,
          0,
          NULL }

So people think familiarity is a good thing, which means that you are already familiar with Handler's writing.

Upstream module

So what's so special about the upstream module? T he answer is in the implementation of the module handler. T he handler of the upstream module does everything that contains a fixed process. In memcached's example, you can look at ngx_http_memcached_handler code for the code, and you can see that this fixed operating process is:

Create an upstream data structure.

        if (ngx_http_upstream_create(r) != NGX_OK) {
            return NGX_HTTP_INTERNAL_SERVER_ERROR;
        }

Set the module's tag and schema. Schema is now only used for logs, and tags are used buf_chain management.

        u = r->upstream;

        ngx_str_set(&u->schema, "memcached://");
        u->output.tag = (ngx_buf_tag_t) &ngx_http_memcached_module;

Set up the back-end server list data structure for upstream.

        mlcf = ngx_http_get_module_loc_conf(r, ngx_http_memcached_module);
        u->conf = &mlcf->upstream;

Set up the upstream callback function. The code listed here slightly adjusts the code order.

        u->create_request = ngx_http_memcached_create_request;
        u->reinit_request = ngx_http_memcached_reinit_request;
        u->process_header = ngx_http_memcached_process_header;
        u->abort_request = ngx_http_memcached_abort_request;
        u->finalize_request = ngx_http_memcached_finalize_request;
        u->input_filter_init = ngx_http_memcached_filter_init;
        u->input_filter = ngx_http_memcached_filter;

Create and set up the upstream environment data structure.

        ctx = ngx_palloc(r->pool, sizeof(ngx_http_memcached_ctx_t));
        if (ctx == NULL) {
            return NGX_HTTP_INTERNAL_SERVER_ERROR;
        }

        ctx->rest = NGX_HTTP_MEMCACHED_END;
        ctx->request = r;

        ngx_http_set_ctx(r, ctx, ngx_http_memcached_module);

        u->input_filter_ctx = ctx;

Complete the upstream initialization and finish the work.

        r->main->count++;
        ngx_http_upstream_init(r);
        return NGX_DONE;

Any upstream module, as simple as memcached, complex as proxy, fastcgi is the same. T he biggest differences between the different upstream modules in these 6 steps occur on 2, 3, 4, 5. S teps 2 and 4 are easy to understand, and the flags set by different modules and the callback functions used are certainly different. S tep 5 is not difficult to understand, only Step 3 is the most obscure, different modules in the back-end server list, the policy differences are very large, such as memcached as simple and straightforward, but also as logical as proxy complex. The problem is wrote down first, and then the memcached analysis is clear, and then discussed separately.

Step 6 is the norm. A dd count to 1, and then return to NGX_DONE. N ginx encounters this situation, and although the processing of the current request is considered to have ended, the memory resources used by the request are not released and the connection to the client is not closed. T his is necessary because Nginx establishes a one-to-one relationship between upstream and client requests, and uses these data structures that hold client information when you send the upstream response back to the client using ngx_event_pipe later. This section will be described in the principle section later, and it will not be expanded here.

The design of binding upstream requests to client requests one-on-one has advantages and drawbacks. T he advantage is to simplify module development, you can focus on module logic, and the defects are equally obvious, one-to-one design can not meet the needs of complex logic. This will be explained later in the principle.

Callback function

The skeleton of the memcached module was dissected earlier, and now we're going to solve each callback function one by one.

ngx_http_memcached_create_request: It's easy to generate a key based on what you set up, and then generate a "get $key" request, which is placed in the r-gt;upstream-request_bufs.
ngx_http_memcached_reinit_request: No initialization required.
ngx_http_memcached_abort_request: No additional operation is required.
ngx_http_memcached_finalize_request: No additional operation is required.
ngx_http_memcached_process_header: The module's business focus function. The head information of the memcache protocol is defined as the first line of text, which can be found in this code proof:

        for (p = u->buffer.pos; p < u->buffer.last; p++) {
            if ( * p == LF) {
            goto found;
        }

If the LF character is not found in the read-in buffered data, the function returns NGX_AGAIN, indicating that the head is not fully read and that the data needs to continue to be read. Nginx calls the function again after it receives the new data.

Nginx uses only one cache when processing the response header of the back-end server, where all the data is in the cache, so there is no need to consider that the head information spans multiple caches when parsing the header information. If the head is too large to be saved in this cache, Nginx returns an error message to the client and logs the error log, suggesting that the cache is not large enough.

process_header is to translate the state returned by the back-end server into the state returned to the client. For example, in ngx_http_memcached_process_header, there are several pieces of code:

        r->headers_out.content_length_n = ngx_atoof(len, p - len - 1);

        u->headers_in.status_n = 200;
        u->state->status = 200;

        u->headers_in.status_n = 404;
        u->state->status = 404;

u-gt;state is used to calculate upstream-related variables. F or example, u-state-state-status will be used to calculate the value of the upstream_status". T he u-gt;headers_in returned status code as a response to the client. The first line sets the length of the response returned to the client.

One thing you can't forget in this function is that after processing the header information, you need to move the read pointer pos back, otherwise the data will also be copied into the body of the response returned to the client, resulting in incorrect body content.


        u->buffer.pos = p + 1;

process_header function completes the correct processing of the response header and should return NGX_OK. I f you return NGX_AGAIN, the full data is not read and you need to continue reading the data from the back-end server. Return NGX_DECLINED is meaningless, any other return value is considered an error state, Nginx ends the upstream request and returns an error message.

ngx_http_memcached_filter_init: Fix the length of content received from the back-end server. Because this part of the length is not added when processing header.
ngx_http_memcached_filter: The memcached module is one of the few modules with callback functions that handle the body. B ecause the memcached module needs to filter the CRLF "END" CRLF at the end of the body, it implements its own filter callback function. T he practical meaning of working with the body is to encapsulate the body's valid content received from the back-end server into ngx_chain_t and add it to the out_bufs of the u-out_bufs server. I nstead of copying the data, Nginx establishes a ngx_buf_t structure that points to the data memory areas and then ngx_chain_t these bufs. This implementation avoids large memory relocations and is one of the secrets of Nginx's efficiency.

This section reviews

This section describes the basic composition of the upstream module. T he upstream module is developed from the handler module, and the instruction system and module take effect in the same way as the handler module. T he difference is that the upstream module sets many callback functions in the handler function. T he actual work is done by these callback functions. E ach callback function is executed at a fixed stage of upstream, and most callback functions are generally not actually used. Upstream's most important callback functions are create_request, process_header, and input_filter, which together implement the parsing portion of the protocol with the back-end server.