Deploying the Kubernetes app for the first time makes it easy to ignore the details

May 31, 2021 Article blog

1. 1. Configure Pod requests and restrictions

2. 2. Configure the Liveness and Readiness probes

3. 3. Set the default Pod network policy

4. 4. Perform custom behaviors with Hooks and Init containers

The article comes from the public number: Architecture Headline By Julian Gindi

According to the author's personal experience, most people seem to like to dump the application to Kubernetes either through Helm or manually, and then wait every day for a good life that can be easily invoked. But in GumGum's practice, we've learned about the series of "traps" of the Kubernetes app, and we want to share them to give you a little inspiration for your Kubernetes quest.

1. Configure Pod requests and restrictions

Let's start by configuring a simple environment where you can run your Pod. K ubernetes does do do a good job of handling Pod scheduling and failure states, but we also realize that if the Kubernetes scheduler can't measure how much resources the Pod needs to run successfully, sometimes deployment can be challenging. T his challenge is also the root cause of the design of resource request and restriction mechanism. A t present, there is still a lot of controversy about setting up best practices for application requests and restrictions. I n fact, the work is more like an art than a mere science. Let's talk about GumGum's internal views on this issue:

Pod request: This is the primary metric that the scheduler uses to measure the best way to deploy a Pod.

Here's a look at the description in the Kubernetes documentation:

The filtering step finds a set of Pods where feasible. For example, the PodFitsResources filter checks that the candidate node has sufficient available resources to meet a specific resource request from the Pod.

Internally, we use application requests in such a way that by setting up, we estimate the resource requirements when the application is running its actual workload. O n this basis, the scheduler can place nodes more rationally. I nitially, we wanted to set the request higher to ensure that each Pod had sufficient resources. B ut we quickly discovered that this approach significantly increased scheduling time and prevented some Pods from being fully scheduled. T he result is actually similar to what we see when we don't specify a resource request at all: in the latter case, because the control plane doesn't know how many resources the application needs, the scheduler often "evicts" the pod and doesn't reschedule it. It is the key component of this scheduling algorithm that makes it impossible for us to get the scheduling effect that is expected.

Pod limit: The direct limit on the Pod represents the maximum amount of resources that the cluster allows each container to use.

Also look at the description in the official documentation:

If you set a memory limit of 4GiB for the container, the kubelet (and when the container runs) enforces this limit. T he runtime prevents the container from using resource capacity that exceeds the configured limit. For example, when a process in a container consumes more memory than is approved, the system kernel terminates the resource allocation attempt and prompts for an out-of-memory (OOM) error.

Containers can use more actual resources than they request, but never above the configuration limit. O bviously, it is difficult, but also very important, to set up the indicator correctly. I deally, we want the Pod's resource requirements to change throughout the process lifecycle without interfering with other processes on the system -- and that's what the restriction mechanism is all about. Unfortunately, we are unable to give the most appropriate settings explicitly and can only adjust by following the following procedure:

Using the load testing tool, we can simulate baseline traffic levels and observe Pod's resource usage, including memory and CPU.
We set the Pod request to a very low level while keeping the Pod resource limit at about 5 times the requested value, and then observing its behavior. When the request is too low, the process cannot start and often raises mysterious Go runtime errors.

One thing to emphasize here is that the more restrictive the resources, the more difficult it is to schedule your Pod. T his is because Pod scheduling requires that the target node have sufficient resources. F or example, if you have very limited resources (only 4GB of memory), even running lightweight Web server processes can be very difficult. I n this case, you need to scale out, and each new container should run on top of a node that also has at least 4GB of available memory. I f such a node does not exist, you will need to introduce a new node into the cluster to handle the pod, which will undoubtedly increase startup time. In summary, it is important to find the minimum "border" between resource requests and constraints to ensure fast, balanced scaling.

2. Configure the Liveness and Readiness probes

Another interesting topic often discussed in the Kubernetes community is how to configure Linves and Readiness probes. T he rational use of these two probes gives us a mechanism for running fault-tolerant software and minimizing downtime. H owever, if not configured correctly, they can also have a significant performance impact on the application. Here's a look at the basics of both probes and how to use them:

Liveness probe: "Used to indicate whether the container is running." I f the Liveness probe fails, the kubelet closes the container and the container starts executing the restart policy. I f the container does not provide a Liveness probe, its default state is considered successful. " - Kubernetes documentation

The resource requirements for Liveness probes must be low because they need to run frequently and notify Kubernetes when the application is running. N ote that if you set it to run once per second, the system will need to assume an additional request processing capacity of 1 request per second. T herefore, it is important to carefully consider how to handle these additional requests and the appropriate resources. A t GumGum, we set the Liveness probe to respond when the main component of the application runs, regardless of whether the data is fully available (for example, data from a remote database or cache). F or example, we'll set up a specific "health" endpoint in the app that is solely responsible for returning 200 response codes. As long as the response is still being returned, the process has started and can process the request (but traffic has not yet been formally generated).

Readyness probe: "Indicates whether the container is ready to process the request." I f the Readiness probe fails, the endpoint controller removes the IP address of the Pod from all service endpoints that match the Pod. ”

The Readyness probe is much more expensive to run because it keeps the back end informed that the entire application is running and ready to receive requests. T here is much debate in the community about whether this probe should access the database. G iven the overhead of the Readiness probe, which requires frequent runs but frequent flexibility to adjust, we decided to "provide traffic" in some applications only after records were returned from the database. By carefully designing the Readiness probe, we have been able to achieve higher levels of availability and zero downtime deployment.

But if it's really necessary to check the readiness of database requests at any time through the application's Readiness probe, try to control the amount of resources for query operations as much as possible, such as...

SELECT small_item FROM table LIMIT 1

Here are the configuration values we specified for both probes in Kubernetes:

livenessProbe:
httpGet:
path: /api/liveness
port: http
readinessProbe:
httpGet:
path: /api/readiness
port: http  periodSeconds: 2

You can also add some other configuration options:

Seconds after the initialDelaySeconds- container starts, the probe starts actually running
periodSeconds - the waiting interval between probes
timeoutSeconds - How many seconds does it take to determine if a Pod is in a faulty state? Equivalent to a time-out indicator in the traditional sense
The failureThreshold-probe fails many times before sending a restart signal to the Pod
How many times does the success of the successThreshold-probe determine that the Pod is in a ready state (usually after the Pod starts or fails to recover)

3. Set the default Pod network policy

Kubernetes uses a "flat" network topology; H owever, combined with practical use cases, this communication capability is often unnecessary or even unacceptable. A potential security risk is that if a vulnerable application is exploited, an attacker can gain full access and send traffic to all Pods on the network. Therefore, it is also necessary to apply the minimum access principle in Pod networks, ideally specifying which containers are allowed to establish interconnects through network policies.

Take the following simple strategy, for example, to see that it will deny all portal traffic in a particular namespace:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress

4. Perform custom behaviors with Hooks and Init containers

One of the core goals we want to achieve in the Kubernetes system is to try to provide deployment support with near-zero downtime for existing developers. H owever, different applications often have different shutdown methods and resource cleanup processes, so the overall zero downtime goal is difficult to achieve. T he first thing that lies before us is the difficulty of Nginx. W e noticed that when we started a rolling deployment of the Pod, the active connection was dropped before it was successfully terminated. A fter extensive online research, kubernetes have proven that they do not wait for Nginx to exhaust its connectivity resources before terminating the Pod. With pre-stop hooks, we were able to inject this feature and thus achieve zero downtime.

lifecycle:
preStop:
exec:
command: ["/usr/local/bin/nginx-killer.sh"]

isnginx-killer.sh:

#!/bin/bashsleep 3
PID=$(cat /run/nginx.pid)
nginx -s quitwhile [ -d /proc/$PID ]; do
echo "Waiting while shutting down nginx..."
sleep 10
done

Another practical example is to handle launch tasks for a particular application through an Init container. S ome popular Kubernetes projects also use init-containers such as Istio to inject processing code into Pods. I nit containers are especially useful if you need to complete the heavy database migration process before your application starts. You can also set a higher resource cap for this process to ensure that it is not affected by the restrictions set by the main application.

Another common pattern is to provide secret access to init-conatiner, and the container publishes these credentials to the primary pod, preventing unauthorized access to secret through the main application Pod body. Again, look at the statements in the documentation:

Init containers can safely run utilities or custom code to prevent them from breaking the security of application container images. By stripping these unnecessary tools, you can limit the attack surface of the application container image.

5. Kernel tuning

Finally, let's talk about a state-of-the-art technology. K ubernetes itself is a highly flexible platform to help you run your workloads in the most appropriate way. A t GumGum, we have a wide range of high-performance applications that place extreme demands on operating resources. A fter extensive load testing, we found it difficult for an application to handle the necessary traffic load with Kubernetes default settings. B ut Kubernetes allows us to run a high-privileged container by modifying its configuration to run parameters for the core for a particular Pod. With the following sample code, we modified the maximum number of open connections in the Pod:

initContainers:
- name: sysctl
image: alpine:3.10
securityContext:
privileged: true
command: ['sh', '-c', "sysctl -w net.core.somaxconn=32768"]

This is an advanced technology that uses less frequently. I f your application is difficult to run healthily in a high-load scenario, you may need to adjust some of its parameters. It is recommended that you refer to the details of parameter tuning and optional values in the official documentation.

Summary

While Kubernetes is already an almost "out-of-the-box" solution, there are a number of key steps that need to be taken to keep your application running in a balanced way. T hroughout the process of migrating your application to Kubernetes, it's important to focus on the load test "loop" - running the application, testing it for load, observing metrics and scaling behavior, adjusting your configuration based on results, and then repeating it. T ry to set the expected traffic objectively and try to increase the traffic to an excess level to see which components will be paralyzed first. W ith this iterative approach, you may only need to take some of the steps described in this article to get the ideal application run. In summary, always focus on the following core issues:

What is the resource footprint of my application? How does occupancy change?
What are the actual scaling requirements for the service? W hat average traffic is expected to be processed? What is the level of peak traffic?
How often might the service need to scale out? How long will it take for the new Pod to officially start receiving traffic?
Is our Pod termination process elegant and controllable? I s this elegance and controllability needed? Can we achieve zero downtime deployment?
How do you minimize security risks and limit the "explosion radius" (range of impact) of Pod intrusions? Are there some unnecessary permissions or access capabilities in the service?

Kubernetes is an impressively powerful platform where you can use best practices to deploy thousands of services across your cluster. B ut there are always differences between different software, and sometimes your application may need to be adjusted further, so Kubernetes provides us with a number of adjustment "knobs" that make it as easy as possible for users to achieve the desired technical goals. Combining resource requests with constraints, Livens and Readiness checks, init-containers, network policies, and custom kernel tuning, you're confident that you'll be able to achieve better baseline performance, resiliency, and rapid scale scaling on top of the Kubernetes platform.

That's what W3Cschool编程狮 has to say about the first deployment of kubernetes applications, easy to ignore details, and hopefully it's helpful.