Autoscaling with Kubernetes, Part 1: Pod Autoscaling

OOTB

By default, if we don’t specify a number of replicas in the deployment manifest, it defaults to 1. We could always guesstimate and manually increase the number of replicas to some ceiling, but that would not only be expensive in the cloud world, but also give us the inability to scale beyond that if we get an unexpected spike in traffic.

Configuring Autoscaling

Luckily, Kubernetes supports pod autoscaling which adds or removes pods based on the average resource consumption of a given Kubernetes object. The mechanism is called a Horizontal Pod Autoscaler. Let’s take a look at a simple example below.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: author
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: author
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 50

This is targeting the pods in the author deployment and specifying the minimum and maximum replicas desired as well as the CPU utilization threshold that needs to be crossed before spinning up new instances.

It’s important to note that the target CPU percentage is the average across all pods in the deployment.

Once we deploy the above manifest, we can check the number of replicas and see what the current average CPU usage is with the command below:

kubectl get hpa

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
author   Deployment/author   1%/50%    1         5         1          1m

You’ll notice that we currently have 1 replica and the average CPU usage is at 1%. This would be 1% of the default setting for requested CPU resources which is 100m (100 millicpu or millcores) – equivalent to 0.1 CPU. This can always be changed by adding a resources node to your deployment manifest like so:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: author
  labels:
    app: author
spec:
  selector:
    matchLabels:
      app: author
  template:
    metadata:
      labels:
        app: author
    spec:
      containers:
      - name: author
        image: gcr.io/brandon-versus/author-service:latest
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m

It’s always good practice to specify not just the requested amounts, but also the limits too especially if you’re operating on nodes with limited resources as getting close to maxing out resources can cause nodes to become unstable: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/

Scaling Up!

To see the HPAs in action, I decided to use the ab benchmarking tool. By sending enough load, I’m hoping I can get the average CPU past the 50% threshold and see new pods getting created. I did so by port forwarding to the existing pod and using the following command:

ab -n 10000 -c 100 http://localhost:5000/authors

After a minute or so, I looked up the HPA stats and saw that our current CPU has surpassed our setting to scale up:

kubectl get hpa

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
author   Deployment/author   89%/50%   1         5         1          5m

Shortly after, I checked again and we were up to three replicas!

kubectl get hpa

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
author   Deployment/author   67%/50%   1         5         3          6m
kubectl get pods

NAME                      READY   STATUS    RESTARTS   AGE
author-7c488dbbd4-88hzc   1/1     Running   0          28m
author-7c488dbbd4-jkr9m   1/1     Running   0          1m
author-7c488dbbd4-tnk7h   1/1     Running   0          1m

Scaling Down

After some time had passed after my ab command completed, I noticed that the pods scaled back down to the minimum we set earlier on in the HPA manifest.

kubectl get hpa

NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
author   Deployment/author   1%/50%    1         5         1          141m
kubectl get pods

NAME                      READY   STATUS    RESTARTS   AGE
author-7c488dbbd4-jkr9m   1/1     Running   0          37m

I don’t recall off-hand, but there is a setting that will scale down pods after a period of inactivity – I think it was somewhere around 10 minutes or so. I’ll update the post when I come across this.