By default, if we don’t specify a number of replicas in the deployment manifest, it defaults to 1. We could always guesstimate and manually increase the number of replicas to some ceiling, but that would not only be expensive in the cloud world, but also give us the inability to scale beyond that if we get an unexpected spike in traffic.
Luckily, Kubernetes supports pod autoscaling which adds or removes pods based on the average resource consumption of a given Kubernetes object. The mechanism is called a Horizontal Pod Autoscaler. Let’s take a look at a simple example below.
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: author spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: author minReplicas: 1 maxReplicas: 5 targetCPUUtilizationPercentage: 50
This is targeting the pods in the author deployment and specifying the minimum and maximum replicas desired as well as the CPU utilization threshold that needs to be crossed before spinning up new instances.
It’s important to note that the target CPU percentage is the average across all pods in the deployment.
Once we deploy the above manifest, we can check the number of replicas and see what the current average CPU usage is with the command below:
kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE author Deployment/author 1%/50% 1 5 1 1m
You’ll notice that we currently have 1 replica and the average CPU usage is at 1%. This would be 1% of the default setting for requested CPU resources which is 100m (100 millicpu or millcores) – equivalent to 0.1 CPU. This can always be changed by adding a resources node to your deployment manifest like so:
apiVersion: apps/v1 kind: Deployment metadata: name: author labels: app: author spec: selector: matchLabels: app: author template: metadata: labels: app: author spec: containers: - name: author image: gcr.io/brandon-versus/author-service:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 100m limits: cpu: 200m
It’s always good practice to specify not just the requested amounts, but also the limits too especially if you’re operating on nodes with limited resources as getting close to maxing out resources can cause nodes to become unstable: https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/
To see the HPAs in action, I decided to use the ab benchmarking tool. By sending enough load, I’m hoping I can get the average CPU past the 50% threshold and see new pods getting created. I did so by port forwarding to the existing pod and using the following command:
ab -n 10000 -c 100 http://localhost:5000/authors
After a minute or so, I looked up the HPA stats and saw that our current CPU has surpassed our setting to scale up:
kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE author Deployment/author 89%/50% 1 5 1 5m
Shortly after, I checked again and we were up to three replicas!
kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE author Deployment/author 67%/50% 1 5 3 6m
kubectl get pods NAME READY STATUS RESTARTS AGE author-7c488dbbd4-88hzc 1/1 Running 0 28m author-7c488dbbd4-jkr9m 1/1 Running 0 1m author-7c488dbbd4-tnk7h 1/1 Running 0 1m
After some time had passed after my ab command completed, I noticed that the pods scaled back down to the minimum we set earlier on in the HPA manifest.
kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE author Deployment/author 1%/50% 1 5 1 141m
kubectl get pods NAME READY STATUS RESTARTS AGE author-7c488dbbd4-jkr9m 1/1 Running 0 37m
I don’t recall off-hand, but there is a setting that will scale down pods after a period of inactivity – I think it was somewhere around 10 minutes or so. I’ll update the post when I come across this.