Horizontal Pod Autoscaler (HPA) in Kubernetes

Trần_Tuấn_Anh

0.00/5 (No votes)

17 Aug 2024CPOL2 min read

1.9K

The Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically adjusts the number of pods in a deployment to match the demand, based on metrics like CPU or memory. This ensures optimal performance and scalability of applications.

1. Concept

Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically adjusts the number of replica pods in a deployment 1 based on observed system metrics. It ensures that your application always has the necessary resources to handle the current workload, preventing performance degradation during peak usage periods and saving costs during low-demand periods.

2. Essential parts

Metrics Server: An HPA (Horizontal Pod Autoscaler) requires a Metrics Server to collect and provide data on the resource utilization of pods.
Deployment/ReplicaSet: An HPA operates on these objects to scale the number of pods up or down.

3. How to Use HPA

3.1 Installing Metrics Server

Before you can leverage the Horizontal Pod Autoscaler (HPA) feature, you need to ensure that the Metrics Server is installed and running within your Kubernetes cluster. The Metrics Server is responsible for collecting resource usage metrics from your cluster nodes and pods, providing essential data for HPA to make informed decisions about scaling your applications.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

3.2 Creating an HPA

Here's an example of how to configure an HPA for a deployment. Let's say you have a deployment named "my-app" and you want to automatically scale the number of pods based on CPU utilization.

Create deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app-image
        resources:
          requests:
            cpu: "100m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Create HPA configuration:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Explanation:

scaleTargetRef: The object that HPA will automatically adjust the number of pods for. In this case, it's the Deployment named "my-app".
minReplicas: The minimum number of pods.
maxReplicas: The maximum number of pods.
metrics: The type and threshold of the metric used to adjust the number of pods. In this example, HPA will adjust based on CPU utilization with a target average of 50%.

Apply configuration

kubectl apply -f hpa.yaml

4. Key Considerations When Using HPA

Ensure Metrics Server Accuracy: HPA relies on the Metrics Server to function. Verify that the Metrics Server is running and providing accurate data.
Optimize Configuration: Set appropriate values for minReplicas and maxReplicas to avoid excessive scaling up or down.
Monitor and Adjust: Monitor HPA performance and adjust configuration as needed to ensure it operates efficiently.
Check Resource Utilization: Ensure your application is not experiencing resource contention. Use tools like Prometheus to monitor resource usage.
Test and Validate: Before deploying to production, test the HPA configuration in a development or staging environment to ensure it functions as expected.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)