1. Concept
Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically adjusts the number of replica pods in a deployment 1 based on observed system metrics. It ensures that your application always has the necessary resources to handle the current workload, preventing performance degradation during peak usage periods and saving costs during low-demand periods.
2. Essential parts
- Metrics Server: An HPA (Horizontal Pod Autoscaler) requires a Metrics Server to collect and provide data on the resource utilization of pods.
- Deployment/ReplicaSet: An HPA operates on these objects to scale the number of pods up or down.
3. How to Use HPA
3.1 Installing Metrics Server
Before you can leverage the Horizontal Pod Autoscaler (HPA) feature, you need to ensure that the Metrics Server is installed and running within your Kubernetes cluster. The Metrics Server is responsible for collecting resource usage metrics from your cluster nodes and pods, providing essential data for HPA to make informed decisions about scaling your applications.
kubectl apply -f https:
3.2 Creating an HPA
Here's an example of how to configure an HPA for a deployment. Let's say you have a deployment named "my-app" and you want to automatically scale the number of pods based on CPU utilization.
Create deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-container
image: my-app-image
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Create HPA configuration:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Explanation:
scaleTargetRef
: The object that HPA will automatically adjust the number of pods for. In this case, it's the Deployment named "my-app". minReplicas
: The minimum number of pods. maxReplicas
: The maximum number of pods. metrics
: The type and threshold of the metric used to adjust the number of pods. In this example, HPA will adjust based on CPU utilization with a target average of 50%.
Apply configuration
kubectl apply -f hpa.yaml
4. Key Considerations When Using HPA
- Ensure Metrics Server Accuracy: HPA relies on the Metrics Server to function. Verify that the Metrics Server is running and providing accurate data.
- Optimize Configuration: Set appropriate values for minReplicas and maxReplicas to avoid excessive scaling up or down.
- Monitor and Adjust: Monitor HPA performance and adjust configuration as needed to ensure it operates efficiently.
- Check Resource Utilization: Ensure your application is not experiencing resource contention. Use tools like Prometheus to monitor resource usage.
- Test and Validate: Before deploying to production, test the HPA configuration in a development or staging environment to ensure it functions as expected.
Read more at : Horizontal Pod Autoscaler (HPA) in Kubernetes