How to set up horizontal pod scaling on DigitalOcean

intermediate 12 min read Updated 2026-05-25

Quick Answer

Horizontal Pod Autoscaler (HPA) on DigitalOcean Kubernetes automatically scales your application pods based on CPU, memory, or custom metrics. Enable the metrics server, create an HPA resource with target metrics, and Kubernetes will automatically add or remove pods to meet demand.

Full DigitalOcean Review

Prerequisites

Active DigitalOcean Kubernetes cluster
kubectl configured for your DOKS cluster
Basic understanding of Kubernetes pods and deployments
Application already deployed to the cluster

Step-by-Step Instructions

Verify metrics server is enabled

Connect to your DigitalOcean Kubernetes cluster and check if the metrics server is running:

kubectl get deployment metrics-server -n kube-system

If not found, enable it in your DOKS cluster settings in the DigitalOcean control panel under Kubernetes > Your Cluster > Settings > Add-ons and enable Metrics Server.

The metrics server is required for HPA to collect CPU and memory metrics from your pods.

Configure resource requests for your deployment

Edit your deployment to include CPU and memory resource requests, which are required for HPA:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:latest
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

Apply the changes: kubectl apply -f deployment.yaml

Resource requests are mandatory for CPU-based scaling and recommended for memory-based scaling.

Create the Horizontal Pod Autoscaler

Create an HPA resource file hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply the HPA: kubectl apply -f hpa.yaml

Start with conservative scaling limits and adjust based on your application's performance and cost requirements.

Verify HPA status and configuration

Check that your HPA is active and collecting metrics:

kubectl get hpa my-app-hpa

View detailed HPA information:

kubectl describe hpa my-app-hpa

You should see current CPU utilization, target utilization, and current replica count. It may take a few minutes for metrics to appear.

Configure memory-based scaling (optional)

Add memory metrics to your HPA for more comprehensive scaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Apply the updated configuration: kubectl apply -f hpa.yaml

Memory-based scaling helps prevent out-of-memory errors during traffic spikes.

Test the autoscaling behavior

Generate load on your application to test scaling. Use a load testing tool or create a temporary pod:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh

Inside the pod, generate requests:

while true; do wget -q -O- http://my-app-service/; done

Monitor the HPA response: kubectl get hpa my-app-hpa --watch

Scaling decisions typically take 15-30 seconds for scale-up and 5 minutes for scale-down to prevent flapping.

Monitor and adjust scaling parameters

Use DigitalOcean's monitoring dashboard to track your cluster performance under Kubernetes > Your Cluster > Insights. Adjust HPA parameters based on observed behavior:

Lower averageUtilization for more aggressive scaling
Increase maxReplicas if you hit the limit during peak traffic
Adjust minReplicas based on baseline traffic requirements

Update your HPA configuration and reapply: kubectl apply -f hpa.yaml

Review scaling events regularly using kubectl describe hpa to optimize your thresholds.

Common Issues & Troubleshooting

HPA shows 'unknown' for current CPU utilization

Ensure the metrics server is running (kubectl get pods -n kube-system | grep metrics-server) and your deployment has CPU resource requests defined. Wait 2-3 minutes after deployment for metrics to populate.

Pods are not scaling up despite high CPU usage

Check if you've reached the maxReplicas limit and verify your DigitalOcean node pool has sufficient capacity. Use kubectl describe nodes to check resource availability and consider adding more nodes if needed.

Scaling happens too frequently (flapping)

Add stabilization windows to your HPA configuration:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 60
  scaleDown:
    stabilizationWindowSeconds: 300

This prevents rapid scaling changes.

HPA cannot scale below minimum replicas during low traffic

This is expected behavior. If you want fewer replicas during off-peak hours, consider using Vertical Pod Autoscaler (VPA) alongside HPA or manually adjust minReplicas in your HPA configuration for different time periods.

Prices mentioned in this guide are pulled from current plan data and may change. Always verify on the official DigitalOcean website before purchasing.

Visit DigitalOcean View DigitalOcean Pricing

Related to DigitalOcean

ReviewDigitalOcean Review PricingDigitalOcean Pricing AlternativesDigitalOcean Alternatives