Skip to content

VPA

  • It continuously monitors and adjusts CPU and memory based on real time workloads.
  • Dynamically adjusts pod CPU and memory to meet demand.
  • Ensures applications get required resources automatically.
  • Helps in avoiding slowdown, crashes and reduce waste.

Recommender:

  • Monitors resource usage.
  • Calculates optimal allocation.
  • Analyzes historical metrics, OOM events, and VPA deployments specs.

Updater:

  • Evicts pods.
  • Applies recommender’s suggestions.

Admission Controller:

  • Adjust CPU and memory before new pods of the update request starts.
  • Validates pods status.

Vertical Pod Autoscaler CRD:

  • Monitor container CPU
  • Monitor container memory
  • Adjusts the resources over the time

Vertical Pod Autoscaler Checkpoint CRD:

  • This custom resource is for the check pointing what the VPA does.
  • Tracks historical container CPU and memory.
  • Tracks performance and usage, its valuable as this helps in decision making.
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
name: NAME
spec:
recommenders:
- name: default # Change if using custom one
config: # Optional config map for tweaked recommender params
policies:
cpu:
containerAggregation:
percentile: VALUE # [0.1, 1.0], use Nth percentile CPU usage for recommendations
usageAggregation:
mode: SampleMean | SampleMax | Percentile
memory:
containerAggregation:
percentile: VALUE # [0.1, 1.0], use Nth percentile CPU usage for recommendations
usageAggregation:
mode: SampleMean | SampleMax | Percentile
disabled: false | true # Boolean, disable this recommender if true
...
...
...
targetRef:
apiVersion: apps/v1 | apps/v1 | v1 | apps/v1
kind: Deployment | ReplicaSet | ReplicationController | StatefulSet
name: RESOURCE_NAME
updatePolicy:
updateMode: Auto | Initial | Off | Recreate
evictionRequirements:
- resources: ["cpu", "memory"] # Evict if target is higher than requests
changeRequirement: TargetHigherThanRequests
- resources: ["cpu", "memory"] # Evict if target is lower than requests
changeRequirement: TargetLowerThanRequests
resourcePolicy:
containerPolicies:
- containerName: '*' # Container name to which this config to apply, "*" means all containers
minAllowed:
cpu: VALUE
memory: VALUE
maxAllowed:
cpu: VALUE
memory: VALUE
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits | RequestsOnly # Default is RequestsAndLimits
mode: Auto | Off # Controls whether the VPA actively manages (autosizes) the resource requests and limits of a container.
...
...
...
  • Initial: VPA assigns resources at pod creation only, without changes during its lifetime.
  • Auto: VPA assigns and updates resources during the pod’s lifetime, with options for eviction and rescheduling.
  • Off: VPA does not change pod resources but sets recommended resources, useful for a dry run.
  • SampleMean:
    • Aggregates usage samples over time by calculating the average (mean) of resource usage. This produces a smooth estimate of typical usage.
    • Use when you want your recommendations to reflect typical average workload usage over time, avoiding overprovisioning from transient spikes.
  • SampleMax:
    • Aggregates usage samples by taking the maximum observed value in the usage timeframe. This captures peak utilization for conservative sizing.
    • Use when you want to allocate enough resources to sustain peak loads seen in the sample window, for more conservative sizing and fewer Out-Of-Memory or CPU throttling events.
  • Percentile:
    • Aggregates usage samples by calculating a specific percentile of resource usage over time (e.g., 90th percentile). This allows tuning recommendations to cover most usage spikes without overprovisioning.
    • Use when you want more customizable control to select a usage percentile appropriate for your workload, balancing between average and peak resource usage.
resourcePolicy.containerPolicies.controlledValues
Section titled “resourcePolicy.containerPolicies.controlledValues”
  • RequestsAndLimits:
    • VPA updates both the resource requests and their limits on containers. This means the pod spec will be updated with VPA-recommended CPU and memory requests, and also the corresponding limits, keeping them aligned.
    • This is useful when you want VPA to fully control resource sizing, ensuring pods don’t surpass recommended limits while guaranteeing minimum requests.
    • It’s common for production workloads needing tight resource control and autoscaling safety.
  • RequestsOnly:
    • VPA updates only the resource requests but leaves the limits unchanged (as configured in the original pod spec).
    • This is useful when you want VPA to optimize requests for scheduling and resource allocation but retain manual control over upper limits to avoid unexpected pod behavior or resource spikes beyond predefined limits.
    • This setup is often preferred when running sensitive workloads where limits enforce strict resource caps.
  • Auto: VPA automatically adjusts resource requests and limits for the container based on observed usage.
  • Off: VPA does not provide any recommendations or adjust resources for that container; it effectively ignores it.