VPA

Need of VPA

It continuously monitors and adjusts CPU and memory based on real time workloads.
Dynamically adjusts pod CPU and memory to meet demand.
Ensures applications get required resources automatically.
Helps in avoiding slowdown, crashes and reduce waste.

Architecture

Recommender:

Monitors resource usage.
Calculates optimal allocation.
Analyzes historical metrics, OOM events, and VPA deployments specs.

Updater:

Evicts pods.
Applies recommender’s suggestions.

Admission Controller:

Adjust CPU and memory before new pods of the update request starts.
Validates pods status.

VPA Modes

Setup

Vertical Pod Autoscaler CRD:

Monitor container CPU
Monitor container memory
Adjusts the resources over the time

Vertical Pod Autoscaler Checkpoint CRD:

This custom resource is for the check pointing what the VPA does.
Tracks historical container CPU and memory.
Tracks performance and usage, its valuable as this helps in decision making.

apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: NAME
spec:
  recommenders:
    - name: default # Change if using custom one
      config: # Optional config map for tweaked recommender params
        policies:
          cpu:
            containerAggregation:
              percentile: VALUE # [0.1, 1.0], use Nth percentile CPU usage for recommendations
            usageAggregation:
              mode: SampleMean | SampleMax | Percentile
          memory:
            containerAggregation:
              percentile: VALUE # [0.1, 1.0], use Nth percentile CPU usage for recommendations
            usageAggregation:
              mode: SampleMean | SampleMax | Percentile
      disabled: false | true # Boolean, disable this recommender if true
    ...
    ...
    ...
  targetRef:
    apiVersion: apps/v1 | apps/v1 | v1 | apps/v1
    kind: Deployment | ReplicaSet | ReplicationController | StatefulSet
    name: RESOURCE_NAME
  updatePolicy:
    updateMode: Auto | Initial | Off | Recreate
    evictionRequirements:
      - resources: ["cpu", "memory"] # Evict if target is higher than requests
        changeRequirement: TargetHigherThanRequests
      - resources: ["cpu", "memory"] # Evict if target is lower than requests
        changeRequirement: TargetLowerThanRequests
  resourcePolicy:
    containerPolicies:
      - containerName: '*' # Container name to which this config to apply, "*" means all containers
        minAllowed:
          cpu: VALUE
          memory: VALUE
        maxAllowed:
          cpu: VALUE
          memory: VALUE
        controlledResources: ["cpu", "memory"]
        controlledValues: RequestsAndLimits | RequestsOnly # Default is RequestsAndLimits
        mode: Auto | Off # Controls whether the VPA actively manages (autosizes) the resource requests and limits of a container.
      ...
      ...
      ...

updatePolicy.updateMode

Initial: VPA assigns resources at pod creation only, without changes during its lifetime.
Auto: VPA assigns and updates resources during the pod’s lifetime, with options for eviction and rescheduling.
Off: VPA does not change pod resources but sets recommended resources, useful for a dry run.

usageAggregation.mode

SampleMean:
- Aggregates usage samples over time by calculating the average (mean) of resource usage. This produces a smooth estimate of typical usage.
- Use when you want your recommendations to reflect typical average workload usage over time, avoiding overprovisioning from transient spikes.
SampleMax:
- Aggregates usage samples by taking the maximum observed value in the usage timeframe. This captures peak utilization for conservative sizing.
- Use when you want to allocate enough resources to sustain peak loads seen in the sample window, for more conservative sizing and fewer Out-Of-Memory or CPU throttling events.
Percentile:
- Aggregates usage samples by calculating a specific percentile of resource usage over time (e.g., 90th percentile). This allows tuning recommendations to cover most usage spikes without overprovisioning.
- Use when you want more customizable control to select a usage percentile appropriate for your workload, balancing between average and peak resource usage.

resourcePolicy.containerPolicies.controlledValues

RequestsAndLimits:
- VPA updates both the resource requests and their limits on containers. This means the pod spec will be updated with VPA-recommended CPU and memory requests, and also the corresponding limits, keeping them aligned.
- This is useful when you want VPA to fully control resource sizing, ensuring pods don’t surpass recommended limits while guaranteeing minimum requests.
- It’s common for production workloads needing tight resource control and autoscaling safety.
RequestsOnly:
- VPA updates only the resource requests but leaves the limits unchanged (as configured in the original pod spec).
- This is useful when you want VPA to optimize requests for scheduling and resource allocation but retain manual control over upper limits to avoid unexpected pod behavior or resource spikes beyond predefined limits.
- This setup is often preferred when running sensitive workloads where limits enforce strict resource caps.

resourcePolicy.containerPolicies.mode

Auto: VPA automatically adjusts resource requests and limits for the container based on observed usage.
Off: VPA does not provide any recommendations or adjust resources for that container; it effectively ignores it.

To check if the VPA recommender has recommended the resources, there is status in the VPA itself for that.

Units and recommendations:

target:
- This is the recommended resource request (CPU or memory) that the VPA suggests for the container or pod based on observed usage patterns.
- The VPA adviser calculates this value as the target allocation needed to run the workload efficiently.
- It reflects what the pod’s resource requests should ideally be.
lowerBound:
- The minimum resource amount recommended by VPA for the container or pod.
- VPA will not recommend settings below this value, even if the observed usage is low.
- It safeguards against under-provisioning that might cause poor performance or application failure.
upperBound:
- The maximum resource amount recommended by VPA.
- VPA ensures recommendations do not exceed this limit, preventing over-provisioning.
- It controls the resource ceiling, which might be set based on capacity, policy, or budget constraints.
uncappedTarget:
- This is the raw or uncapped version of the target recommendation, calculated before applying any upper or lower bound constraints.
- It reflects what the ideal target would be purely based on usage metrics.
- Final target is derived by capping this value within the boundaries defined by lower and upper bounds.