⚙️Tuning resources

In Kubernetes, like in Docker, containers has a limited amount of resources controlled by cgroups.

Also, unlike in v2, the Helm Charts automatically tune the number of workers based on the amount of available resources. Of course, it is overridable, but we'll talk about that later.

CPU/Memory

How-to set resource limits and requests

At the moment of writing, in-place pod vertical scaling is not supported.

VPA will evict pods if the requested resources differ significantly from the new recommendation.

In Helm, every service has a resources section. Since we are using Bitnami's helpers, there are also resourcesPreset, in which you can find the default values at bitnami/charts.

By best practices, we won't be setting any limits nor requests in the default values of the Helm Charts as the value heavily depends on the workload and hardware.

Here's the location of parameters with associated recommendations in comments based on 3.7GHz CPU (PRO2-XS on Scaleway):

yaml: values.override.yaml
# Recommendation is to never use CPU limits, unless you plan to use guaranteed QoS and allocate whole cores (e.g, multi-tenant environments with strict isolation).

# For nginx, you would set low memory (between 128Mi-256Mi) and cpu requests.
# Nginx is limited by the network, so you would load-balance it.
# Horizontal pod autoscaling is recommended based on memory or traffic.
nginx:
  resourcesPreset: 'none'
  resources: {}

  ## You would put:
  # resources:
  #   requests:
  #     cpu: '100m'
  #     memory: '128Mi'
  #   limits:
  #     memory: '256Mi'

  # HPA
  autoscaling:
    enabled: false
    minReplicas: ''
    maxReplicas: ''
    targetCPU: ''
    targetMemory: ''

# For laputa, you would set high memory (between 2.5Gi-12Gi) and cpu requests (whole cores: 1 or 2 or more).
# Memory limits should be set based on the traffic as laputa does heavy computations.
# Laputa is stateful, so it's not possible to enable horizontal pod autoscaling.
laputa:
  resourcesPreset: 'none'
  resources: {}

  autoscaling:
    vpa:
      enabled: false
      annotations: {}
      controlledResources: []
      maxAllowed: {}
      minAllowed: {}
      updatePolicy:
        updateMode: Auto

    ## You would put:
    # vpa:
    #   enabled: true
    #   controlledResources: ["cpu", "memory"]
    #   maxAllowed:
    #     cpu: '2'
    #     memory: '12Gi'
    #   minAllowed:
    #     cpu: '1'
    #     memory: '2.5Gi'

# For layout, you would set a low memory (256Mi-386Mi) and cpu requests (10m-100m).
# Memory limits can easily be guaranteed or slightly burstable.
# Horizontal pod autoscaling is recommended based on memory.
layout:
  resourcesPreset: 'none'
  resources: {}

  autoscaling:
    vpa:
      enabled: false
      annotations: {}
      controlledResources: []
      maxAllowed: {}
      minAllowed: {}
      updatePolicy:
        updateMode: Auto
    hpa:
      enabled: false
      minReplicas: ''
      maxReplicas: ''
      targetCPU: ''
      targetMemory: ''

# For dataset, you would set a medium memory (1.5Gi-3Gi) and low cpu requests (10m-100m).
dataset:
  resourcesPreset: 'none'
  resources: {}

  autoscaling:
    vpa:
      enabled: false
      annotations: {}
      controlledResources: []
      maxAllowed: {}
      minAllowed: {}
      updatePolicy:
        updateMode: Auto
    hpa:
      enabled: false
      minReplicas: ''
      maxReplicas: ''
      targetCPU: ''
      targetMemory: ''

# For impersonate, you would set a very low memory (16Mi-64Mi) and cpu requests (10m-50m).
# Memory limits can be slightly burstable.
impersonate:
  resourcesPreset: 'none'
  resources: {}

  autoscaling:
    vpa:
      enabled: false
      annotations: {}
      controlledResources: []
      maxAllowed: {}
      minAllowed: {}
      updatePolicy:
        updateMode: Auto
    hpa:
      enabled: false
      minReplicas: ''
      maxReplicas: ''
      targetCPU: ''
      targetMemory: ''

# For spicedb, the authorization service, you would set a low memory (64Mi-128Mi) and cpu requests (10m-50m).
# It is important for this service to have very low latency.
# Replication is recommended, and we recommend to place spicedb alongside the layout and laputa services.
# Horizontal pod autoscaling is very recommended based on average CPU usage, or traffic.
spicedb:
  resourcesPreset: 'none'
  resources: {}

  autoscaling:
    vpa:
      enabled: false
      annotations: {}
      controlledResources: []
      maxAllowed: {}
      minAllowed: {}
      updatePolicy:
        updateMode: Auto
    hpa:
      enabled: false
      minReplicas: ''
      maxReplicas: ''
      targetCPU: ''
      targetMemory: ''

# For vault, the secret management service, you would set a medium memory (128Mi-512Mi) and cpu requests (100m-500m).
vault:
  server:
    resourcesPreset: 'none'
    resources: {}

    # HPA
    autoscaling:
      enabled: false
      minReplicas: ''
      maxReplicas: ''
      targetCPU: ''
      targetMemory: ''

# For curity, the authentication service, which has two services: the admin and the runtime.
# For the admin, you would set memory requests slighly higher than the -Xmx option (2Gi) and low cpu requests (10m-100m).
# For the runtime, you would set memory requests slighly higher than the -Xmx option (2Gi) and medium cpu requests (100m-500m).
# Because Java has a memory pool, you can set the limit equals to the request.
# Horizontal pod autoscaling on the runtime is recommended based on average CPU usage, or traffic.
curity:
  admin:
    resourcesPreset: 'none'
    resources: {}

    extraEnvVars:
      # You have to restore the original .extraEnvVars to be able adding new env vars.
      - name: PG_PASSWORD
        valueFrom:
          secretKeyRef:
            name: '{{ include "toucan-stack.database.secretName" . }}'
            key: '{{ include "toucan-stack.database.keyName" . }}'

      - name: JAVA_OPTS
        # -Xms is the starting memory.
        # -Xmx is the maximum memory, which should match the memory limit or request.
        value: -Xms256m -Xmx2g

    autoscaling:
      vpa:
        enabled: false
        annotations: {}
        controlledResources: []
        maxAllowed: {}
        minAllowed: {}
        updatePolicy:
          updateMode: Auto

  runtime:
    resourcesPreset: 'none'
    resources: {}

    extraEnvVars:
      # You have to restore the original .extraEnvVars to be able adding new env vars.
      - name: PG_PASSWORD
        valueFrom:
          secretKeyRef:
            name: '{{ include "toucan-stack.database.secretName" . }}'
            key: '{{ include "toucan-stack.database.keyName" . }}'

      - name: JAVA_OPTS
        # -Xms is the starting memory.
        # -Xmx is the maximum memory, which should match the memory limit or request.
        value: -Xms256m -Xmx2g

    autoscaling:
      vpa:
        enabled: false
        annotations: {}
        controlledResources: []
        maxAllowed: {}
        minAllowed: {}
        updatePolicy:
          updateMode: Auto
      hpa:
        enabled: false
        minReplicas: ''
        maxReplicas: ''
        targetCPU: ''
        targetMemory: ''

# For gotenberg, the screenshot service, you would set a low memory (64Mi-128Mi) and cpu requests (10m-50m).
# Gotenberg works by receiving jobs, so a Bursting QoS is heavily recommended.
# Set the memory limits up to 2GiB. No CPU limits.
# Horizontal pod autoscaling is recommended based on queue length.
gotenberg:
  # gotenberg doesn't have a resourcesPreset.
  resources: {}

  autoscaling:
    enabled: false
    minReplicas: 1
    maxReplicas: 100
    behavior: {}
    extraMetrics: []
    targetCPUUtilizationPercentage: 80
    # targetMemoryUtilizationPercentage: 80

# For postgresql, you would set a low memory (128Mi-1Gi) and cpu requests (100m-500m).
# As this service is critical, we recommend setting up alerts with a threshold of 1Gi and a very high memory limit like 2Gi.
# As the database is stateful, horizontal pod autoscaling is not possible.
postgresql:
  primary:
    resourcesPreset: 'none'
    resources: {}

# For mongodb, you would set a high memory (256Mi-4Gi) and cpu requests (200m or 1 core) depending on your dataset.
# From our experience, mongodb eats a lot of memory, in combination to high memory requests, we recommend setting up a memory limit of 4Gi, or more.
# As the database is stateful, horizontal pod autoscaling is not possible.
mongodb:
  resourcesPreset: 'none'
  resources: {}

  configuration: |-
    storage:
      wiredTiger:
        engineConfig:
          # 0.25 is the minimum. Recommendation is 50% of (RAM - 1GiB).
          cacheSizeGB: <value>
    # OR:
    hostInfo:
      system:
        # Set the value in the Kubernete's limits, in MB.
        # This will compute the WiredTiger cache size.
        memLimitMB: <value>

## REDIS ##
# For redis, you would set a low memory (64Mi-128Mi) and cpu requests (50m-100m).
# We recommend setting memory limits as you wish.
# As the database is stateful, horizontal pod autoscaling is not possible.

layout-redis:
  resourcesPreset: 'none'
  resources: {}

laputa-redis:
  resourcesPreset: 'none'
  resources: {}

impersonate-redis:
  resourcesPreset: 'none'
  resources: {}

Understanding relationship between Kubernetes resources and physical resources

CPU (in Kubernetes) represents the number of CPU cores allocated to a container. A value of 1 corresponds to one full physical core. The CPU resource determines the compute power available to the container. Kubernetes allows CPU resources to be shared via time-slicing; for instance, 0.1 or 100m means the container is allocated 10% of a single core's processing time.

Memory (in Kubernetes) refers to the amount of RAM allocated to a container, measured in bytes. For example, 1Gi represents 1 gibibyte (GiB) of RAM. This memory is used by the application to store variables, in-memory data structures, and buffers required during runtime. Unlike CPU, memory is not compressible or time-shared. If a container exceeds its memory limit, it will be terminated (OOMKilled).

Some services are memory-intensive like Curity, the authentication service, but may not require a lot of CPU (since it's job is to simply authenticate). In some others services, they can be CPU-intensive like MongoDB, the database service, where heavy queries, caching and compactions are done.

Understanding concepts

In Kubernetes, resource configuration is defined using requests and limits.

Requests guarantee the minimum amount of CPU or memory that a container requires. This value is used by the Kubernetes scheduler to determine on which node to place the pod. It is purely a scheduling concept.

Limits define the maximum amount of resources a container is allowed to consume. This is enforced at runtime by the Linux kernel using cgroups.

Best practices are:

  • Set requests to reflect the average usage of your application.

  • Set limits to reflect the expected peak usage, or as a safeguard.

To tune resources accurately, it is recommended to use a monitoring tool such as Prometheus.

However, limits behave differently depending on the resource type:

  • For CPU, when a container reaches its limit, it is throttled. Execution slows down, but the container continues running.

  • For memory, when a container exceeds its limit, it is terminated by the system (OOMKilled).

Understanding these differences is essential because they directly influence how you configure and tune resource usage for your applications.

Tuning strategies

Kubernetes defines three common Quality of Service (QoS) classes based on how CPU and memory requests and limits are configured:

  • BestEffort: Neither requests nor limits are set.

  • Guaranteed: CPU and memory limits are equal to their respective requests.

  • Burstable: Requests are set, and limits are higher than the requests.

Recommended best practices:

  • Always set requests to ensure that the service receives a guaranteed minimum amount of resources. This also enables the Kubernetes scheduler to place the pod on a node with sufficient capacity.

  • Always set a memory limit to prevent the container from consuming excessive memory, which could affect other services or destabilize the host.

    • Use the Burstable class if the service has occasional memory spikes, or use Guaranteed if memory usage is consistent and predictable.

  • Rarely set CPU limits, or only use whole numbers when you do. CPU is a compressible resource, meaning containers are throttled rather than killed when limits are hit. In many cases, omitting CPU limits can improve overall scheduling and performance by giving the Linux CPU scheduler more flexibility.

Here's an example of resource profile fetched from Prometheus:

(left) CPU profile. (right) Memory profile.

A valid strategy here is:

yaml
resources:
  requests:
    cpu: '150m'
    memory: '128Mi'
  limits:
    # no cpu limits
    memory: '256Mi'

Configuring the threads/workers/connection pool of the components

Every program uses threads to implement parallelism or concurrency. It is often best practice that 1 thread/worker = 1 physical thread.

However, Laputa doesn't use asynchronous workers, so the number of workers is also linked to the number of active connections.

Here's a list of parameters that you can tune:

yaml: values.override.yaml
laputa:
  config:
    common:
      # We recommend setting based on the average number of active connections (theorically 5 minimum, but we recommend 15 minimum or more)
      TOUCAN_GUNICORN_WORKERS: <SET NUMBER OF WORKERS>
      # We recommend setting based on the number of cores (2 minimum)
      TOUCAN_CELERY_MAX_WORKERS: <SET NUMBER OF WORKERS>
      # We recommend setting based on the number of cores (10 minimum)
      TOUCAN_CELERY_QUICK_MAX_WORKERS: <SET NUMBER OF WORKERS>

dataset:
  gunicornConfig: |-
    # We recommend setting based on the number of cores (5 minimum)
    workers = <SET NUMBER OF WORKERS>
    worker_class = 'uvicorn.workers.UvicornWorker'
    keepalive = 10
    bind = '0.0.0.0:{{ .Values.dataset.containerPorts.http }}'
    wsgi_app = 'dataset_service.main:app'
    max_requests_jitter = 200

# layout is single threaded using an event loop. No tuning is possible at the moment.
layout:

impersonate:
  extraEnvVars:
    - name: GOMAXPROCS
      # Theorically, we recommend setting based on the number of cores, which Go will use by default.
      # This service is almost never overloaded, so you could use an even smaller number.
      value: <SET NUMBER OF WORKERS>
    # OR, easily set based on limits.cpu:
    - name: GOMAXPROCS
      valueFrom:
        resourceFieldRef:
          resource: limits.cpu
          divisor: '1'

spicedb:
  ## There are no reasons to tune SpiceDB as it is aware of Kubernetes' limits:
  ## https://github.com/authzed/spicedb/issues/498
  extraEnvVars:
    # You have to restore the original .extraEnvVars to be able adding new env vars.
    - name: PG_PASSWORD
      valueFrom:
        secretKeyRef:
          name: '{{ include "toucan-stack.database.secretName" . }}'
          key: '{{ include "toucan-stack.database.keyName" . }}'

    - name: GOMAXPROCS
      # Theorically, we recommend setting based on the number of cores, which Go will use by default.
      value: <SET NUMBER OF WORKERS>
    # OR, easily set based on limits.cpu:
    - name: GOMAXPROCS
      valueFrom:
        resourceFieldRef:
          resource: limits.cpu
          divisor: '1'

vault:
  server:
    extraEnvVars:
      # You have to restore the original .extraEnvVars to be able adding new env vars.
      - name: ADMIN_CLIENT_SECRET
        valueFrom:
          secretKeyRef:
            name: '{{- include "toucan-stack.curity.oauth2.secretName" . -}}'
            key: curity-toucan-admin-management-client-secret

      - name: MICRO_SERVICE_CLIENT_SECRET
        valueFrom:
          secretKeyRef:
            name: '{{- include "toucan-stack.curity.oauth2.secretName" . -}}'
            key: curity-toucan-micro-service-client-secret

      - name: TOUCAN_VAULT_TOKEN
        valueFrom:
          secretKeyRef:
            name: '{{- include "toucan-stack.vault.oauthapp.secretName" . -}}'
            key: vault-token

      # HERE
      - name: GOMAXPROCS
        # Theorically, we recommend setting based on the number of cores, which Go will use by default.
        value: <SET NUMBER OF WORKERS>
      # OR, easily set based on limits.cpu:
      - name: GOMAXPROCS
        valueFrom:
          resourceFieldRef:
            resource: limits.cpu
            divisor: '1'

# We are not offering any option for tuning Curity at the moment. It is planned in the future.
curity:

# Please refer to: https://github.com/bitnami/charts/tree/main/bitnami/vault/
# And: https://www.postgresql.org/docs/current/runtime-config.html
postgresql:
  primary:
    configuration: |-
      max_connections = 100
      shared_buffers = 256MB

References

Last updated

Was this helpful?