๐Ÿ”งTroubleshooting Bootstraping

This guide will help you debug common bootstraping issues.

Issues appearing here only happen on bootstrap. Other issues happening at runtime are not documented here.

Post-install job is failing

The post-install job is failing.

The issue is a configuration issue, or a network issue.

Detection

bash
kubectl get jobs --all-namespaces

With jobs looking like:

Example
NAMESPACE       NAME                              STATUS     COMPLETIONS   DURATION   AGE
<namespace>     toucan-stack-postinstall          Failed     0/1           1m         1d

Possible causes

The post-install job initializes:

  • The admin account.

  • The permissions of the admin account.

  • The permissions of the backend.

Therefore, the issues could be linked to:

  • The reachability of the authentication server Curity.

  • The reachability of the authorization server SpiceDB.

  • A programming error.

The first issue is very likely caused by configuration or network issue.

The second issue is very unlikely because the authorization server can only be accessible internally. Meaning, the Helm Chart should have already been configured for the authorization server to be reachable.

Investigation

Logs should indicates the errors. Search the errors on Google.

Issues could be linked to (but not limited to):

  • A network issue: DNS configuration.

  • A configuration issue:

    • Check curity.runtime.hostname and global.hostname

    • Check if TLS is properly configured:

      • kubectl describe ingress -n <namespace> toucan-stack-curity-runtime

      • kubectl describe ingress -n <namespace> toucan-stack-tucana

      • If using cert-manager:

        • kubectl get certificates -n <namespace>

        • kubectl describe certificate -n <namespace> <certificate-name>

        • kubectl get certificaterequests -n <namespace>

        • kubectl get challenges -n <namespace>

        • kubectl get issuer -n <namespace>

  • A programming error.

Mitigation

Fix the configuration or the DNS configuration, or contact the support for help.

Vault is in CrashLoopBackOff state

The vault container is crashing repeatedly, stuck in crash loop.

The issue is a configuration issue, or a network issue.

Detection

With events looking like:

Possible causes

On bootstrap, if the vault container is crashing, it's very likely linked to the PostStart lifecycle Hook.

The lifecycle Hook is responsible for:

  1. Fetching or inserting the vault bootstraping secret.

  2. Install the vault oauthapp plugin.

  3. Connect to the auth server to setup the oauthapp client.

  4. Setup the KV2 secrets engine.

Step 1 can fail if RBAC has been disabled: configuration issue.

Step 2 can fail if the plugin cannot be downloaded: network issue.

Step 3 can fail if the auth server cannot be reached: configuration or network issue.

Lastly, it can also fail due to implementation errors.

Investigation

Mitigation

Failed at step 1

Check if the RBAC is properly configured:

  • A Role and RoleBinding both named toucan-stack-vault-server-secret-manager (or similar) must be present.

  • A ServiceAccount named toucan-stack-vault-server (or similar) must be present.

Failed at step 2

Check your network and make sure that the plugin can be downloaded from the machine.

Find the plugin URL and checksum at vault.oauthapp.pluginURL and vault.oauthapp.checksum in the values.yaml file:

If you are working in air-gapped, you must host this file somewhere and replace the vault.oauthapp.pluginURL.

Failed at step 3

The authentication server cannot be reached.

  • Make sure the hostname defined at curity.runtime.hostname is accessible from outside Kubernetes:

    • Make sure the DNS is properly configured to redirect the hostname to the LoadBalancer IP:

      {% code title="bash" overflow="wrap" %}

      {% endcode %} - Make sure no firewall is blocking the connection:

      • The Kubernetes gateway (traffic FROM kubernetes GOING TO the external network) should not be blocked.

Unknown failures

Contact the support.

Last updated

Was this helpful?