🔧Troubleshooting Bootstraping

This guide will help you debug common bootstraping issues.

Issues appearing here only happen on bootstrap. Other issues happening at runtime are not documented here.

Post-install job is failing

The post-install job is failing.

The issue is a configuration issue, or a network issue.

Detection

bash

kubectl get jobs --all-namespaces

With jobs looking like:

Example

NAMESPACE       NAME                              STATUS     COMPLETIONS   DURATION   AGE
<namespace>     toucan-stack-postinstall          Failed     0/1           1m         1d

Possible causes

The post-install job initializes:

The admin account.
The permissions of the admin account.
The permissions of the backend.

Therefore, the issues could be linked to:

The reachability of the authentication server Curity.
The reachability of the authorization server SpiceDB.
A programming error.

The first issue is very likely caused by configuration or network issue.

The second issue is very unlikely because the authorization server can only be accessible internally. Meaning, the Helm Chart should have already been configured for the authorization server to be reachable.

Investigation

bash

kubectl logs -n demo 'toucan-stack-postinstall-...' -c <create-admin-account/create-relationship/init-workspace>

Logs should indicates the errors. Search the errors on Google.

Issues could be linked to (but not limited to):

A network issue: DNS configuration.
A configuration issue:
- Check curity.runtime.hostname and global.hostname
- Check if TLS is properly configured:
  - kubectl describe ingress -n <namespace> toucan-stack-curity-runtime
  - kubectl describe ingress -n <namespace> toucan-stack-tucana
  - If using cert-manager:
    kubectl get certificates -n <namespace>
    kubectl describe certificate -n <namespace> <certificate-name>
    kubectl get certificaterequests -n <namespace>
    kubectl get challenges -n <namespace>
    kubectl get issuer -n <namespace>
A programming error.

Mitigation

Fix the configuration or the DNS configuration, or contact the support for help.

Vault is in `CrashLoopBackOff` state

The vault container is crashing repeatedly, stuck in crash loop.

The issue is a configuration issue, or a network issue.

Detection

bash

kubectl get pods --all-namespaces
kubectl get events --all-namespaces

With events looking like:

Example

Events:
  Type     Reason               Age                 From               Message
  ----     ------               ----                ----               -------
  Warning  FailedPostStartHook  35s (x3 over 117s)  kubelet            PostStartHook failed
  Normal   Killing              35s (x3 over 117s)  kubelet            FailedPostStartHook
  Warning  BackOff              16s (x7 over 86s)   kubelet            Back-off restarting failed container server in pod toucan-stack-vault-server-0_demo(7b241ae3-4338-4dbb-9017-a74b22d6bfb4)

Possible causes

On bootstrap, if the vault container is crashing, it's very likely linked to the PostStart lifecycle Hook.

The lifecycle Hook is responsible for:

Fetching or inserting the vault bootstraping secret.
Install the vault oauthapp plugin.
Connect to the auth server to setup the oauthapp client.
Setup the KV2 secrets engine.

Step 1 can fail if RBAC has been disabled: configuration issue.

Step 2 can fail if the plugin cannot be downloaded: network issue.

Step 3 can fail if the auth server cannot be reached: configuration or network issue.

Lastly, it can also fail due to implementation errors.

Investigation

bash

# Check the log of the lifecycle hook.
# NOTE: It might not show anything.
kubectl logs toucan-stack-vault-server-0 -c tail-vault-init -n <namespace>
kubectl describe pod toucan-stack-vault-server-0 -n <namespace>

Mitigation

Failed at step 1

Check if the RBAC is properly configured:

A Role and RoleBinding both named toucan-stack-vault-server-secret-manager (or similar) must be present.
A ServiceAccount named toucan-stack-vault-server (or similar) must be present.

Failed at step 2

Check your network and make sure that the plugin can be downloaded from the machine.

Find the plugin URL and checksum at vault.oauthapp.pluginURL and vault.oauthapp.checksum in the values.yaml file:

bash

helm get values -a toucan-stack -n <namespace> | grep pluginURL # toucan-stack is the release name

If you are working in air-gapped, you must host this file somewhere and replace the vault.oauthapp.pluginURL.

Failed at step 3

The authentication server cannot be reached.

Make sure the hostname defined at curity.runtime.hostname is accessible from outside Kubernetes:

Make sure the DNS is properly configured to redirect the hostname to the LoadBalancer IP:

{% code title="bash" overflow="wrap" %}

# DNS
dig @<DNS_SERVER_IP> <curity.runtime.hostname>
# ANSWER SECTION must look like:
# auth.example.com.    30      IN      A    192.168.0.100
# with "192.168.0.100" replaced by the LoadBalancer IP.

# Ingress Controller
kubectl get svc ingress-nginx-controller -n ingress-nginx
# Must look like:
# NAME                       TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
# ingress-nginx-controller   LoadBalancer   10.43.103.131   192.168.0.100   8443/TCP,4466/TCP,4465/TCP   68d
# with "192.168.0.100" replaced by the LoadBalancer IP.

# Ingress rule
kubectl get ingress toucan-stack-curity-runtime -n <namespace>
# Must look like:
# NAME                          CLASS   HOSTS              ADDRESS          PORTS     AGE
# toucan-stack-curity-runtime   nginx   auth.example.com   192.168.50.128   80, 443   30d
# with "CLASS" being the NGINX controller, "HOSTS" matching the DNS rule.

{% endcode %} - Make sure no firewall is blocking the connection:

The Kubernetes gateway (traffic FROM kubernetes GOING TO the external network) should not be blocked.

Unknown failures

Contact the support.

Last updated 3 months ago

Was this helpful?

Post-install job is failing

Detection

Possible causes

Investigation

Mitigation

Vault is in CrashLoopBackOff state

Detection

Possible causes

Investigation

Mitigation

Failed at step 1

Failed at step 2

Failed at step 3

Unknown failures

Vault is in `CrashLoopBackOff` state