# Troubleshooting Bootstraping

This guide will help you debug common bootstraping issues.

Issues appearing here only happen on bootstrap. Other issues happening at runtime are not documented here.

## Post-install job is failing

The post-install job is failing.

The issue is a **configuration** issue, or a **network** issue.

### Detection

{% code title="bash" overflow="wrap" %}

```bash
kubectl get jobs --all-namespaces
```

{% endcode %}

With jobs looking like:

{% code title="Example" overflow="wrap" %}

```bash
NAMESPACE       NAME                              STATUS     COMPLETIONS   DURATION   AGE
<namespace>     toucan-stack-postinstall          Failed     0/1           1m         1d
```

{% endcode %}

### Possible causes

The post-install job initializes:

* The admin account.
* The permissions of the admin account.
* The permissions of the backend.

Therefore, the issues could be linked to:

* The reachability of the authentication server Curity.
* The reachability of the authorization server SpiceDB.
* A programming error.

The first issue is very likely caused by configuration or network issue.

The second issue is very unlikely because the authorization server can only be accessible internally. Meaning, the Helm Chart should have already been configured for the authorization server to be reachable.

### Investigation

{% code title="bash" overflow="wrap" %}

```bash
kubectl logs -n demo 'toucan-stack-postinstall-...' -c <create-admin-account/create-relationship/init-workspace>
```

{% endcode %}

Logs should indicates the errors. Search the errors on Google.

Issues could be linked to (but not limited to):

* A network issue: DNS configuration.
* A configuration issue:
  * Check `curity.runtime.hostname` and `global.hostname`
  * Check if TLS is properly configured:
    * `kubectl describe ingress -n <namespace> toucan-stack-curity-runtime`
    * `kubectl describe ingress -n <namespace> toucan-stack-tucana`
    * If using cert-manager:
      * `kubectl get certificates -n <namespace>`
      * `kubectl describe certificate -n <namespace> <certificate-name>`
      * `kubectl get certificaterequests -n <namespace>`
      * `kubectl get challenges -n <namespace>`
      * `kubectl get issuer -n <namespace>`
* A programming error.

### Mitigation

Fix the configuration or the DNS configuration, or contact the support for help.

## Vault is in `CrashLoopBackOff` state

The vault container is crashing repeatedly, stuck in crash loop.

The issue is a **configuration** issue, or a **network** issue.

### Detection

{% code title="bash" overflow="wrap" %}

```bash
kubectl get pods --all-namespaces
kubectl get events --all-namespaces
```

{% endcode %}

With events looking like:

{% code title="Example" overflow="wrap" %}

```bash
Events:
  Type     Reason               Age                 From               Message
  ----     ------               ----                ----               -------
  Warning  FailedPostStartHook  35s (x3 over 117s)  kubelet            PostStartHook failed
  Normal   Killing              35s (x3 over 117s)  kubelet            FailedPostStartHook
  Warning  BackOff              16s (x7 over 86s)   kubelet            Back-off restarting failed container server in pod toucan-stack-vault-server-0_demo(7b241ae3-4338-4dbb-9017-a74b22d6bfb4)
```

{% endcode %}

### Possible causes

On bootstrap, if the `vault` container is crashing, it's very likely linked to the `PostStart` lifecycle Hook.

The lifecycle Hook is responsible for:

1. Fetching or inserting the vault bootstraping secret.
2. Install the vault oauthapp plugin.
3. Connect to the auth server to setup the oauthapp client.
4. Setup the KV2 secrets engine.

Step 1 can fail if **RBAC has been disabled**: configuration issue.

Step 2 can fail if **the plugin cannot be downloaded**: network issue.

Step 3 can fail if **the auth server cannot be reached**: configuration or network issue.

Lastly, it can also fail due to implementation errors.

### Investigation

{% code title="bash" overflow="wrap" %}

```bash
# Check the log of the lifecycle hook.
# NOTE: It might not show anything.
kubectl logs toucan-stack-vault-server-0 -c tail-vault-init -n <namespace>
kubectl describe pod toucan-stack-vault-server-0 -n <namespace>
```

{% endcode %}

### Mitigation

#### Failed at step 1

Check if the RBAC is properly configured:

* A `Role` and `RoleBinding` both named `toucan-stack-vault-server-secret-manager` (or similar) must be present.
* A `ServiceAccount` named `toucan-stack-vault-server` (or similar) must be present.

#### Failed at step 2

Check your network and make sure that the plugin can be downloaded from the machine.

Find the plugin URL and checksum at `vault.oauthapp.pluginURL` and `vault.oauthapp.checksum` in the `values.yaml` file:

{% code title="bash" overflow="wrap" %}

```bash
helm get values -a toucan-stack -n <namespace> | grep pluginURL # toucan-stack is the release name
```

{% endcode %}

If you are working in air-gapped, you must host this file somewhere and replace the `vault.oauthapp.pluginURL`.

#### Failed at step 3

The authentication server cannot be reached.

* Make sure the hostname defined at `curity.runtime.hostname` is accessible from outside Kubernetes:
  * Make sure the DNS is properly configured to redirect the hostname to the `LoadBalancer` IP:

    {% code title="bash" overflow="wrap" %}

    ```bash
    # DNS
    dig @<DNS_SERVER_IP> <curity.runtime.hostname>
    # ANSWER SECTION must look like:
    # auth.example.com.    30      IN      A    192.168.0.100
    # with "192.168.0.100" replaced by the LoadBalancer IP.

    # Ingress Controller
    kubectl get svc ingress-nginx-controller -n ingress-nginx
    # Must look like:
    # NAME                       TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
    # ingress-nginx-controller   LoadBalancer   10.43.103.131   192.168.0.100   8443/TCP,4466/TCP,4465/TCP   68d
    # with "192.168.0.100" replaced by the LoadBalancer IP.

    # Ingress rule
    kubectl get ingress toucan-stack-curity-runtime -n <namespace>
    # Must look like:
    # NAME                          CLASS   HOSTS              ADDRESS          PORTS     AGE
    # toucan-stack-curity-runtime   nginx   auth.example.com   192.168.50.128   80, 443   30d
    # with "CLASS" being the NGINX controller, "HOSTS" matching the DNS rule.
    ```

    {% endcode %} - Make sure no firewall is blocking the connection:

    * The Kubernetes gateway (traffic FROM kubernetes GOING TO the external network) should not be blocked.

#### Unknown failures

Contact the support.
