🔧Troubleshooting Bootstraping
This guide will help you debug common bootstraping issues.
Issues appearing here only happen on bootstrap. Other issues happening at runtime are not documented here.
Post-install job is failing
The post-install job is failing.
The issue is a configuration issue, or a network issue.
Detection
kubectl get jobs --all-namespaces
With jobs looking like:
NAMESPACE NAME STATUS COMPLETIONS DURATION AGE
<namespace> toucan-stack-postinstall Failed 0/1 1m 1d
Possible causes
The post-install job initializes:
The admin account.
The permissions of the admin account.
The permissions of the backend.
Therefore, the issues could be linked to:
The reachability of the authentication server Curity.
The reachability of the authorization server SpiceDB.
A programming error.
The first issue is very likely caused by configuration or network issue.
The second issue is very unlikely because the authorization server can only be accessible internally. Meaning, the Helm Chart should have already been configured for the authorization server to be reachable.
Investigation
kubectl logs -n demo 'toucan-stack-postinstall-...' -c <create-admin-account/create-relationship/init-workspace>
Logs should indicates the errors. Search the errors on Google.
Issues could be linked to (but not limited to):
A network issue: DNS configuration.
A configuration issue:
Check
curity.runtime.hostname
andglobal.hostname
Check if TLS is properly configured:
kubectl describe ingress -n <namespace> toucan-stack-curity-runtime
kubectl describe ingress -n <namespace> toucan-stack-tucana
If using cert-manager:
kubectl get certificates -n <namespace>
kubectl describe certificate -n <namespace> <certificate-name>
kubectl get certificaterequests -n <namespace>
kubectl get challenges -n <namespace>
kubectl get issuer -n <namespace>
A programming error.
Mitigation
Fix the configuration or the DNS configuration, or contact the support for help.
Vault is in CrashLoopBackOff
state
CrashLoopBackOff
stateThe vault container is crashing repeatedly, stuck in crash loop.
The issue is a configuration issue, or a network issue.
Detection
kubectl get pods --all-namespaces
kubectl get events --all-namespaces
With events looking like:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedPostStartHook 35s (x3 over 117s) kubelet PostStartHook failed
Normal Killing 35s (x3 over 117s) kubelet FailedPostStartHook
Warning BackOff 16s (x7 over 86s) kubelet Back-off restarting failed container server in pod toucan-stack-vault-server-0_demo(7b241ae3-4338-4dbb-9017-a74b22d6bfb4)
Possible causes
On bootstrap, if the vault
container is crashing, it's very likely linked to the PostStart
lifecycle Hook.
The lifecycle Hook is responsible for:
Fetching or inserting the vault bootstraping secret.
Install the vault oauthapp plugin.
Connect to the auth server to setup the oauthapp client.
Setup the KV2 secrets engine.
Step 1 can fail if RBAC has been disabled: configuration issue.
Step 2 can fail if the plugin cannot be downloaded: network issue.
Step 3 can fail if the auth server cannot be reached: configuration or network issue.
Lastly, it can also fail due to implementation errors.
Investigation
# Check the log of the lifecycle hook.
# NOTE: It might not show anything.
kubectl logs toucan-stack-vault-server-0 -c tail-vault-init -n <namespace>
kubectl describe pod toucan-stack-vault-server-0 -n <namespace>
Mitigation
Failed at step 1
Check if the RBAC is properly configured:
A
Role
andRoleBinding
both namedtoucan-stack-vault-server-secret-manager
(or similar) must be present.A
ServiceAccount
namedtoucan-stack-vault-server
(or similar) must be present.
Failed at step 2
Check your network and make sure that the plugin can be downloaded from the machine.
Find the plugin URL and checksum at vault.oauthapp.pluginURL
and vault.oauthapp.checksum
in the values.yaml
file:
helm get values -a toucan-stack -n <namespace> | grep pluginURL # toucan-stack is the release name
If you are working in air-gapped, you must host this file somewhere and replace the vault.oauthapp.pluginURL
.
Failed at step 3
The authentication server cannot be reached.
Make sure the hostname defined at
curity.runtime.hostname
is accessible from outside Kubernetes:Make sure the DNS is properly configured to redirect the hostname to the
LoadBalancer
IP:{% code title="bash" overflow="wrap" %}
# DNS dig @<DNS_SERVER_IP> <curity.runtime.hostname> # ANSWER SECTION must look like: # auth.example.com. 30 IN A 192.168.0.100 # with "192.168.0.100" replaced by the LoadBalancer IP. # Ingress Controller kubectl get svc ingress-nginx-controller -n ingress-nginx # Must look like: # NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE # ingress-nginx-controller LoadBalancer 10.43.103.131 192.168.0.100 8443/TCP,4466/TCP,4465/TCP 68d # with "192.168.0.100" replaced by the LoadBalancer IP. # Ingress rule kubectl get ingress toucan-stack-curity-runtime -n <namespace> # Must look like: # NAME CLASS HOSTS ADDRESS PORTS AGE # toucan-stack-curity-runtime nginx auth.example.com 192.168.50.128 80, 443 30d # with "CLASS" being the NGINX controller, "HOSTS" matching the DNS rule.
{% endcode %} - Make sure no firewall is blocking the connection:
The Kubernetes gateway (traffic FROM kubernetes GOING TO the external network) should not be blocked.
Unknown failures
Contact the support.
Last updated
Was this helpful?