Creating an automated system to detect and restart Kubernetes pods stuck in a CrashLoopBackOff state.
Deploy the Crashy Application
-
Apply the crashy-app.yaml manifest::
kubectl apply -f manifests/crashy-app.yaml
-
Verify that the pod enters a CrashLoopBackOff state:
kubectl get pods
-
Run the CrashLoop Detection Script
-
Make the script executable::
chmod +x scripts/watch_crashloop.sh
-
Run the script:.
./scripts/watch_crashloop.sh **Observe the script detecting and restarting the pod.**
-
Deploy the CronJob (Optional)
Apply the restart_crashloop_cronjob.yaml manifest:
kubectl apply -f scripts/restart_crashloop_cronjob.yaml
-
Verify the CronJob:
kubectl get cronjobs **Deploy the Kubernetes Operator (Optional)**
-
Build the operator Docker image:
docker build -t crashloop-operator:latest manifests/operator/
-
Push the image to a container registry (e.g., Docker Hub):
docker tag crashloop-operator:latest your-dockerhub-username/crashloop-operator:latest docker push your-dockerhub-username/crashloop-operator:latest
-
Deploy the operator to your Kubernetes cluster:
apiVersion: apps/v1 kind: Deployment metadata: name: crashloop-operator spec: replicas: 1 selector: matchLabels: app: crashloop-operator template: metadata: labels: app: crashloop-operator spec: containers: - name: operator image: your-dockerhub-username/crashloop-operator:latest
-
Apply the operator deployment:
kubectl apply -f operator-deployment.yaml
The crashy-app pod enters a CrashLoopBackOff state.
The script, CronJob, or operator detects the CrashLoopBackOff and restarts the pod.
The pod is recreated and continues to crash, but the system ensures it is restarted automatically.