Splunk
rocket_launch

CI/CD Demo Manager

Canary deployments with auto-remediation

warning Bad Canary - Auto-rollback via Splunk O11y
gpp_maybe Unauthorized Deploy - Detection via Splunk
bar_chart DORA Metrics - Frequency, failure rate, MTTR
cloud Realm:
error
group Tenants / active
timer_off Session expired — resources will be auto-deleted within 5 minutes
group_off Environment at capacity (/ slots). Please wait for a slot.

bolt Actions

flight_takeoff Deploy Canary

gpp_maybe Security Testing

info No deployment found - Click Initialize to start

bar_chart Status

folder
Namespace:
TTL:
block Maximum TTL reached.
Extend:
Total Pods
Stable
Canary

dns Pod Details

Pod Type Phase Version
warning

End Demo

This will delete all deployed resources and end the current session. This action cannot be undone.

menu_book

Demo Guide

Bad Canary Auto-Remediation

Overview

Deploy a canary with intentionally degraded performance. Splunk Observability Cloud detects the latency spike via a detector and automatically rolls back the canary via a webhook to the listener.

Demo Steps

  1. 1. Click Initialize to create 3 stable pods (Hooli, Acme Corp, Pied Piper)
  2. 2. Open the Splunk O11y dashboard and show the baseline metrics (low latency, steady throughput)
  3. 3. Click the Bad canary button to deploy a resource-constrained pod
  4. 4. Watch the dashboard — latency spikes to 4000-5000ms after ~10 iterations
  5. 5. The detector fires after 3 seconds of sustained high latency (>3500ms)
  6. 6. Detector triggers a webhook → listener auto-rolls back the canary
  7. 7. Show DORA Metrics — deployment frequency, change failure rate, MTTR

What to Show in Splunk O11y

  1. 1. Dashboard: Latency spike on canary pod vs stable pods
  2. 2. Detector: Threshold breach alert (>3500ms sustained for 3s)
  3. 3. Events feed: Deployment and rollback custom events
  4. 4. DORA Metrics: Deployment frequency, change failure rate, MTTR charts

After the Rollback

  1. 1. Deploy a Good canary to show successful recovery
  2. 2. Show the DORA metrics updating — MTTR calculated, failure rate adjusts
  3. 3. Point out the deployment timeline in the events feed

lightbulb Hints & Tips

  • arrow_right Bad canary runs normally for the first ~10 iterations, then degrades — give it a moment before switching to the dashboard
  • arrow_right Latency jumps to 4000-5000ms vs normal 30-50ms — the spike is very visible on the dashboard
  • arrow_right The detector needs ~3 seconds of sustained high latency before firing
  • arrow_right After rollback, deploy a Good canary to show the contrast with a successful deployment
  • arrow_right Show the events feed to highlight the full lifecycle — deploy, detect, rollback

Unauthorized Deployment Detection

Overview

Detect rogue Kubernetes deployments that bypass the CI/CD pipeline. Authorized deployments emit structured deployment markers to stdout. Detection happens in Splunk Platform by correlating K8s events with deployment marker logs — deployments without markers are flagged as unauthorized.

Prerequisites

  1. 1. OTel Collector configured with k8s_events receiver (forwards K8s events to Splunk)
  2. 2. OTel Collector collecting container stdout (forwards deployment markers to Splunk)
  3. 3. Splunk saved search with the unauthorized detection SPL query
  4. 4. Splunk alert action with webhook URL pointing to the listener

Demo Steps

  1. 1. Click Initialize — deployment markers (DEPLOYMENT_MARKER) are emitted for each stable pod
  2. 2. Deploy a Good canary — markers are emitted (start/finish), proving authorized deployments are tracked
  3. 3. Show Splunk — all deployments have matching start/finish marker logs
  4. 4. Click Simulate Unauthorized Deploy — a rogue pod appears with no markers emitted
  5. 5. Show Splunk — the saved search detects a K8s ScalingReplicaSet event without matching markers
  6. 6. The Splunk alert fires → webhook calls the listener's rollback endpoint
  7. 7. The listener deletes the rogue deployment and logs UNAUTHORIZED_ROLLBACK
  8. 8. Show the rollback log in Splunk

How Detection Works

  1. 1. Authorized deployments emit DEPLOYMENT_MARKER start/finish logs via stdout
  2. 2. OTel Collector forwards both container logs and K8s events to Splunk
  3. 3. SPL left joins K8s ScalingReplicaSet events with DEPLOYMENT_MARKER logs
  4. 4. Deployments without matching markers are flagged as unauthorized
  5. 5. Splunk alert fires a webhook → listener deletes the rogue deployment

Key SPL Concepts

  1. 1. K8s events index: Contains ScalingReplicaSet events from the k8s_events receiver
  2. 2. CI/CD logs index: Contains DEPLOYMENT_MARKER logs from container stdout
  3. 3. Left join: Matches K8s events to markers by deployment_name
  4. 4. where isnull(deployment_id): Finds events with no matching marker

lightbulb Hints & Tips

  • arrow_right The rogue pod uses version v4 to visually distinguish it from legitimate pods (v1/v2/v3)
  • arrow_right Detection speed depends on your Splunk saved search schedule interval
  • arrow_right The listener auto-discovers the namespace if not included in the webhook payload
  • arrow_right You can manually test the webhook endpoint with curl: POST /tenant/rollback/unauthorized
  • arrow_right Show the deployment marker audit query to demonstrate the full authorization trail

Press Esc to close