On-Call Schedule · DuckControl

Current Shift

ALPHA

Ends in —

Primary On-Call

T-JOSH

Operator lead · UTC+10

Backup On-Call

T-BOT

Automated agent · 24/7

Escalations Today

No active incidents

Alpha Shift

🕐 00:00 – 08:00 UTC · Mon–Fri

🦆

T-JOSH (Josh) Primary operator · Galaxy infra lead PRIMARY

🤖

T-BOT (Automation) Alert router · auto-escalation BACKUP

Bravo Shift

🕗 08:00 – 16:00 UTC · Mon–Fri

🦆

T-JOSH (Josh) Primary operator · core hours PRIMARY

🤖

T-BOT (Automation) Monitoring agent · fallback BACKUP

Charlie Shift

🕓 16:00 – 24:00 UTC · Mon–Fri

🤖

T-BOT (Automation) Primary monitoring · low-traffic window PRIMARY

🦆

T-JOSH (Josh) On-call backup · escalations only BACKUP

Weekend / Holiday

🕐 All hours UTC · Sat–Sun + holidays

🤖

T-BOT (Automation) 24/7 automated monitoring PRIMARY

🦆

T-JOSH (Josh) P0 escalations only · 30min SLA P0 ONLY

Primary Contact

Operator T-JOSH

Telegram @spaceduck_ops

Email ops@spaceduck.bot

Response SLA 15 min (P0)

AWS Emergency

AWS Support console.aws.amazon.com

Account ID 121546003735

Region us-east-1

Support plan Developer+

Automated alert fires CloudWatch alarm triggers → T-BOT receives event → logs to alert history → attempts auto-remediation if playbook exists T+0 min

Primary on-call notified T-JOSH receives Telegram notification → acknowledges within 15 min → begins triage using mission-control.html T+5–15 min

Incident declared (if unresolved) Incident mode enabled on Mission Control → Operator Shift Log updated → handoff bundle exported → governance log entry created T+30 min

AWS Support escalated (if infra failure) Open case via AWS Support Console → attach CloudWatch logs and Lambda version info → request priority callback T+60 min

Post-incident review Complete postmortem within 24h → update GOVERNANCE-LOG.md → update runbooks and alert thresholds → share shift handoff pack T+24 h

P0 — Critical

Complete service outage or data loss risk

Response: 15 min · 24/7 · All hands

Examples: Lambda down, auth broken, DynamoDB unavailable

P1 — High

Degraded functionality, elevated error rate

Response: 1 hour · business hours primary

Examples: SES sandbox, peck failures >10%, slow agents

P2 — Medium

Non-critical issue, workaround available

Response: next business day

Examples: UI glitches, stale caches, minor config drift