Skip to content
use-cases / emergency-fix-from-phone / hero
TERMINAL · SNAPSHOTS · 03:47

Emergency production fix from your phone

PagerDuty wakes you. You don't get up. Open the bookmark for the production terminal. PATCH the snapshot from before the bad deploy. Production is back. No bastion, no VPN, no laptop.

use-cases / emergency-fix-from-phone / rollback

Four moves from pager to flat

On-call is a triage job, not a debugging job. The terminal URL gets you in. The snapshot PATCH gets you out. The morning is for the actual fix.

Phone-only incident path4 STEPS · 5 MINUTES
0103:42PAGER

Alert arrives. Phone screen on, bed light off.

0203:43TERMINAL

Open terminal-1 URL. tail the log. Spot the env-var change from the 11pm deploy.

0303:46RESTORE

PATCH /containers/[id]/snapshots/pre-deploy-2255. The container reverts.

0403:47FLAT

Error rate falls back to the baseline. Channel update sent. Lights off.

Edit-on-phone is hell, so the lazy fix is the right fix. Restore the container to the snapshot you took before the bad deploy. The 11am post-mortem can decide what to actually change.

use-cases / emergency-fix-from-phone / chart

What the dashboard showed

The same window, embedded in your phone browser. Baseline, deploy, spike, restore, flat. Twenty-eight seconds for the snapshot to come back.

dashboard.…hoody.com/error-rate
errors / minutelast 6 hours
23:00 · bad deploy03:42 · pager03:47 · snapshot restored
spike from the bad env varPATCH on /snapshots/pre-deploy-2255
use-cases / emergency-fix-from-phone / philosophy

At 03:47 you don't fix bugs. You fix availability.

The on-call rotation isn't a debugging session. It's a triage session. Snapshots make triage instantaneous so the actual debugging happens at 11am, by humans who slept.

  • triage now
  • diagnose later
  • sleep tonight
use-cases / emergency-fix-from-phone / replaces

What this replaces

Most on-call rituals are scar tissue from infrastructure that wasn't browsable on a phone. The HTTPS URL plus a snapshot PATCH replaces a stack of them.

  • The bastion boxAn extra hop with its own credentials
  • VPN tunnel from bedTwo factors and a timeout to start fighting
  • Wake-up-the-laptop ritualFive minutes of friction before any keystroke counts
  • On-call binder PDFPage 14 of the runbook on a 6-inch screen
  • Homegrown jump-host scriptsBrittle SSH chains the new hire can't run
  • Pager-the-senior-engineerWake a second human to share the URL
use-cases / emergency-fix-from-phone / cta

You opened a URL on your phone and fixed production.

use-cases / emergency-fix-from-phone / related

Read the others