Skip to content
use-cases / snapshot-before-migration / hero
USE CASE / SNAPSHOT BEFORE MIGRATION

Snapshot the container right before the nightly migration

Add a hoody-cron entry that fires five minutes before the 03:00 migration job. It curls the snapshots URL and tags the artifact as the rollback point. If the migration fails, you restore in 30 seconds with a single PATCH.

use-cases / snapshot-before-migration / mechanism

Two URLs, one bedtime habit

The cron service schedules a curl. The snapshots service does the freezing. Neither knows about the migration job that runs five minutes later, and that's the whole point.

POST cron/entries
scheduler
# register the recurring snapshot job (one-time setup)
curl -X POST \
  cron.containers.hoody.com/users/root/entries \
  -H "Content-Type: application/json" \
  -d '{
    "schedule": "55 2 * * *",
    "command": "curl -X POST $SNAP_URL -d '{\"alias\":\"rollback-point\"}'",
    "comment": "pre-migration snapshot"
  }'
FIRES
POST containers/[id]/snapshots
freezer
# what the cron entry curls every night at 02:55 UTC
curl -X POST \
  api.hoody.com/api/v1/containers/$ID/snapshots \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"alias": "rollback-point", "expiry": 7}'

# response from the snapshots service
200 OK · snap-2026-05-04 created in 8s

The cron entry is a row in a Postgres table somewhere in Hoody. The snapshots URL writes a content-addressed blob to the container's storage backend. Both are durable, both are versioned, and neither requires a long-lived process on your laptop.

use-cases / snapshot-before-migration / anatomy

Anatomy of last night

Four moments, four URLs, and a five-minute gap between the safety net and the change. The migration finishes before most engineers' first alarm.

0102:55:00ZCron firesschedule: 55 2 * * *
0202:55:08ZSnapshot landsalias rollback-point
0303:00:00ZMigration runsALTER TABLE invoices
0403:01:42ZAudit closessnapshot retained 7d

If step 03 fails, the rollback is `PATCH /snapshots/snap-2026-05-04` and you are back at 02:55:08Z. The audit timeline above is the same data, served as JSON.

use-cases / snapshot-before-migration / powers

What this shape unlocks

Not the snapshot itself. The shape: a backup that exists before the change, addressed by a URL, with a name that includes today's date.

01 / SAFETY

The backup precedes the change, by exactly five minutes

Most outage post-mortems start with "we forgot to take a backup." When the backup is a cron entry, you can't forget. The 02:55 snapshot is the runbook's first sentence, written in advance.

02 / SPEED

Rollback is one PATCH, not a 40-step runbook

Restoring snap-2026-05-04 is a single HTTP call against api.hoody.com. The container reverts to its 02:55:08Z state in under 30 seconds. No ticket, no on-call escalation, no "who has the AWS console."

03 / IDLE COST

If the migration succeeds, the snapshot costs nothing

Snapshots are content-addressed and stored as deltas. A 412 MB delta on top of an unchanged base disk is what you pay for, and only for the 7-day retention window. Successful migrations leave behind almost no footprint.

use-cases / snapshot-before-migration / rollback

The rollback, before vs after

What the runbook used to look like, and what it collapses into when the snapshot is named for today's date and addressable as a URL.

OLD RUNBOOK9 manual steps, 20-40 minutes, 1-2 humans
  • 01
    Page the on-call DBA, confirm the migration broke productionmedian: 4 minutes to acknowledge
  • 02
    Find the most recent backup in the AWS consolehope it's less than 24 hours old
  • 03
    Spin up a restore instance, wait for it to come online8-15 minutes on db.r6g.xlarge
  • 04
    Reverse the ALTER TABLE manually with a hand-written DOWN scriptif the DOWN script exists, which it usually doesn't
  • 05
    Restart application servers, flush caches, watch error ratesand write the post-mortem
Average rollback time, ops@acme 2024 incidents: 27 minutes
NEW RUNBOOK1 HTTP call, ~30 seconds, 0 humans
# rollback last night's migrationcurl -X PATCH api.hoody.com/api/v1/containers/$ID/snapshots/snap-2026-05-04# response200 OK · container reverted to snap-2026-05-04 in 28s
PATCH is idempotent and reversible

The new column on the right is not a tool. It is one sentence in a runbook. The sentence does not start with "first, take a backup" because the backup already exists.

use-cases / snapshot-before-migration / punchline

The rollback plan is a URL you scheduled to exist.

BEFORE: a Notion page nobody readsAFTER: a row in cron and a blob in object storage
OLDfirst, take a backupthe runbook's first sentence, manually executed at 03:02 by a tired human
NEWthe backup already exists, named for today's date, taken five minutes before the changethe runbook's first sentence, written in advance by a cron entry
use-cases / snapshot-before-migration / replaces

What this replaces

The cargo-cult around pre-migration backups. Schedule the snapshot, sleep with the migration window.

  • Manual pre-migration snapshotsWake at 02:54 to click the backup button
  • Runbook checklist itemsStep 1 of 14 on the page no one finds
  • On-call standby for rollbackPay an engineer to wait for nothing
  • Ad-hoc pg_dump before deploysHope you remembered, hope it finished
  • "We will just be careful" releasesThe optimism that ships incidents
  • AWS RDS automated snapshotsFixed 5-minute window, paid by the GB-day
use-cases / snapshot-before-migration / cta

The rollback plan is a URL you scheduled to exist.

use-cases / snapshot-before-migration / related

Read the others