Rollback. Branch. Share. The state model underneath every container.
Five filesystem primitives and a BTRFS copy-on-write layer make container state survive, time-travel, and move between servers — without leaving HTTP.
/hoody/storage · /hoody/databases · /ramdisk · /hoody/shares · BTRFS snapshots
/ ├── hoody/ │ ├── storage/ ← persistent, per-container │ ├── databases/ ← concurrent-safe SQLite (FUSE) │ └── shares/ ← inter-container directory mounts ├── ramdisk/ ← RAM-backed, 50% of container memory └── ... ← standard Linux FS (ext4, POSIX)
The filesystem map.
Each path has a different persistence and concurrency story. Picking the right one is the whole mental model. Full deep-dives in the sections below.
/hoody/storage
Persistent per-container directory. Survives restarts; snapshots capture it. A regular ext4 directory — no FUSE, no concurrency safety beyond what your app provides.
/hoody/databases
FUSE mount. Many processes, many containers (same server) can concurrent-write SQLite safely — no 'database is locked'. Path change only: move the file from /app/data.db to /hoody/databases/data.db. Host-level — not replicated cross-server.
/ramdisk
RAM-backed tmpfs at 10–20 GB/s, <1µs latency. Ceiling of 50% container memory, on-demand allocation (0 bytes used when empty). Persists through container restart, cleared on host reboot. Your usage competes with application memory.
/hoody/shares
Inter-container directory mounts via the Storage Shares API. Read-only or read-write, 1-to-1 or project-wide. Cross-server shares use automatic NFS — no mount setup. Lifecycle (accept / reject / mount / revoke) lives on /platform/control-plane.
/ (ext4)
Standard Linux filesystem everywhere else. POSIX, ext4, full semantics. Behaves like any VPS outside the /hoody/* paths.
BTRFS under it all
Copy-on-write layer beneath the disk-backed paths. Block-level snapshots, deduplication across containers. Instant creation, space-efficient restore. (Not /ramdisk — that's tmpfs in RAM.)
Copy-on-write, block-level, instant.
BTRFS stores only the blocks that changed since the snapshot point. Creating a snapshot adds a marker, not data. Cost scales with how much a container actually changes — not with how many snapshots or containers sit on the same base image.
t0 — snapshot A
Container filesystem has blocks a, b, c. Snapshot A references all three.
t1 — block b changes
Write-modify copies b to b'. Original b stays referenced by snapshot A. Container now sees a, b', c.
t2 — snapshot B
Snapshot B captures a, b', c. A and B share a and c. Only b/b' diverges. Total storage: 4 blocks, not 6.
Running or stopped — the container state picks the snapshot type.
Take a snapshot while running and you get the processes, memory, terminal history, browser tabs, and network connections along with the filesystem. Take one while stopped and you get the filesystem only. The API call is the same; the type is automatic.
Running → stateful
The full machine state, frozen.
- +Filesystem (everything in / including /hoody/*)
- +Running processes (PIDs, parent relationships)
- +Memory + RAM dump
- +Terminal history and open sessions
- +Browser tabs and active display content
- +Database connection state
- +Network connections (sockets, established TCP)
- +Open files (fd tables)
- +Environment variables
Stopped → stateless
Filesystem only. Restore = fresh start from that FS.
- ·Filesystem only
- ·No processes — restore brings up a cold container
- ·No memory — no RAM dump in the snapshot
- ·No network state — connections must re-establish
— Restore is destructive: it overwrites current live state. If you want to keep the present, snapshot it first, then restore the target.
Let the agent try. Keep the undo button.
LLMs that touch auth middleware, database migrations, or broad refactors benefit most from a snapshot-before-run pattern. Cheap to take. Fast to restore. One API call in each direction.
Without snapshots
- 1.Agent refactors your auth middleware. First smoke tests pass.
- 2.You merge and deploy. Everything looks fine for days.
- 3.Sessions start dropping silently in production.
- 4.Bisecting recent agent commits takes hours — the change is buried in a large diff.
- 5.Rollback means reverting every merged agent PR by hand and redeploying.
With Hoody snapshots
- 1.Snapshot the container before the agent runs. Give it an alias like pre-auth-refactor.
- 2.Let the agent work. It edits files, restarts services, runs smoke tests.
- 3.Something looks wrong in production a week later.
- 4.PATCH /snapshots/pre-auth-refactor — the container restores to the pre-agent state in 5–15s.
- 5.With service restored from snapshot, you can take a new snapshot of the broken state for offline investigation.
The safety-net pattern is why every AI-assisted workflow — code generation, infrastructure refactoring, database migrations — should run inside a snapshotted container. The snapshot is cheap; the discovery cost of a bad AI change is not.
The workflow is a commit graph for entire machines.
Snapshot before a risky change. Iterate. If the result is good, keep going — the snapshot is cheap and expirable. If it breaks, one PATCH call puts the container back exactly where it was — RAM, processes, open files and all.
t0 — baseline
POST /snapshots — tagged v1.4.0-pre
t1 — risky work
AI agent refactors, migrations run, services restart
t2 — broke something
Smoke tests fail. Need to go back.
t3 — restore
PATCH /snapshots/v1.4.0-pre — 5–15s restore
t4 — identical to t0
RAM, processes, FS all match t0. Zero drift.
PATCH /api/v1/containers/ID/snapshots/v1.4.0-preWhen SSD is the bottleneck, /ramdisk is the answer.
Half the container's memory, reachable as /ramdisk, allocated on-demand. It's there when you use it, disappears when you don't. Persists through container restart. Clears on host reboot.
⚠ /ramdisk usage counts against container memory. If the container has 4 GB and /ramdisk holds 3 GB, the application has 1 GB to work with. Monitor with `free -h` and cap with careful design.
Five snapshot strategies teams actually use.
Pick one and your state discipline becomes a one-line decision, not a policy doc. Most teams run two or three of these in parallel.
1 · Pre-operation safety
Snapshot before anything destructive: migrations, AI code generation, incident response, manual hotfixes.
2 · Versioned milestones
Alias snapshots at release points — v1.4.0, v1.5.0-rc. Expiry weeks out. Instant rollback to any named version.
3 · Daily automated
Cron-snapshot with auto-expiry = self-pruning history. Seven days of yesterdays, thirty days of last months.
4 · Git-style branching
Snapshot + container copy = an alternate timeline on a different project or server. Try a risky path on the copy. If it works, rebuild the baseline there; sync is one-way so the copy is where the new truth lives.
5 · Golden-image templates
Seed a snapshot, copy-from-snapshot for every new dev container. Onboarding becomes one POST call.
Bonus · Forensic preservation
When production is compromised: snapshot the compromised state for investigation, restore production from a clean earlier snapshot, diff the two offline. Incident response without losing evidence.
What you would otherwise stitch together.
Rollback, stateful capture, concurrent-safe SQLite, cross-container shares, RAM-backed scratch — each has a traditional answer. Here's the honest side-by-side.
| Concern | Hoody Data & State | Traditional stack |
|---|---|---|
| Roll back an entire machine | PATCH /containers/ID/snapshots/NAME | Tarball + hand-redeploy + pray |
| Capture running memory state | Stateful snapshot (automatic) | VMware suspend + custom tooling |
| Cross-container directory share | /hoody/shares + Shares API | Run NFS or SMB server yourself |
| Concurrent SQLite writes | /hoody/databases (FUSE mount) | Rewrite your data layer on Postgres |
| RAM-backed scratch space | /ramdisk (ceiling 50% memory) | tmpfs + careful ulimits |
| Storage dedup across similar containers | BTRFS copy-on-write (built in) | rsync --link-dest, manual policy |
| Cross-server state replication | POST /containers/ID/copy + /sync | DIY rsync loops + service restart |
If you are already on a managed VM snapshot system for a specific workload, stay there for that workload. Hoody's state model earns its place when the primitive you want is actually container-level time travel.
Your state is already a commit graph. Learn to use it.
The filesystem is already there. The snapshots are already there. The mounts are already there. Spin up a container and the whole state model is live.
See also — /platform/control-plane for the snapshot and copy/sync APIs, /kit/files for cloud backends, /kit/sqlite for SQLite as HTTP.