use-cases / agent-grades-agents / hero

CRON · AGENT · SQLITE

An agent that grades yesterday's agents

Your product runs hundreds of agent sessions a day. Each one writes its transcript to a SQLite URL. At 6am, a cron entry POSTs to a supervisor agent with one prompt: read yesterday's transcripts, score them, flag the worst three. By the time you sit down, the report card is already open.

Read the agent docs

agent.containers.hoody.com · grades / 2026-05-03

https://agent.containers.hoody.com/grades/2026-05-03graded · 06:21

VERDICT4 agents reviewed·1 needs attention

by supervisor-agent

AGENTRUNSGRADENOTE

email-drafterdrafts customer replies
142yesterday
A
perfect ground truth · tone matched in 138 / 142
pr-reviewercomments on github diffs
38yesterday
B
drift toward verbosity · avg 412 words vs 280 baseline
support-triagelabels tickets · routes to queue
217yesterday
C
hallucinated tool args twice · misrouted 6 ticketsNeeds attention
weekly-digestsummarises sales pipeline
1yesterday
A
all 14 deals cited · numbers match crm export

/CRONTAB0 6 * * * curl -X POST /api/v1/agent/tasks -d '{"description":"Read yesterday's sessions from sqlite, sample 50, score each rubric, write findings, flag the worst three for human review."}'

the cron job is the supervisor · the supervisor is also an agent

use-cases / agent-grades-agents / mechanism

One cron line, one prompt, one verdict

A single 5-field cron entry POSTs to the agent service with a prompt. The supervisor container wakes, reads yesterday's SQLite traces, writes its grades back to the same database, and exits. There is no orchestrator, no rubric service, no eval pipeline.

POST /cron/users/me/entries

POST · scheduler

# POST /api/v1/cron/users/me/entries
{
  "schedule": "0 6 * * *",
  "command": "curl -X POST $AGENT/api/v1/agent/tasks \
     -d @grade.json",
  "comment": "nightly-supervisor"
}

grade.json · supervisor prompt

POST · supervisor

# grade.json — the supervisor's instructions
{
  "description": "Read yesterday's transcripts from /sqlite/sessions WHERE day = '2026-05-03'. Sample 50. Score each on factuality, tool correctness, tone drift. Write findings to the report table. Flag the worst three for human review.",
  "mode": "code"
}

The cron line decides WHEN. The prompt decides WHAT. The supervisor container does the work in ~20 minutes overnight and then disappears. The graded sample is on disk by the time anyone is at their desk.

use-cases / agent-grades-agents / powers

Three things a supervisor agent does that a dashboard can't

AgentOps screens show you logs. LangSmith rubrics give you scores. A graded supervisor closes the loop — it reads the transcripts, decides what is bad, and writes the verdict.

READS

It actually reads the transcripts

Not just metrics. The supervisor opens each session, reads tool calls, checks ground truth, weighs tone. A spreadsheet rubric counts; an agent supervisor judges.

DECIDES

It picks the three you should see

Out of 400 runs, 397 are fine. The supervisor's job is to surface the three that aren't — by name, with a one-line note. You don't scroll a dashboard, you read four lines.

WRITES

It writes findings back to SQLite

Every grade and every note lands in the same SQLite URL the agents use. Tomorrow's supervisor compares. Drift becomes a query, not a vibe.

use-cases / agent-grades-agents / flow

From transcripts to verdict in twenty minutes

Three things happen between 6:00am and 6:21am. None of them require you.

/cron/0 6 * * * → agent/tasks → /grades/2026-05-03RUNS WHILE YOU SLEEP

READ

Open yesterday's transcripts

The supervisor agent queries the same SQLite URL the workers wrote to. SELECT * FROM sessions WHERE day = yesterday. Sample 50 at random.

SCORE

Grade each rubric

Per session: factuality, tool-call correctness, tone drift, hallucination count. Letter grade + one-line reason. Cost: a single agent task.

FLAG

Write findings · flag the bottom three

INSERT into the report table. Mark the worst three for human review. The page at /grades/[date] is just a SELECT on that table.

By 6:21am there is a graded sample on disk and three flagged transcripts queued. The grader doesn't watch the agents — it runs on a cadence and judges them, like a teacher reading homework overnight.

use-cases / agent-grades-agents / capacity

What the cadence buys you

Numbers grounded in the cron + agent + SQLite surfaces. Not invented benchmarks.

ONE CRON LINE0 6 * * *
Five fields decide when the supervisor wakes. Change the schedule, change the cadence — hourly, daily, on-demand. The line is the entire scheduler.
GRADE WINDOW~20 min
A supervisor task that samples 50 sessions, reads each, and writes verdicts typically finishes inside 20 minutes. The container exits when the task does.
ORCHESTRATOR DAEMONS0
No Airflow, no eval service, no DAG scheduler. The cron entry is a row in /etc/crontab. The verdict is a row in SQLite. There is no third thing.

Standard 5-field cron expressions per Hoody Cron API. Supervisor session length depends on sample size and rubric complexity. SQLite is the same hoody-sqlite URL the worker agents already write to — no second store.

use-cases / agent-grades-agents / punchline

The cron job is the supervisor; the supervisor is also an agent.

yesterday · running blindtoday · graded by 6:21

WHAT THE OLD LOOP LOOKED LIKEhuman reads logs · weekly meeting · post-hoc rubric in a sheetnoticed drift after a week · reviewed 0.5% of runs

WHAT IT LOOKS LIKE NOW

Read the cron + agent spec

use-cases / agent-grades-agents / replaces

What this replaces

The standard agent-quality stack: read-only dashboards, manual log review, and rubric tools that score but never act. The supervisor cron does all three in twenty minutes.

human-only agent reviewsAn engineer reading logs by hand · 0.5% sample · catches drift after a week
weekly-meeting agent retrospectivesThe drift was already a week old by the time you discussed it
manual log inspectiongrep, scroll, hope · no rubric, no score, no record
AgentOps quality dashboards (read-only)Charts you have to open · the verdict was never written down
LangSmith eval rubrics that don't actScores get computed · no one is paged · no one is told
post-hoc spreadsheet rubricsA Google Sheet someone fills out on Friday · stale by Monday

use-cases / agent-grades-agents / cta

Stop reading logs at 11pm. Schedule an agent to do it overnight, and read its report card with your coffee.

Read the agent docs

use-cases / agent-grades-agents / related

Read the others

Sixty containers on one server

One bare-metal box runs dozens to hundreds of Hoody containers. KSM and BTRFS dedup make the marginal cost near zero.

Containers·Snapshots

Onboard a developer with one link

A new engineer joins on Monday. You send one URL. They open it on whatever laptop they have and they're in a fresh container cloned from your developer-baseline snapshot — code, deps, env, seed data, VSCode-in-browser. Writing code in five minutes, not setting up.

Snapshots·Containers·Terminal·Files

API endpoints that materialize on demand

A wildcard exec script catches the call, asks an LLM to write the handler, runs it in a V8 sandbox, and saves the route. The next call is native.

Exec·Agent·Code·Files

Branch computers like Git

Snapshot a running container — files, processes, memory. Restore in seconds. Fork via /copy. Branching, but for the entire machine.

Snapshots·Containers

Real VS Code on your phone

The Code Orchestrator spawns a VS Code instance on the container and serves the editor over a normal HTTPS URL. Any device with a browser can open it. The work lives in the container, not on the device.

Display·Terminal·Files·Containers+1

AI agents that spawn other AI agents

A research agent posts to /api/v1/projects/$PID/containers to start a child container, then calls the child's agent URL like any other HTTP service. Sub-agents spawn their own sub-agents the same way. No orchestrator framework, just URLs.

Agent·Exec·Containers

One sandbox per customer, automatically

An exec script catches your signup webhook, copies a fresh-customer container, and hands the new tenant their own URL. Isolation is the operating system, not a tenant_id column.

Containers·Snapshots·Exec·Files

Wake up to a finished prototype

Hand the agent a paragraph at midnight. It spawns its own containers, snapshots before risky steps, and posts to your notification webhook at sunrise.

Agent·Snapshots·Containers·Browser+2

Emergency production fix from your phone

PagerDuty wakes you. Open the terminal URL on your phone. PATCH the snapshot from before the bad deploy. Production is back. No bastion, no VPN, no laptop.

Terminal·Snapshots·Network

Tail production logs to a URL anyone can curl

One pipe URL. Up to 256 readers. Three engineers tail the same incident at once with no bastion, no Datadog seat, no log forwarder.

Pipe

Push one build to thirty CI workers at once

The build container streams the tarball to a pipe path with ?n=30. All thirty test workers curl the same URL. Bytes go through once, fanned out.

Pipe

Watch your agent think from the coffee shop

Your agent runs at home. You're at a café. Pipe each event of the loop through Hoody Pipe and curl the same path from your phone — the trace lands character by character. No SSH, no dashboard, no upload.

Pipe·Agent

Move 200GB between clouds with two curls

pg_dump | gzip | curl from Frankfurt. curl | gunzip | psql in Singapore. Bytes flow through the pipe with zero disk in the middle.

Pipe

Send a teammate a database state in one line

pg_dump streams straight into their psql. No file uploaded, no link shared, no download. The pipe routes the bytes through.

Pipe

Stream LLM tokens to anything that reads HTTP

Step 3 streams tokens with curl -T -. Step 4 curls the same path. Tokens move generator to consumer at line speed. No SSE plumbing, no broker.

Pipe·Agent

A progress bar your boss can spectate without joining

Append ?progress to the pipe URL. Anyone who opens it gets a live HTML dashboard — bytes, speed, ETA, state. Up to fifty spectators, none consuming a receiver slot, none touching the stream.

Pipe

The webhook fan-out you didn't have to build

Stripe POSTs to a pipe path with ?n=12. Twelve subscribers curl the receiver URL with ?n=12. The pipe holds the event until everyone is connected.

Pipe·Exec

A CI cache that's just two curl commands

tar | zstd | curl puts node_modules into a pipe. Twenty downstream jobs curl | zstd -d | tar x. No S3 bucket, no cache action, no egress bill.

Pipe·Containers

Drag-drop uploads into your script

hoody-pipe serves a web upload form at every path. Drag a file onto the page, your script reads the bytes from stdin. Zero upload code, no S3 bucket, no presigned URLs.

Pipe·Exec

Broadcast a workshop to 200 viewers from your laptop

ffmpeg streams your screen to a pipe path with ?n=200. Each attendee curls the URL into a browser tab. No platform, no logins, no upload.

Pipe

Inter-container IPC without the message broker

Container A writes to a pipe path. Container B reads from the same path. Backpressure is the connection. No Redis, no queue, no broker.

Pipe·Containers

Tail your agent on the train, get pinged when it lands

The agent streams its trace to a pipe path you can curl from your phone. When it finishes, its last act hits hoody-notifications and your phone buzzes. Two URLs and a buzz — no SDK, no client app, no dashboard.

Pipe·Agent·Notifications

A microphone over HTTP, in two terminals

ffmpeg captures the mic, pipes to a URL. The other end curls and plays the audio. No Zoom, no SDK, no signaling server.

Pipe

Five agents, five pipes, one verdict

A panel of five models reviews the same input. Each runs in its own container and streams its verdict to its own pipe path. A judge process curls all five in parallel and tallies the result.

Pipe·Agent·Containers

Replay this morning's incident to the whole team

Snapshot the incident-time logs in hoody-files. Replay them through a Hoody Pipe URL with ?n=8. Eight engineers curl the same path and watch the cascade fire in lockstep — the post-mortem is a synchronized stream, not a Confluence doc.

Pipe·Files

The fastest 'send me that file' you've ever typed

A teammate pings for a 4 GB dump. Slack rejects it, Drive needs a share request. You type curl -T file …; they type curl … > file. The bytes move directly between disks — no upload bar, no link to share.

Pipe

Run a local LLM, serve it to your whole fleet

One GPU runs llama.cpp. Its tokens stream into a pipe path with ?n=50. Fifty containers curl the same URL and split the stream.

Pipe·Daemon

A live metrics dashboard with no metrics backend

Each container's monitoring loop curls a metric to a pipe URL. The dashboard curls the same URL with ?progress and renders the SSE stream.

Pipe

The cron job that deletes itself when you're done

POST a managed cron entry with expires_at set 48 hours out. The job runs on schedule, then removes itself — no reminder, no cleanup PR, no stale entry.

Cron

Snapshot the container right before the nightly migration

A hoody-cron entry that fires at 02:55 UTC, curls the snapshots URL, and names the artifact pre-migration-2026-05-04. Five minutes later the migration runs. If it succeeds, the snapshot sits idle and costs nothing. If it fails, you restore in 30 seconds with a single PATCH.

Cron·Snapshots

A separate crontab for every customer, automatically

Each tenant gets their own container and their own hoody-cron service. Customer A's 9am digest fires on time even when customer B's job hangs for 40 minutes, because they aren't on the same crontab.

Cron·Containers

Wake an agent at 3am, retire it at 4

A nightly cron POSTs a spawn request, the agent does its hour of work, then a second cron tears the container down. The agent exists only when there is work for it to do.

Cron·Agent·Containers

Daily rollups without an orchestrator

Raw events pile up in a sqlite URL. Every night a cron entry curls an exec endpoint, the script runs the rollup SQL, writes the daily table back. No DAG, no Airflow Postgres, no scheduler dashboard.

Cron·SQLite·Exec

A crontab per branch, deployed with the code

Your repo checks in `.hoody/crontab`. The deploy script PUTs that file to the new container's Cron API. Each branch gets its container, its filesystem, its schedule.

Cron·Containers

On-call escalation that ages out with the shift

POST a cron entry with expires_at = shift end. When the shift ends, the entry deletes itself. The next on-call posts their own.

Cron·Notifications

Hourly scrape, daily digest, weekly archive — one container

Three lines in one crontab: hourly browser scrape into SQLite, daily exec digest, weekly archive to files. Flat-rate server, three rhythms, no scheduler service.

Cron·Browser·SQLite·Files

Let your customers BYO their own cron schedule

Customers POST their own 5-field expressions; their crontab lives in their container, isolated. You don't validate against a global queue.

Cron·Containers

Schedule the agent, not the script

A 5-field cron entry curls hoody-agent with a prompt instead of running a fixed script. Today is the last day of the month — the agent figures it out. The data shape changed — the agent figures it out.

Cron·Agent

A heartbeat for the silent jobs

Each cron run POSTs a heartbeat to a notifications endpoint. A second cron checks last-heartbeat and pages on silence. Silence is the alert.

Cron·Notifications

Keep the last 24 hours as 24 snapshots

An hourly cron POSTs a snapshot named with the hour. After 24 hours each new snapshot overwrites yesterday's at the same hour. The 24-floor time machine.

Cron·Snapshots

Replay this morning's webhooks at the same time tomorrow

You captured 30 minutes of real Stripe traffic into a hoody-files folder. One cron entry replays it against staging at 9am every weekday — same volume, same payloads, same time-of-day pressure.

Cron·Files·Exec

Edit your crontab from a phone, in the airport

Open the cron URL on your phone in the gate area. Tap a row, change a single field of the cron expression, hit Save. PATCH lands. The job fires tonight on the new schedule. No SSH session, no jump box, no laptop.

Cron·Terminal

A scheduled digest that fans out to 200 inboxes

Cron at 9am POSTs to an exec script that builds the digest and curls a pipe URL with ?n=200. Two hundred recipients hit the same URL once.

Cron·Exec·Pipe

Mute the flaky job without losing it

PATCH /entries/[id] [ enabled: false ]. The job stays in your crontab waiting to be fixed. No deletion, no rewrite, no lost context.

Cron

Cleanup jobs that schedule their own retirement

The cleanup script checks if there's anything left to clean. When the directory is empty, it DELETEs its own cron entry. Job's done, job's gone.

Cron·Files

Roll your TLS certificates without an SSH session

Cron weekly: POST to an exec script that runs certbot, posts the new cert to the proxy via PATCH. No shell session, no key, no jump host.

Cron·Exec

A weekly canary that tries to break production

Sunday 7am cron wakes a Hoody Agent in a fresh container against a snapshot of prod. It runs the OWASP top twenty, fuzzes the API, and writes a findings report to a URL by 9am. Container retires.

Cron·Agent·Browser·Snapshots

The hobby-project graveyard you can afford to keep alive

Eleven half-finished side projects on Heroku is eleven dynos at $5–7 each. On Hoody it's eleven containers on one $29 bare-metal box. Idle costs zero, the URL wakes the container in milliseconds, and the chess engine nobody uses still runs.

Containers

A preview environment per pull request, all month

Each open PR gets its own clone of a snapshot. The container wakes when reviewers click the link; idle PRs cost nothing.

Containers·Snapshots

Run a 12-product portfolio from one bare-metal box

Twelve isolated containers, each its own SaaS, share one $49 bare-metal server — a step above the $29 entry tier, chosen here for the RAM headroom fifty containers want. Per-product margins go from negative to nice.

Containers

Kill the staging-server tax

Stop paying for a duplicate of production. Snapshot the prod container, branch staging from it on demand, freeze it back to disk when nobody's testing. Three environments, one machine, one bill.

Containers·Snapshots

Forty client sites, one rent, one dashboard

Each client site lives in its own container; you bill them per-site, you pay the host once. The math finally works for agencies.

Containers·Workspaces

Replace the E2B bill with the bare metal you already rent

Your agents stop renting compute by the second from E2B/Modal/Daytona. They use containers on the box you already have.

Containers·Agent·Exec

Idle staging costs nothing, so staging stops getting deleted

Staging used to die because it was expensive to keep around. When idle is free, staging gets to live — even the one a teammate touched 90 days ago.

Containers·Snapshots

Per-customer sandboxes at fleet scale

Eight hundred isolated customers on three bare-metal servers — one flat-rate monthly bill, no per-tenant meter. Each tenant gets a real container with its own kernel namespace, filesystem, and URL. Idle containers cost nothing on top of the server you already pay for.

Containers·Snapshots·Exec

The CI cache that's not an S3 line item

Cache files live in /files on the box you already rent. Workers PUT and GET tarballs over HTTP. No S3 bucket, no egress, no third vendor — the bytes never leave the box.

Files·Containers

Fifty demo environments for fifty sales calls

Each prospect gets a real, isolated copy of your product seeded with their data. Cloned from a snapshot. Theirs to keep for a week.

Containers·Snapshots

An agent that grades yesterday's agents

One cron line, one prompt, one verdict

Three things a supervisor agent does that a dashboard can't

It actually reads the transcripts

It picks the three you should see

It writes findings back to SQLite

From transcripts to verdict in twenty minutes

Open yesterday's transcripts

Grade each rubric

Write findings · flag the bottom three

What the cadence buys you

What this replaces

Read the others

Sixty containers on one server

Onboard a developer with one link

API endpoints that materialize on demand

Branch computers like Git

Real VS Code on your phone

AI agents that spawn other AI agents

One sandbox per customer, automatically

Wake up to a finished prototype

Emergency production fix from your phone

Tail production logs to a URL anyone can curl

Push one build to thirty CI workers at once

Watch your agent think from the coffee shop

Share your screen with a URL, not a meeting invite

Move 200GB between clouds with two curls

Send a teammate a database state in one line

Stream LLM tokens to anything that reads HTTP

A progress bar your boss can spectate without joining

The webhook fan-out you didn't have to build

A CI cache that's just two curl commands

Drag-drop uploads into your script

Broadcast a workshop to 200 viewers from your laptop

Inter-container IPC without the message broker

Tail your agent on the train, get pinged when it lands

A microphone over HTTP, in two terminals

Five agents, five pipes, one verdict

Replay this morning's incident to the whole team

The fastest 'send me that file' you've ever typed

Run a local LLM, serve it to your whole fleet

A live metrics dashboard with no metrics backend

The cron job that deletes itself when you're done

Snapshot the container right before the nightly migration

A separate crontab for every customer, automatically

Wake an agent at 3am, retire it at 4

Daily rollups without an orchestrator

A crontab per branch, deployed with the code

On-call escalation that ages out with the shift

Hourly scrape, daily digest, weekly archive — one container

Let your customers BYO their own cron schedule

Schedule the agent, not the script

A heartbeat for the silent jobs

Keep the last 24 hours as 24 snapshots

Replay this morning's webhooks at the same time tomorrow

Edit your crontab from a phone, in the airport

A scheduled digest that fans out to 200 inboxes

Mute the flaky job without losing it

Cleanup jobs that schedule their own retirement

Roll your TLS certificates without an SSH session

A weekly canary that tries to break production

The hobby-project graveyard you can afford to keep alive

A preview environment per pull request, all month

Run a 12-product portfolio from one bare-metal box

Kill the staging-server tax

Forty client sites, one rent, one dashboard

Replace the E2B bill with the bare metal you already rent

Idle staging costs nothing, so staging stops getting deleted

Per-customer sandboxes at fleet scale

The CI cache that's not an S3 line item

Fifty demo environments for fifty sales calls