
Sixty containers on one server
One bare-metal box runs dozens to hundreds of Hoody containers. KSM and BTRFS dedup make the marginal cost near zero.
Step 3 of your agent generates tokens. Step 4 needs to start consuming them before step 3 is done. Pipe the model's output straight into a path; the next process curls the same path. No SSE plumbing, no broker, no callback wrangling — bytes move at line speed.
# stream tokens upwardai.generate([ stream: true]) | curl -T - \ /pipe/tokensno buffer · no broker · no re-encode
# read at line speedcurl \ /pipe/tokens \ | jq -c .delta | apply()# no buffer between usMost streaming stacks need an SSE endpoint, a queue, a pub/sub bus, and a framework callback to move tokens four feet. The pipe replaces all of it: the producer writes to a path with PUT, the consumer reads from the same path with GET. Bytes flow directly between the two — no intermediate storage on the server.
curl -T - /pipe/tokenscurl /pipe/tokensServer-side storage: zero. Bytes stream from sender to receiver as soon as both connect, with backpressure handled per-receiver. The endpoint exists only because two curls touched it.
# Step 3 — agent generates and pipes tokens upward.
ai.generate({ model, stream: true }) \
| jq -c '{delta: .text}' \
| curl -T - https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3
# Step 4 — three readers GET the same path. The pipe fans out.
curl https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3 | tee evaluator.log
curl https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3 | jq -c .delta
curl https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3 | websocketd --port=8080
# All four processes block until the n=3 readers connect, then bytes flow.PUT pushes the bytes up, GET pulls them down. The ?n parameter says how many readers to wait for; the pipe blocks until that many connect, then fans out simultaneously. No client SDK, no broker, no SDK install — only HTTP.
Once the producer is piping, anything that speaks HTTP can subscribe. Up to 256 readers on the same stream, fanned out by the pipe with backpressure handled per-receiver. No client library to install, no relay to provision.
An EventSource or fetch reader hits the pipe path and gets the same byte stream the agent is producing. No SSE framing on your server — the pipe carries the bytes the model emits, raw.
An evaluator process subscribes to the same path. It can interrupt the producer the moment the output drifts. Two agents on the same wire, no orchestrator framework brokering between them.
A logging consumer reads, gzips, and writes to disk. A debugger UI reads in parallel. None of them know the others exist — the pipe just hands every reader the same bytes.
The LLM streams. The pipe streams. The reader streams. No middle layer.
The wiring you reach for when one process needs to stream tokens to another in real time. Each one ships its own framing, its own SDK, its own ops surface. The pipe is the wire.
Stop wiring streaming infrastructure between two processes that already speak HTTP. Open a path. Pipe into it. Read out of it.