Skip to content
use-cases / stream-llm-tokens-to-anything / hero
PIPE · AGENT · STREAMING

Stream LLM tokens to anything that reads HTTP

Step 3 of your agent generates tokens. Step 4 needs to start consuming them before step 3 is done. Pipe the model's output straight into a path; the next process curls the same path. No SSE plumbing, no broker, no callback wrangling — bytes move at line speed.

Read the pipe API
use-cases / stream-llm-tokens-to-anything / mechanism

Two curls, one path, no middle layer

Most streaming stacks need an SSE endpoint, a queue, a pub/sub bus, and a framework callback to move tokens four feet. The pipe replaces all of it: the producer writes to a path with PUT, the consumer reads from the same path with GET. Bytes flow directly between the two — no intermediate storage on the server.

THE USUAL STACK

Five layers between generator and reader

  • LangChain streaming abstractioncallback hell
  • Server-Sent Events plumbingframing + heartbeats
  • Redis pub/subbroker to operate
  • Custom WebSocket relayauth + reconnect
  • Message broker (Kafka/RabbitMQ)topics + partitions
  • Agent framework callbacksvendor-specific
THE PIPE

Two curls touching the same path

PRODUCERcurl -T - /pipe/tokens
SAME PATH
CONSUMERcurl /pipe/tokens

Server-side storage: zero. Bytes stream from sender to receiver as soon as both connect, with backpressure handled per-receiver. The endpoint exists only because two curls touched it.

agent-step-3.sh
# Step 3 — agent generates and pipes tokens upward.
ai.generate({ model, stream: true }) \
  | jq -c '{delta: .text}' \
  | curl -T - https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3

# Step 4 — three readers GET the same path. The pipe fans out.
curl https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3 | tee evaluator.log
curl https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3 | jq -c .delta
curl https://pipe.hoody.com/api/v1/pipe/run-42/tokens?n=3 | websocketd --port=8080

# All four processes block until the n=3 readers connect, then bytes flow.

PUT pushes the bytes up, GET pulls them down. The ?n parameter says how many readers to wait for; the pipe blocks until that many connect, then fans out simultaneously. No client SDK, no broker, no SDK install — only HTTP.

use-cases / stream-llm-tokens-to-anything / listeners

Same path, many readers, no SDK

Once the producer is piping, anything that speaks HTTP can subscribe. Up to 256 readers on the same stream, fanned out by the pipe with backpressure handled per-receiver. No client library to install, no relay to provision.

FOR THE FRONTEND

The browser reads the same URL

An EventSource or fetch reader hits the pipe path and gets the same byte stream the agent is producing. No SSE framing on your server — the pipe carries the bytes the model emits, raw.

FOR THE EVALUATOR

A second agent listens and decides

An evaluator process subscribes to the same path. It can interrupt the producer the moment the output drifts. Two agents on the same wire, no orchestrator framework brokering between them.

FOR THE LOG TRAIL

Tee the stream into a container that watches

A logging consumer reads, gzips, and writes to disk. A debugger UI reads in parallel. None of them know the others exist — the pipe just hands every reader the same bytes.

FAN-OUT CAP256Per-path receiver ceiling enforced by the pipe — set ?n to wait for that many before the transfer starts.
LATENCY OVERHEAD0Bytes traverse the pipe as they arrive. No buffering on the server — backpressure is handled per-receiver.
SDK FOOTPRINT0 kbProducer and consumer are curl. Anything that speaks HTTP can subscribe — browser, container, agent, shell.
use-cases / stream-llm-tokens-to-anything / punchline

The LLM streams. The pipe streams. The reader streams. No middle layer.

0101 · the model emits tokens
0202 · the pipe forwards bytes
0303 · the reader applies them
no broker between stepsthe path is the protocol
use-cases / stream-llm-tokens-to-anything / replaces

What this replaces

The wiring you reach for when one process needs to stream tokens to another in real time. Each one ships its own framing, its own SDK, its own ops surface. The pipe is the wire.

  • LangChain streaming abstractionsCallback chains, framework lock-in
  • Server-sent events plumbingFraming + heartbeats + reconnect logic
  • Redis pub/subBroker to install, operate, and pay for
  • Custom WebSocket relaysAuth, reconnect, backpressure all DIY
  • Message brokers (Kafka, RabbitMQ)Topics, partitions, consumer groups for one stream
  • Agent framework callbacksVendor-specific, only readable from the same SDK
use-cases / stream-llm-tokens-to-anything / cta

Stop wiring streaming infrastructure between two processes that already speak HTTP. Open a path. Pipe into it. Read out of it.

Read the pipe API
use-cases / stream-llm-tokens-to-anything / related

Read the others