Open source · v2.2.2

Stop runaway agents before the bill hits

Real-time AI token metering & budget enforcement. Call GET /budget/{id}/check before every LLM request — sub-second enforcement, self-hostable, 1M+ events/sec.

<10ms budget check 1M+ eps Full mode Open spec SDKs + schema

Quick Start — make demo Star on GitHub

FluxMeter demo showing real-time token metering and budget enforcement

bash

git clone https://github.com/10kshuaizhang/fluxmeter.git
cd fluxmeter
make demo

Billing systems weren't built for agent loops

Token usage can run away much faster than traditional metering can react.

The problem

An agent loop burned through about $200 of tokens in under a minute because usage was only checked periodically. By the time the system noticed, the budget was already gone.

If your customers prepay for tokens and you need to cut them off the instant they run out — not 30 seconds later — you need pre-request enforcement, not batch analytics.

The FluxMeter answer

Check budget before each LLM call. Ingest usage after. For streaming, reserve an estimate, then reconcile when the stream ends.

available = balance − held

Effective balance accounts for in-flight streaming holds — so customers are stopped before the next call, not after a delayed batch query catches up.

Budget enforcement in two layers

Pre-request check blocks spend before tokens burn. Post-window deduction keeps balances accurate.

Layer	Latency	What it does
Pre-request check	<10ms	`GET /budget/{id}/check` — blocks before tokens burn
Post-window deduction	10–15s	Flink aggregates → atomic Lua deduction → kill signals

Standard flow

1. GET /budget/{id}/check
2. If allowed → make LLM call
3. POST /ingest with token counts

Streaming flow

1. POST /budget/{id}/reserve
2. Stream tokens
3. POST /ingest + POST /reconcile

bash

# Set $50 budget, alert at $5 remaining
curl -X POST localhost:8000/budget/cust_123 \
  -H 'Content-Type: application/json' \
  -d '{"balance_usd": 50.0, "alert_threshold_usd": 5.0, "max_rpm": 100}'

# Pre-request check — call BEFORE every LLM request
curl "localhost:8000/budget/cust_123/check?estimated_cost_usd=0.05"
# → {"allowed": true, "balance_usd": 47.23, ...}
# → {"allowed": false, "reason": "budget_exhausted"}

Three deployment paths

Start with Lite in one minute. Scale to Flink when volume demands it.

Lite

Default

make demo

API → Redis Lua. No Flink. Side projects & <100K eps.

<10ms budget check
Rollup worker
Stripe Meters export

Full

High volume

make demo-full

Kafka → Flink → Redis. 100K–1M eps with span attribution.

1M+ eps bursts
DLQ replay
Budget kill signals

SaaS

Multi-tenant

make start-saas

Control plane on :8001. Tenant CRUD + plan limits scaffold.

Tenant isolation
API keys
Plan limits

Integrate in three ways

Python SDK, HTTP API, or direct Kafka — pick what fits your stack.

python

from fluxmeter import FluxMeter

meter = FluxMeter(kafka_brokers="localhost:9094")
meter.track_openai("cust_123", openai_response, latency_ms=1200)

Full API reference on GitHub docs · OpenAPI at spec/openapi/openapi.yaml

Streaming-first architecture

Incremental aggregation, atomic Lua deduction, exactly-once sinks.

[Your App] → [Kafka] → [Flink: aggregation] → [Redis] → [API]
     │              │              │                │
  SDK/HTTP     budget-alerts   keyed by         Budget check
  ingest       ← kill signals  (customer,model) (3-layer cache)
                               10s windows

Incremental aggregation

O(keys) memory, not O(events)

Atomic Lua deduction

Microdollar precision — no float drift

Sink idempotency

SHA-256 + SET NX — no double-billing on replay

Three-layer check

Cache → Redis → fail policy — never blocks hot path

Design rationale in docs/DESIGN.md

Load tested at scale

make load-test runs staged benchmarks from 10K to 1M events/sec.

Environment	10K eps	50K eps	500K+ target
Local docker-compose	~9K avg / ~18K peak	~49K avg / ~92K peak	~40–45K avg
Reference cluster (2 TM)	Stable	Stable	500K indefinite; 1M bursts

Methodology and profiles in docs/load-testing.md

OpenCore layout

Spec and SDKs are the product surface. Engine is the reference implementation.

Layer	Path	Purpose
Spec	spec/	Event schema, OpenAPI, semantic conventions
SDKs	sdk/	Python (PyPI) + JS clients
Community	contrib/	Provider mappings, pricing, connectors
Engine	src/	Flink reference implementation

Connect to your billing stack

Export usage to Lago, Orb, Metronome, Stripe, and more.

Stripe Lago Orb Metronome OpenMeter Zuora

Integration guides for billing platforms at docs/integrations.md