FluxMeter

Open source · v2.2.2

Stop runaway agents before the bill hits

Real-time AI token metering & budget enforcement. Call GET /budget/{id}/check before every LLM request — sub-second enforcement, self-hostable, 1M+ events/sec.

<10ms budget check 1M+ eps Full mode Open spec SDKs + schema
FluxMeter demo showing real-time token metering and budget enforcement
bash
git clone https://github.com/10kshuaizhang/fluxmeter.git
cd fluxmeter
make demo

Billing systems weren't built for agent loops

Token usage can run away much faster than traditional metering can react.

The problem

An agent loop burned through about $200 of tokens in under a minute because usage was only checked periodically. By the time the system noticed, the budget was already gone.

If your customers prepay for tokens and you need to cut them off the instant they run out — not 30 seconds later — you need pre-request enforcement, not batch analytics.

The FluxMeter answer

Check budget before each LLM call. Ingest usage after. For streaming, reserve an estimate, then reconcile when the stream ends.

available = balance − held

Effective balance accounts for in-flight streaming holds — so customers are stopped before the next call, not after a delayed batch query catches up.

Budget enforcement in two layers

Pre-request check blocks spend before tokens burn. Post-window deduction keeps balances accurate.

Layer Latency What it does
Pre-request check <10ms GET /budget/{id}/check — blocks before tokens burn
Post-window deduction 10–15s Flink aggregates → atomic Lua deduction → kill signals

Standard flow

  1. 1. GET /budget/{id}/check
  2. 2. If allowed → make LLM call
  3. 3. POST /ingest with token counts

Streaming flow

  1. 1. POST /budget/{id}/reserve
  2. 2. Stream tokens
  3. 3. POST /ingest + POST /reconcile
bash
# Set $50 budget, alert at $5 remaining
curl -X POST localhost:8000/budget/cust_123 \
  -H 'Content-Type: application/json' \
  -d '{"balance_usd": 50.0, "alert_threshold_usd": 5.0, "max_rpm": 100}'

# Pre-request check — call BEFORE every LLM request
curl "localhost:8000/budget/cust_123/check?estimated_cost_usd=0.05"
# → {"allowed": true, "balance_usd": 47.23, ...}
# → {"allowed": false, "reason": "budget_exhausted"}

Three deployment paths

Start with Lite in one minute. Scale to Flink when volume demands it.

Lite

Default
make demo

API → Redis Lua. No Flink. Side projects & <100K eps.

  • <10ms budget check
  • Rollup worker
  • Stripe Meters export

Full

High volume
make demo-full

Kafka → Flink → Redis. 100K–1M eps with span attribution.

  • 1M+ eps bursts
  • DLQ replay
  • Budget kill signals

SaaS

Multi-tenant
make start-saas

Control plane on :8001. Tenant CRUD + plan limits scaffold.

  • Tenant isolation
  • API keys
  • Plan limits

Integrate in three ways

Python SDK, HTTP API, or direct Kafka — pick what fits your stack.

python
from fluxmeter import FluxMeter

meter = FluxMeter(kafka_brokers="localhost:9094")
meter.track_openai("cust_123", openai_response, latency_ms=1200)

Full API reference on GitHub docs · OpenAPI at spec/openapi/openapi.yaml

Streaming-first architecture

Incremental aggregation, atomic Lua deduction, exactly-once sinks.

[Your App] → [Kafka] → [Flink: aggregation] → [Redis] → [API]
     │              │              │                │
  SDK/HTTP     budget-alerts   keyed by         Budget check
  ingest       ← kill signals  (customer,model) (3-layer cache)
                               10s windows

Incremental aggregation

O(keys) memory, not O(events)

Atomic Lua deduction

Microdollar precision — no float drift

Sink idempotency

SHA-256 + SET NX — no double-billing on replay

Three-layer check

Cache → Redis → fail policy — never blocks hot path

Design rationale in docs/DESIGN.md

Load tested at scale

make load-test runs staged benchmarks from 10K to 1M events/sec.

Environment 10K eps 50K eps 500K+ target
Local docker-compose ~9K avg / ~18K peak ~49K avg / ~92K peak ~40–45K avg
Reference cluster (2 TM) Stable Stable 500K indefinite; 1M bursts

Methodology and profiles in docs/load-testing.md

OpenCore layout

Spec and SDKs are the product surface. Engine is the reference implementation.

Layer Path Purpose
Spec spec/ Event schema, OpenAPI, semantic conventions
SDKs sdk/ Python (PyPI) + JS clients
Community contrib/ Provider mappings, pricing, connectors
Engine src/ Flink reference implementation

Connect to your billing stack

Export usage to Lago, Orb, Metronome, Stripe, and more.