Lucair — AI Inference & Routing
⚜ Luca Inferencing & Routing

The right model for every request.

Lucair runs open models on our own GPUs and routes each call to the cheapest one that still clears the quality bar — local-first, cloud only when the work demands it. The same answers, for a fraction of the cost.

0.2× baseline cost vs cloud OpenAI-compatible  drop-in API Per-token billing, auditable
The economics

Same answer. A fraction of the cost.

A one-line summary shouldn't be billed like a multi-step proof. Lucair reads the task, sends the easy majority to a local model, and escalates only the hard few — so a mixed workload costs a fraction of sending everything to a frontier model.

Cost to serve 1M mixed tokens · illustrative

Routed, not flat-rated

All Opus
$10.0
All Sonnet
$6.0
All Haiku
$2.0
Lucair
$1.5

A typical mix lands near 0.74× Haiku — about 75% under all-Sonnet and 85% under all-Opus, at matched quality.

Where the tokens go

Most work stays local

Baseline · local 70%
Fast · Haiku 18%
Balanced · Sonnet 9%
Frontier · Opus 3%
The ladder

One ladder, climbed only when needed.

Every request starts at the baseline and rises only as far as the task requires — never above your plan's ceiling. When confidence is low, Lucair routes up, never down.

TierModel× HaikuDefault for
Baselineministral-8b · local0.2×Summaries, translation, extraction, low-reasoning chat
Fastclaude-haiku-4-5Mid-low reasoning, low-confidence escalation
Balancedclaude-sonnet-4-6Mid reasoning, longer generation
Frontierclaude-opus-4-8High reasoning, extended thinking, mastery
How it works

Read the room, seat the request.

I

Assess

Reasoning level, whether it needs thinking, summarize vs generate — read in milliseconds.

II

Route

The cheapest model that clears the task's quality bar, clamped to your plan ceiling.

III

Serve

Local-first on our GPUs; fall through to a cloud tier if local is busy. Never a hard fail.

IV

Meter

Every token recorded to an immutable ledger and your wallet — to the cent.

Your tenancy

Stand up your inference in an afternoon.

Internal team or outside company — set up a tenancy, mint a key, and point your existing OpenAI client at Lucair. We can run the local inference for you or stand up dedicated APIs, exactly as we did for our first client, RW Exprès.

Self-serve setup

Create a tenancy, generate scoped API keys, set quotas — internal or external, isolated by default.

Managed local inference

We host and operate the models on our GPUs; you consume them. No infra to run.

Dedicated APIs on request

Need a bespoke surface like RW Exprès? We stand up tenant-specific endpoints and routing.

Data stays home

Private content is embedded and served locally — it never leaves to a third party unless you allow it.

# Point any OpenAI client at Lucair
from openai import OpenAI

client = OpenAI(
  base_url="https://lucair.ai/v1",
  api_key="sk-luc-…your-tenant-key…",
)

r = client.chat.completions.create(
  model="auto",            # let Lucair route
  messages=[{"role":"user",
             "content":"Résume ce contrat."}],
)
# served by ministral-8b · local · 0.2×
print(r.choices[0].message.content)
Billing you can audit

Every token, on the record.

Every token is metered to an immutable, hash-chained ledger. External tenants pay by card through Stripe — top up or auto-recharge, charged before credited, idempotent, fail-closed at zero, reconciled to the cent. Internal teams run on token meters alone — usage tracked for chargeback, no card.

External · Stripe card Internal · meters only Hash-chained ledger Per-request usage records Quotas & rate limits Reconcile to the cent
RW Exprès

Our first tenant — a French AI-seat product — runs its terminal chat and document RAG on Lucair, with Ministral-8B as the Starter baseline and escalation only when a task earns it.

⚜ Lucair · lucair.ai

Spend the minimum
that still clears the bar.

Set up a tenancy, point your client at lucair.ai/v1, and let every request find its right model.