The right model for every request.
Lucair runs open models on our own GPUs and routes each call to the cheapest one that still clears the quality bar — local-first, cloud only when the work demands it. The same answers, for a fraction of the cost.
Same answer. A fraction of the cost.
A one-line summary shouldn't be billed like a multi-step proof. Lucair reads the task, sends the easy majority to a local model, and escalates only the hard few — so a mixed workload costs a fraction of sending everything to a frontier model.
Routed, not flat-rated
A typical mix lands near 0.74× Haiku — about 75% under all-Sonnet and 85% under all-Opus, at matched quality.
Most work stays local
One ladder, climbed only when needed.
Every request starts at the baseline and rises only as far as the task requires — never above your plan's ceiling. When confidence is low, Lucair routes up, never down.
Read the room, seat the request.
Assess
Reasoning level, whether it needs thinking, summarize vs generate — read in milliseconds.
Route
The cheapest model that clears the task's quality bar, clamped to your plan ceiling.
Serve
Local-first on our GPUs; fall through to a cloud tier if local is busy. Never a hard fail.
Meter
Every token recorded to an immutable ledger and your wallet — to the cent.
Stand up your inference in an afternoon.
Internal team or outside company — set up a tenancy, mint a key, and point your existing OpenAI client at Lucair. We can run the local inference for you or stand up dedicated APIs, exactly as we did for our first client, RW Exprès.
Self-serve setup
Create a tenancy, generate scoped API keys, set quotas — internal or external, isolated by default.
Managed local inference
We host and operate the models on our GPUs; you consume them. No infra to run.
Dedicated APIs on request
Need a bespoke surface like RW Exprès? We stand up tenant-specific endpoints and routing.
Data stays home
Private content is embedded and served locally — it never leaves to a third party unless you allow it.
# Point any OpenAI client at Lucair from openai import OpenAI client = OpenAI( base_url="https://lucair.ai/v1", api_key="sk-luc-…your-tenant-key…", ) r = client.chat.completions.create( model="auto", # let Lucair route messages=[{"role":"user", "content":"Résume ce contrat."}], ) # served by ministral-8b · local · 0.2× print(r.choices[0].message.content)
Every token, on the record.
Every token is metered to an immutable, hash-chained ledger. External tenants pay by card through Stripe — top up or auto-recharge, charged before credited, idempotent, fail-closed at zero, reconciled to the cent. Internal teams run on token meters alone — usage tracked for chargeback, no card.
Our first tenant — a French AI-seat product — runs its terminal chat and document RAG on Lucair, with Ministral-8B as the Starter baseline and escalation only when a task earns it.
Spend the minimum
that still clears the bar.
Set up a tenancy, point your client at lucair.ai/v1, and let every request find its right model.