Every workload. One sharp API.
jusInfer reads the shape of every request, its modality, its tools, its tempo, and sorts it into the workload it actually is: agentic, robotics, cowork, or multimodal. Each lands on the cheapest capable tier. You point at one API, we resolve the right model for the moment, and we bill you once.
$1 per user / month · first user free · pre-paid credits, no expiry
Every workload gets its own tuning and endpoints. That is the difference.
Most gateways give every request the same treatment. jusInfer sorts each request into a workload class first, because a tool-looping agent, a robot control loop, a human drafting in a chat window, and a document to parse all want very different things from a model. Each class owns its own roster, router, system prompt, parameters, and upstream endpoints.
Autonomous agents that plan, loop, and call tools.
- ▸Many sequential calls per task
- ▸Heavy tool / MCP function-calling
- ▸Context accumulates across steps
- ▸Reasoning where it counts, cheap where it doesn't
Serves Claude Code, OpenCode, Cursor, Aider, Cline, Continue, and your own agents.
Real-time control and perception for embodied agents.
- ▸Tight, low-latency control loops
- ▸Vision-language-action and policy models
- ▸Served on open weights and vLLM
- ▸Throughput at the edge, where it matters
Serves Robot training and inference, simulation, and embodied agents.
Collaborative work: humans and agents, together.
- ▸Turn-based, latency-sensitive chat and drafting
- ▸Multi-agent collaboration over shared state
- ▸Tuned for conversational quality, not tool loops
- ▸Home of the OpenCowork harness
Serves Interactive assist, multi-agent teams, and the first-party OpenCowork client.
Vision, audio, and cross-modal understanding.
- ▸Image, document, and screen parsing
- ▸Audio transcription and understanding
- ▸Cross-modal reasoning in a single call
- ▸Tuned for fidelity, not tool loops
Serves Document parsing, transcription, vision pipelines, and screen agents.
Fig. A. Set workload on the request, target a workload alias, or bind it to your API key. Unset requests are inferred.
Two pillars, one abstraction. You see one bill.
The workload class is the seam where both of our pillars hang. Per request, we tune for your kind of work, then route to the model that wins on cost and capability, and the answer keeps improving as the market does. Your code never changes.
No model names to memorize, no price tables, no per-workload SDKs. You see one bill.
- ·Per-class system prompt + guardrails
- ·Per-class parameters (reasoning, tool encouragement)
- ·Per-class model roster + endpoints
- ·The right model, tuned for your kind of work
- ·Cheapest capable model per request
- ·Cache and reuse across the team
- ·Capacity arbitrage across providers
- ·Opaque by default, inspectable on demand
Fig. B. The two pillars. We don't sell you a model menu; we sell the right model for the work, chosen on every call.
Our mission is to make intelligence affordable. The frontier keeps moving. What should not move is the cost of using it.
Built for a billion agents, aimed at a trillion. Autonomous agents, robots, and collaborators are about to outnumber every human API consumer that came before them. One endpoint holds its shape from the first billion calls to the trillionth.
Old-world supply chain. Neo-world delivery.
We borrow what works from a century of supply-chain optimization (inventory, routing, bin-packing, hedging) and apply it to the new substrate of LLM inference.
Right-model routing
Every task is graded on difficulty before it runs. Most tasks don't need the biggest model, and don't get one.
Cache and reuse
Prompts, tool calls, and intermediate plans are cached aggressively across the team. The same work is never paid for twice.
Capacity arbitrage
We buy across providers, regions, and time-of-day. When one provider spikes, we route around it, and you never notice.
confidence
Fig. D. Every route is the cheapest tier that can actually do the job, and you can see the tiers it beat. Proof, not faith.
One platform fee. Pre-paid credits for inference.
$1 per user per month is the platform fee. First user free. Credits are pre-paid in packs ($5, $10, larger). The gateway debits the actual inference cost per request. Credits roll over and don't expire. Hit zero, requests pause until you top up. No negative bill is possible.
Pay for the task, not the workload.
A robot control step, a coding task, a parsed blueprint, a teammate's chat turn. Each routes to the cheapest tier in its class, and the ledger itemizes it. Four workloads, one ledger, one invoice. We watch the meter so finance does not have to.
Fig. C, a live routing receipt. Strikethrough is flat-rate; the figure beside it is what you actually pay.
Platform · $1 / user / month
Billed monthlyFirst user free. Each additional teammate is $1/mo, billed via Stripe and prorated when you add or remove members. Solo accounts stay free forever.
Credits · pre-paid, pay per request
Pre-paid · no expiryBuy a pack ($5, $10, larger) and the gateway debits actual inference cost per request. Credits roll over and don't expire. Hit zero, requests pause until you top up. No negative bill is possible.
Plans
Drop-in for your coding agent. OpenAI- and Anthropic-API compatible. Tuned for coding workloads.
- ▸Works with Claude Code, OpenCode, Cursor, Aider, Cline
- ▸OpenAI-compatible API
- ▸Anthropic-API compatible
- ▸Best-model-per-task routing, inspectable on any call
- ▸Pay only for what runs
Less hand-holding. Multi-repo context. Background learning.
- ▸Everything in Just
- ▸Multi-repo aware, agent reasons across boundaries
- ▸RL loop on your repo, learns your patterns over time
- ▸Background agents for long-running plans
- ▸Custom routing policies
Autonomous execution with audit-grade compliance.
- ▸Everything in Pro
- ▸SOC 2 · EU AI Act
- ▸Self-improving codebase: patches, deps, refactors
- ▸Org-wide agent fleets with role boundaries
- ▸Custom SLAs and dedicated capacity
Bolt-ons for teams ready to coordinate.
Optional add-ons that layer on top of any plan. Priced per tenant. Both ship after the core plans stabilize.
Shared configs
One config, every workstation.
Routing rules, prompt templates, tool allowlists, and review policies pushed instantly to every developer. No drift between workstations.
AgenticPM
Issue tracker becomes the execution plan.
Agents pick up tickets, draft PRs, attach evidence, and report progress on the kanban your team already uses. Humans stay in the loop on merges.
Plain answers.
How does billing actually work?
Two things: a small seat fee ($1/user/month, first user free), and pre-paid credit packs ($5, $10, more) that cover inference. The seat fee pays for collaboration; credits cover the model compute. You can't go negative; if credits hit zero, requests pause until you top up.
What happens to unused credits?
They roll over forever. There's no monthly reset, no expiry. The wallet just accumulates until you spend it.
How are you cheaper?
We optimize execution across models. Old-world supply-chain techniques such as inventory, hedging, and bin-packing, applied to neo-world intelligence delivery. Most of your tasks do not need the biggest model; we make sure they do not get one.
Which models do you use?
The ones that win the cost-vs-capability tradeoff for your task, right now. The mix improves as the market improves. The model behind any call is opaque by default but inspectable on demand. You can ask "what ran?" anytime.
What's the difference between the plans?
Just (today) is the drop-in: point your existing agent at api.jusinfer.com and you get best-model-per-task routing on a single repo. Pro adds multi-repo awareness and an RL loop that learns your patterns over time. Org layers SOC 2 / EU AI Act compliance and a self-improving codebase on top. You only pay for what's shipping.
What are you launching next?
CLI and a desktop app are on the roadmap. The first focus is a VS Code plugin, which is where most coding actually happens.
Can I see which model ran?
Yes. By default we keep the model opaque so you can focus on shipping. On any call, you can prompt for the routing decision or flip the inspector on for the session.
What about my data?
Your code is yours. We do not train on it. Routing metadata is kept only as long as needed to make the next decision better.
No lock-in, by construction.
OpenAI-compatible in, OpenAI-compatible out. Your keys, your data, your exit. One endpoint that spans every workload and every provider is the strongest anti-lock-in there is. You are not even committed to a single workload paradigm.
jusInfer is one piece of a bigger toolkit.
Same team, same opinionated defaults, built to work together. Route inference here, ship coding agents with jusCode, and level up the people who run them at jusCode Academy.
The gateway for coding agents.
A model-agnostic endpoint tuned for Claude Code, OpenCode, Cursor and friends. Stop wiring providers per tool.
Visit juscode.co →Get certified on agentic workflows.
Hands-on paths that teach how to design, ship and operate AI coding agents in production. For you and your team.
Explore the Academy →