Inference endpoints · agentic · robotics · cowork · multimodal

Every workload. One sharp API.

jusInfer reads the shape of every request, its modality, its tools, its tempo, and sorts it into the workload it actually is: agentic, robotics, cowork, or multimodal. Each lands on the cheapest capable tier. You point at one API, we resolve the right model for the moment, and we bill you once.

Start building →See the workloads

$1 per user / month · first user free · pre-paid credits, no expiry

4Workload classes

0Vendor lock-in

1API, one bill

$0Until you ship

01 · The workloads

Every workload gets its own tuning and endpoints. That is the difference.

Most gateways give every request the same treatment. jusInfer sorts each request into a workload class first, because a tool-looping agent, a robot control loop, a human drafting in a chat window, and a document to parse all want very different things from a model. Each class owns its own roster, router, system prompt, parameters, and upstream endpoints.

WORKLOAD CLASS

Agentic

Autonomous agents that plan, loop, and call tools.

▸Many sequential calls per task
▸Heavy tool / MCP function-calling
▸Context accumulates across steps
▸Reasoning where it counts, cheap where it doesn't

Serves Claude Code, OpenCode, Cursor, Aider, Cline, Continue, and your own agents.

WORKLOAD CLASS

Robotics

Real-time control and perception for embodied agents.

▸Tight, low-latency control loops
▸Vision-language-action and policy models
▸Served on open weights and vLLM
▸Throughput at the edge, where it matters

Serves Robot training and inference, simulation, and embodied agents.

WORKLOAD CLASS

CoWork

Collaborative work: humans and agents, together.

▸Turn-based, latency-sensitive chat and drafting
▸Multi-agent collaboration over shared state
▸Tuned for conversational quality, not tool loops
▸Home of the OpenCowork harness

Serves Interactive assist, multi-agent teams, and the first-party OpenCowork client.

WORKLOAD CLASS

Multimodal

Vision, audio, and cross-modal understanding.

▸Image, document, and screen parsing
▸Audio transcription and understanding
▸Cross-modal reasoning in a single call
▸Tuned for fidelity, not tool loops

Serves Document parsing, transcription, vision pipelines, and screen agents.

Fig. A. Set workload on the request, target a workload alias, or bind it to your API key. Unset requests are inferred.

02 · How it works

Two pillars, one abstraction. You see one bill.

The workload class is the seam where both of our pillars hang. Per request, we tune for your kind of work, then route to the model that wins on cost and capability, and the answer keeps improving as the market does. Your code never changes.

No model names to memorize, no price tables, no per-workload SDKs. You see one bill.

PILLAR

Workload-tuned optimization

·Per-class system prompt + guardrails
·Per-class parameters (reasoning, tool encouragement)
·Per-class model roster + endpoints
·The right model, tuned for your kind of work

PILLAR

Cost routing

·Cheapest capable model per request
·Cache and reuse across the team
·Capacity arbitrage across providers
·Opaque by default, inspectable on demand

Fig. B. The two pillars. We don't sell you a model menu; we sell the right model for the work, chosen on every call.

03 · Mission

Our mission is to make intelligence affordable. The frontier keeps moving. What should not move is the cost of using it.

Built for a billion agents, aimed at a trillion. Autonomous agents, robots, and collaborators are about to outnumber every human API consumer that came before them. One endpoint holds its shape from the first billion calls to the trillionth.

0agent requests routed, cumulative

10⁶Million10⁹Billion10¹²Trillion

04 · How we're cheaper

Old-world supply chain. Neo-world delivery.

We borrow what works from a century of supply-chain optimization (inventory, routing, bin-packing, hedging) and apply it to the new substrate of LLM inference.

Right-model routing

Every task is graded on difficulty before it runs. Most tasks don't need the biggest model, and don't get one.

Cache and reuse

Prompts, tool calls, and intermediate plans are cached aggressively across the team. The same work is never paid for twice.

Capacity arbitrage

We buy across providers, regions, and time-of-day. When one provider spikes, we route around it, and you never notice.

0%routing
confidence

task: parse a blueprint image

A cheaper tier lacks visioncan't

Chosen tier vision-capable, $0.021chosen ✓

A premium tier overkill, pricierskipped

Fig. D. Every route is the cheapest tier that can actually do the job, and you can see the tiers it beat. Proof, not faith.

05 · Pricing

One platform fee. Pre-paid credits for inference.

$1 per user per month is the platform fee. First user free. Credits are pre-paid in packs ($5, $10, larger). The gateway debits the actual inference cost per request. Credits roll over and don't expire. Hit zero, requests pause until you top up. No negative bill is possible.

jusInferrouting receipt · today

Saved today$0.00

one key · four workloads · one invoice

−41%

Pay for the task, not the workload.

A robot control step, a coding task, a parsed blueprint, a teammate's chat turn. Each routes to the cheapest tier in its class, and the ledger itemizes it. Four workloads, one ledger, one invoice. We watch the meter so finance does not have to.

Fig. C, a live routing receipt. Strikethrough is flat-rate; the figure beside it is what you actually pay.

Platform · $1 / user / month

Billed monthly

First user free. Each additional teammate is $1/mo, billed via Stripe and prorated when you add or remove members. Solo accounts stay free forever.

Credits · pre-paid, pay per request

Pre-paid · no expiry

Buy a pack ($5, $10, larger) and the gateway debits actual inference cost per request. Credits roll over and don't expire. Hit zero, requests pause until you top up. No negative bill is possible.

Plans

JustAvailable now

Just

Drop-in for your coding agent. OpenAI- and Anthropic-API compatible. Tuned for coding workloads.

▸Works with Claude Code, OpenCode, Cursor, Aider, Cline
▸OpenAI-compatible API
▸Anthropic-API compatible
▸Best-model-per-task routing, inspectable on any call
▸Pay only for what runs

Start coding

ProNext

Pro

Less hand-holding. Multi-repo context. Background learning.

▸Everything in Just
▸Multi-repo aware, agent reasons across boundaries
▸RL loop on your repo, learns your patterns over time
▸Background agents for long-running plans
▸Custom routing policies

Waitlist

OrgLater

Org

Autonomous execution with audit-grade compliance.

▸Everything in Pro
▸SOC 2 · EU AI Act
▸Self-improving codebase: patches, deps, refactors
▸Org-wide agent fleets with role boundaries
▸Custom SLAs and dedicated capacity

Contact

06 · Add-ons

Bolt-ons for teams ready to coordinate.

Optional add-ons that layer on top of any plan. Priced per tenant. Both ship after the core plans stabilize.

Add-onComing soon

Shared configs

One config, every workstation.

Routing rules, prompt templates, tool allowlists, and review policies pushed instantly to every developer. No drift between workstations.

Add-onComing soon

AgenticPM

Issue tracker becomes the execution plan.

Agents pick up tickets, draft PRs, attach evidence, and report progress on the kanban your team already uses. Humans stay in the loop on merges.

07 · FAQ

Plain answers.

How does billing actually work?

Two things: a small seat fee ($1/user/month, first user free), and pre-paid credit packs ($5, $10, more) that cover inference. The seat fee pays for collaboration; credits cover the model compute. You can't go negative; if credits hit zero, requests pause until you top up.

What happens to unused credits?

They roll over forever. There's no monthly reset, no expiry. The wallet just accumulates until you spend it.

How are you cheaper?

We optimize execution across models. Old-world supply-chain techniques such as inventory, hedging, and bin-packing, applied to neo-world intelligence delivery. Most of your tasks do not need the biggest model; we make sure they do not get one.

Which models do you use?

The ones that win the cost-vs-capability tradeoff for your task, right now. The mix improves as the market improves. The model behind any call is opaque by default but inspectable on demand. You can ask "what ran?" anytime.

What's the difference between the plans?

Just (today) is the drop-in: point your existing agent at api.jusinfer.com and you get best-model-per-task routing on a single repo. Pro adds multi-repo awareness and an RL loop that learns your patterns over time. Org layers SOC 2 / EU AI Act compliance and a self-improving codebase on top. You only pay for what's shipping.

What are you launching next?

CLI and a desktop app are on the roadmap. The first focus is a VS Code plugin, which is where most coding actually happens.

Can I see which model ran?

Yes. By default we keep the model opaque so you can focus on shipping. On any call, you can prompt for the routing decision or flip the inspector on for the session.

What about my data?

Your code is yours. We do not train on it. Routing metadata is kept only as long as needed to make the next decision better.

Yours to leave

No lock-in, by construction.

OpenAI-compatible in, OpenAI-compatible out. Your keys, your data, your exit. One endpoint that spans every workload and every provider is the strongest anti-lock-in there is. You are not even committed to a single workload paradigm.

Read the drop-in guide

Locked in?

OpenAI-compatible, your keys, export and leave anytime

The Kalmantic stack

jusInfer is one piece of a bigger toolkit.

Same team, same opinionated defaults, built to work together. Route inference here, ship coding agents with jusCode, and level up the people who run them at jusCode Academy.

jusCode

The gateway for coding agents.

A model-agnostic endpoint tuned for Claude Code, OpenCode, Cursor and friends. Stop wiring providers per tool.

Visit juscode.co →

jusCode Academy

Get certified on agentic workflows.

Hands-on paths that teach how to design, ship and operate AI coding agents in production. For you and your team.

Explore the Academy →

Ready

Pick your workload. We'll handle the rest.

Start building →See the workloads