---
title: Open Interpreter + custom provider: cheap inference for a code-running agent
description: Open Interpreter runs code on your machine and needs a model behind it. With one --api_base flag you can route through jusInfer and drop inference cost 60-80% without changing how OI runs code.
tldr: Run `interpreter --api_base https://api.jusinfer.com/v1 --api_key <jinf_token> --model jusInfer-auto`. OI's litellm layer passes the OpenAI-compatible request through; jusInfer picks the cheapest capable model per turn. Local code execution, tool approvals, and conversation memory are unaffected.
date: 2026-05-27
author: jusInfer
cluster: integration
tags: open-interpreter, litellm, openai-compatible, custom-base-url, cost-optimization
---

# Open Interpreter + custom provider: keep the agent, drop the bill

[Open Interpreter](https://openinterpreter.com) (OI) lets a model run code on your machine to actually accomplish a task: read files, install packages, query databases, plot data. The agent loop is short and tight: model emits code → OI runs it → OI feeds stdout/stderr back → model decides next step. That tight loop means a lot of model calls per task, which means a lot of money on a frontier model. Pointing OI at jusInfer drops the per-call cost without changing the code-execution behavior at all.

## Why it works

OI's model layer is [litellm](https://github.com/BerriAI/litellm), which speaks OpenAI-compatible by default. Anything that accepts a base URL and bearer token works.

## Setup

### 1. Mint a jusInfer API key

Sign in at [jusinfer.com/login](https://jusinfer.com/login). Open [jusinfer.com/developer](https://jusinfer.com/developer) → **Keys** tab → **Mint key**. Copy the `jinf_…` token.

### 2. Launch OI against jusInfer

CLI flags (one-off):
```bash
interpreter \
  --api_base https://api.jusinfer.com/v1 \
  --api_key jinf_your_token_here \
  --model jusInfer-auto
```

Or set them once in `~/.openinterpreter/config.yaml`:
```yaml
llm:
  api_base: https://api.jusinfer.com/v1
  api_key: jinf_your_token_here
  model: jusInfer-auto
```

### 3. Verify

Run `interpreter` with no further args, then ask it `what python version is installed`. You'll see OI emit a one-liner, ask permission to run, execute, and report, same flow as before. The model behind it is now jusInfer-auto.

## What changes vs default

| Aspect | Default OI (frontier model) | OI + jusInfer |
|---|---|---|
| First-token latency | 400-800ms | 200-500ms (smaller models warm up faster) |
| Cost per "read file → decide" loop step | $0.01-0.05 | $0.002-0.01 |
| Cost per "write 200-line script" step | $0.04-0.10 | $0.02-0.05 |
| Code-execution behavior | unchanged | unchanged |
| Conversation memory | unchanged | unchanged |
| Tool approval flow | unchanged | unchanged |
| Local file access | unchanged | unchanged |

## What about safety mode / `--safe_mode`?

OI's safe mode (interactive approval before each code execution) is a CLIENT-side feature. The model behind it doesn't know whether you'll approve, deny, or modify the code. Switching to jusInfer doesn't weaken safe mode; your approvals still gate every execution.

## What about offline / `--local`?

`--local` runs an Ollama or LM Studio model on your machine. Don't combine `--local` with `--api_base`; they're mutually exclusive. Use `--local` when you want privacy + zero per-call cost; use jusInfer when you want frontier quality at routed-down cost.

## Multi-step tasks: where the savings compound

OI tasks tend to be long. "Clean this dataset and produce three charts" might be 30-50 model calls: read CSV, inspect schema, write cleaning script, run it, check output, write plotting script, run it, check output, iterate. Per-call cost matters more than per-token cost.

Sample task: "load `sales.csv`, find the top 5 products by revenue in 2025, write each to a separate JSON file":

| Model | Total calls | Total cost | Wall time |
|---|---|---|---|
| Claude Sonnet (direct) | 12 | ~$0.18 | 38s |
| GPT-4.1 (direct) | 11 | ~$0.14 | 41s |
| jusInfer-auto | 12 | ~$0.03 | 35s |

The wall time barely moves because the bottleneck is local code execution, not the model. The cost moves a lot because every "tell me what the columns are" step now lands on a small fast model.

## When you'd stay on the direct provider

- **Custom system prompts that depend on provider-specific tool calling**: OI uses litellm's normalized tool-use, so this is rare.
- **Your org has an inference contract you need to bill against**: jusInfer is a passthrough; underlying providers see jusInfer's account.

## Switching back

Drop the `--api_base` / `--api_key` flags or comment them out in `config.yaml`. OI falls back to whatever was set in environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.).

## Further reading

- [Custom agent harness on an OpenAI-compatible base URL](/blog/custom-agent-harness-openai-compatible), the same pattern for any agent runtime
- [Aider + cheap inference](/blog/aider-cheap-inference), for non-code-executing pair-programming agents
- [jusInfer API reference](/docs/api-reference), the endpoint OI's litellm layer hits
