Featured image of post Free Claude Code: Route Claude Code API Calls to Free Alternatives

Free Claude Code: Route Claude Code API Calls to Free Alternatives

An open-source proxy that lets you keep using Claude Code CLI, VS Code, and JetBrains interfaces while routing API calls to NVIDIA NIM, OpenRouter, DeepSeek, or local models.

Claude Code’s developer experience is excellent, but the API costs add up fast. free-claude-code is an open-source proxy that lets you keep using Claude Code’s CLI, VS Code extension, and JetBrains integration while routing the underlying API calls to free-tier cloud APIs or self-hosted local models.

How It Works

Every Claude Code operation goes through the Anthropic API. This proxy sits in between:

1
2
3
4
5
Claude Code CLI / VS Code / JetBrains
           ↓
    free-claude-code proxy
           ↓
  NVIDIA NIM / OpenRouter / Ollama / ...

The proxy exposes Anthropic-compatible endpoints (/v1/messages, /v1/models, etc.), translates incoming requests to each provider’s format, then translates the responses back to Anthropic’s format. From the Claude Code client’s perspective, it’s just a regular Anthropic API.

Supported Providers

Ten backends are currently supported:

ProviderNotes
NVIDIA NIMFree tier at build.nvidia.com; includes Kimi K2.5, GLM 4.7
OpenRouterAggregates many models; some with free tiers
DeepSeekdeepseek-chat, much cheaper than Opus
KimiMoonshot’s platform.moonshot.ai
Waferwafer.ai; DeepSeek-V4-Pro, GLM-5.1
Z.aiGLM-5.1, GLM-5-turbo
OpenCode Zenopencode.ai; includes deepseek-v4-flash-free
LM StudioLocal server, default localhost:1234
llama.cppLocal server, default localhost:8080
OllamaContainerized local models, default localhost:11434

Per-Tier Model Routing

Claude Code splits requests into three tiers: Opus (main agent), Sonnet, and Haiku (sub-agents). The proxy lets you route each tier to a different model:

1
2
3
MODEL_OPUS=openrouter/qwen/qwen3-235b-a22b:free
MODEL_SONNET=deepseek/deepseek-chat
MODEL_HAIKU=ollama/llama3.1

Opus requests (typically the most expensive) can be routed to a free model; Haiku requests can run locally.

Installation and Setup

Prerequisites: Claude Code CLI and Python uv.

1
2
3
4
5
# Install the proxy
uv tool install --force git+https://github.com/Alishahryar1/free-claude-code.git

# Start the proxy server
fcc-server

After starting, open the displayed localhost address in your browser to access the Admin UI and configure provider API keys.

Then use fcc-claude instead of the regular claude command β€” the launcher automatically injects the required environment variables.

Client Integration

VS Code

Add to settings.json:

1
2
3
4
5
6
7
{
  "claude.env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8082",
    "ANTHROPIC_AUTH_TOKEN": "freecc",
    "CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY": "1"
  }
}

JetBrains

Edit the ACP configuration file (path varies by platform) with the same three environment variables.

Once configured, the IDE’s model picker also works β€” the proxy’s /v1/models endpoint exposes all available models for visual selection.

Optional Features

Discord / Telegram bots: Wrap Claude Code sessions in a bot for remote task management, streaming progress, and conversation branches. Requires bot tokens and channel IDs.

Voice transcription: Connect Whisper or NVIDIA NIM for voice-to-text input via the messaging platforms.

Actual Limitations

A few real constraints to keep in mind:

Model capability gap: Many of Claude Code’s strengths β€” long context, accurate tool calls, complex reasoning β€” are specific to Claude models. Switching to alternatives may degrade agentic reliability, especially tool call accuracy, which drives most of Claude Code’s workflow.

Free tier rate limits: NVIDIA NIM and OpenRouter free models typically have RPM/TPD caps. Heavy usage will hit rate limits.

Local model resource requirements: Running llama.cpp or Ollama needs sufficient VRAM/RAM. Performance is noticeably slower than cloud APIs.

When It Makes Sense

  • Trying out Claude Code without committing to Anthropic API costs
  • Mostly doing simple tasks (file edits, formatting, small features) that don’t need top-tier models
  • You have a GPU and prefer paying with electricity instead of API fees
  • Comparing how different models perform under the Claude Code interface

If your work depends on Claude’s long-context handling or complex agentic tasks, swapping models will likely cause tool call failures or reasoning errors. In that case, analyzing your usage structure and optimizing cache usage may be more practical than switching models β€” see the earlier posts in this series.

References