Featured image of post CLI-Anything: The Universal Bridge for AI Agents to Control Any Software

CLI-Anything: The Universal Bridge for AI Agents to Control Any Software

No screenshots, no clicking β€” one command makes Claude Code generate a complete CLI for any software, letting AI agents directly call GIMP, Blender, and LibreOffice

The most common way AI agents operate software is to take a screenshot, run OCR, then simulate mouse clicks. The first time I saw this workflow, something felt off: this isn’t using AI, it’s using AI to impersonate someone with bad eyesight.

Screenshots degrade in quality, resolution is limited, and the whole thing breaks the moment a UI version updates. The more fundamental problem: GUIs were never designed to be called by programs.

CLI-Anything takes a different angle. Instead of teaching AI to navigate a GUI, it lets AI automatically generate a complete CLI for any software. The agent calls the CLI, the CLI drives the backend. That shift in thinking felt right to me.

The Root Problem with GUI Automation

There are three main approaches for AI agents to operate desktop software:

Screenshot + Click (Computer Use): Capture the screen, let a vision model decide where to click. Anthropic’s own Computer Use demo works this way. The problem: Blender’s buttons become unrecognizable with a theme change, LibreOffice has a different UI on Linux vs macOS, a Docker version upgrade moves panels around β€” the whole pipeline breaks.

Limited APIs: Some software has REST APIs or SDKs, but coverage is extremely narrow. GIMP has Script-Fu, Blender has bpy, but these are native scripting interfaces, not designed for agent calls β€” no standard I/O, no JSON output, no self-description capability.

MCP wrappers: Many people are packaging tools as MCP servers, but writing a good MCP requires enormous manual effort. Every piece of software needs its own implementation. It doesn’t scale.

CLI-Anything’s premise: CLI is the universal interface that both humans and AI can use.

Why CLI Is the Right Answer

CLI has four properties that make it especially well-suited for agent calls:

Structured: command subcommand --flag value is deterministic syntax. Unlike GUI, the same command produces consistent results across different environments.

Composable: CLI output can be piped to the next command. Chaining like cli-anything-gimp export --format png | cli-anything-blender import is something GUI simply cannot do.

Self-describing: --help is the manual. An agent doesn’t need to read documentation first β€” just run --help to discover available commands, parameters, and formats. Every command generated by CLI-Anything supports --help.

Deterministic: cli-anything-libreoffice export --format pdf does the same thing every time. Unlike screenshots, it’s unaffected by screen resolution, themes, or DPI.

How CLI-Anything Works: The 7-Step Pipeline

CLI-Anything’s core is a fully automated pipeline. Input: software source code. Output: a complete Python Click CLI. The process has seven steps:

Step 1: Analyze. Scan the source code to find all API calls that correspond to GUI actions. For example, GIMP’s “Export As PNG” maps to gimp-file-overwrite-png; Blender’s “Add Mesh” is bpy.ops.mesh.primitive_cube_add(). This step produces an API mapping document.

Step 2: Design. Based on the API mapping, design the CLI’s command groups, state model, and output format. This step determines the overall CLI structure β€” similar to how git remote and git commit are organized in a hierarchy.

Step 3: Implement. Build the actual CLI with Click, including REPL mode, --json output, and undo/redo support. Each command connects to the real backend API found in Step 1 β€” no stubs.

Step 4: Plan Tests. Automatically generate TEST.md listing unit test and E2E test plans.

Step 5: Write Tests. Implement the full test suite. CLI-Anything currently has 1,508 tests, all passing.

Step 6: Document. Update TEST.md with usage documentation.

Step 7: Publish. Create setup.py and install to system PATH via pip install -e .. After installation, an agent can discover the tool with which cli-anything-blender.

The entire pipeline requires no human intervention. Let Claude Code run it and you get a usable CLI.

Getting Started with Claude Code

Three commands in Claude Code to get going:

1
2
3
4
5
6
7
8
# Install the CLI-Anything plugin
/plugin marketplace add HKUDS/CLI-Anything

# Install CLI-Anything itself
/plugin install cli-anything

# Run the pipeline on your target software (GIMP as example)
/cli-anything ./gimp

If the generated CLI doesn’t cover the functionality you need:

1
/cli-anything:refine ./gimp "batch processing and filters"

refine reruns the pipeline focused on the specified functionality to fill in missing commands.

What the Generated CLI Looks Like

Using LibreOffice as an example, the generated CLI works like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Create a new document
cli-anything-libreoffice document new -o report.json --type writer

# Add a heading
cli-anything-libreoffice --project report.json writer add-heading \
  -t "Q1 Report" --level 1

# Export to PDF
cli-anything-libreoffice --project report.json export render output.pdf -p pdf

# JSON output (for agents)
cli-anything-libreoffice --json document info --project report.json

The backend is real LibreOffice headless β€” not a toy implementation. --project manages state, --json makes output parseable by agents.

Blender has a REPL mode, well-suited for interactive workflows:

1
2
3
4
$ cli-anything-blender
blender> scene new --name ProductShot
blender[ProductShot]> object add-mesh --type cube --location 0 0 1
blender[ProductShot]*> render execute --output render.png --engine CYCLES

The * in the prompt indicates unsaved changes β€” the same logic git uses for status display. That design detail shows the authors actually thought about UX.

Three Key Design Decisions

JSON mode: Every command supports the --json flag, producing machine-readable output. This lets other agents parse results directly without regex-scraping human-readable text. This is standard in agent-native design, but surprisingly many tools skip it.

REPL mode: Running cli-anything-blender with no arguments enters an interactive REPL that maintains session state across commands. Perfect for multi-step workflows like modeling, rendering, and exporting.

Real backends: LibreOffice uses headless mode, Blender uses bpy, Audacity uses sox. The generated CLI doesn’t fake success β€” it actually calls the software. This makes the test suite meaningful: 1,508 passing tests means the functionality genuinely works.

Currently supported: GIMP (107 tests), Blender (208), Inkscape (202), Audacity (161), LibreOffice (158), OBS Studio (153), Kdenlive (155), Shotcut (154), Draw.io (138), AnyGen (50), Zoom (22).

My Observations and Limitations

The results after running it are genuinely impressive, but there are a few limitations worth being clear about.

Requires frontier-class models. The README explicitly notes that Claude Sonnet 4.6 or higher is required. Generating a 1,000+ line Click CLI and running a complete test suite is a reasoning task of real complexity. With a smaller model, the generated CLI will likely have poor coverage.

Requires source code. Step 1 of the pipeline needs to scan source code to find APIs. With only a binary, results degrade significantly. This makes CLI-Anything better suited for open-source software β€” closed-source commercial software is harder.

Complex software may need multiple refinements. Blender has hundreds of operators; a single /cli-anything run won’t cover them all. You’ll need to run /cli-anything:refine targeting specific use cases to fill in the gaps.

None of these are design flaws β€” they’re the reasonable boundaries of this approach under current technical constraints.

Software as Agent Tool

There’s a bigger idea behind CLI-Anything: almost none of the existing software ecosystem has an agent-native interface. GIMP is built for humans, Blender is built for humans, LibreOffice is built for humans. To make these tools accessible to AI agents, there are two paths β€” wait for each software author to add an API one by one, or automatically generate a bridge layer.

CLI-Anything takes the second path, and uses CLI rather than MCP or REST API for the reasons I described earlier: CLI has --help, has pipe, has determinism.

If this direction holds, the number of tools available to AI agents could grow by an order of magnitude. Not by waiting for software vendors to add support, but by generating that support yourself.

The more interesting question for me: if every piece of software has an agent-native CLI, how do agents “discover” and “compose” these tools? CLI-Anything’s pip install -e . + which approach solves the discovery problem, but the semantics of tool composition still need more infrastructure. That might be the next direction worth watching.

References