Skip to content

Latest commit

 

History

History
177 lines (127 loc) · 5.92 KB

File metadata and controls

177 lines (127 loc) · 5.92 KB

OmniMCP

OmniMCP is a UI automation system that enables Claude to control the computer through the Model Control Protocol (MCP). It combines OmniParser's visual understanding with Claude's natural language capabilities to automate UI interactions.

Standalone Installation (minimal dependencies)

This standalone package provides OmniMCP with minimal dependencies, letting you use the core functionality without installing all of OpenAdapt's dependencies. It's part of a larger refactoring effort to make components more modular and easier to use.

Prerequisites

  • Python 3.10 or 3.11
  • uv - Fast Python package installer and resolver
    # Install uv
    curl -LsSf https://astral.sh/uv/install.sh | sh

Install OmniMCP

# Clone the OpenAdapt repository
git clone https://github.com/OpenAdaptAI/OpenAdapt.git
cd OpenAdapt/omnimcp

# Run the installation script (creates a virtual environment using uv)
# For Unix/Mac:
./install.sh
# Note: If you get a permission error, run: chmod +x ./install.sh

# For Windows:
install.bat

This installation method:

  1. Creates an isolated virtual environment using uv
  2. Only installs the dependencies needed for OmniMCP
  3. Sets up Python to find the required OpenAdapt modules without installing the full package

Usage

After installation, activate the virtual environment:

# For Unix/Mac
source .venv/bin/activate

# For Windows
.venv\Scripts\activate.bat

Development

For development and testing, you can reset the environment with:

# Reset the virtual environment and reinstall dependencies
cd /path/to/OpenAdapt/omnimcp
rm -rf .venv && chmod +x install.sh && ./install.sh

Running OmniMCP

# Run CLI mode (direct command input)
omnimcp cli

# Run MCP server (for Claude Desktop)
omnimcp server

# Run in debug mode to visualize screen elements
omnimcp debug

# Run Computer Use mode (Anthropic's official Computer Use integration)
computer-use

# Connect to a remote OmniParser server
omnimcp cli --server-url=https://your-omniparser-server.example.com

# Deploy OmniParser automatically without confirming
omnimcp cli --auto-deploy-parser --skip-confirmation

# IMPORTANT: Always use auto-deploy with skip-confirmation
omnimcp cli --auto-deploy-parser --skip-confirmation

# Disable automatic OmniParser deployment attempt
omnimcp cli --auto-deploy-parser=False

# With additional options
omnimcp cli --use-normalized-coordinates
omnimcp debug --debug-dir=/path/to/debug/folder

# Computer Use with specific model
computer-use --model=claude-3-opus-20240229

# Computer Use with auto-deploy of OmniParser
computer-use --auto-deploy-parser --skip-confirmation

OmniParser Configuration

OmniMCP requires access to an OmniParser server for analyzing screenshots:

  1. Use a Remote OmniParser Server (Recommended)

    omnimcp cli --server-url=https://your-omniparser-server.example.com
  2. Auto-Deploy OmniParser (Convenient but requires AWS credentials)

    • By default, OmniMCP will offer to deploy OmniParser if not available
    • You can control this behavior with these flags:
      # Deploy without asking for confirmation
      omnimcp cli --auto-deploy-parser --skip-confirmation
      
      # Disable auto-deployment completely
      omnimcp cli --auto-deploy-parser=False
  3. Use the Default Local Server

    • OmniMCP will try to connect to http://localhost:8000 by default
    • This requires running an OmniParser server locally
  4. IMPORTANT: Always Use Auto-Deploy with Skip-Confirmation

    • For best results, always use these flags together:
    omnimcp cli --auto-deploy-parser --skip-confirmation

Future Direction: Anthropic ComputerUse Integration

OmniMCP and Anthropic's ComputerUse both enable Claude to control computers, but with different architectural approaches:

Key Differences

Integration Approach:

  • OmniMCP uses OmniParser for understanding UI elements
  • ComputerUse captures screenshots and provides them directly to Claude

Environment:

  • OmniMCP runs directly on the host system with minimal dependencies
  • ComputerUse operates in a containerized virtual desktop environment

MCP vs. Anthropic-defined Tools:

  • OmniMCP uses the Model Control Protocol (MCP), a structured protocol for AI models to interact with tools
  • ComputerUse uses Anthropic-defined tools (computer, text_editor, and bash) via Claude's tool use API

Potential Integration Paths

Future OmniMCP development could:

  1. Dual Protocol Support: Support both MCP and Anthropic-defined tools
  2. Container Option: Provide a containerized deployment similar to ComputerUse
  3. Unified Approach: Create a bridge between MCP and ComputerUse tools
  4. Feature Parity: Incorporate ComputerUse capabilities while maintaining MCP compatibility

Both approaches have merits, and integrating aspects of ComputerUse could enhance OmniMCP's capabilities while preserving its lightweight nature and existing MCP integration.

Features

  • Visual UI analysis with OmniParser
  • Natural language understanding with Claude
  • Keyboard and mouse control with pynput
  • Model Control Protocol integration
  • Debug visualizations

Structure

OmniMCP uses code from the OpenAdapt repository but with a minimal set of dependencies. The key components are:

  • omnimcp/pyproject.toml: Minimal dependency list
  • omnimcp/setup.py: Setup script that adds OpenAdapt to the Python path
  • omnimcp/omnimcp/ package:
    • omnimcp/omnimcp/omnimcp.py: Core OmniMCP functionality
    • omnimcp/omnimcp/run_omnimcp.py: CLI interface
    • omnimcp/omnimcp/computer_use.py: Computer Use integration
    • omnimcp/omnimcp/pathing.py: Python path configuration
    • omnimcp/omnimcp/adapters/omniparser.py: OmniParser client and provider
    • omnimcp/omnimcp/mcp/server.py: Model Control Protocol server implementation