Skip to content

Releases: askui/python-sdk

v0.35.0

02 Jun 08:53
89a8cec

Choose a tag to compare

v0.35.0

🎉 Overview

v0.35.0 adds support for OpenAI-compatible APIs as model providers, enabling the use of OpenAI, Ollama, vLLM, LM Studio, Together AI, RunPod, and any other service that exposes an OpenAI-compatible chat completions endpoint. Truncation strategies now preserve the first user message across summarization to retain the original task instructions, and the truncation headroom has been doubled to reduce the chance of hitting context limits immediately after truncation.

✨ New Features

  • OpenAIVlmProvider — VLM provider for any OpenAI-compatible API (OpenAI, vLLM, LM Studio, Together AI, etc.) by @philipph-askui in #268
  • OpenAIImageQAProvider — image Q&A provider for any OpenAI-compatible API by @philipph-askui in #268
  • OllamaVlmProvider — convenience wrapper for local Ollama instances with sensible defaults (base_url=http://localhost:11434/v1, model_id=qwen3.5) by @philipph-askui in #268
  • OllamaImageQAProvider — image Q&A via local Ollama instances by @philipph-askui in #268
  • OpenAICompatibleVlmProvider — VLM provider for endpoints that require an exact URL (e.g., RunPod, custom proxies) where the OpenAI SDK's automatic path appending would break the request by @philipph-askui in #268
  • OpenAIMessagesApi — full translation layer between the internal MessageParam format and OpenAI's chat completions API, handling tool calls, image content, thinking blocks, and role alternation by @philipph-askui in #268
  • OpenAIGetModelGetModel implementation for OpenAI-compatible APIs with structured output support by @philipph-askui in #268
  • Built-in pricing data for gpt-5.4, gpt-5.4-mini, and gpt-5.4-nano models by @philipph-askui in #268

🔧 Improvements

  • Truncation strategies now preserve the first user message across summarization, ensuring the original task instructions are never lost when the conversation is truncated by @philipph-askui in #280
  • MAX_INPUT_TOKENS increased from 100k to 200k and TRUNCATION_THRESHOLD lowered from 0.7 to 0.56, roughly doubling the headroom after truncation to reduce the chance of re-triggering truncation immediately by @philipph-askui in #280
  • process_id parameter in list_process_windows tool is now auto-converted to int, preventing tool errors when the agent passes it as a string by @philipph-askui in #279

🐛 Bug Fixes

  • AgentSpeaker now handles the case where the model returns stop_reason='tool_use' but no actual tool call blocks in the content, preventing stopped executions by prompting the model to retry with a valid tool call by @philipph-askui in #278

Full Changelog: v0.34.0...v0.35.0

v0.34.0

20 May 06:10
12b46f6

Choose a tag to compare

v0.34.0

🎉 Overview

v0.34.0 adds new tools that let agents interact with the file system and display configuration on the automation target: ComputerGetFileTool reads files (text or image), ComputerGetFileNamesTool lists directory contents, and ComputerRemoveVirtualDisplaysTool tears down virtual displays. A new clean_virtual_displays controller setting auto-removes virtual displays on startup. The ComputerAgent docstring now documents per-call tool registration via act(..., tools=[...]).

✨ New Features

  • ComputerGetFileTool (experimental) — reads a file at an absolute path on the automation target, returning UTF-8 text as a string or decoded images as PIL.Image.Image by @mlikasam-askui in #277
  • ComputerGetFileNamesTool (experimental) — lists regular file names (not subdirectories) in a directory on the automation target by @mlikasam-askui in #277
  • ComputerRemoveVirtualDisplaysTool (experimental) — removes all virtual displays from the controller, leaving only physical displays active by @mlikasam-askui in #277
  • clean_virtual_displays setting on AskUiControllerClientSettings — when enabled, automatically removes all virtual displays after the controller connects by @mlikasam-askui in #277

🔧 Improvements

  • ComputerAgent docstring updated with examples for per-call tool registration via act(..., tools=[...]) by @mlikasam-askui in #277
  • Pinned askui-agent-os>=26.4.1 on macOS and >=26.5.1 on other platforms to ensure gRPC compatibility with the new commands by @mlikasam-askui in #277

Full Changelog: v0.33.0...v0.34.0

v0.33.0

12 May 14:29
9b6de63

Choose a tag to compare

v0.33.0

🎉 Overview

v0.33.0 introduces AutomationError — a new exception type for unfixable errors that immediately terminate agent execution instead of being auto-corrected. The conversation control loop now properly cleans up via try/finally, ensuring reporters and teardown always run even when errors propagate. This release also corrects the typing speed unit documentation and fixes a bug where messages could be lost if the truncation strategy crashed.

✨ New Features

  • AutomationError — new exception type for unfixable errors (e.g., missing credentials, unreachable services) that propagates immediately to the caller, bypassing the agent's auto-correction retry loop. Regular exceptions remain fixable by the agent as before. by @philipph-askui in #271
  • Documentation for error handling in tools — added a new "Error Handling in Tools" section to the tools guide explaining the distinction between fixable errors (regular exceptions) and unfixable errors (AutomationError) by @philipph-askui in #271

🔧 Improvements

  • Conversation control loop now uses try/finally to guarantee _on_conversation_end() and _teardown_control_loop() execute even when an AutomationError or other exception propagates, preventing resource leaks by @philipph-askui in #271
  • Messages are now reported to the reporter before being passed to the truncation strategy, preventing data loss if truncation crashes by @philipph-askui in #274
  • Truncation failures are now caught, logged, and reported to the reporter with the message "Truncation Failed with error: {e}" before re-raising, improving observability of context-window management errors by @philipph-askui in #274

🐛 Bug Fixes

  • Corrected typing speed unit in ComputerTypeTool description and AgentOs.type() docstring from "characters per minute" to "characters per second" by @philipph-askui in #272

⚠️ Breaking Changes

  • AgentException renamed to AgentError — if you were catching AgentException directly, update your imports to use AgentError from askui.models.shared.tools

Full Changelog: v0.32.1...v0.33.0

v0.32.1

30 Apr 13:54
de61dbf

Choose a tag to compare

v0.32.1

🎉 Overview

v0.32.1 fixes a bug that led to a crash if the optional "web" dependency group was not installed.

🐛 Bug Fixes

Full Changelog: v0.32.0...v0.32.1

v0.32.0

30 Apr 08:54
e480db5

Choose a tag to compare

v0.32.0

🎉 Overview

v0.32.0 introduces the new WebAgent, a browser automation agent with native Playwright tools for mouse, keyboard, and screenshot interactions. The release also adds numpad key support across the AgentOS keyboard abstraction.

✨ New Features

  • WebAgent — a new browser automation agent with a full suite of Playwright tools (screenshot, move_mouse, mouse_click, mouse_scroll, mouse_hold_down, mouse_release, type, keyboard_tap, keyboard_pressed, keyboard_release) in addition to the existing navigation tools by @philipph-askui in #267
  • Numpad key support — added numpad_lock, numpad_0numpad_9, numpad_+, numpad_-, numpad_*, numpad_/, and numpad_. to PcKey with corresponding Playwright key mappings by @mlikasam-askui in #269

🔧 Improvements

⚠️ Breaking Changes

  • WebVisionAgent is deprecated — use WebAgent instead. WebVisionAgent still works but emits a DeprecationWarning
  • WebAgent now extends Agent directly instead of ComputerAgent, with a new constructor signature that accepts callbacks and truncation_strategy parameters
  • Playwright navigation tools (PlaywrightGotoTool, PlaywrightBackTool, etc.) now inherit from PlaywrightBaseTool instead of Tool and require a PlaywrightAgentOs (or compatible) instance as their agent OS

Full Changelog: v0.31.0...v0.32.0

v0.31.0

22 Apr 10:08
301dab3

Choose a tag to compare

v0.31.0

🎉 Overview

v0.31.0 substantially improves the memory efficiency of askui. The SimpleHtmlReporter has been rearchitected to stream message rows (including base64-encoded screenshots) to a temporary file on disk instead of accumulating them in memory, significantly reducing memory usage during long-running sessions. Further, reporters are now wrapped with automatic error handling so that a failure in one reporter no longer crashes the agent.

✨ New Features

  • ReporterErrorHandler — a decorator that wraps any Reporter with try/except error handling; on first failure the reporter is disabled for the rest of the session, preventing reporting errors from interrupting agent execution by @mlikasam-askui in #258

🔧 Improvements

  • SimpleHtmlReporter now streams HTML message rows to a temporary file as they arrive instead of holding all base64 image data in memory, reducing peak memory usage for screenshot-heavy sessions by @mlikasam-askui in #258
  • CompositeReporter now automatically wraps all reporters in ReporterErrorHandler, making error resilience the default behavior by @mlikasam-askui in #258

Full Changelog: v0.30.0...v0.31.0

v0.30.0

15 Apr 12:11

Choose a tag to compare

v0.30.0

🎉 Overview

v0.30.0 introduces a new infrastructure-error handling prompt that prevents agents from entering unfixable retry loops when the underlying controller, session, or RPC connection fails. It also enriches the HTML report's conversation breakdown with per-conversation step counts, durations, and cache token statistics, and quiets noisy tool-failure logs by demoting them from WARNING to INFO.

✨ New Features

  • Infrastructure / tool error prompt added to the computer, Android, and multi-device agent capabilities — instructs agents to retry infrastructure failures (connection lost, session expired, RPC errors, stream closed, service unavailable, controller timeouts) at most once and otherwise stop immediately with a BROKEN report status instead of looping on unfixable errors by @philipph-askui in #265
  • Step count, cache_creation_input_tokens, and cache_read_input_tokens added to the per-conversation usage breakdown in SimpleHtmlReporter by @philipph-askui in #264
  • Per-conversation duration added to the HTML report breakdown — started_at / ended_at timestamps are captured on conversation summaries and rendered in a human-readable elapsed-time format by @philipph-askui in #266

🔧 Improvements

  • Tool failed logs in ToolCollection demoted from WARNING to INFO to reduce log noise during normal agent operation by @philipph-askui in #264

⚠️ Breaking Changes

  • UsageTrackingCallback renamed to ConversationStatisticsCallback

Full Changelog: v0.29.0...v0.30.0

v0.29.0

10 Apr 12:02
dfc4b51

Choose a tag to compare

v0.29.0

🎉 Overview

v0.29.0 replaces the simple message-dropping truncation strategy with a new VLM-based SummarizingTruncationStrategy that summarizes older conversation history to preserve context while staying within token limits. It also fixes mouse scroll coordinate scaling issues, improves scroll tool descriptions with OS-specific guidance, removes get and locate from the default agent tools, hardens the move_mouse tool against malformed coordinate inputs, and makes base64 image truncation in html reports more robust.

✨ New Features

  • SummarizingTruncationStrategy — new default truncation strategy that uses the VLM to summarize older conversation history instead of dropping messages, with prompt caching support during summarization for cost efficiency by @philipph-askui in #257
  • SlidingImageWindowSummarizingTruncationStrategy (experimental) — extends summarization with dynamic image removal from older messages to reduce network traffic and latencies while staying compatible with prompt caching by @philipph-askui in #257
  • truncation_strategy init parameter on ComputerAgent, AndroidAgent, and Agent — allows passing a custom truncation strategy with auto-injection of conversation dependencies (vlm_provider, reporter, callbacks) by @philipph-askui in #257

🔧 Improvements

  • Mouse scroll tool description now includes OS-dependent scroll guidance (start with dy=150/dy=-150, macOS direction info) by @programminx-askui in #260
  • truncate_content in reporting replaced by truncate_base64_images — only base64 image data is replaced with placeholders, leaving all other content (prompts, tool outputs) untouched by @philipph-askui in #259
  • move_mouse tool now robustly parses coordinates when the agent passes them as strings or comma-separated values, with clearer tool description and improved error messages by @philipph-askui in #262

🐛 Bug Fixes

  • Fix incorrect coordinate scaling on mouse scroll deltas — ComputerAgentOsFacade.mouse_scroll no longer applies display scaling to scroll amounts (SOLENG-332) by @programminx-askui in #260

⚠️ Breaking Changes

  • SimpleTruncationStrategy and SimpleTruncationStrategyFactory removed — replaced by SummarizingTruncationStrategy as the new default
  • Conversation constructor parameter truncation_strategy_factory replaced by truncation_strategy (a strategy instance instead of a factory)
  • get and locate tools removed from Agent's default tool list — they are no longer auto-added when an agent_os is provided
  • mouse_scroll parameters renamed from x/y to dx/dy across all AgentOs implementations (AskUiControllerClient, PlaywrightAgentOs, ComputerAgentOsFacade, ComputerAgent)
  • truncate_content function in reporting.py removed — replaced by truncate_base64_images

Full Changelog: v0.28.0...v0.29.0

v0.28.0

03 Apr 10:05
927aa6d

Choose a tag to compare

v0.28.0

🎉 Overview

v0.28.0 integrates AgentOS as a Python package dependency (no more manual installation), adds a UIAutomator hierarchy tool for Android agents, improves support for Anthropic prompt caching to reduce inference cost, introduces Tool.from_mcp_tool() for wrapping FastMCP tools, and overhauls usage tracking with per-step and per-conversation cost breakdowns including cache token costs in the HTML reports.

✨ New Features

  • AgentOS shipped as Python package (askui-agent-os) — no manual installation needed by @mlikasam-askui in #246
  • Anthropic prompt caching (auto strategy) with cache_control parameter by @philipph-askui in #253
  • AndroidGetUIAutomatorHierarchyTool — accessibility hierarchy dump for Android agents, providing structured UI element data (text, resource IDs, tap centers) as an alternative to screenshot-based inference by @mlikasam-askui in #251
  • Hierarchical usage tracking with per-step, per-conversation, and aggregate cost breakdowns including cache token costs in HTML reports by @mlikasam-askui in #253

🔧 Improvements

  • Tool.from_mcp_tool() to wrap FastMCP tools as AskUI Tools by @mlikasam-askui in #250
  • markitdown and bson moved to optional dependencies (office-document) and pure-python-adb promoted to core to streamline the installation by @mlikasam-askui in #255
  • Documented optional install extras (office-document, bedrock, vertex, otel, web) in README by @mlikasam-askui in #255
  • Workspace ID (askui.workspace.id) added to OTEL trace resource attributes by @philipph-askui in #256
  • Improved tracing structure with _get_next_message() span for better observability by @philipph-askui in #256

🐛 Bug Fixes

  • Fix prompt caching breakpoints to improve prompt caching efficiency by @philipph-askui in #253
  • Fix report formatting and cache statistics accumulation by @philipph-askui in #253
  • Constrain grpcio<1.80.0 to avoid compatibility issues by @philipph-askui in #250
  • Clean up OTEL tracing: remove stale cluster_name config and unnecessary SQLAlchemy instrumentation by @philipph-askui in #256

⚠️ Breaking Changes

  • ASKUI_COMPONENT_REGISTRY_FILE, ASKUI_INSTALLATION_DIRECTORY, and ASKUI_CONTROLLER_PATH environment variables are no longer recognized — AgentOS is now auto-discovered via the askui-agent-os package
  • OtelSettings.cluster_name field and ASKUI__OTEL_CLUSTER_NAME env var removed; replaced by workspace_id / ASKUI_WORKSPACE_ID
  • Minimum anthropic SDK version bumped from >=0.72.0 to >=0.86.0
  • android optional extra removed — pure-python-adb is now a core dependency; use office-document extra for MarkItDown features previously bundled by default
  • bson and markitdown removed from default dependencies — install askui[office-document] if you need Office file conversion

Full Changelog: v0.27.0...v0.28.0

v0.27.0

18 Mar 09:55
6f6b83b

Choose a tag to compare

v0.27.0

🎉 Overview

v0.27.0 adds a MultiDeviceAgent that can operate android and computer devices simultaneously, improves the SDK structure by introducing default tool lists for the ComputerAgent, and AndroidAgent, and fixes a bug with single-display handling on android.

✨ New Features

🔧 Improvements

🐛 Bug Fixes

Full Changelog: v0.26.1...v0.27.0