Local control: in-SDK sidecar + desktop/browser drivers by abonneth · Pull Request #161 · hcompai/hai-agents-python

abonneth · 2026-06-29T16:08:55Z

Made with Cursor

Note

High Risk
Enables real local browser/desktop control and subprocess execution on the user machine; incorrect wiring or driver bugs could affect the host OS, though optional installs and a per-session lease limit blast radius.

Overview
Adds local computer-use so agents can drive the user’s machine via an in-process sidecar that long-polls the platform for commands and runs them on local drivers, instead of only remote hosted environments.

Packaging: optional extras hai-agents[desktop] (pyautogui, pillow) and [browser] (selenium, markdownify); all composes them; wheel force-includes bundled browser JS (defuddle / h.js).

SDK wiring: Client / AsyncClient now expose agents and sessions subclasses that, on create/update agent and create session, rewrite user_device environments (and nested subagents) to inject a deterministic session_id derived from environment id, API key, and capability (web → browser, desktop → desktop).

Sidecar: SidecarClient ensures a trajectory channel, polls /api/v1/commands/..., dispatches by method name to LocalDesktopDriver or SeleniumWebDriver, posts results with idempotent caching by command_uid, and uses a machine lease so only one sidecar owns a session. CLI: hai local browser (Chrome debugger port) and hai local desktop.

Drivers: Desktop driver covers screenshots/observation snapshots, pointer/keyboard, files, and run_command (including detach on Windows/macOS). Browser driver attaches to Chrome via CDP, blocks risky URL schemes, injects page helper JS for viewport/DOM work, and implements navigation, input, tabs, cookies, and observation bundles (screenshot + markdown).

^{Reviewed by Cursor Bugbot for commit 1c267dc. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add a deny-by-default CapabilityPolicy that gates which command names a local browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage, and secrets are opt-in), a name-keyed driver registry so one package can host many drivers, and the command-name contract mirroring the hai_drivers interfaces. Co-authored-by: Cursor <cursoragent@cursor.com>

Long-polling sidecar (single-owner lease, connect-time drain, command_uid replay cache + echo), capability policy (deny-by-default with opt-ins), driver registry, pyautogui desktop driver and Selenium browser driver. Co-authored-by: Cursor <cursoragent@cursor.com>

…e open Co-authored-by: Cursor <cursoragent@cursor.com>

…+ config knobs Policy now derives allowed commands from the driver's public methods minus the danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method lists that duplicated the drivers. Replace the driver registry with a direct lazy factory and trim SidecarConfig to essentials. Co-authored-by: Cursor <cursoragent@cursor.com>

- serialize_result recurses into dicts (fixes get_observation_snapshot crash) - browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach - desktop: run_command merges os.environ instead of replacing it - sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown - drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay) - split drivers into desktop/ and browser/ subpackages Co-authored-by: Cursor <cursoragent@cursor.com>

…constants Co-authored-by: Cursor <cursoragent@cursor.com>

…down - vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard - extract_markdown -> Defuddle (main-content, in-browser) - get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM) - viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback - ship js assets via wheel force-include Co-authored-by: Cursor <cursoragent@cursor.com>

…l` CLI Client now injects the local session_id for any source:"local" environment on create_agent/update_agent/patch_agent and on inline-agent create_session, so callers only pass source:"local" and the env id. Adds `hai local browser` and `hai local desktop` to run the sidecar from the CLI. Co-authored-by: Cursor <cursoragent@cursor.com>

…e, typed envs) - enter_secret clicks (x, y) to focus the target before typing, so the secret lands in the field the agent pointed at instead of stale focus. - get_tab_title honors tab_id by switching, reading, and restoring the tab. - close_active_tab guards against an empty handle list after closing the last tab. - localize_environments/localize_agent now wire source:"local" envs whether they arrive as dicts or typed Pydantic models. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-29T21:38:35Z

+        if not allow_cookies:
+            allowed -= _COOKIES
+        if not allow_secrets:
+            allowed -= _SECRETS


Script policy bypass via helpers

High Severity

With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.

^{Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

…ce SidecarBusyError in CLI update_agent/patch_agent take agent_name positionally; the **kwargs-only wrappers raised TypeError on a positional call. Accept *args and pass through. `hai local browser/desktop` acquired the lease inside asyncio.run, outside the guarded block, so a busy sidecar dumped a raw traceback; route it to the CLI error. Co-authored-by: Cursor <cursoragent@cursor.com>

…ingle-source capability - sidecar: cache command_uid -> result and re-post on redelivery instead of re-running side effects (a transient result-POST failure left the command pending and re-executed it on the next poll). - desktop driver: route keyboard through pyautogui (matches the Key contract and the remote executor); drop pynput, whose member names (enter/esc) diverge from the contract names (return/escape) and silently failed. - policy: walk the full MRO so inherited driver methods are gated, not just those declared on the concrete class. - config/wiring: KIND_TO_CAPABILITY is the single source; _CAPABILITIES derives from it. - pyproject: drop unused pynput; collapse the all extra to a self-reference. Co-authored-by: Cursor <cursoragent@cursor.com>

…shutdown Remove CapabilityPolicy and the --allow-* CLI flags: the GUI keystroke and script paths reach the same surface anyway, so the gate was a formality. - _dispatch rejects unknown/private names cleanly instead of crashing the poll loop - build the driver only after the machine lease is acquired (no leak on busy lease) - SIGINT/SIGTERM now stop the sidecar cooperatively so in-flight commands finish - floor the 429 backoff so Retry-After: 0 can't busy-loop - guard malformed fetch bodies so a bad json() doesn't kill the loop Co-authored-by: Cursor <cursoragent@cursor.com>

- desktop snapshot emits screenshot_b64 (str), the field ObservationSnapshot requires; the old screenshot_png key raised a validation error on every observe - release_key clears its modifier bit instead of XOR (stray release no longer flips it back on); key mask mutates only after a successful perform - CDP mouse events carry the buttons bitmask so drags register, and moves use button "none" - _run_script keeps the iframe guard on during retries so transient blocks retry - _focus_new_tab switches to the genuinely new handle, not window_handles[-1] - block chrome-extension/devtools/filesystem URL schemes - a kind-less dict env defaults to web so session_id still autowires Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-30T20:57:21Z

+        self._action_builder = ActionBuilder
+        self._destroyed = False
+        self.cursor_x = 0
+        self.cursor_y = 0


Mouse position never synced

Medium Severity

SeleniumWebDriver keeps cursor_x/cursor_y for CDP mouse events but initializes them to (0, 0) and never reads the browser’s actual pointer. webpage_metadata and observation_bundle expose that stale position, and click, mouse_press, and scroll can act at the wrong coordinates when no prior mouse_move_to ran.

Additional Locations (1)

src/hai_agents/local/browser/driver.py#L538-L545

^{Reviewed by Cursor Bugbot for commit 7345734. Configure here.}

…rt.py Three single-purpose helper modules become one; the defuddle bundle is now read lazily and cached on first extract_markdown instead of at import. Co-authored-by: Cursor <cursoragent@cursor.com>

Cosmetic: module tunables read as plain UPPER_CASE. Class-private methods and driver internals keep the underscore (the dispatch firewall keys off it). Co-authored-by: Cursor <cursoragent@cursor.com>

…sktop->pyautogui_desktop Package dirs now name their implementation. CLI commands, capability strings, install extras, and class names are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-30T22:51:18Z

+
+    def click(self, x: int, y: int, button: str = "left") -> None:
+        self._pyautogui.click(x=x, y=y, button=button)
+        self._settle_after_click()


Desktop click coordinate space mismatch

High Severity

get_observation_snapshot reports the cursor in screenshot pixel space (including after width downscaling via screenshot_max_width), but click, mouse_move_to, and related input helpers forward those coordinates unchanged to PyAutoGUI, which expects logical screen coordinates from _screen_size. Agents aiming from the observation image will miss clicks whenever capture dimensions differ from the stored screen size.

Additional Locations (1)

src/hai_agents/local/pyautogui_desktop/driver.py#L79-L81

^{Reviewed by Cursor Bugbot for commit db90a85. Configure here.}

… cached - localize_agent/create_agent now recurse into inline subagents, so a local browser/desktop child gets its session_id (was only top-level environments) - the result cache is now LRU: a cache hit refreshes recency so an actively redelivered command_uid is not evicted and re-executed mid-retry Co-authored-by: Cursor <cursoragent@cursor.com>

Stored but never read (vestigial in the upstream driver too); removing it so the constructor doesn't advertise an option that does nothing. Co-authored-by: Cursor <cursoragent@cursor.com>

Matches the consolidated hai_drivers desktop interface (single screenshot_b64 method); the command proxy forwards screenshot_b64, so the desktop driver must expose it rather than screenshot_png_bytes. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1c267dc. Configure here.}

cursor · 2026-07-01T11:52:55Z

+    def click(self, button: str = "left", delay_before_release: float = 0.05) -> None:
+        self.mouse_press(button=button)
+        time.sleep(delay_before_release)
+        self.mouse_release(button=button)


Browser sidecar click args mismatch

High Severity

The sidecar invokes driver methods by RPC name with JSON args. Desktop control uses click with x/y, and sidecar tests dispatch the same shape, but SeleniumWebDriver.click only accepts button and delay_before_release. Browser click requests carrying coordinates raise a TypeError or never move the pointer before clicking.

^{Reviewed by Cursor Bugbot for commit 1c267dc. Configure here.}

abonneth and others added 7 commits June 29, 2026 17:31

fix(local): browser destroy stops chromedriver, leaves attached Chrom…

f6d3585

…e open Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(local): drop leading underscore on new module-level helpers/…

a26e295

…constants Co-authored-by: Cursor <cursoragent@cursor.com>

abonneth marked this pull request as ready for review June 29, 2026 18:15

abonneth requested a review from adeprezh as a code owner June 29, 2026 18:15

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/selenium_browser/driver.py

Comment thread src/hai_agents/local/browser/driver.py Outdated

Comment thread src/hai_agents/local/browser/driver.py Outdated

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/wiring.py

abonneth and others added 2 commits June 29, 2026 23:10

refactor(local): source values user_device/cloud (was local/remote)

0d1c561

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 29, 2026

View reviewed changes

refactor(local): source->host in autowiring

8a1810f

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/browser/driver.py Outdated

abonneth and others added 4 commits June 30, 2026 13:57

cursor Bot reviewed Jun 30, 2026

View reviewed changes

abonneth and others added 2 commits June 30, 2026 23:01

refactor(local/browser): consolidate hjs/defuddle/markdown into suppo…

32ce8a5

…rt.py Three single-purpose helper modules become one; the defuddle bundle is now read lazily and cached on first extract_markdown instead of at import. Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(local): drop leading underscore on module-level constants

aeca68a

Cosmetic: module tunables read as plain UPPER_CASE. Class-private methods and driver internals keep the underscore (the dispatch firewall keys off it). Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/hai_agents/local/wiring.py Outdated

Comment thread src/hai_agents/local/sidecar.py

refactor(local): rename driver packages browser->selenium_browser, de…

db90a85

…sktop->pyautogui_desktop Package dirs now name their implementation. CLI commands, capability strings, install extras, and class names are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/hai_agents/local/selenium_browser/driver.py Outdated

refactor(local/browser): drop unused disable_html flag

2a055f1

Stored but never read (vestigial in the upstream driver too); removing it so the constructor doesn't advertise an option that does nothing. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local control: in-SDK sidecar + desktop/browser drivers#161

Local control: in-SDK sidecar + desktop/browser drivers#161
abonneth wants to merge 21 commits into
mainfrom
antoine/local-control

abonneth commented Jun 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 29, 2026

Uh oh!

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

abonneth commented Jun 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 29, 2026

Choose a reason for hiding this comment

Script policy bypass via helpers

Uh oh!

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Mouse position never synced

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Desktop click coordinate space mismatch

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Browser sidecar click args mismatch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abonneth commented Jun 29, 2026 •

edited by cursor Bot

Loading