WIP: web_fetch_content tool by p12tic · Pull Request #749 · Zoo-Code-Org/Zoo-Code

p12tic · 2026-06-27T19:07:00Z

Related GitHub Issue

Closes: #748

Description

Implements web_fetch_content tool.

Test Procedure

Manual testing.

Pre-Submission Checklist

Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
Scope: My changes are focused on the linked issue (one major feature/fix per PR).
Self-Review: I have performed a thorough self-review of my code.
Testing: New and/or updated tests have been added to cover my changes (if applicable).
Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Documentation Updates

Additional Notes

Please let me know if this is something that can be added to ZooCode and I will finish polishing it. I suspect at least docs will need updates.

Summary by CodeRabbit

New Features
- Added support for fetching web content as a new tool, including URL-based requests and optional prompt context.
- Web content actions now appear in the chat UI with a clearer label and target URL display.
- Added a new web tool group and updated available modes to include web access.
Bug Fixes
- Improved handling of web-fetch tool calls in progress updates and final results.
- Added localized chat messages and tool labels across supported languages.

coderabbitai · 2026-06-27T19:07:08Z

📝 Walkthrough

Walkthrough

Adds a new built-in fetch_web_content native tool that fetches HTTP/HTTPS URLs with user-approval gating, byte and character limits, HTML-to-text conversion via Cheerio, and JSON pretty-printing. The tool is registered in the OpenAI tool list, wired into the assistant message parser and dispatcher, exposed as a new web tool group in the code and debug modes, rendered in the chat UI with a Globe icon, and localized across 20 languages.

Changes

fetch_web_content built-in tool

Layer / File(s)	Summary
Type and schema contracts `packages/types/src/tool.ts`, `packages/types/src/mode.ts`, `packages/types/src/vscode-extension-host.ts`, `src/shared/tools.ts`, `schemas/roomodes.json`	Adds `"web"` to `toolGroups`, `"fetch_web_content"` to `toolNames`, a typed `NativeToolArgs` entry, `fetchWebContent` variant on `ClineSayTool` with optional `url`, a new `TOOL_GROUPS.web` entry, `"web"` in `code` and `debug` mode groups, and expands the roomodes JSON schema to allow `"web"` in groups enums.
OpenAI tool definition and registry `src/core/prompts/tools/native-tools/fetch_web_content.ts`, `src/core/prompts/tools/native-tools/index.ts`, `src/package.json`	Creates the OpenAI function tool definition with strict `url`/`prompt` parameter schema, registers it in `getNativeTools()`, and adds `cheerio` as a runtime dependency.
FetchWebContentTool core `src/core/tools/FetchWebContentTool.ts`	Implements `FetchWebContentTool` with URL validation, HTTP/HTTPS-only enforcement, `AbortController` timeout, streaming response reads capped at `MAX_RESPONSE_BYTES`, `htmlToText` via Cheerio, JSON pretty-printing, structured result formatting, and `handlePartial` URL-stabilization logic. Exports a singleton `fetchWebContentTool`.
Parser and dispatcher wiring `src/core/assistant-message/NativeToolCallParser.ts`, `src/core/assistant-message/presentAssistantMessage.ts`, `src/core/auto-approval/tools.ts`	Adds `fetch_web_content` branches to `createPartialToolUse` and `parseToolCall`, wires tool dispatch and description in `presentAssistantMessage`, and adds `fetchWebContent` to the read-only auto-approval list.
Tests `src/core/tools/__tests__/fetchWebContentTool.spec.ts`, `src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`, `src/shared/__tests__/modes.spec.ts`	Adds 724-line spec for `FetchWebContentTool` covering execute, handlePartial, and htmlToText; adds NativeToolCallParser tests for parse/streaming/finalization of `fetch_web_content`; updates modes spec to assert `"web"` in debug mode groups.
Chat UI rendering `webview-ui/src/components/chat/ChatRow.tsx`, `webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx`	Adds a `fetchWebContent` case to `ChatRow` using the `Globe` icon and `wantsToFetch`/`didFetch` localized strings, with URL display in a `ToolUseBlockHeader`; adds UI tests asserting icon, prompt text, and URL rendering.
i18n across 20 locales `webview-ui/src/i18n/locales//chat.json`, `webview-ui/src/i18n/locales//prompts.json`	Adds `webFetch.wantsToFetch` and `webFetch.didFetch` to chat locale files and `tools.toolNames.web` to prompts locale files for ca, de, en, es, fr, hi, id, it, ja, ko, nl, pl, pt-BR, ru, tr, vi, zh-CN, zh-TW.

Sequence Diagram(s)

sequenceDiagram
  participant NativeToolCallParser
  participant presentAssistantMessage
  participant FetchWebContentTool
  participant User as User Approval
  participant fetch as HTTP fetch

  NativeToolCallParser->>presentAssistantMessage: parsed fetch_web_content ToolUse (url, prompt)
  presentAssistantMessage->>FetchWebContentTool: handle(params, task, callbacks)
  FetchWebContentTool->>FetchWebContentTool: validate URL + enforce HTTP/HTTPS
  FetchWebContentTool->>User: askApproval(fetchWebContent, url)
  User-->>FetchWebContentTool: approved
  FetchWebContentTool->>fetch: GET url (AbortController timeout, max bytes)
  fetch-->>FetchWebContentTool: streaming response body
  FetchWebContentTool->>FetchWebContentTool: htmlToText / JSON.stringify / raw text
  FetchWebContentTool-->>presentAssistantMessage: pushToolResult(formatted content)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

awaiting-review

Suggested reviewers

JamesRobert20
navedmerchant
hannesrudolph
edelauna
taltas

Poem

🐇 A rabbit hops across the web,
fetching pages, ebb by ebb,
HTML stripped to plain-text treasure,
JSON pretty-printed for good measure,
Globe icon glows in the chat UI—
no MCP needed, just Zoo nearby! 🌐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description includes the required issue link, summary, testing, checklist, screenshot, and docs notes, though briefly.
Linked Issues check	✅ Passed	The changes match `#748` by adding a built-in web content fetch tool plus the wiring, parsing, UI, and tests it needs.
Out of Scope Changes check	✅ Passed	The changes stay focused on the new fetch_web_content feature and its supporting types, tests, UI, and locales.
Title check	✅ Passed	The title identifies the main change, the new web_fetch_content tool, and stays concise.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-27T19:46:10Z

Codecov Report

❌ Patch coverage is 93.65079% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../core/assistant-message/presentAssistantMessage.ts	0.00%	3 Missing ⚠️
src/core/tools/FetchWebContentTool.ts	97.32%	0 Missing and 3 partials ⚠️
src/core/assistant-message/NativeToolCallParser.ts	83.33%	0 Missing and 1 partial ⚠️
webview-ui/src/components/chat/ChatRow.tsx	50.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

p12tic · 2026-06-28T19:17:02Z

@coderabbitai please review

coderabbitai · 2026-06-28T19:17:24Z

@p12tic The PR description image shows the Zoo Code UI. At the top, there’s a user prompt: “Please fetch bbc.com and tell me what news are there”. Below that, the interface shows an “API Request” section, then a message from Zoo: “No skills apply to this request — it's a simple web content fetch task.” A second section says “Zoo wants to fetch web content” and displays the URL https://www.bbc.com. Lower down, a green “Task Completed” status appears with returned content headed “Here are the current top news stories from BBC.com (June 27, 2026):”. Visible headlines include “Venezuela Earthquake Death Toll Tops 1,400”, “US Strikes Iran After Attack on Cargo Ship”, “Burkina Faso Severs Diplomatic Ties with France”, “Meloni and Trump: A Very Public Fall-Out”, and “Europe's Deadly Heatwave Breaks German Record”.

I’ll review the PR now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/shared/__tests__/modes.spec.ts (1)
607-617: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Add the matching code-mode assertion for web.

The new contract is that the web group is available in both debug and code, but this spec only pins debug. If code drops web, this layer will still stay green.

As per coding guidelines, "Use package-local unit tests for pure logic, parsing, state transitions, validation, serialization, request construction, retry decisions, and error handling."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/__tests__/modes.spec.ts` around lines 607 - 617, The debug mode
spec only asserts that the `web` group exists on `debug`, so it can miss
regressions in `code`. Update `modes.spec.ts` by adding a matching assertion for
the `code` mode using the same `modes.find(...)`/`toMatchObject(...)` pattern,
and verify its `groups` also include `web` alongside the existing `debug`
checks.
Source: Coding guidelines

🧹 Nitpick comments (1)

src/shared/tools.ts (1)
119-119: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Align prompt nullability with the native args contract.

FetchWebContentTool.execute and the new tool schema both accept prompt: null, but NativeToolArgs.fetch_web_content narrows it to string | undefined. The parser currently casts through that mismatch, so future callers lose type protection here. Prefer prompt?: string | null if null remains part of the wire format.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/shared/tools.ts` at line 119, The `NativeToolArgs.fetch_web_content` type
is narrower than the actual `FetchWebContentTool.execute` and tool schema
contract because it excludes `null` for `prompt`. Update the `fetch_web_content`
entry in `NativeToolArgs` to allow `prompt?: string | null`, and make sure the
parser and any related type references in `FetchWebContentTool` continue to
align with that wire format so callers keep proper type safety.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts`:
- Around line 376-389: The NativeToolCallParser streaming tests only cover a
complete JSON payload in one call, so they miss accumulator and partial-JSON
behavior in NativeToolCallParser.processStreamingChunk. Update the
fetch_web_content and related streaming specs in NativeToolCallParser.spec.ts to
split url and prompt across multiple fragmented chunks, asserting intermediate
partial results and final assembled nativeArgs so the accumulator logic in
startStreamingToolCall/processStreamingChunk is exercised. Use the existing
NativeToolCallParser and processStreamingChunk test cases as the place to add
these multi-chunk assertions.

In `@src/core/tools/FetchWebContentTool.ts`:
- Around line 175-215: The FetchWebContentTool validation currently only blocks
non-http(s) schemes, so it still allows SSRF targets like localhost,
private/link-local IPs, and metadata endpoints; add host-target filtering in
FetchWebContentTool before the fetch call and in the redirect path. Use the
existing url/parsedUrl handling in FetchWebContentTool to resolve the hostname,
reject internal or loopback destinations, and re-check each redirect destination
before following it. Also update src/core/auto-approval/tools.ts so
fetchWebContent is not treated as read-only auto-approval until these
network-safety checks are in place.

In `@webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx`:
- Around line 60-136: Add a Vitest case in ChatRow.fetch-web-content.spec.tsx to
cover the completed fetch state handled by ChatRow’s fetchWebContent rendering
path. Right now the suite only exercises type: "ask" with ask: "tool", so the
didFetch branch for type: "say" and say: "tool" remains untested; add a local
render assertion that the completed web content UI still shows the expected
URL/message. Use the existing ChatRow test helpers and keep the coverage under
webview-ui/src/components/chat/__tests__ rather than adding an e2e test.

In `@webview-ui/src/components/chat/ChatRow.tsx`:
- Around line 1019-1027: The `didFetch` branch in `ChatRow` is unreachable
because `fetchWebContent` is handled only in the `message.ask === "tool"` path,
while completed fetch events are routed through the later `message.type ===
"say"` switch and never rendered by `case "tool"`. Update the message routing so
`fetchWebContent` is handled in the `message.type === "say"` tool-rendering flow
as well, or otherwise make the `fetchWebContent` case reachable for completed
tool fetches. Use the existing `case "fetchWebContent"` and `case "tool"`
branches in `ChatRow` to keep the want-to-fetch/did-fetch states aligned.

---

Outside diff comments:
In `@src/shared/__tests__/modes.spec.ts`:
- Around line 607-617: The debug mode spec only asserts that the `web` group
exists on `debug`, so it can miss regressions in `code`. Update `modes.spec.ts`
by adding a matching assertion for the `code` mode using the same
`modes.find(...)`/`toMatchObject(...)` pattern, and verify its `groups` also
include `web` alongside the existing `debug` checks.

---

Nitpick comments:
In `@src/shared/tools.ts`:
- Line 119: The `NativeToolArgs.fetch_web_content` type is narrower than the
actual `FetchWebContentTool.execute` and tool schema contract because it
excludes `null` for `prompt`. Update the `fetch_web_content` entry in
`NativeToolArgs` to allow `prompt?: string | null`, and make sure the parser and
any related type references in `FetchWebContentTool` continue to align with that
wire format so callers keep proper type safety.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a036f0c0-1591-4915-a7c7-f740d0b9ce47

📥 Commits

Reviewing files that changed from the base of the PR and between 83fc6bb and ba0ca59.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (53)

packages/types/src/mode.ts
packages/types/src/tool.ts
packages/types/src/vscode-extension-host.ts
schemas/roomodes.json
src/core/assistant-message/NativeToolCallParser.ts
src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts
src/core/assistant-message/presentAssistantMessage.ts
src/core/auto-approval/tools.ts
src/core/prompts/tools/native-tools/fetch_web_content.ts
src/core/prompts/tools/native-tools/index.ts
src/core/tools/FetchWebContentTool.ts
src/core/tools/__tests__/fetchWebContentTool.spec.ts
src/package.json
src/shared/__tests__/modes.spec.ts
src/shared/tools.ts
webview-ui/src/components/chat/ChatRow.tsx
webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx
webview-ui/src/i18n/locales/ca/chat.json
webview-ui/src/i18n/locales/ca/prompts.json
webview-ui/src/i18n/locales/de/chat.json
webview-ui/src/i18n/locales/de/prompts.json
webview-ui/src/i18n/locales/en/chat.json
webview-ui/src/i18n/locales/en/prompts.json
webview-ui/src/i18n/locales/es/chat.json
webview-ui/src/i18n/locales/es/prompts.json
webview-ui/src/i18n/locales/fr/chat.json
webview-ui/src/i18n/locales/fr/prompts.json
webview-ui/src/i18n/locales/hi/chat.json
webview-ui/src/i18n/locales/hi/prompts.json
webview-ui/src/i18n/locales/id/chat.json
webview-ui/src/i18n/locales/id/prompts.json
webview-ui/src/i18n/locales/it/chat.json
webview-ui/src/i18n/locales/it/prompts.json
webview-ui/src/i18n/locales/ja/chat.json
webview-ui/src/i18n/locales/ja/prompts.json
webview-ui/src/i18n/locales/ko/chat.json
webview-ui/src/i18n/locales/ko/prompts.json
webview-ui/src/i18n/locales/nl/chat.json
webview-ui/src/i18n/locales/nl/prompts.json
webview-ui/src/i18n/locales/pl/chat.json
webview-ui/src/i18n/locales/pl/prompts.json
webview-ui/src/i18n/locales/pt-BR/chat.json
webview-ui/src/i18n/locales/pt-BR/prompts.json
webview-ui/src/i18n/locales/ru/chat.json
webview-ui/src/i18n/locales/ru/prompts.json
webview-ui/src/i18n/locales/tr/chat.json
webview-ui/src/i18n/locales/tr/prompts.json
webview-ui/src/i18n/locales/vi/chat.json
webview-ui/src/i18n/locales/vi/prompts.json
webview-ui/src/i18n/locales/zh-CN/chat.json
webview-ui/src/i18n/locales/zh-CN/prompts.json
webview-ui/src/i18n/locales/zh-TW/chat.json
webview-ui/src/i18n/locales/zh-TW/prompts.json

coderabbitai · 2026-06-28T19:23:54Z

+		describe("fetch_web_content tool", () => {
+			it("should emit a partial ToolUse with nativeArgs.url during streaming", () => {
+				const id = "toolu_streaming_fetch_1"
+				NativeToolCallParser.startStreamingToolCall(id, "fetch_web_content")
+
+				const fullArgs = JSON.stringify({ url: "https://example.com", prompt: "Find info" })
+				const result = NativeToolCallParser.processStreamingChunk(id, fullArgs)
+
+				expect(result).not.toBeNull()
+				expect(result?.nativeArgs).toBeDefined()
+				const nativeArgs = result?.nativeArgs as { url: string; prompt?: string }
+				expect(nativeArgs.url).toBe("https://example.com")
+				expect(nativeArgs.prompt).toBe("Find info")
+			})


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Exercise fragmented streaming chunks here.

These cases only pass a complete JSON payload in one processStreamingChunk() call, so they never hit the accumulator/partial-JSON behavior in src/core/assistant-message/NativeToolCallParser.ts:250-288. A bug that only appears when url/prompt arrive across multiple chunks would still pass this suite.

As per coding guidelines, "Use package-local unit tests for pure logic, parsing, state transitions, validation, serialization, request construction, retry decisions, and error handling."

Also applies to: 423-468

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/core/assistant-message/__tests__/NativeToolCallParser.spec.ts` around lines 376 - 389, The NativeToolCallParser streaming tests only cover a complete JSON payload in one call, so they miss accumulator and partial-JSON behavior in NativeToolCallParser.processStreamingChunk. Update the fetch_web_content and related streaming specs in NativeToolCallParser.spec.ts to split url and prompt across multiple fragmented chunks, asserting intermediate partial results and final assembled nativeArgs so the accumulator logic in startStreamingToolCall/processStreamingChunk is exercised. Use the existing NativeToolCallParser and processStreamingChunk test cases as the place to add these multi-chunk assertions.

Source: Coding guidelines

coderabbitai · 2026-06-28T19:23:54Z

+		// Only allow http and https protocols
+		if (!["http:", "https:"].includes(parsedUrl.protocol)) {
+			task.consecutiveMistakeCount++
+			task.recordToolError("fetch_web_content")
+			task.didToolFailInCurrentTurn = true
+			pushToolResult(
+				formatResponse.toolError(`Invalid protocol: ${parsedUrl.protocol}. Only http and https are supported.`),
+			)
+			return
+		}
+
+		task.consecutiveMistakeCount = 0
+
+		// Build the approval message
+		const sharedMessageProps: ClineSayTool = {
+			tool: "fetchWebContent",
+			url: url,
+		}
+
+		const completeMessage = JSON.stringify(sharedMessageProps satisfies ClineSayTool)
+		const didApprove = await askApproval("tool", completeMessage)
+
+		if (!didApprove) {
+			return
+		}
+
+		// Execute the fetch
+		const controller = new AbortController()
+		const timeout = setTimeout(() => controller.abort(), DEFAULT_TIMEOUT_MS)
+
+		try {
+			const response = await fetch(url, {
+				method: "GET",
+				headers: {
+					"User-Agent": "Mozilla/5.0 (compatible; ZooCode/1.0.0)",
+					Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,*/*;q=0.7",
+					"Accept-Language": "en-US,en;q=0.9",
+				},
+				redirect: "follow",
+				signal: controller.signal,
+			})


🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

Block private-network and metadata targets before fetching.

The current validation only checks scheme, so this still permits SSRF to localhost, RFC1918/private ranges, link-local addresses, and cloud metadata endpoints, and then returns that response body back into the chat. That is especially risky here because src/core/auto-approval/tools.ts now treats fetchWebContent as read-only, so auto-approve can bypass the manual consent gate entirely. Deny internal/loopback/link-local targets, resolve and re-check every redirect destination, and keep this tool out of read-only auto-approval until those checks exist.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/core/tools/FetchWebContentTool.ts` around lines 175 - 215, The FetchWebContentTool validation currently only blocks non-http(s) schemes, so it still allows SSRF targets like localhost, private/link-local IPs, and metadata endpoints; add host-target filtering in FetchWebContentTool before the fetch call and in the redirect path. Use the existing url/parsedUrl handling in FetchWebContentTool to resolve the hostname, reject internal or loopback destinations, and re-check each redirect destination before following it. Also update src/core/auto-approval/tools.ts so fetchWebContent is not treated as read-only auto-approval until these network-safety checks are in place.

coderabbitai · 2026-06-28T19:23:54Z

+describe("ChatRow - fetchWebContent tool", () => {
+	beforeEach(() => {
+		vi.clearAllMocks()
+	})
+
+	it("should display fetchWebContent ask message with URL", () => {
+		const message: any = {
+			type: "ask",
+			ask: "tool",
+			ts: Date.now(),
+			text: JSON.stringify({
+				tool: "fetchWebContent",
+				url: "https://example.com",
+			}),
+			partial: false,
+		}
+
+		renderChatRowWithProviders(message)
+
+		expect(screen.getByText("Zoo wants to fetch web content")).toBeInTheDocument()
+		expect(screen.getByText("https://example.com")).toBeInTheDocument()
+	})
+
+	it("should display the Globe icon for fetchWebContent", () => {
+		const message: any = {
+			type: "ask",
+			ask: "tool",
+			ts: Date.now(),
+			text: JSON.stringify({
+				tool: "fetchWebContent",
+				url: "https://docs.example.com/api",
+			}),
+			partial: false,
+		}
+
+		renderChatRowWithProviders(message)
+
+		expect(screen.getByLabelText("Web fetch icon")).toBeInTheDocument()
+	})
+
+	it("should display the URL in the tool use block", () => {
+		const message: any = {
+			type: "ask",
+			ask: "tool",
+			ts: Date.now(),
+			text: JSON.stringify({
+				tool: "fetchWebContent",
+				url: "https://api.github.com/repos/owner/repo",
+			}),
+			partial: false,
+		}
+
+		renderChatRowWithProviders(message)
+
+		expect(screen.getByText("https://api.github.com/repos/owner/repo")).toBeInTheDocument()
+	})
+
+	it("should not return null for fetchWebContent tool (regression test)", () => {
+		const message: any = {
+			type: "ask",
+			ask: "tool",
+			ts: Date.now(),
+			text: JSON.stringify({
+				tool: "fetchWebContent",
+				url: "https://www.delfi.lt",
+			}),
+			partial: false,
+		}
+
+		const { container } = renderChatRowWithProviders(message)
+
+		// The container should have rendered content (not null)
+		expect(container.innerHTML).not.toBe("")
+		expect(screen.getByText("Zoo wants to fetch web content")).toBeInTheDocument()
+		expect(screen.getByText("https://www.delfi.lt")).toBeInTheDocument()
+	})
+})


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add a say/tool coverage case for the completed fetch state.

Every test here uses ask/tool, so the didFetch path is untested. A local Vitest case for type: "say", say: "tool" would catch the current regression and matches the webview testing guidance. As per coding guidelines, "Prefer local webview-ui tests for React/webview behavior such as component rendering, local state, hooks, form dirty-state, validation, or prop wiring. Add or update Vitest coverage under webview-ui/src/**/__tests__ instead of reaching for apps/vscode-e2e."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@webview-ui/src/components/chat/__tests__/ChatRow.fetch-web-content.spec.tsx` around lines 60 - 136, Add a Vitest case in ChatRow.fetch-web-content.spec.tsx to cover the completed fetch state handled by ChatRow’s fetchWebContent rendering path. Right now the suite only exercises type: "ask" with ask: "tool", so the didFetch branch for type: "say" and say: "tool" remains untested; add a local render assertion that the completed web content UI still shows the expected URL/message. Use the existing ChatRow test helpers and keep the coverage under webview-ui/src/components/chat/__tests__ rather than adding an e2e test.

Source: Coding guidelines

coderabbitai · 2026-06-28T19:23:54Z

+			case "fetchWebContent":
+				return (
+					<>
+						<div style={headerStyle}>
+							<Globe className="w-4 shrink-0" aria-label="Web fetch icon" />
+							<span style={{ fontWeight: "bold" }}>
+								{message.type === "ask"
+									? t("chat:webFetch.wantsToFetch")
+									: t("chat:webFetch.didFetch")}


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

didFetch is unreachable here.

This branch sits behind tool, but tool is only parsed for message.ask === "tool". say/tool fetch events still go through the later message.type === "say" switch, whose case "tool" does not render fetchWebContent, so completed fetches disappear from the timeline.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@webview-ui/src/components/chat/ChatRow.tsx` around lines 1019 - 1027, The `didFetch` branch in `ChatRow` is unreachable because `fetchWebContent` is handled only in the `message.ask === "tool"` path, while completed fetch events are routed through the later `message.type === "say"` switch and never rendered by `case "tool"`. Update the message routing so `fetchWebContent` is handled in the `message.type === "say"` tool-rendering flow as well, or otherwise make the `fetchWebContent` case reachable for completed tool fetches. Use the existing `case "fetchWebContent"` and `case "tool"` branches in `ChatRow` to keep the want-to-fetch/did-fetch states aligned.

edelauna

So in reviewing this I noticed a couple implementation/security details we should consider, but I also just realized - how is this feature any different than asking the agent to just curl a website?

I wonder if instead this feature should be more stand alone, and a type of structured web parser.

Based on this article: https://mikhail.io/2025/10/claude-code-web-tools/ it seems like Claude fetches a page, converts it to markdown then has a different model go through and parse the contents? I wonder if this feature should be similar to the code embeddings feature where a user can specify a provider and model that will perform a structured parsing of a web fetch?

edelauna · 2026-06-29T02:39:08Z

 		"searchFiles",
 		"codebaseSearch",
 		"runSlashCommand",
+		"fetchWebContent",


Does grouping an outbound network fetch with file reads make sense here? With alwaysAllowReadOnly on, every fetch_web_content call auto-approves with no user prompt. I think network calls warrant their own opt-in flag, or a setting to always require explicit approval?

edelauna · 2026-06-29T02:39:08Z

+				signal: controller.signal,
+			})
+
+			clearTimeout(timeout)


The timeout is cancelled here as soon as headers arrive, so the body-read loop below runs with no deadline. A server that sends headers quickly but streams the body slowly could hold the connection open indefinitely — the 30s limit only covers time-to-first-byte. Should the timer stay active through the full read?

edelauna · 2026-06-29T02:39:08Z

+					Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,text/plain;q=0.8,*/*;q=0.7",
+					"Accept-Language": "en-US,en;q=0.9",
+				},
+				redirect: "follow",


With redirect: "follow", a server can redirect to a private IP or cloud metadata endpoint (169.254.169.254, localhost, etc.) and the redirect destination is never validated — only the original URL was checked. Would limiting redirects to the same host work here?

Suggested change

redirect: "follow",

redirect: "manual",

(Then check response.status >= 300 && response.status < 400 and only follow if new URL(response.headers.get('location')!, url).origin === new URL(url).origin)

edelauna · 2026-06-29T02:39:08Z

+					content = text
+				}
+			} else {
+				content = text


If the server responds with image/jpeg, application/pdf, or another binary type (the Accept header allows */*), this falls through and returns the raw decoded bytes to the model as text. Should binary MIME types be rejected before attempting to decode?

edelauna · 2026-06-29T02:39:08Z

+				`Size: ${totalSize} bytes`,
+				``,
+				`--- Content ---`,
+				content.slice(0, MAX_CONTENT_CHARS),


Web content lands directly in the model's context with no trust boundary marker. If a page contains visible text like "Ignore all previous instructions...", it passes through as-is. Would wrapping this in something like <untrusted_web_content source="..."> (and noting in the tool description that content inside this tag is third-party data, not instructions) help?

edelauna · 2026-06-29T02:39:09Z

+			}
+
+			if (prompt) {
+				outputLines.push(``, `--- Analysis Request ---`, `Prompt: ${prompt}`)


The user's analysis prompt arrives after up to 50k chars of untrusted content. A malicious page could also embed --- Analysis Request --- in its visible text before the real one appears. Would putting the prompt instruction before the content block help with both ordering and injection resistance?

edelauna · 2026-06-29T02:39:09Z

+			clearTimeout(timeout)
+
+			if (!response.ok) {
+				pushToolResult(formatResponse.toolError(`HTTP ${response.status}: ${response.statusText}`))


Other tools (e.g. GenerateImageTool) call task.say("error", ...) before pushToolResult so the error surfaces as a visible bubble in chat — without it, only the model sees this. Same gap exists at the size-limit path a bit further down.

edelauna · 2026-06-29T02:39:09Z

+			// Process content based on type
+			let content: string
+			if (contentType.includes("text/html") || contentType.includes("application/xhtml")) {
+				content = htmlToText(text)


cheerio.load() is synchronous. For responses near the 5 MB limit, parsing can block the extension host event loop for hundreds of milliseconds, freezing IntelliSense and file watchers. Would slicing the HTML to a smaller cap before parsing (e.g. 500 KB) be a reasonable trade-off, since MAX_CONTENT_CHARS will truncate the output anyway?

p12tic · 2026-06-30T14:00:21Z

@edelauna Thanks for quick review. I should've market the PR as RFC, right now it probably used more time than necessary for review, as I just wanted to understand whether this PR makes sense in principle.

So in reviewing this I noticed a couple implementation/security details we should consider, but I also just realized - how is this feature any different than asking the agent to just curl a website?

It also extracts text from the page, so not entirely same as it makes life for the model easier. Important with stupid open-source models. Claude will be able to extract stuff from any format.

Another difference is regarding permissions, it makes sense to allow web fetch to e.g. architect to make better decisions.

I wonder if instead this feature should be more stand alone, and a type of structured web parser.

Based on this article: https://mikhail.io/2025/10/claude-code-web-tools/ it seems like Claude fetches a page, converts it to markdown then has a different model go through and parse the contents? I wonder if this feature should be similar to the code embeddings feature where a user can specify a provider and model that will perform a structured parsing of a web fetch?

Agreed, this makes sense. Would you accept simple web_fetch_content tool as a first step and then add LLM-based parsing on top later. I plan to implement a web search tool as well, so the web access capabilities will improve gradually anyway.

navedmerchant · 2026-07-02T05:20:08Z

I propose we use this library, I did an audit of the code using LLMs and its pretty well engineered. https://github.com/teng-lin/agent-fetch. It is also in JS, and outputs markdown directly. @p12tic @edelauna . Can you guys also take a look?

p12tic force-pushed the web-fetch-content-tool branch 3 times, most recently from 7599218 to fc37493 Compare June 27, 2026 19:38

p12tic force-pushed the web-fetch-content-tool branch 2 times, most recently from 0767e1d to 6b7c05a Compare June 27, 2026 21:00

p12tic added 6 commits June 28, 2026 22:13

feat: register fetch_web_content tool and web group in type system

490cf05

feat: add fetch_web_content native tool definition

f1a4235

feat: implement FetchWebContentTool with tests

d89ef6b

feat: wire fetch_web_content into execution dispatcher

7f2248f

feat: add fetch_web_content timeline rendering in ChatRow

ec01149

Update translations

ba0ca59

p12tic force-pushed the web-fetch-content-tool branch from 6b7c05a to ba0ca59 Compare June 28, 2026 19:13

p12tic marked this pull request as ready for review June 28, 2026 19:16

p12tic requested review from JamesRobert20, edelauna, hannesrudolph, navedmerchant and taltas as code owners June 28, 2026 19:16

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

edelauna requested changes Jun 29, 2026

View reviewed changes

github-actions Bot added the awaiting-author PR is waiting for the author to address requested changes label Jun 29, 2026

p12tic changed the title ~~web_fetch_content tool~~ WIP: web_fetch_content tool Jun 30, 2026

Uh oh!

Conversation

p12tic commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Description

Test Procedure

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

p12tic commented Jun 28, 2026

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

edelauna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

p12tic commented Jun 30, 2026

Uh oh!

navedmerchant commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

p12tic commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

codecov Bot commented Jun 27, 2026 •

edited

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading