fix: prevent agent loop stall on write_to_file filesystem errors (#703)#727
fix: prevent agent loop stall on write_to_file filesystem errors (#703)#727awschmeder wants to merge 2 commits into
Conversation
…oo-Code-Org#703) - Remove unguarded createDirectoriesForFile call from handlePartial; the call was a redundant optimization (execute() already creates dirs before open()) and its unguarded throw caused the partial-block advancement gate in presentAssistantMessage to be skipped, permanently stalling the agent loop - Move createDirectoriesForFile in execute() inside the try block so EROFS/ EACCES errors route through handleError with diffViewProvider.reset() cleanup and consecutive-mistake counting, rather than escaping unhandled - Add regression tests covering both failure paths
…_file filesystem failure When write_to_file hits a filesystem error (EROFS/EACCES) the streaming phase left the "Zoo wants to edit this file" spinner running, surfaced the same error twice (handlePartial + execute), and spawned a new partial tool message on every subsequent streaming delta. - Add Task.finalizePartialToolAsk() to finalize a partial tool ask without blocking on user input, dismissing the spinner. - handlePartial swallows streaming filesystem errors (after finalizing the spinner and resetting the diff view) so only the authoritative execute() error is reported, eliminating the duplicate error bubble. - Track partialStreamFailed so later streaming deltas short-circuit instead of re-attempting and spawning repeated partial tool messages. - Add regression tests for spinner finalization, single-error reporting, and no repeated partial messages.
📝 WalkthroughWalkthrough
ChangesPartial write-to-file error recovery
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/core/task/Task.ts`:
- Around line 1739-1745: Persist the finalized tool-ask state in
finalizePartialToolAsk by updating the underlying clineMessages entry, not just
the webview. After clearing the partial flag on the last ask/tool message, make
sure the task state is saved through the same persistence path used by
task.ask(..., false) in EditFileTool so the finalized message is written before
any reload/resume. Use the existing updateClineMessage flow only as the UI
update, and add the missing persistence step around finalizePartialToolAsk and
the clineMessages mutation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fe35d544-0a5c-4962-a19e-ead7a0d45383
📒 Files selected for processing (3)
src/core/task/Task.tssrc/core/tools/WriteToFileTool.tssrc/core/tools/__tests__/writeToFileTool.spec.ts
| async finalizePartialToolAsk(): Promise<void> { | ||
| const lastMessage = this.clineMessages.at(-1) | ||
|
|
||
| if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") { | ||
| lastMessage.partial = false | ||
| await this.updateClineMessage(lastMessage) | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win
Persist the finalized tool-ask state.
Line 1744 only calls updateClineMessage(), which updates the webview but does not save this.clineMessages. The existing finalization path in src/core/tools/EditFileTool.ts:150-171 goes through task.ask(..., false), which persists the row first. If a swallowed partial-stream failure is followed by a reload/resume before another save happens, this message is still stored as partial: true and the spinner can come back.
Suggested fix
async finalizePartialToolAsk(): Promise<void> {
const lastMessage = this.clineMessages.at(-1)
if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") {
lastMessage.partial = false
+ await this.saveClineMessages()
await this.updateClineMessage(lastMessage)
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async finalizePartialToolAsk(): Promise<void> { | |
| const lastMessage = this.clineMessages.at(-1) | |
| if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") { | |
| lastMessage.partial = false | |
| await this.updateClineMessage(lastMessage) | |
| } | |
| async finalizePartialToolAsk(): Promise<void> { | |
| const lastMessage = this.clineMessages.at(-1) | |
| if (lastMessage && lastMessage.partial && lastMessage.type === "ask" && lastMessage.ask === "tool") { | |
| lastMessage.partial = false | |
| await this.saveClineMessages() | |
| await this.updateClineMessage(lastMessage) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/core/task/Task.ts` around lines 1739 - 1745, Persist the finalized
tool-ask state in finalizePartialToolAsk by updating the underlying
clineMessages entry, not just the webview. After clearing the partial flag on
the last ask/tool message, make sure the task state is saved through the same
persistence path used by task.ask(..., false) in EditFileTool so the finalized
message is written before any reload/resume. Use the existing updateClineMessage
flow only as the UI update, and add the missing persistence step around
finalizePartialToolAsk and the clineMessages mutation.
write-to-file-error-correct.mov |
edelauna
left a comment
There was a problem hiding this comment.
Thanks for adding this. Since we're touching I/O functionality can we harden the code a bit? I left some comments.
| task.consecutiveMistakeCount = 0 | ||
|
|
||
| // Create parent directories for new files inside the try block so filesystem | ||
| // errors (EROFS, EACCES, etc.) route through handleError with proper cleanup | ||
| // and consecutive-mistake counting, rather than escaping unhandled. | ||
| if (!fileExists) { | ||
| await createDirectoriesForFile(absolutePath) |
There was a problem hiding this comment.
The mistake counter is zeroed on line 107 before the call that can throw on line 113. When createDirectoriesForFile rejects with EROFS, handleError only calls cline.say("error", ...) and pushes a tool result -- it does not re-increment consecutiveMistakeCount. Net effect: a permanently read-only path zeros the runaway-loop guard on every attempt with no increment. Should consecutiveMistakeCount = 0 move to after this call succeeds (e.g. just before diffViewProvider.open at line 152)?
| * doomed open()/update() retry, which would otherwise create a fresh "Zoo wants to edit | ||
| * this file" message on every delta. Cleared by resetPartialState() between invocations. | ||
| */ | ||
| private partialStreamFailed = false |
There was a problem hiding this comment.
writeToFileTool is a module-level singleton. partialStreamFailed is instance state on that one object. The presentAssistantMessageLocked guard only prevents concurrent execution within a single Task -- two simultaneous sessions (or a parent task + subtask) can call writeToFileTool.handle() concurrently. If Task A sets this flag due to an EROFS error, Task B's next streaming delta silently short-circuits even though its write is fine. The pre-existing lastSeenPartialPath in BaseTool has the same architectural risk, but this flag has more severe consequences (actively suppresses UI vs. a momentarily stale path). Worth documenting in the JSDoc at minimum, or using a per-taskId map?
| * spinner does not get stuck in a loading state. | ||
| */ | ||
| async finalizePartialToolAsk(): Promise<void> { | ||
| const lastMessage = this.clineMessages.at(-1) |
There was a problem hiding this comment.
The target is identified by position (at(-1)) with no correlation to the specific partial ask opened during streaming. Is there any async path between the task.ask("tool", ..., block.partial) call in handlePartial and the catch block that could push a new message (e.g. processQueuedMessages)? If so, at(-1) could silently no-op and leave the spinner stuck.
| } | ||
| mockCline.say = vi.fn().mockResolvedValue(undefined) | ||
| mockCline.ask = vi.fn().mockResolvedValue(undefined) | ||
| mockCline.finalizePartialToolAsk = vi.fn().mockResolvedValue(undefined) |
There was a problem hiding this comment.
The mock replaces the full implementation, so these tests verify the method is called but never that lastMessage.partial actually becomes false. A regression removing that mutation from the real implementation would pass. Would a thin direct test on Task.finalizePartialToolAsk with a real clineMessages array be worth adding?
Related GitHub Issue
Closes: #703
Description
The bug (#703): When the model uses
write_to_fileto create a file whose parent directory cannot be created or written (e.g. a read-only mount or a path under a directory lacking write permission, producingEROFS/EACCES), the agent loop stalled permanently.handlePartialcalledcreateDirectoriesForFilewithout a.catch()guard. During streaming (block.partial === true) the unguarded throw was caught byBaseTool.handle->handleError, but the partial-block advancement gate inpresentAssistantMessagewas skipped (block.partialwas stilltrueand neitherdidRejectToolnordidAlreadyUseToolwas set), souserMessageContentReadywas never set and the task hung forever.The fix:
createDirectoriesForFilecall fromhandlePartial(execute()already creates parent directories beforediffViewProvider.open()).createDirectoriesForFileinexecute()inside thetryblock soEROFS/EACCESerrors route throughhandleErrorwithdiffViewProvider.reset()cleanup and consecutive-mistake counting instead of escaping unhandled.Correct streaming UI behavior under a filesystem error. Because the failure now occurs while a partial
write_to_fileblock is still streaming, the implementation is designed so the in-progress UI resolves cleanly rather than leaving artifacts. The following are properties of the implementation:Task.finalizePartialToolAsk()(a non-blocking helper that sets the last partialtoolask topartial: false) dismisses it. It deliberately avoidstask.ask(..., false), which would fall through to a blockingpWaitForthat would result in a non-terminating spinner left in the chat UI.handlePartialfinalizes the partial ask, resets the diff view, and swallows the streaming error (logged to the console) so only the authoritative non-partialexecute()path surfaces it to the user. This is safe because the loop advancement gate does not depend on the streaming throw -- it advances when the non-partial block arrives.partialStreamFailedflag (cleared between invocations viaresetPartialState()) short-circuits later deltas so they do not re-attemptopen()/update()and spawn a new partial tool message each time.Test Procedure
Automated (run from
src/):31 tests pass, including new regression tests that assert:
EROFSinhandlePartialdoes not callcreateDirectoriesForFileand does not stall.EROFS/EACCESinexecute()routes throughhandleErrorwithdiffViewProvider.reset()and does not proceed to open/save.handlePartialopen()/update()throws,finalizePartialToolAsk()anddiffViewProvider.reset()are called and the streaming error is swallowed (not surfaced viahandleError).execute()phases.open().npx tsc --noEmitpasses cleanly.Manual (Extension Development Host / F5):
Pre-Submission Checklist
Documentation Updates
Summary by CodeRabbit