Sprint 3 - Agent Runaway Detection and Timeout Handling

Metadata

Implementation: Change V5.2.0: Plugin Enhancements Proposal (Sprint 3 Hooks)
Issues: #225, #226, #227, #228, #229, [Sprint 3] feat: Implement breaking change detection for contract-validator (#230)
Sprint: Sprint 3

Context

Background agents were spawned to implement hook functionality. Some agents ran for extended periods without completing.

Problem

Agents ran 400+ tool calls over approximately 1 hour without completing their tasks. They got stuck in loops or kept exploring tangential paths instead of completing the core implementation. Manual intervention was required to stop them and commit the partial work.

Solution

Stopped the runaway agents manually
Reviewed what work had been completed
Committed the completed portions
Finished remaining work in the main session

Prevention

Agent design best practices:

Give agents NARROW, SPECIFIC tasks (not broad "implement feature X")
Include explicit completion criteria in the agent prompt
Set maximum tool call limits when possible
Break large tasks into smaller subtasks with checkpoints

Monitoring:

Check agent progress periodically (every 15-20 minutes)
If agent exceeds 100 tool calls, review if it's making progress
Look for repetitive patterns (same files being read/edited repeatedly)
Be ready to intervene and salvage partial work

Task scoping:

BAD: "Implement the vagueness detection hook"
GOOD: "Create hooks/hooks.json with a UserPromptSubmit hook that runs detect-vagueness.sh"