Files
leo-claude-mktplace/plugins/projman/skills/runaway-detection.md
lmiranda 2e65b60725 refactor(projman): extract skills and consolidate commands
Major refactoring of projman plugin architecture:

Skills Extraction (17 new files):
- Extracted reusable knowledge from commands and agents into skills/
- branch-security, dependency-management, git-workflow, input-detection
- issue-conventions, lessons-learned, mcp-tools-reference, planning-workflow
- progress-tracking, repo-validation, review-checklist, runaway-detection
- setup-workflows, sprint-approval, task-sizing, test-standards, wiki-conventions

Command Consolidation (17 → 12 commands):
- /setup: consolidates initial-setup, project-init, project-sync (--full/--quick/--sync)
- /debug: consolidates debug-report, debug-review (report/review modes)
- /test: consolidates test-check, test-gen (run/gen modes)
- /sprint-status: absorbs sprint-diagram via --diagram flag

Architecture Cleanup:
- Remove plugin-level mcp-servers/ symlinks (6 plugins)
- Remove plugin README.md files (12 files, ~2000 lines)
- Update all documentation to reflect new command structure
- Fix documentation drift in CONFIGURATION.md, COMMANDS-CHEATSHEET.md

Commands are now thin dispatchers (~20-50 lines) that reference skills.
Agents reference skills for domain knowledge instead of inline content.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:02:16 -05:00

157 lines
3.4 KiB
Markdown

---
name: runaway-detection
description: Detecting and handling stuck agents
---
# Runaway Detection
## Purpose
Defines how to detect stuck agents and intervention protocols.
## When to Use
- **Orchestrator agent**: When monitoring dispatched agents
- **Executor agent**: Self-monitoring during execution
---
## Warning Signs
| Sign | Threshold | Action |
|------|-----------|--------|
| No progress comment | 30+ minutes | Investigate |
| Same phase repeated | 20+ tool calls | Consider stopping |
| Same error 3+ times | Immediately | Stop agent |
| Approaching budget | 80% of limit | Post checkpoint |
---
## Agent Timeout Guidelines
| Task Size | Expected Duration | Intervention Point |
|-----------|-------------------|-------------------|
| XS | ~5-10 min | 15 min no progress |
| S | ~10-20 min | 30 min no progress |
| M | ~20-40 min | 45 min no progress |
---
## Detection Protocol
1. **Read latest progress comment** - Check tool call count and phase
2. **Compare to previous** - Is progress happening?
3. **Check for error patterns** - Same error repeating?
4. **Evaluate time elapsed** - Beyond expected duration?
---
## Intervention Protocol
When you detect an agent may be stuck:
### Step 1: Assess
```
Agent Status Check for #45:
- Last progress: 25 minutes ago
- Phase: "Testing" (same as 20 tool calls ago)
- Errors: "ModuleNotFoundError" (3 times)
- Assessment: LIKELY STUCK
```
### Step 2: Stop Agent
```python
# If TaskStop available
TaskStop(task_id="agent-id")
```
### Step 3: Update Issue Status
```python
update_issue(
repo="org/repo",
issue_number=45,
labels=["Status/Failed", ...other_labels]
)
```
### Step 4: Add Explanation Comment
```python
add_comment(
repo="org/repo",
number=45,
body="""## Agent Intervention
**Reason:** No progress detected for 25 minutes / repeated errors
**Last Status:** Testing phase, ModuleNotFoundError x3
**Action:** Stopped agent, requires human review
### What Was Completed
- [x] Created auth/jwt_service.py
- [x] Implemented generate_token()
### What Remains
- [ ] Fix import issue
- [ ] Write tests
- [ ] Commit
### Recommendation
- Check for missing dependency in requirements.txt
- May need manual intervention to resolve import
"""
)
```
---
## Self-Monitoring (Executor)
Executors should self-monitor:
### Circuit Breakers
- **Same error 3 times**: Stop and report
- **80% of tool call budget**: Post checkpoint
- **File not found 3 times**: Stop and ask for help
- **Test failing same way 5 times**: Stop and report
### Self-Check Template
```
Self-check at tool call 45/100:
- Progress: 4/7 steps completed
- Current phase: Testing
- Errors encountered: 1 (resolved)
- Remaining budget: 55 calls
- Status: ON TRACK
```
---
## Recovery Actions
After stopping a stuck agent:
1. **Preserve work** - Branch and commits remain
2. **Document state** - Checkpoint in issue comment
3. **Identify cause** - What caused the loop?
4. **Plan recovery**:
- Manual completion
- Different approach
- Break down further
- Assign to human
---
## Common Stuck Patterns
| Pattern | Cause | Solution |
|---------|-------|----------|
| Import loop | Missing dependency | Add to requirements |
| Test loop | Non-deterministic test | Fix test isolation |
| Validation loop | Error message not changing | Improve error specificity |
| File not found | Wrong path | Verify path exists |
| Permission denied | File ownership | Check permissions |