feat(projman): add checkpoint/resume for interrupted agent work (#237)
Executor checkpointing: - Standard checkpoint comment format with branch, commit, phase - Files modified with status (created, modified) - Completed and pending steps tracking - State notes for resumption context - Save checkpoint after major steps, before stopping Orchestrator resume detection: - Scan issue comments for "## Checkpoint" markers - Offer resume options: resume, start fresh, review details - Verify branch exists and files match before resuming - Dispatch executor with checkpoint context Sprint-start integration: - Checkpoint detection as first workflow step - Resume flow documentation with example - Checkpoint format specification This enables resuming work after: - Budget exhaustion (100 tool call limit) - Agent failure/circuit breaker - Manual interruption - Session timeout Closes #237 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -424,6 +424,81 @@ As the executor, you interact with MCP tools for status updates:
|
|||||||
- Apply best practices
|
- Apply best practices
|
||||||
- Deliver quality work
|
- Deliver quality work
|
||||||
|
|
||||||
|
## Checkpointing (Save Progress for Resume)
|
||||||
|
|
||||||
|
**CRITICAL: Save checkpoints so work can be resumed if interrupted.**
|
||||||
|
|
||||||
|
**Checkpoint Comment Format:**
|
||||||
|
```markdown
|
||||||
|
## Checkpoint
|
||||||
|
**Branch:** feat/45-jwt-service
|
||||||
|
**Commit:** abc123 (or "uncommitted")
|
||||||
|
**Phase:** [current phase]
|
||||||
|
**Tool Calls:** 45
|
||||||
|
|
||||||
|
### Files Modified
|
||||||
|
- auth/jwt_service.py (created)
|
||||||
|
- tests/test_jwt.py (created)
|
||||||
|
|
||||||
|
### Completed Steps
|
||||||
|
- [x] Created jwt_service.py skeleton
|
||||||
|
- [x] Implemented generate_token()
|
||||||
|
- [x] Implemented verify_token()
|
||||||
|
|
||||||
|
### Pending Steps
|
||||||
|
- [ ] Write unit tests
|
||||||
|
- [ ] Add token refresh logic
|
||||||
|
- [ ] Commit and push
|
||||||
|
|
||||||
|
### State Notes
|
||||||
|
[Any important context for resumption]
|
||||||
|
```
|
||||||
|
|
||||||
|
**When to Save Checkpoints:**
|
||||||
|
- After completing each major step (every 20-30 tool calls)
|
||||||
|
- Before stopping due to budget limit
|
||||||
|
- When encountering a blocker
|
||||||
|
- After any commit
|
||||||
|
|
||||||
|
**Checkpoint Example:**
|
||||||
|
```
|
||||||
|
add_comment(
|
||||||
|
issue_number=45,
|
||||||
|
body="""## Checkpoint
|
||||||
|
**Branch:** feat/45-jwt-service
|
||||||
|
**Commit:** uncommitted (changes staged)
|
||||||
|
**Phase:** Testing
|
||||||
|
**Tool Calls:** 67
|
||||||
|
|
||||||
|
### Files Modified
|
||||||
|
- auth/jwt_service.py (created, 120 lines)
|
||||||
|
- auth/__init__.py (modified, added import)
|
||||||
|
- tests/test_jwt.py (created, 50 lines, incomplete)
|
||||||
|
|
||||||
|
### Completed Steps
|
||||||
|
- [x] Created auth/jwt_service.py
|
||||||
|
- [x] Implemented generate_token() with HS256
|
||||||
|
- [x] Implemented verify_token()
|
||||||
|
- [x] Updated auth/__init__.py exports
|
||||||
|
|
||||||
|
### Pending Steps
|
||||||
|
- [ ] Complete test_jwt.py (5 tests remaining)
|
||||||
|
- [ ] Add token refresh logic
|
||||||
|
- [ ] Commit changes
|
||||||
|
- [ ] Push to remote
|
||||||
|
|
||||||
|
### State Notes
|
||||||
|
- Using PyJWT 2.8.0
|
||||||
|
- Secret key from JWT_SECRET env var
|
||||||
|
- Tests use pytest fixtures in conftest.py
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Checkpoint on Interruption:**
|
||||||
|
|
||||||
|
If you must stop (budget, failure, blocker), ALWAYS post a checkpoint FIRST.
|
||||||
|
|
||||||
## Runaway Detection (Self-Monitoring)
|
## Runaway Detection (Self-Monitoring)
|
||||||
|
|
||||||
**CRITICAL: Monitor yourself to prevent infinite loops and wasted resources.**
|
**CRITICAL: Monitor yourself to prevent infinite loops and wasted resources.**
|
||||||
|
|||||||
@@ -93,7 +93,44 @@ git branch --show-current
|
|||||||
|
|
||||||
**Workflow:**
|
**Workflow:**
|
||||||
|
|
||||||
**A. Fetch Sprint Issues**
|
**A. Fetch Sprint Issues and Detect Checkpoints**
|
||||||
|
```
|
||||||
|
list_issues(state="open", labels=["sprint-current"])
|
||||||
|
```
|
||||||
|
|
||||||
|
**For each open issue, check for checkpoint comments:**
|
||||||
|
```
|
||||||
|
get_issue(issue_number=45) # Comments included
|
||||||
|
→ Look for comments containing "## Checkpoint"
|
||||||
|
```
|
||||||
|
|
||||||
|
**If Checkpoint Found:**
|
||||||
|
```
|
||||||
|
Checkpoint Detected for #45
|
||||||
|
|
||||||
|
Found checkpoint from previous session:
|
||||||
|
Branch: feat/45-jwt-service
|
||||||
|
Phase: Testing
|
||||||
|
Tool Calls: 67
|
||||||
|
Files Modified: 3
|
||||||
|
Completed: 4/7 steps
|
||||||
|
|
||||||
|
Options:
|
||||||
|
1. Resume from checkpoint (recommended)
|
||||||
|
2. Start fresh (discard previous work)
|
||||||
|
3. Review checkpoint details first
|
||||||
|
|
||||||
|
Would you like to resume?
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resume Protocol:**
|
||||||
|
1. Verify branch exists: `git branch -a | grep feat/45-jwt-service`
|
||||||
|
2. Switch to branch: `git checkout feat/45-jwt-service`
|
||||||
|
3. Verify files match checkpoint
|
||||||
|
4. Dispatch executor with checkpoint context
|
||||||
|
5. Executor continues from pending steps
|
||||||
|
|
||||||
|
**B. Fetch Sprint Issues (Standard)**
|
||||||
```
|
```
|
||||||
list_issues(state="open", labels=["sprint-current"])
|
list_issues(state="open", labels=["sprint-current"])
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -25,7 +25,12 @@ If you are on a production or staging branch, you MUST stop and ask the user to
|
|||||||
|
|
||||||
The orchestrator agent will:
|
The orchestrator agent will:
|
||||||
|
|
||||||
1. **Fetch Sprint Issues**
|
1. **Detect Checkpoints (Resume Support)**
|
||||||
|
- Check each open issue for `## Checkpoint` comments
|
||||||
|
- If checkpoint found, offer to resume from that point
|
||||||
|
- Resume preserves: branch, completed work, pending steps
|
||||||
|
|
||||||
|
2. **Fetch Sprint Issues**
|
||||||
- Use `list_issues` to fetch open issues for the sprint
|
- Use `list_issues` to fetch open issues for the sprint
|
||||||
- Identify priorities based on labels (Priority/Critical, Priority/High, etc.)
|
- Identify priorities based on labels (Priority/Critical, Priority/High, etc.)
|
||||||
|
|
||||||
@@ -300,6 +305,61 @@ Batch 2 (now unblocked):
|
|||||||
Starting #46 while #48 continues...
|
Starting #46 while #48 continues...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Checkpoint Resume Support
|
||||||
|
|
||||||
|
If a previous session was interrupted (agent stopped, failure, budget exhausted), checkpoints enable resumption.
|
||||||
|
|
||||||
|
**Checkpoint Detection:**
|
||||||
|
The orchestrator scans issue comments for `## Checkpoint` markers containing:
|
||||||
|
- Branch name
|
||||||
|
- Last commit hash
|
||||||
|
- Completed/pending steps
|
||||||
|
- Files modified
|
||||||
|
|
||||||
|
**Resume Flow:**
|
||||||
|
```
|
||||||
|
User: /sprint-start
|
||||||
|
|
||||||
|
Orchestrator: Checking for checkpoints...
|
||||||
|
|
||||||
|
Found checkpoint for #45 (JWT service):
|
||||||
|
Branch: feat/45-jwt-service
|
||||||
|
Last activity: 2 hours ago
|
||||||
|
Progress: 4/7 steps completed
|
||||||
|
Pending: Write tests, add refresh, commit
|
||||||
|
|
||||||
|
Options:
|
||||||
|
1. Resume from checkpoint (recommended)
|
||||||
|
2. Start fresh (lose previous work)
|
||||||
|
3. Review checkpoint details
|
||||||
|
|
||||||
|
User: 1
|
||||||
|
|
||||||
|
Orchestrator: Resuming #45 from checkpoint...
|
||||||
|
✓ Branch exists
|
||||||
|
✓ Files match checkpoint
|
||||||
|
✓ Dispatching executor with context
|
||||||
|
|
||||||
|
Executor continues from pending steps...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Checkpoint Format:**
|
||||||
|
Executors save checkpoints after major steps:
|
||||||
|
```markdown
|
||||||
|
## Checkpoint
|
||||||
|
**Branch:** feat/45-jwt-service
|
||||||
|
**Commit:** abc123
|
||||||
|
**Phase:** Testing
|
||||||
|
|
||||||
|
### Completed Steps
|
||||||
|
- [x] Step 1
|
||||||
|
- [x] Step 2
|
||||||
|
|
||||||
|
### Pending Steps
|
||||||
|
- [ ] Step 3
|
||||||
|
- [ ] Step 4
|
||||||
|
```
|
||||||
|
|
||||||
## Getting Started
|
## Getting Started
|
||||||
|
|
||||||
Simply invoke `/sprint-start` and the orchestrator will:
|
Simply invoke `/sprint-start` and the orchestrator will:
|
||||||
|
|||||||
Reference in New Issue
Block a user