Add "RFC-Perf-Sentinel-Plugin"

2026-01-29 04:33:19 +00:00
parent 8f06f95784
commit 22f276388b
1 changed files with 644 additions and 0 deletions
--- a/RFC-Perf-Sentinel-Plugin.-.md
+++ b/RFC-Perf-Sentinel-Plugin.-.md
@@ -0,0 +1,644 @@
 # RFC: perf-sentinel Plugin
 **Status:** Proposal  
 **Created:** 2026-01-28  
 **Author:** Claude Code Analysis  
 **Priority:** Medium  
 **Estimated Effort:** 2 sprints  
 **Depends On:** None (can be implemented independently)
 ---
 ## Executive Summary
 Create a new plugin dedicated to performance profiling, benchmarking, and optimization of the plugin/hook ecosystem. This plugin automates the kind of manual analysis performed in RFC-Hook-Efficiency-Improvements, providing continuous monitoring and regression detection.
 ---
 ## Motivation
 ### Problem Statement
 1. **Manual analysis is time-consuming** - The hook efficiency review required reading 15+ files and manual assessment
 2. **No regression detection** - Performance degradations go unnoticed until they become painful
 3. **No baseline tracking** - We don't know if changes improve or worsen performance
 4. **Optimization suggestions require expertise** - Not all users can identify inefficiencies
 ### Goals
 1. Automate performance audits
 2. Establish and track performance baselines
 3. Detect regressions before they accumulate
 4. Provide actionable optimization recommendations
 5. Keep monitoring overhead minimal (< 10ms per session)
 ---
 ## Proposed Solution
 ### Plugin Overview
 | Attribute | Value |
 |-----------|-------|
 | Name | `perf-sentinel` |
 | Version | 1.0.0 |
 | Category | Developer Tools |
 | Dependencies | None (external tools optional) |
 ### Directory Structure
 ```
 plugins/perf-sentinel/
 ├── .claude-plugin/
 │   └── plugin.json
 ├── commands/
 │   ├── perf-audit.md           # Full performance audit
 │   ├── perf-baseline.md        # Establish performance baseline
 │   ├── perf-compare.md         # Compare against baseline
 │   ├── benchmark-hooks.md      # Benchmark specific hooks
 │   ├── profile-session.md      # Profile session startup/operations
 │   └── perf-optimize.md        # Generate optimization suggestions
 ├── hooks/
 │   ├── hooks.json
 │   └── session-metrics.sh      # Lightweight session-end metrics
 ├── agents/
 │   ├── perf-analyzer.md        # Analyze data, generate recommendations
 │   └── benchmark-runner.md     # Run systematic benchmarks
 ├── scripts/
 │   ├── hook-profiler.sh        # Profile individual hooks
 │   ├── metrics-collector.sh    # Collect system metrics
 │   └── lib/
 │       ├── timing.sh           # Timing utilities
 │       ├── metrics.sh          # Metrics collection/storage
 │       └── reporting.sh        # Report generation
 ├── skills/
 │   └── perf-patterns/
 │       └── common-issues.md    # Knowledge base of common perf issues
 └── README.md
 ```
 ---
 ## Feature Specifications
 ### Command: `/perf-audit`
 **Purpose:** Automated performance audit (what we did manually for hooks)
 **Behavior:**
 1. Enumerate all hooks across all plugins
 2. Measure execution time of each hook (dry run with mock input)
 3. Analyze hook scripts for known anti-patterns
 4. Check for redundant operations across hooks
 5. Generate report with recommendations
 **Output Example:**
 ```
 [perf-sentinel] Performance Audit - 2026-01-28
 ==============================================
 HOOK ANALYSIS (16 hooks across 12 plugins)
 SessionStart Hooks (7):
 ┌─────────────────────────────────┬──────────┬─────────────────────────────┐
 │ Hook                            │ Time     │ Concerns                    │
 ├─────────────────────────────────┼──────────┼─────────────────────────────┤
 │ projman/startup-check.sh        │ 342ms    │ ⚠️ HTTP calls, version check │
 │ cmdb-assistant/startup-check.sh │ 287ms    │ ⚠️ HTTP calls (NetBox API)   │
 │ contract-validator/auto-valid.. │ 156ms    │ MD5 hashing                 │
 │ data-platform/startup-check.sh  │ 134ms    │ ⚠️ Python + DB connection    │
 │ claude-config-maintainer/enf..  │ 45ms     │ File I/O                    │
 │ pr-review/startup-check.sh      │ 38ms     │ OK                          │
 │ viz-platform (inline echo)      │ 2ms      │ ⚠️ No value                  │
 └─────────────────────────────────┴──────────┴─────────────────────────────┘
 Total SessionStart overhead: 1,004ms
 PostToolUse Hooks (4):
 ┌─────────────────────────────────┬──────────┬─────────────────────────────┐
 │ Hook                            │ Time     │ Concerns                    │
 ├─────────────────────────────────┼──────────┼─────────────────────────────┤
 │ project-hygiene/cleanup.sh      │ 89ms     │ ⚠️ find across project       │
 │ contract-validator/breaking-..  │ 67ms     │ git operations              │
 │ data-platform/schema-diff-ch..  │ 34ms     │ OK                          │
 │ doc-guardian/notify.sh          │ 12ms     │ OK                          │
 └─────────────────────────────────┴──────────┴─────────────────────────────┘
 Avg PostToolUse overhead: 50ms per edit
 REDUNDANCY ANALYSIS
  - MCP venv check: duplicated in 3 hooks
  - Git remote check: duplicated in 2 hooks
  - JSON parsing: 8 independent implementations
 ANTI-PATTERNS DETECTED
  1. HTTP calls in SessionStart (2 hooks)
  2. Full project find in PostToolUse (1 hook)
  3. Broad Bash matcher triggering on all commands (2 hooks)
 RECOMMENDATIONS (see /perf-optimize for details)
  1. Consolidate SessionStart hooks → est. -400ms
  2. Add cooldown to project-hygiene → est. -50ms/edit
  3. Remove viz-platform echo hook → est. -2ms
  4. Add early-exit to git-flow hooks → est. -20ms/cmd
 Overall Score: 62/100 (Needs Improvement)
 ```
 **Anti-patterns to detect:**
 | Pattern | Detection Method | Severity |
 |---------|------------------|----------|
 | HTTP calls in SessionStart | grep for `curl`, `wget`, API URLs | High |
 | Full project `find` | grep for `find "$PROJECT"` or `find .` | High |
 | Python spawn for simple tasks | grep for `python` in shell scripts | Medium |
 | No early exit | Check if script bails out quickly for irrelevant input | Medium |
 | Broad Bash matcher | Check hooks.json for `"matcher": "Bash"` | Medium |
 | Duplicate logic | Compare function signatures across hooks | Low |
 ---
 ### Command: `/perf-baseline`
 **Purpose:** Establish performance baseline for current version
 **Behavior:**
 1. Run multiple session startup measurements (default: 10)
 2. Run multiple edit cycle measurements (default: 10)
 3. Calculate statistics (mean, stddev, p95)
 4. Store baseline with version tag
 **Parameters:**
 - `--runs N` - Number of measurement runs (default: 10)
 - `--tag TAG` - Custom tag (default: version from marketplace.json)
 **Output Example:**
 ```
 [perf-sentinel] Creating baseline for v5.4.0
 ============================================
 Running session startup measurements (10 runs)...
  Run 1: 487ms
  Run 2: 512ms
  ...
  Run 10: 478ms
 Running edit cycle measurements (10 runs)...
  Run 1: 156ms
  ...
 Baseline Statistics:
 ┌────────────────────┬─────────┬─────────┬─────────┬─────────┐
 │ Metric             │ Mean    │ StdDev  │ P50     │ P95     │
 ├────────────────────┼─────────┼─────────┼─────────┼─────────┤
 │ Session startup    │ 487ms   │ 42ms    │ 481ms   │ 534ms   │
 │ Edit cycle         │ 156ms   │ 23ms    │ 152ms   │ 189ms   │
 │ Hook count         │ 16      │ -       │ -       │ -       │
 │ Total hook code    │ 1,493   │ -       │ -       │ -       │
 └────────────────────┴─────────┴─────────┴─────────┴─────────┘
 Baseline saved: ~/.cache/claude-plugins/perf-sentinel/baselines/v5.4.0.json
 ```
 **Storage Format:**
 ```json
 {
  "version": "5.4.0",
  "timestamp": "2026-01-28T12:00:00Z",
  "measurements": {
    "session_startup": {
      "runs": [487, 512, 478, ...],
      "mean": 487,
      "stddev": 42,
      "p50": 481,
      "p95": 534
    },
    "edit_cycle": {
      "runs": [156, 162, 148, ...],
      "mean": 156,
      "stddev": 23,
      "p50": 152,
      "p95": 189
    }
  },
  "metadata": {
    "hook_count": 16,
    "total_hook_lines": 1493,
    "plugins": ["projman", "pr-review", ...]
  }
 }
 ```
 ---
 ### Command: `/perf-compare`
 **Purpose:** Compare current performance against baseline
 **Behavior:**
 1. Load baseline (latest or specified)
 2. Run current measurements
 3. Compare with statistical significance testing
 4. Flag regressions exceeding threshold (default: 10%)
 **Parameters:**
 - `--baseline TAG` - Specific baseline to compare against
 - `--threshold N` - Regression threshold percentage (default: 10)
 **Output Example:**
 ```
 [perf-sentinel] Performance Comparison
 ======================================
 Baseline: v5.4.0 (2026-01-28)
 Current:  v5.5.0-dev
                      Baseline    Current     Change      Status
 ┌────────────────────┬───────────┬───────────┬───────────┬──────────┐
 │ Session startup    │ 487ms     │ 612ms     │ +125ms    │ ⚠️ +26%   │
 │ Edit cycle         │ 156ms     │ 148ms     │ -8ms      │ ✅ -5%    │
 │ Hook count         │ 16        │ 18        │ +2        │ ℹ️        │
 └────────────────────┴───────────┴───────────┴───────────┴──────────┘
 ⚠️ REGRESSION DETECTED: Session startup
 Root Cause Analysis:
  New hooks since baseline:
    + data-platform/startup-check.sh (SessionStart) → +134ms
    + data-platform/schema-diff-check.sh (PostToolUse)
  Modified hooks:
    ~ projman/startup-check.sh → +12ms (added version check)
 Recommendation:
  Review data-platform SessionStart hook - consider lazy loading
  Run /perf-optimize for detailed suggestions
 ```
 ---
 ### Command: `/benchmark-hooks`
 **Purpose:** Detailed benchmarking of specific hooks
 **Behavior:**
 1. Run specified hook(s) multiple times with mock input
 2. Provide statistical analysis
 3. Optional: use hyperfine if available
 **Parameters:**
 - `HOOK_PATH` - Path to specific hook, or "all"
 - `--runs N` - Number of runs (default: 20)
 - `--warmup N` - Warmup runs (default: 3)
 **Output Example:**
 ```
 [perf-sentinel] Hook Benchmark: projman/startup-check.sh
 ========================================================
 Configuration:
  Runs: 20 (+ 3 warmup)
  Input: mock SessionStart payload
  Tool: hyperfine (detected)
 Results:
  Mean:     342.3ms
  StdDev:   28.1ms
  Min:      298ms
  Max:      412ms
  P50:      338ms
  P95:      389ms
 Time Breakdown (via strace):
  Network I/O:  267ms (78%)  ← Gitea API calls
  File I/O:      42ms (12%)
  Process:       33ms (10%)
 Recommendation:
  Network calls dominate. Consider:
  - Caching API responses
  - Making calls async/lazy
  - Reducing call frequency
 ```
 ---
 ### Command: `/perf-optimize`
 **Purpose:** Generate actionable optimization plan
 **Behavior:**
 1. Run audit if not recently run
 2. Analyze patterns and anti-patterns
 3. Generate prioritized optimization plan
 4. Estimate impact of each optimization
 **Output Example:**
 ```
 [perf-sentinel] Optimization Plan
 =================================
 Based on audit from 2026-01-28 12:00
 PRIORITY 1 - High Impact, Low Effort
 ────────────────────────────────────
 1. Remove viz-platform SessionStart hook
   File: plugins/viz-platform/hooks/hooks.json
   Action: Delete the hook (provides no value)
   Impact: -2ms startup
   Effort: 1 line change
 2. Make clarity-assist opt-in
   File: plugins/clarity-assist/hooks/vagueness-check.sh
   Action: Change default from true to false
   Impact: -50ms per prompt (when disabled)
   Effort: 1 line change
 PRIORITY 2 - High Impact, Medium Effort
 ───────────────────────────────────────
 3. Add early-exit to git-flow hooks
   Files: plugins/git-flow/hooks/branch-check.sh
          plugins/git-flow/hooks/commit-msg-check.sh
   Action: Check for "git" in command before full parsing
   Impact: -20ms per non-git bash command
   Effort: ~10 lines each
 4. Add cooldown to project-hygiene
   File: plugins/project-hygiene/hooks/cleanup.sh
   Action: Skip if ran in last 5 minutes
   Impact: -89ms per edit (when cooling down)
   Effort: ~15 lines
 PRIORITY 3 - High Impact, High Effort
 ─────────────────────────────────────
 5. Consolidate SessionStart hooks
   Action: Create central dispatcher, remove individual hooks
   Impact: -400ms startup (estimated)
   Effort: New architecture, ~200 lines
 6. Create shared hook library
   Action: Extract common JSON parsing, path filtering
   Impact: Reduced maintenance, consistent behavior
   Effort: ~100 lines library + refactor all hooks
 ESTIMATED TOTAL IMPACT
 ──────────────────────
 If all optimizations implemented:
  Session startup: 487ms → ~180ms (63% reduction)
  Edit cycle: 156ms → ~60ms (62% reduction)
 Implementation order recommendation:
  Sprint 1: Items 1-4 (quick wins)
  Sprint 2: Items 5-6 (architecture)
 ```
 ---
 ### Hook: Session Metrics (Lightweight)
 **File:** `hooks/hooks.json`
 ```json
 {
  "hooks": {
    "SessionEnd": [
      {
        "type": "command",
        "command": "${CLAUDE_PLUGIN_ROOT}/hooks/session-metrics.sh"
      }
    ]
  }
 }
 ```
 **Design Principles:**
 1. **SessionEnd only** - No per-tool overhead
 2. **Fast execution** - Must complete in < 10ms
 3. **Minimal I/O** - Single file write
 4. **No network** - Never make HTTP calls
 **Script:** `hooks/session-metrics.sh`
 ```bash
 #!/bin/bash
 # perf-sentinel session metrics collector
 # MUST be lightweight - runs on every session end
 # Metrics directory
 METRICS_DIR="$HOME/.cache/claude-plugins/perf-sentinel/sessions"
 TODAY=$(date +%Y-%m-%d)
 mkdir -p "$METRICS_DIR/$TODAY" 2>/dev/null
 # Collect only what's cheap to measure
 SESSION_ID="${CLAUDE_SESSION_ID:-$(date +%s)}"
 DURATION="${CLAUDE_SESSION_DURATION_MS:-0}"
 # Single atomic write
 cat > "$METRICS_DIR/$TODAY/$SESSION_ID.json" << EOF
 {"ts":"$(date -Iseconds)","dur":$DURATION,"id":"$SESSION_ID"}
 EOF
 exit 0
 ```
 ---
 ### Agent: perf-analyzer
 **Purpose:** AI-powered performance analysis and recommendations
 **Capabilities:**
 1. Analyze collected metrics for trends
 2. Identify anomalies and regressions
 3. Generate natural language recommendations
 4. Cross-reference with known performance patterns
 **Model:** `sonnet` (good balance of speed and analysis quality)
 ---
 ### Agent: benchmark-runner
 **Purpose:** Execute systematic benchmarks
 **Capabilities:**
 1. Run benchmarks with proper isolation
 2. Handle warmup and cooldown
 3. Statistical analysis of results
 4. Integration with external tools when available
 **Model:** `haiku` (simple task execution)
 ---
 ## Data Storage
 ### Directory Structure
 ```
 ~/.cache/claude-plugins/perf-sentinel/
 ├── baselines/
 │   ├── v5.4.0.json
 │   ├── v5.5.0.json
 │   └── latest -> v5.5.0.json
 ├── sessions/
 │   ├── 2026-01-27/
 │   │   ├── session-001.json
 │   │   └── session-002.json
 │   └── 2026-01-28/
 │       └── session-001.json
 ├── audits/
 │   └── 2026-01-28.json
 ├── benchmarks/
 │   └── hooks-2026-01-28.json
 └── config.json
 ```
 ### Retention Policy
 - Sessions: 30 days
 - Audits: 90 days
 - Baselines: Indefinite
 - Benchmarks: 90 days
 ---
 ## External Tool Integration
 ### Detection and Fallback
 ```bash
 # In scripts/lib/timing.sh
 benchmark_command() {
    local cmd="$1"
    local runs="${2:-10}"
    if command -v hyperfine &> /dev/null; then
        # Use hyperfine for statistical benchmarking
        hyperfine --runs "$runs" --warmup 3 --export-json /tmp/bench.json "$cmd"
        parse_hyperfine_output /tmp/bench.json
    else
        # Fallback to basic timing
        local total=0
        for i in $(seq 1 "$runs"); do
            start=$(date +%s%N)
            eval "$cmd" > /dev/null 2>&1
            end=$(date +%s%N)
            elapsed=$(( (end - start) / 1000000 ))
            total=$((total + elapsed))
        done
        echo $((total / runs))
    fi
 }
 ```
 ### Supported Tools
 | Tool | Purpose | Required | Detection |
 |------|---------|----------|-----------|
 | `time` (GNU) | Basic timing | Yes (built-in) | Always available |
 | `hyperfine` | Statistical benchmarks | No | `command -v hyperfine` |
 | `strace` | Syscall analysis | No | `command -v strace` |
 | `ps` | Memory tracking | Yes (built-in) | Always available |
 | `perf` | CPU profiling | No | `command -v perf` |
 ---
 ## Configuration
 ### Config File: `~/.cache/claude-plugins/perf-sentinel/config.json`
 ```json
 {
  "metrics": {
    "enabled": true,
    "retention_days": 30
  },
  "thresholds": {
    "regression_percent": 10,
    "slow_hook_ms": 100,
    "slow_session_ms": 500
  },
  "external_tools": {
    "prefer_hyperfine": true,
    "use_strace": false
  },
  "notifications": {
    "on_regression": true,
    "on_slow_session": false
  }
 }
 ```
 ### Environment Variables
 | Variable | Purpose | Default |
 |----------|---------|---------|
 | `PERF_SENTINEL_ENABLED` | Enable/disable metrics collection | `true` |
 | `PERF_SENTINEL_VERBOSE` | Verbose output | `false` |
 | `PERF_SENTINEL_THRESHOLD` | Regression threshold % | `10` |
 ---
 ## Implementation Plan
 ### Phase 1: Core Infrastructure (Sprint 1)
 1. Create plugin structure
 2. Implement `/perf-audit` command
 3. Implement lightweight SessionEnd hook
 4. Create scripts/lib/ utilities
 ### Phase 2: Baseline System (Sprint 1)
 5. Implement `/perf-baseline` command
 6. Implement `/perf-compare` command
 7. Set up data storage structure
 ### Phase 3: Advanced Features (Sprint 2)
 8. Implement `/benchmark-hooks` command
 9. Implement `/perf-optimize` command
 10. Create perf-analyzer agent
 11. Add external tool integration
 ### Phase 4: Polish (Sprint 2)
 12. Add configuration system
 13. Create README and documentation
 14. Add to marketplace.json
 15. Integration testing
 ---
 ## Success Metrics
 | Metric | Target |
 |--------|--------|
 | Audit execution time | < 5 seconds |
 | Metrics hook overhead | < 10ms |
 | Regression detection accuracy | > 90% |
 | False positive rate | < 5% |
 ---
 ## Risks and Mitigations
 | Risk | Mitigation |
 |------|------------|
 | Metrics collection adds overhead | SessionEnd only, < 10ms budget |
 | Benchmark results vary by system | Statistical analysis, warmup runs |
 | External tools not available | Graceful fallback to built-in |
 | Storage grows unbounded | Retention policy, auto-cleanup |
 ---
 ## Open Questions
 1. Should `/perf-audit` run automatically on version changes?
 2. Should we integrate with CI/CD for automated regression testing?
 3. Should baseline comparisons be part of `/sprint-close`?
 4. Should we track memory usage in addition to time?
 ---
 ## Related RFCs
 - [RFC-Hook-Efficiency-Improvements](./RFC-Hook-Efficiency-Improvements) - The manual analysis that motivated this plugin
 ---
 ## Changelog
 | Date | Change |
 |------|--------|
 | 2026-01-28 | Initial RFC created |