Sprint 4 - Plugin Commands implementation adding 18 new user-facing commands across 8 plugins as part of V5.2.0 Plugin Enhancements. **projman:** - #241: /sprint-diagram - Mermaid visualization of sprint issues **pr-review:** - #242: Confidence threshold config (PR_REVIEW_CONFIDENCE_THRESHOLD) - #243: /pr-diff - Formatted diff with inline review comments **data-platform:** - #244: /data-quality - DataFrame quality checks (nulls, duplicates, outliers) - #245: /lineage-viz - dbt lineage as Mermaid diagrams - #246: /dbt-test - Formatted dbt test runner **viz-platform:** - #247: /chart-export - Export charts to PNG/SVG/PDF via kaleido - #248: /accessibility-check - Color blind validation (WCAG contrast) - #249: /breakpoints - Responsive layout configuration **contract-validator:** - #250: /dependency-graph - Plugin dependency visualization **doc-guardian:** - #251: /changelog-gen - Generate changelog from conventional commits - #252: /doc-coverage - Documentation coverage metrics - #253: /stale-docs - Flag outdated documentation **claude-config-maintainer:** - #254: /config-diff - Track CLAUDE.md changes over time - #255: /config-lint - 31 lint rules for CLAUDE.md best practices **cmdb-assistant:** - #256: /cmdb-topology - Infrastructure topology diagrams - #257: /change-audit - NetBox audit trail queries - #258: /ip-conflicts - Detect IP conflicts and overlaps Closes #241, #242, #243, #244, #245, #246, #247, #248, #249, #250, #251, #252, #253, #254, #255, #256, #257, #258 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.5 KiB
3.5 KiB
Confidence Scoring for PR Review
Purpose
Confidence scoring ensures that review findings are calibrated and actionable. By filtering out low-confidence findings, we reduce noise and focus reviewer attention on real issues.
Score Ranges
| Range | Label | Meaning | Action |
|---|---|---|---|
| 0.9 - 1.0 | HIGH | Definite issue | Must address |
| 0.7 - 0.89 | MEDIUM | Likely issue | Should address |
| 0.5 - 0.69 | LOW | Possible concern | Consider addressing |
| < 0.5 | SUPPRESSED | Uncertain | Don't report |
Scoring Factors
Positive Factors (Increase Confidence)
| Factor | Impact |
|---|---|
| Clear data flow from source to sink | +0.3 |
| Pattern matches known vulnerability | +0.2 |
| No intervening validation visible | +0.2 |
| Matches OWASP Top 10 | +0.15 |
| Found in security-sensitive context | +0.1 |
Negative Factors (Decrease Confidence)
| Factor | Impact |
|---|---|
| Validation might exist elsewhere | -0.2 |
| Depends on runtime configuration | -0.15 |
| Pattern is common but often safe | -0.15 |
| Requires multiple conditions to exploit | -0.1 |
| Theoretical impact only | -0.1 |
Calibration Guidelines
Security Issues
Base confidence by pattern:
- SQL string concatenation with user input: 0.95
- Hardcoded credentials: 0.9
- Missing auth check: 0.8
- Generic error exposure: 0.6
- Missing rate limiting: 0.5
Performance Issues
Base confidence by pattern:
- Clear N+1 in loop: 0.9
- SELECT * on large table: 0.7
- Missing index on filtered column: 0.6
- Suboptimal algorithm: 0.5
Maintainability Issues
Base confidence by pattern:
- Function >100 lines: 0.8
- Deep nesting >4 levels: 0.75
- Duplicate code blocks: 0.7
- Unclear naming: 0.6
- Minor style issues: 0.3 (suppress)
Test Coverage
Base confidence by pattern:
- No test file for new module: 0.9
- Security function untested: 0.85
- Edge case not covered: 0.6
- Simple getter untested: 0.3 (suppress)
Threshold Configuration
The default threshold is 0.7 (showing MEDIUM and HIGH confidence findings). This can be adjusted:
PR_REVIEW_CONFIDENCE_THRESHOLD=0.9 # Only definite issues (HIGH)
PR_REVIEW_CONFIDENCE_THRESHOLD=0.7 # Likely issues and above (MEDIUM+HIGH) - default
PR_REVIEW_CONFIDENCE_THRESHOLD=0.5 # Include possible concerns (LOW+)
PR_REVIEW_CONFIDENCE_THRESHOLD=0.3 # Include more speculative
Example Scoring
High Confidence (0.95)
// Clear SQL injection
const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
- User input (req.params.id): +0.3
- Direct to SQL query: +0.3
- No visible validation: +0.2
- Matches OWASP Top 10: +0.15
- Total: 0.95
Medium Confidence (0.72)
// Possible performance issue
users.forEach(async (user) => {
const orders = await db.orders.find({ userId: user.id });
});
- Loop with query: +0.3
- Pattern matches N+1: +0.2
- But might be small dataset: -0.15
- Could have caching: -0.1
- Total: 0.72
Low Confidence (0.55)
// Maybe too complex?
function processOrder(order, user, items, discounts, shipping) {
// 60 lines of logic
}
- Function is long: +0.2
- Many parameters: +0.15
- But might be intentional: -0.1
- Could be refactored later: -0.1
- Total: 0.55
Suppressed (0.35)
// Minor style preference
const x = foo ? bar : baz;
- Ternary could be if/else: +0.1
- Very common pattern: -0.2
- No real impact: -0.1
- Style preference: -0.1
- Total: 0.35 (suppressed)