Files
leo-claude-mktplace/plugins/pr-review/skills/review-patterns/confidence-scoring.md
lmiranda e5ca804692 feat: v3.0.0 architecture overhaul
- Rename marketplace to lm-claude-plugins
- Move MCP servers to root with symlinks
- Add 6 PR tools to Gitea MCP (list_pull_requests, get_pull_request,
  get_pr_diff, get_pr_comments, create_pr_review, add_pr_comment)
- Add clarity-assist plugin (prompt optimization with ND accommodations)
- Add git-flow plugin (workflow automation)
- Add pr-review plugin (multi-agent review with confidence scoring)
- Centralize configuration docs
- Update all documentation for v3.0.0

BREAKING CHANGE: MCP server paths changed, marketplace renamed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 16:56:53 -05:00

3.3 KiB

Confidence Scoring for PR Review

Purpose

Confidence scoring ensures that review findings are calibrated and actionable. By filtering out low-confidence findings, we reduce noise and focus reviewer attention on real issues.

Score Ranges

Range Label Meaning Action
0.9 - 1.0 HIGH Definite issue Must address
0.7 - 0.89 MEDIUM Likely issue Should address
0.5 - 0.69 LOW Possible concern Consider addressing
< 0.5 SUPPRESSED Uncertain Don't report

Scoring Factors

Positive Factors (Increase Confidence)

Factor Impact
Clear data flow from source to sink +0.3
Pattern matches known vulnerability +0.2
No intervening validation visible +0.2
Matches OWASP Top 10 +0.15
Found in security-sensitive context +0.1

Negative Factors (Decrease Confidence)

Factor Impact
Validation might exist elsewhere -0.2
Depends on runtime configuration -0.15
Pattern is common but often safe -0.15
Requires multiple conditions to exploit -0.1
Theoretical impact only -0.1

Calibration Guidelines

Security Issues

Base confidence by pattern:

  • SQL string concatenation with user input: 0.95
  • Hardcoded credentials: 0.9
  • Missing auth check: 0.8
  • Generic error exposure: 0.6
  • Missing rate limiting: 0.5

Performance Issues

Base confidence by pattern:

  • Clear N+1 in loop: 0.9
  • SELECT * on large table: 0.7
  • Missing index on filtered column: 0.6
  • Suboptimal algorithm: 0.5

Maintainability Issues

Base confidence by pattern:

  • Function >100 lines: 0.8
  • Deep nesting >4 levels: 0.75
  • Duplicate code blocks: 0.7
  • Unclear naming: 0.6
  • Minor style issues: 0.3 (suppress)

Test Coverage

Base confidence by pattern:

  • No test file for new module: 0.9
  • Security function untested: 0.85
  • Edge case not covered: 0.6
  • Simple getter untested: 0.3 (suppress)

Threshold Configuration

The default threshold is 0.5. This can be adjusted:

PR_REVIEW_CONFIDENCE_THRESHOLD=0.7  # Only high-confidence
PR_REVIEW_CONFIDENCE_THRESHOLD=0.3  # Include more speculative

Example Scoring

High Confidence (0.95)

// Clear SQL injection
const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
  • User input (req.params.id): +0.3
  • Direct to SQL query: +0.3
  • No visible validation: +0.2
  • Matches OWASP Top 10: +0.15
  • Total: 0.95

Medium Confidence (0.72)

// Possible performance issue
users.forEach(async (user) => {
  const orders = await db.orders.find({ userId: user.id });
});
  • Loop with query: +0.3
  • Pattern matches N+1: +0.2
  • But might be small dataset: -0.15
  • Could have caching: -0.1
  • Total: 0.72

Low Confidence (0.55)

// Maybe too complex?
function processOrder(order, user, items, discounts, shipping) {
  // 60 lines of logic
}
  • Function is long: +0.2
  • Many parameters: +0.15
  • But might be intentional: -0.1
  • Could be refactored later: -0.1
  • Total: 0.55

Suppressed (0.35)

// Minor style preference
const x = foo ? bar : baz;
  • Ternary could be if/else: +0.1
  • Very common pattern: -0.2
  • No real impact: -0.1
  • Style preference: -0.1
  • Total: 0.35 (suppressed)