# Confidence Scoring for PR Review

## Purpose

Confidence scoring ensures that review findings are calibrated and actionable. By filtering out low-confidence findings, we reduce noise and focus reviewer attention on real issues.

## Score Ranges

| Range | Label | Meaning | Action |
|-------|-------|---------|--------|
| 0.9 - 1.0 | HIGH | Definite issue | Must address |
| 0.7 - 0.89 | MEDIUM | Likely issue | Should address |
| 0.5 - 0.69 | LOW | Possible concern | Consider addressing |
| < 0.5 | SUPPRESSED | Uncertain | Don't report |

## Scoring Factors

### Positive Factors (Increase Confidence)

| Factor | Impact |
|--------|--------|
| Clear data flow from source to sink | +0.3 |
| Pattern matches known vulnerability | +0.2 |
| No intervening validation visible | +0.2 |
| Matches OWASP Top 10 | +0.15 |
| Found in security-sensitive context | +0.1 |

### Negative Factors (Decrease Confidence)

| Factor | Impact |
|--------|--------|
| Validation might exist elsewhere | -0.2 |
| Depends on runtime configuration | -0.15 |
| Pattern is common but often safe | -0.15 |
| Requires multiple conditions to exploit | -0.1 |
| Theoretical impact only | -0.1 |

## Calibration Guidelines

### Security Issues

Base confidence by pattern:
- SQL string concatenation with user input: 0.95
- Hardcoded credentials: 0.9
- Missing auth check: 0.8
- Generic error exposure: 0.6
- Missing rate limiting: 0.5

### Performance Issues

Base confidence by pattern:
- Clear N+1 in loop: 0.9
- SELECT * on large table: 0.7
- Missing index on filtered column: 0.6
- Suboptimal algorithm: 0.5

### Maintainability Issues

Base confidence by pattern:
- Function >100 lines: 0.8
- Deep nesting >4 levels: 0.75
- Duplicate code blocks: 0.7
- Unclear naming: 0.6
- Minor style issues: 0.3 (suppress)

### Test Coverage

Base confidence by pattern:
- No test file for new module: 0.9
- Security function untested: 0.85
- Edge case not covered: 0.6
- Simple getter untested: 0.3 (suppress)

## Threshold Configuration

The default threshold is 0.5. This can be adjusted:

```bash
PR_REVIEW_CONFIDENCE_THRESHOLD=0.7  # Only high-confidence
PR_REVIEW_CONFIDENCE_THRESHOLD=0.3  # Include more speculative
```

## Example Scoring

### High Confidence (0.95)

```javascript
// Clear SQL injection
const query = `SELECT * FROM users WHERE id = ${req.params.id}`;
```

- User input (req.params.id): +0.3
- Direct to SQL query: +0.3
- No visible validation: +0.2
- Matches OWASP Top 10: +0.15
- **Total: 0.95**

### Medium Confidence (0.72)

```javascript
// Possible performance issue
users.forEach(async (user) => {
  const orders = await db.orders.find({ userId: user.id });
});
```

- Loop with query: +0.3
- Pattern matches N+1: +0.2
- But might be small dataset: -0.15
- Could have caching: -0.1
- **Total: 0.72**

### Low Confidence (0.55)

```javascript
// Maybe too complex?
function processOrder(order, user, items, discounts, shipping) {
  // 60 lines of logic
}
```

- Function is long: +0.2
- Many parameters: +0.15
- But might be intentional: -0.1
- Could be refactored later: -0.1
- **Total: 0.55**

### Suppressed (0.35)

```javascript
// Minor style preference
const x = foo ? bar : baz;
```

- Ternary could be if/else: +0.1
- Very common pattern: -0.2
- No real impact: -0.1
- Style preference: -0.1
- **Total: 0.35** (suppressed)