Refactored commands to extract reusable skills following the Commands → Skills separation pattern. Each command is now <50 lines and references skill files for detailed knowledge. Plugins refactored: - claude-config-maintainer: 5 commands → 7 skills - code-sentinel: 3 commands → 2 skills - contract-validator: 5 commands → 6 skills - data-platform: 10 commands → 6 skills - doc-guardian: 5 commands → 6 skills (replaced nested dir) - git-flow: 8 commands → 7 skills Skills contain: workflows, validation rules, conventions, reference data, tool documentation Commands now contain: YAML frontmatter, agent assignment, skills list, brief workflow steps, parameters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
73 lines
1.7 KiB
Markdown
73 lines
1.7 KiB
Markdown
# Data Profiling
|
|
|
|
## Profiling Workflow
|
|
|
|
1. **Get data reference** via `list_data`
|
|
2. **Generate statistics** via `describe`
|
|
3. **Analyze quality** (nulls, duplicates, types, outliers)
|
|
4. **Calculate score** and generate report
|
|
|
|
## Quality Checks
|
|
|
|
### Null Analysis
|
|
- Calculate null percentage per column
|
|
- **PASS**: < 5% nulls
|
|
- **WARN**: 5-20% nulls
|
|
- **FAIL**: > 20% nulls
|
|
|
|
### Duplicate Detection
|
|
- Check for fully duplicated rows
|
|
- **PASS**: 0% duplicates
|
|
- **WARN**: < 1% duplicates
|
|
- **FAIL**: >= 1% duplicates
|
|
|
|
### Type Consistency
|
|
- Identify mixed-type columns
|
|
- Flag numeric columns with string values
|
|
- **PASS**: Consistent types
|
|
- **FAIL**: Mixed types detected
|
|
|
|
### Outlier Detection (IQR Method)
|
|
- Calculate Q1, Q3, IQR = Q3 - Q1
|
|
- Outliers: values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
|
|
- **PASS**: < 1% outliers
|
|
- **WARN**: 1-5% outliers
|
|
- **FAIL**: > 5% outliers
|
|
|
|
## Quality Scoring
|
|
|
|
| Component | Weight | Formula |
|
|
|-----------|--------|---------|
|
|
| Nulls | 30% | 100 - (avg_null_pct * 2) |
|
|
| Duplicates | 20% | 100 - (dup_pct * 50) |
|
|
| Type consistency | 25% | 100 if clean, 0 if mixed |
|
|
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
|
|
|
|
Final score: Weighted average, capped at 0-100
|
|
|
|
## Report Format
|
|
|
|
```
|
|
=== Data Quality Report ===
|
|
Dataset: [data_ref]
|
|
Rows: X | Columns: Y
|
|
Overall Score: XX/100 [PASS/WARN/FAIL]
|
|
|
|
--- Column Analysis ---
|
|
| Column | Nulls | Dups | Type | Outliers | Status |
|
|
|--------|-------|------|------|----------|--------|
|
|
| col1 | X.X% | - | type | X.X% | PASS |
|
|
|
|
--- Issues Found ---
|
|
[WARN/FAIL] Column 'X': Issue description
|
|
|
|
--- Recommendations ---
|
|
1. Suggested remediation steps
|
|
```
|
|
|
|
## Strict Mode
|
|
|
|
With `--strict` flag:
|
|
- **WARN** at 1% nulls (vs 5%)
|
|
- **FAIL** at 5% nulls (vs 20%)
|