refactor: extract skills from commands across 8 plugins
Refactored commands to extract reusable skills following the Commands → Skills separation pattern. Each command is now <50 lines and references skill files for detailed knowledge. Plugins refactored: - claude-config-maintainer: 5 commands → 7 skills - code-sentinel: 3 commands → 2 skills - contract-validator: 5 commands → 6 skills - data-platform: 10 commands → 6 skills - doc-guardian: 5 commands → 6 skills (replaced nested dir) - git-flow: 8 commands → 7 skills Skills contain: workflows, validation rules, conventions, reference data, tool documentation Commands now contain: YAML frontmatter, agent assignment, skills list, brief workflow steps, parameters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
72
plugins/data-platform/skills/data-profiling.md
Normal file
72
plugins/data-platform/skills/data-profiling.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Data Profiling
|
||||
|
||||
## Profiling Workflow
|
||||
|
||||
1. **Get data reference** via `list_data`
|
||||
2. **Generate statistics** via `describe`
|
||||
3. **Analyze quality** (nulls, duplicates, types, outliers)
|
||||
4. **Calculate score** and generate report
|
||||
|
||||
## Quality Checks
|
||||
|
||||
### Null Analysis
|
||||
- Calculate null percentage per column
|
||||
- **PASS**: < 5% nulls
|
||||
- **WARN**: 5-20% nulls
|
||||
- **FAIL**: > 20% nulls
|
||||
|
||||
### Duplicate Detection
|
||||
- Check for fully duplicated rows
|
||||
- **PASS**: 0% duplicates
|
||||
- **WARN**: < 1% duplicates
|
||||
- **FAIL**: >= 1% duplicates
|
||||
|
||||
### Type Consistency
|
||||
- Identify mixed-type columns
|
||||
- Flag numeric columns with string values
|
||||
- **PASS**: Consistent types
|
||||
- **FAIL**: Mixed types detected
|
||||
|
||||
### Outlier Detection (IQR Method)
|
||||
- Calculate Q1, Q3, IQR = Q3 - Q1
|
||||
- Outliers: values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
|
||||
- **PASS**: < 1% outliers
|
||||
- **WARN**: 1-5% outliers
|
||||
- **FAIL**: > 5% outliers
|
||||
|
||||
## Quality Scoring
|
||||
|
||||
| Component | Weight | Formula |
|
||||
|-----------|--------|---------|
|
||||
| Nulls | 30% | 100 - (avg_null_pct * 2) |
|
||||
| Duplicates | 20% | 100 - (dup_pct * 50) |
|
||||
| Type consistency | 25% | 100 if clean, 0 if mixed |
|
||||
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
|
||||
|
||||
Final score: Weighted average, capped at 0-100
|
||||
|
||||
## Report Format
|
||||
|
||||
```
|
||||
=== Data Quality Report ===
|
||||
Dataset: [data_ref]
|
||||
Rows: X | Columns: Y
|
||||
Overall Score: XX/100 [PASS/WARN/FAIL]
|
||||
|
||||
--- Column Analysis ---
|
||||
| Column | Nulls | Dups | Type | Outliers | Status |
|
||||
|--------|-------|------|------|----------|--------|
|
||||
| col1 | X.X% | - | type | X.X% | PASS |
|
||||
|
||||
--- Issues Found ---
|
||||
[WARN/FAIL] Column 'X': Issue description
|
||||
|
||||
--- Recommendations ---
|
||||
1. Suggested remediation steps
|
||||
```
|
||||
|
||||
## Strict Mode
|
||||
|
||||
With `--strict` flag:
|
||||
- **WARN** at 1% nulls (vs 5%)
|
||||
- **FAIL** at 5% nulls (vs 20%)
|
||||
Reference in New Issue
Block a user