Files
leo-claude-mktplace/plugins/data-platform/skills/data-profiling.md
lmiranda 7c8a20c804 refactor: extract skills from commands across 8 plugins
Refactored commands to extract reusable skills following the
Commands → Skills separation pattern. Each command is now <50 lines
and references skill files for detailed knowledge.

Plugins refactored:
- claude-config-maintainer: 5 commands → 7 skills
- code-sentinel: 3 commands → 2 skills
- contract-validator: 5 commands → 6 skills
- data-platform: 10 commands → 6 skills
- doc-guardian: 5 commands → 6 skills (replaced nested dir)
- git-flow: 8 commands → 7 skills

Skills contain: workflows, validation rules, conventions,
reference data, tool documentation

Commands now contain: YAML frontmatter, agent assignment,
skills list, brief workflow steps, parameters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 17:32:24 -05:00

73 lines
1.7 KiB
Markdown

# Data Profiling
## Profiling Workflow
1. **Get data reference** via `list_data`
2. **Generate statistics** via `describe`
3. **Analyze quality** (nulls, duplicates, types, outliers)
4. **Calculate score** and generate report
## Quality Checks
### Null Analysis
- Calculate null percentage per column
- **PASS**: < 5% nulls
- **WARN**: 5-20% nulls
- **FAIL**: > 20% nulls
### Duplicate Detection
- Check for fully duplicated rows
- **PASS**: 0% duplicates
- **WARN**: < 1% duplicates
- **FAIL**: >= 1% duplicates
### Type Consistency
- Identify mixed-type columns
- Flag numeric columns with string values
- **PASS**: Consistent types
- **FAIL**: Mixed types detected
### Outlier Detection (IQR Method)
- Calculate Q1, Q3, IQR = Q3 - Q1
- Outliers: values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
- **PASS**: < 1% outliers
- **WARN**: 1-5% outliers
- **FAIL**: > 5% outliers
## Quality Scoring
| Component | Weight | Formula |
|-----------|--------|---------|
| Nulls | 30% | 100 - (avg_null_pct * 2) |
| Duplicates | 20% | 100 - (dup_pct * 50) |
| Type consistency | 25% | 100 if clean, 0 if mixed |
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
Final score: Weighted average, capped at 0-100
## Report Format
```
=== Data Quality Report ===
Dataset: [data_ref]
Rows: X | Columns: Y
Overall Score: XX/100 [PASS/WARN/FAIL]
--- Column Analysis ---
| Column | Nulls | Dups | Type | Outliers | Status |
|--------|-------|------|------|----------|--------|
| col1 | X.X% | - | type | X.X% | PASS |
--- Issues Found ---
[WARN/FAIL] Column 'X': Issue description
--- Recommendations ---
1. Suggested remediation steps
```
## Strict Mode
With `--strict` flag:
- **WARN** at 1% nulls (vs 5%)
- **FAIL** at 5% nulls (vs 20%)