refactor: extract skills from commands across 8 plugins
Refactored commands to extract reusable skills following the Commands → Skills separation pattern. Each command is now <50 lines and references skill files for detailed knowledge. Plugins refactored: - claude-config-maintainer: 5 commands → 7 skills - code-sentinel: 3 commands → 2 skills - contract-validator: 5 commands → 6 skills - data-platform: 10 commands → 6 skills - doc-guardian: 5 commands → 6 skills (replaced nested dir) - git-flow: 8 commands → 7 skills Skills contain: workflows, validation rules, conventions, reference data, tool documentation Commands now contain: YAML frontmatter, agent assignment, skills list, brief workflow steps, parameters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,18 +1,13 @@
|
||||
# /data-quality - Data Quality Assessment
|
||||
|
||||
## Skills to Load
|
||||
- skills/data-profiling.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Data Quality │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the assessment.
|
||||
|
||||
Comprehensive data quality check for DataFrames with pass/warn/fail scoring.
|
||||
Display header: `DATA-PLATFORM - Data Quality`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,72 +17,18 @@ Comprehensive data quality check for DataFrames with pass/warn/fail scoring.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get data reference**:
|
||||
- If no data_ref provided, use `list_data` to show available options
|
||||
- Validate the data_ref exists
|
||||
Execute `skills/data-profiling.md` quality assessment:
|
||||
|
||||
2. **Null analysis**:
|
||||
- Calculate null percentage per column
|
||||
- **PASS**: < 5% nulls
|
||||
- **WARN**: 5-20% nulls
|
||||
- **FAIL**: > 20% nulls
|
||||
|
||||
3. **Duplicate detection**:
|
||||
- Check for fully duplicated rows
|
||||
- **PASS**: 0% duplicates
|
||||
- **WARN**: < 1% duplicates
|
||||
- **FAIL**: >= 1% duplicates
|
||||
|
||||
4. **Type consistency**:
|
||||
- Identify mixed-type columns (object columns with mixed content)
|
||||
- Flag columns that could be numeric but contain strings
|
||||
- **PASS**: All columns have consistent types
|
||||
- **FAIL**: Mixed types detected
|
||||
|
||||
5. **Outlier detection** (numeric columns):
|
||||
- Use IQR method (values beyond 1.5 * IQR)
|
||||
- Report percentage of outliers per column
|
||||
- **PASS**: < 1% outliers
|
||||
- **WARN**: 1-5% outliers
|
||||
- **FAIL**: > 5% outliers
|
||||
|
||||
6. **Generate quality report**:
|
||||
- Overall quality score (0-100)
|
||||
- Per-column breakdown
|
||||
- Recommendations for remediation
|
||||
|
||||
## Report Format
|
||||
|
||||
```
|
||||
=== Data Quality Report ===
|
||||
Dataset: sales_data
|
||||
Rows: 10,000 | Columns: 15
|
||||
Overall Score: 82/100 [PASS]
|
||||
|
||||
--- Column Analysis ---
|
||||
| Column | Nulls | Dups | Type | Outliers | Status |
|
||||
|--------------|-------|------|----------|----------|--------|
|
||||
| customer_id | 0.0% | - | int64 | 0.2% | PASS |
|
||||
| email | 2.3% | - | object | - | PASS |
|
||||
| amount | 15.2% | - | float64 | 3.1% | WARN |
|
||||
| created_at | 0.0% | - | datetime | - | PASS |
|
||||
|
||||
--- Issues Found ---
|
||||
[WARN] Column 'amount': 15.2% null values (threshold: 5%)
|
||||
[WARN] Column 'amount': 3.1% outliers detected
|
||||
[FAIL] 1.2% duplicate rows detected (12 rows)
|
||||
|
||||
--- Recommendations ---
|
||||
1. Investigate null values in 'amount' column
|
||||
2. Review outliers in 'amount' - may be data entry errors
|
||||
3. Remove or deduplicate 12 duplicate rows
|
||||
```
|
||||
1. **Get data reference**: Use `list_data` if none provided
|
||||
2. **Run quality checks**: Nulls, duplicates, types, outliers
|
||||
3. **Calculate score**: Apply weighted scoring formula
|
||||
4. **Generate report**: Issues and recommendations
|
||||
|
||||
## Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--strict` | Use stricter thresholds (WARN at 1% nulls, FAIL at 5%) |
|
||||
| `--strict` | Stricter thresholds (WARN at 1%, FAIL at 5% nulls) |
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -96,20 +37,12 @@ Overall Score: 82/100 [PASS]
|
||||
/data-quality df_customers --strict
|
||||
```
|
||||
|
||||
## Scoring
|
||||
## Quality Thresholds
|
||||
|
||||
| Component | Weight | Scoring |
|
||||
|-----------|--------|---------|
|
||||
| Nulls | 30% | 100 - (avg_null_pct * 2) |
|
||||
| Duplicates | 20% | 100 - (dup_pct * 50) |
|
||||
| Type consistency | 25% | 100 if clean, 0 if mixed |
|
||||
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
|
||||
See `skills/data-profiling.md` for detailed thresholds and scoring.
|
||||
|
||||
Final score: Weighted average, capped at 0-100
|
||||
## Required MCP Tools
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `describe` - Get statistical summary (for outlier detection)
|
||||
- `describe` - Get statistical summary
|
||||
- `head` - Preview data
|
||||
- `list_data` - List available DataFrames
|
||||
|
||||
Reference in New Issue
Block a user