refactor: extract skills from commands across 8 plugins

Refactored commands to extract reusable skills following the
Commands → Skills separation pattern. Each command is now <50 lines
and references skill files for detailed knowledge.

Plugins refactored:
- claude-config-maintainer: 5 commands → 7 skills
- code-sentinel: 3 commands → 2 skills
- contract-validator: 5 commands → 6 skills
- data-platform: 10 commands → 6 skills
- doc-guardian: 5 commands → 6 skills (replaced nested dir)
- git-flow: 8 commands → 7 skills

Skills contain: workflows, validation rules, conventions,
reference data, tool documentation

Commands now contain: YAML frontmatter, agent assignment,
skills list, brief workflow steps, parameters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-30 17:32:24 -05:00
parent aad02ef2d9
commit 7c8a20c804
71 changed files with 3896 additions and 3690 deletions

View File

@@ -0,0 +1,72 @@
# Data Profiling
## Profiling Workflow
1. **Get data reference** via `list_data`
2. **Generate statistics** via `describe`
3. **Analyze quality** (nulls, duplicates, types, outliers)
4. **Calculate score** and generate report
## Quality Checks
### Null Analysis
- Calculate null percentage per column
- **PASS**: < 5% nulls
- **WARN**: 5-20% nulls
- **FAIL**: > 20% nulls
### Duplicate Detection
- Check for fully duplicated rows
- **PASS**: 0% duplicates
- **WARN**: < 1% duplicates
- **FAIL**: >= 1% duplicates
### Type Consistency
- Identify mixed-type columns
- Flag numeric columns with string values
- **PASS**: Consistent types
- **FAIL**: Mixed types detected
### Outlier Detection (IQR Method)
- Calculate Q1, Q3, IQR = Q3 - Q1
- Outliers: values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
- **PASS**: < 1% outliers
- **WARN**: 1-5% outliers
- **FAIL**: > 5% outliers
## Quality Scoring
| Component | Weight | Formula |
|-----------|--------|---------|
| Nulls | 30% | 100 - (avg_null_pct * 2) |
| Duplicates | 20% | 100 - (dup_pct * 50) |
| Type consistency | 25% | 100 if clean, 0 if mixed |
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
Final score: Weighted average, capped at 0-100
## Report Format
```
=== Data Quality Report ===
Dataset: [data_ref]
Rows: X | Columns: Y
Overall Score: XX/100 [PASS/WARN/FAIL]
--- Column Analysis ---
| Column | Nulls | Dups | Type | Outliers | Status |
|--------|-------|------|------|----------|--------|
| col1 | X.X% | - | type | X.X% | PASS |
--- Issues Found ---
[WARN/FAIL] Column 'X': Issue description
--- Recommendations ---
1. Suggested remediation steps
```
## Strict Mode
With `--strict` flag:
- **WARN** at 1% nulls (vs 5%)
- **FAIL** at 5% nulls (vs 20%)

View File

@@ -0,0 +1,85 @@
# dbt Workflow
## Pre-Validation (MANDATORY)
**Always run `dbt_parse` before any dbt operation.**
This validates:
- dbt_project.yml syntax
- Model SQL syntax
- schema.yml definitions
- Deprecated syntax (dbt 1.9+)
If validation fails, show errors and STOP.
## Model Selection Syntax
| Pattern | Meaning |
|---------|---------|
| `model_name` | Single model |
| `+model_name` | Model and upstream dependencies |
| `model_name+` | Model and downstream dependents |
| `+model_name+` | Model with all dependencies |
| `tag:name` | Models with specific tag |
| `path:models/staging` | Models in path |
| `test_type:schema` | Schema tests only |
| `test_type:data` | Data tests only |
## Execution Workflow
1. **Parse**: `dbt_parse` - Validate project
2. **Run**: `dbt_run` - Execute models
3. **Test**: `dbt_test` - Run tests
4. **Build**: `dbt_build` - Run + test together
## Test Types
### Schema Tests
Defined in `schema.yml`:
- `unique` - No duplicate values
- `not_null` - No null values
- `accepted_values` - Value in allowed list
- `relationships` - Foreign key integrity
### Data Tests
Custom SQL in `tests/` directory:
- Return rows that fail assertion
- Zero rows = pass, any rows = fail
## Materialization Types
| Type | Description |
|------|-------------|
| `view` | Virtual table, always fresh |
| `table` | Physical table, full rebuild |
| `incremental` | Append/merge new rows only |
| `ephemeral` | CTE, no physical object |
## Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Test/run failure |
| 2 | dbt error (parse failure) |
## Result Formatting
```
=== dbt [Operation] Results ===
Project: [project_name]
Selection: [selection_pattern]
--- Summary ---
Total: X models/tests
PASS: X (%)
FAIL: X (%)
WARN: X (%)
SKIP: X (%)
--- Details ---
[Model/Test details with status]
--- Failure Details ---
[Error messages and remediation]
```

View File

@@ -0,0 +1,73 @@
# Lineage Analysis
## Lineage Workflow
1. **Get lineage data** via `dbt_lineage`
2. **Build dependency graph** (upstream + downstream)
3. **Visualize** (ASCII tree or Mermaid)
4. **Report** critical path and refresh implications
## ASCII Tree Format
```
Sources:
|-- raw_customers (source)
|-- raw_orders (source)
model_name (materialization)
|-- upstream:
| |-- stg_model (view)
| |-- raw_source (source)
|-- downstream:
|-- fct_model (incremental)
|-- rpt_model (table)
```
## Mermaid Diagram Format
```mermaid
flowchart LR
subgraph Sources
raw_data[(raw_data)]
end
subgraph Staging
stg_model[stg_model]
end
subgraph Marts
dim_model{{dim_model}}
end
raw_data --> stg_model
stg_model --> dim_model
```
## Mermaid Node Shapes
| Materialization | Shape | Syntax |
|-----------------|-------|--------|
| source | Cylinder | `[(name)]` |
| view | Rectangle | `[name]` |
| table | Double braces | `{{name}}` |
| incremental | Hexagon | `{{name}}` |
| ephemeral | Dashed | `[/name/]` |
## Mermaid Options
| Flag | Description |
|------|-------------|
| `--direction TB` | Top-to-bottom (default: LR) |
| `--depth N` | Limit lineage depth |
## Styling Target Model
```mermaid
style target_model fill:#f96,stroke:#333,stroke-width:2px
```
## Usage Tips
1. **Documentation**: Copy Mermaid to README.md
2. **GitHub/GitLab**: Both render Mermaid natively
3. **Live Editor**: https://mermaid.live for interactive editing

View File

@@ -0,0 +1,69 @@
# MCP Tools Reference
## pandas Tools
| Tool | Description |
|------|-------------|
| `read_csv` | Load CSV file into DataFrame |
| `read_parquet` | Load Parquet file into DataFrame |
| `read_json` | Load JSON/JSONL file into DataFrame |
| `to_csv` | Export DataFrame to CSV |
| `to_parquet` | Export DataFrame to Parquet |
| `describe` | Get statistical summary (count, mean, std, min, max) |
| `head` | Preview first N rows |
| `tail` | Preview last N rows |
| `filter` | Filter rows by condition |
| `select` | Select specific columns |
| `groupby` | Aggregate data by columns |
| `join` | Join two DataFrames |
| `list_data` | List all loaded DataFrames |
| `drop_data` | Remove DataFrame from memory |
## PostgreSQL Tools
| Tool | Description |
|------|-------------|
| `pg_connect` | Establish database connection |
| `pg_query` | Execute SELECT query, return DataFrame |
| `pg_execute` | Execute INSERT/UPDATE/DELETE |
| `pg_tables` | List tables in schema |
| `pg_columns` | Get column info for table |
| `pg_schemas` | List available schemas |
## PostGIS Tools
| Tool | Description |
|------|-------------|
| `st_tables` | List tables with geometry columns |
| `st_geometry_type` | Get geometry type for column |
| `st_srid` | Get SRID for geometry column |
| `st_extent` | Get bounding box for geometry |
## dbt Tools
| Tool | Description |
|------|-------------|
| `dbt_parse` | Validate project (ALWAYS RUN FIRST) |
| `dbt_run` | Execute models |
| `dbt_test` | Run tests |
| `dbt_build` | Run + test together |
| `dbt_compile` | Compile SQL without execution |
| `dbt_ls` | List dbt resources |
| `dbt_docs_generate` | Generate documentation manifest |
| `dbt_lineage` | Get model dependencies |
## Tool Selection Guidelines
**For data loading:**
- Files: `read_csv`, `read_parquet`, `read_json`
- Database: `pg_query`
**For data exploration:**
- Schema: `describe`, `pg_columns`, `st_tables`
- Preview: `head`, `tail`
- Available data: `list_data`, `pg_tables`
**For dbt operations:**
- Always start with `dbt_parse` for validation
- Use `dbt_lineage` for dependency analysis
- Use `dbt_compile` to see rendered SQL

View File

@@ -0,0 +1,108 @@
# Setup Workflow
## Important Context
- **This workflow uses Bash, Read, Write, AskUserQuestion tools** - NOT MCP tools
- **MCP tools won't work until after setup + session restart**
- **PostgreSQL and dbt are optional** - pandas tools work without them
## Phase 1: Environment Validation
### Check Python Version
```bash
python3 --version
```
Requires Python 3.10+. If below, stop and inform user.
## Phase 2: MCP Server Setup
### Locate MCP Server
Check both paths:
```bash
# Installed marketplace
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/
# Source
ls -la ~/claude-plugins-work/mcp-servers/data-platform/
```
### Check/Create Virtual Environment
```bash
# Check
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python
# Create if missing
cd /path/to/mcp-servers/data-platform
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
deactivate
```
## Phase 3: PostgreSQL Configuration (Optional)
### Config Location
`~/.config/claude/postgres.env`
### Config Format
```bash
# PostgreSQL Configuration
POSTGRES_URL=postgresql://user:pass@host:5432/db
```
Set permissions: `chmod 600 ~/.config/claude/postgres.env`
### Test Connection
```bash
source ~/.config/claude/postgres.env && python3 -c "
import asyncio, asyncpg
async def test():
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
ver = await conn.fetchval('SELECT version()')
await conn.close()
print(f'SUCCESS: {ver.split(\",\")[0]}')
asyncio.run(test())
"
```
## Phase 4: dbt Configuration (Optional)
dbt is **project-level** (auto-detected via `dbt_project.yml`).
For subdirectory projects, set in `.env`:
```
DBT_PROJECT_DIR=./transform
DBT_PROFILES_DIR=~/.dbt
```
### Check dbt Installation
```bash
dbt --version
```
## Phase 5: Validation
### Verify MCP Server
```bash
cd /path/to/mcp-servers/data-platform
.venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('OK')"
```
## Memory Limits
Default: 100,000 rows per DataFrame
Override in project `.env`:
```
DATA_PLATFORM_MAX_ROWS=500000
```
For larger datasets:
- Use chunked processing (`chunk_size` parameter)
- Filter data before loading
- Store to Parquet for efficient re-loading
## Session Restart
After setup, restart Claude Code session for MCP tools to become available.

View File

@@ -0,0 +1,45 @@
# Visual Header
## Standard Format
Display at the start of every command execution:
```
+----------------------------------------------------------------------+
| DATA-PLATFORM - [Command Name] |
+----------------------------------------------------------------------+
```
## Command Headers
| Command | Header Text |
|---------|-------------|
| initial-setup | Setup Wizard |
| ingest | Ingest |
| profile | Data Profile |
| schema | Schema Explorer |
| data-quality | Data Quality |
| run | dbt Run |
| dbt-test | dbt Tests |
| lineage | Lineage |
| lineage-viz | Lineage Visualization |
| explain | Model Explanation |
## Summary Box Format
For completion summaries:
```
+============================================================+
| DATA-PLATFORM [OPERATION] COMPLETE |
+============================================================+
| Component: [Status] |
| Component: [Status] |
+============================================================+
```
## Status Indicators
- Success: `[check]` or `Ready`
- Warning: `[!]` or `Partial`
- Failure: `[X]` or `Failed`