refactor: extract skills from commands across 8 plugins
Refactored commands to extract reusable skills following the Commands → Skills separation pattern. Each command is now <50 lines and references skill files for detailed knowledge. Plugins refactored: - claude-config-maintainer: 5 commands → 7 skills - code-sentinel: 3 commands → 2 skills - contract-validator: 5 commands → 6 skills - data-platform: 10 commands → 6 skills - doc-guardian: 5 commands → 6 skills (replaced nested dir) - git-flow: 8 commands → 7 skills Skills contain: workflows, validation rules, conventions, reference data, tool documentation Commands now contain: YAML frontmatter, agent assignment, skills list, brief workflow steps, parameters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
72
plugins/data-platform/skills/data-profiling.md
Normal file
72
plugins/data-platform/skills/data-profiling.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Data Profiling
|
||||
|
||||
## Profiling Workflow
|
||||
|
||||
1. **Get data reference** via `list_data`
|
||||
2. **Generate statistics** via `describe`
|
||||
3. **Analyze quality** (nulls, duplicates, types, outliers)
|
||||
4. **Calculate score** and generate report
|
||||
|
||||
## Quality Checks
|
||||
|
||||
### Null Analysis
|
||||
- Calculate null percentage per column
|
||||
- **PASS**: < 5% nulls
|
||||
- **WARN**: 5-20% nulls
|
||||
- **FAIL**: > 20% nulls
|
||||
|
||||
### Duplicate Detection
|
||||
- Check for fully duplicated rows
|
||||
- **PASS**: 0% duplicates
|
||||
- **WARN**: < 1% duplicates
|
||||
- **FAIL**: >= 1% duplicates
|
||||
|
||||
### Type Consistency
|
||||
- Identify mixed-type columns
|
||||
- Flag numeric columns with string values
|
||||
- **PASS**: Consistent types
|
||||
- **FAIL**: Mixed types detected
|
||||
|
||||
### Outlier Detection (IQR Method)
|
||||
- Calculate Q1, Q3, IQR = Q3 - Q1
|
||||
- Outliers: values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
|
||||
- **PASS**: < 1% outliers
|
||||
- **WARN**: 1-5% outliers
|
||||
- **FAIL**: > 5% outliers
|
||||
|
||||
## Quality Scoring
|
||||
|
||||
| Component | Weight | Formula |
|
||||
|-----------|--------|---------|
|
||||
| Nulls | 30% | 100 - (avg_null_pct * 2) |
|
||||
| Duplicates | 20% | 100 - (dup_pct * 50) |
|
||||
| Type consistency | 25% | 100 if clean, 0 if mixed |
|
||||
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
|
||||
|
||||
Final score: Weighted average, capped at 0-100
|
||||
|
||||
## Report Format
|
||||
|
||||
```
|
||||
=== Data Quality Report ===
|
||||
Dataset: [data_ref]
|
||||
Rows: X | Columns: Y
|
||||
Overall Score: XX/100 [PASS/WARN/FAIL]
|
||||
|
||||
--- Column Analysis ---
|
||||
| Column | Nulls | Dups | Type | Outliers | Status |
|
||||
|--------|-------|------|------|----------|--------|
|
||||
| col1 | X.X% | - | type | X.X% | PASS |
|
||||
|
||||
--- Issues Found ---
|
||||
[WARN/FAIL] Column 'X': Issue description
|
||||
|
||||
--- Recommendations ---
|
||||
1. Suggested remediation steps
|
||||
```
|
||||
|
||||
## Strict Mode
|
||||
|
||||
With `--strict` flag:
|
||||
- **WARN** at 1% nulls (vs 5%)
|
||||
- **FAIL** at 5% nulls (vs 20%)
|
||||
85
plugins/data-platform/skills/dbt-workflow.md
Normal file
85
plugins/data-platform/skills/dbt-workflow.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# dbt Workflow
|
||||
|
||||
## Pre-Validation (MANDATORY)
|
||||
|
||||
**Always run `dbt_parse` before any dbt operation.**
|
||||
|
||||
This validates:
|
||||
- dbt_project.yml syntax
|
||||
- Model SQL syntax
|
||||
- schema.yml definitions
|
||||
- Deprecated syntax (dbt 1.9+)
|
||||
|
||||
If validation fails, show errors and STOP.
|
||||
|
||||
## Model Selection Syntax
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---------|---------|
|
||||
| `model_name` | Single model |
|
||||
| `+model_name` | Model and upstream dependencies |
|
||||
| `model_name+` | Model and downstream dependents |
|
||||
| `+model_name+` | Model with all dependencies |
|
||||
| `tag:name` | Models with specific tag |
|
||||
| `path:models/staging` | Models in path |
|
||||
| `test_type:schema` | Schema tests only |
|
||||
| `test_type:data` | Data tests only |
|
||||
|
||||
## Execution Workflow
|
||||
|
||||
1. **Parse**: `dbt_parse` - Validate project
|
||||
2. **Run**: `dbt_run` - Execute models
|
||||
3. **Test**: `dbt_test` - Run tests
|
||||
4. **Build**: `dbt_build` - Run + test together
|
||||
|
||||
## Test Types
|
||||
|
||||
### Schema Tests
|
||||
Defined in `schema.yml`:
|
||||
- `unique` - No duplicate values
|
||||
- `not_null` - No null values
|
||||
- `accepted_values` - Value in allowed list
|
||||
- `relationships` - Foreign key integrity
|
||||
|
||||
### Data Tests
|
||||
Custom SQL in `tests/` directory:
|
||||
- Return rows that fail assertion
|
||||
- Zero rows = pass, any rows = fail
|
||||
|
||||
## Materialization Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `view` | Virtual table, always fresh |
|
||||
| `table` | Physical table, full rebuild |
|
||||
| `incremental` | Append/merge new rows only |
|
||||
| `ephemeral` | CTE, no physical object |
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success |
|
||||
| 1 | Test/run failure |
|
||||
| 2 | dbt error (parse failure) |
|
||||
|
||||
## Result Formatting
|
||||
|
||||
```
|
||||
=== dbt [Operation] Results ===
|
||||
Project: [project_name]
|
||||
Selection: [selection_pattern]
|
||||
|
||||
--- Summary ---
|
||||
Total: X models/tests
|
||||
PASS: X (%)
|
||||
FAIL: X (%)
|
||||
WARN: X (%)
|
||||
SKIP: X (%)
|
||||
|
||||
--- Details ---
|
||||
[Model/Test details with status]
|
||||
|
||||
--- Failure Details ---
|
||||
[Error messages and remediation]
|
||||
```
|
||||
73
plugins/data-platform/skills/lineage-analysis.md
Normal file
73
plugins/data-platform/skills/lineage-analysis.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Lineage Analysis
|
||||
|
||||
## Lineage Workflow
|
||||
|
||||
1. **Get lineage data** via `dbt_lineage`
|
||||
2. **Build dependency graph** (upstream + downstream)
|
||||
3. **Visualize** (ASCII tree or Mermaid)
|
||||
4. **Report** critical path and refresh implications
|
||||
|
||||
## ASCII Tree Format
|
||||
|
||||
```
|
||||
Sources:
|
||||
|-- raw_customers (source)
|
||||
|-- raw_orders (source)
|
||||
|
||||
model_name (materialization)
|
||||
|-- upstream:
|
||||
| |-- stg_model (view)
|
||||
| |-- raw_source (source)
|
||||
|-- downstream:
|
||||
|-- fct_model (incremental)
|
||||
|-- rpt_model (table)
|
||||
```
|
||||
|
||||
## Mermaid Diagram Format
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Sources
|
||||
raw_data[(raw_data)]
|
||||
end
|
||||
|
||||
subgraph Staging
|
||||
stg_model[stg_model]
|
||||
end
|
||||
|
||||
subgraph Marts
|
||||
dim_model{{dim_model}}
|
||||
end
|
||||
|
||||
raw_data --> stg_model
|
||||
stg_model --> dim_model
|
||||
```
|
||||
|
||||
## Mermaid Node Shapes
|
||||
|
||||
| Materialization | Shape | Syntax |
|
||||
|-----------------|-------|--------|
|
||||
| source | Cylinder | `[(name)]` |
|
||||
| view | Rectangle | `[name]` |
|
||||
| table | Double braces | `{{name}}` |
|
||||
| incremental | Hexagon | `{{name}}` |
|
||||
| ephemeral | Dashed | `[/name/]` |
|
||||
|
||||
## Mermaid Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--direction TB` | Top-to-bottom (default: LR) |
|
||||
| `--depth N` | Limit lineage depth |
|
||||
|
||||
## Styling Target Model
|
||||
|
||||
```mermaid
|
||||
style target_model fill:#f96,stroke:#333,stroke-width:2px
|
||||
```
|
||||
|
||||
## Usage Tips
|
||||
|
||||
1. **Documentation**: Copy Mermaid to README.md
|
||||
2. **GitHub/GitLab**: Both render Mermaid natively
|
||||
3. **Live Editor**: https://mermaid.live for interactive editing
|
||||
69
plugins/data-platform/skills/mcp-tools-reference.md
Normal file
69
plugins/data-platform/skills/mcp-tools-reference.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# MCP Tools Reference
|
||||
|
||||
## pandas Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `read_csv` | Load CSV file into DataFrame |
|
||||
| `read_parquet` | Load Parquet file into DataFrame |
|
||||
| `read_json` | Load JSON/JSONL file into DataFrame |
|
||||
| `to_csv` | Export DataFrame to CSV |
|
||||
| `to_parquet` | Export DataFrame to Parquet |
|
||||
| `describe` | Get statistical summary (count, mean, std, min, max) |
|
||||
| `head` | Preview first N rows |
|
||||
| `tail` | Preview last N rows |
|
||||
| `filter` | Filter rows by condition |
|
||||
| `select` | Select specific columns |
|
||||
| `groupby` | Aggregate data by columns |
|
||||
| `join` | Join two DataFrames |
|
||||
| `list_data` | List all loaded DataFrames |
|
||||
| `drop_data` | Remove DataFrame from memory |
|
||||
|
||||
## PostgreSQL Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `pg_connect` | Establish database connection |
|
||||
| `pg_query` | Execute SELECT query, return DataFrame |
|
||||
| `pg_execute` | Execute INSERT/UPDATE/DELETE |
|
||||
| `pg_tables` | List tables in schema |
|
||||
| `pg_columns` | Get column info for table |
|
||||
| `pg_schemas` | List available schemas |
|
||||
|
||||
## PostGIS Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `st_tables` | List tables with geometry columns |
|
||||
| `st_geometry_type` | Get geometry type for column |
|
||||
| `st_srid` | Get SRID for geometry column |
|
||||
| `st_extent` | Get bounding box for geometry |
|
||||
|
||||
## dbt Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `dbt_parse` | Validate project (ALWAYS RUN FIRST) |
|
||||
| `dbt_run` | Execute models |
|
||||
| `dbt_test` | Run tests |
|
||||
| `dbt_build` | Run + test together |
|
||||
| `dbt_compile` | Compile SQL without execution |
|
||||
| `dbt_ls` | List dbt resources |
|
||||
| `dbt_docs_generate` | Generate documentation manifest |
|
||||
| `dbt_lineage` | Get model dependencies |
|
||||
|
||||
## Tool Selection Guidelines
|
||||
|
||||
**For data loading:**
|
||||
- Files: `read_csv`, `read_parquet`, `read_json`
|
||||
- Database: `pg_query`
|
||||
|
||||
**For data exploration:**
|
||||
- Schema: `describe`, `pg_columns`, `st_tables`
|
||||
- Preview: `head`, `tail`
|
||||
- Available data: `list_data`, `pg_tables`
|
||||
|
||||
**For dbt operations:**
|
||||
- Always start with `dbt_parse` for validation
|
||||
- Use `dbt_lineage` for dependency analysis
|
||||
- Use `dbt_compile` to see rendered SQL
|
||||
108
plugins/data-platform/skills/setup-workflow.md
Normal file
108
plugins/data-platform/skills/setup-workflow.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# Setup Workflow
|
||||
|
||||
## Important Context
|
||||
|
||||
- **This workflow uses Bash, Read, Write, AskUserQuestion tools** - NOT MCP tools
|
||||
- **MCP tools won't work until after setup + session restart**
|
||||
- **PostgreSQL and dbt are optional** - pandas tools work without them
|
||||
|
||||
## Phase 1: Environment Validation
|
||||
|
||||
### Check Python Version
|
||||
```bash
|
||||
python3 --version
|
||||
```
|
||||
Requires Python 3.10+. If below, stop and inform user.
|
||||
|
||||
## Phase 2: MCP Server Setup
|
||||
|
||||
### Locate MCP Server
|
||||
Check both paths:
|
||||
```bash
|
||||
# Installed marketplace
|
||||
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/
|
||||
|
||||
# Source
|
||||
ls -la ~/claude-plugins-work/mcp-servers/data-platform/
|
||||
```
|
||||
|
||||
### Check/Create Virtual Environment
|
||||
```bash
|
||||
# Check
|
||||
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python
|
||||
|
||||
# Create if missing
|
||||
cd /path/to/mcp-servers/data-platform
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
deactivate
|
||||
```
|
||||
|
||||
## Phase 3: PostgreSQL Configuration (Optional)
|
||||
|
||||
### Config Location
|
||||
`~/.config/claude/postgres.env`
|
||||
|
||||
### Config Format
|
||||
```bash
|
||||
# PostgreSQL Configuration
|
||||
POSTGRES_URL=postgresql://user:pass@host:5432/db
|
||||
```
|
||||
|
||||
Set permissions: `chmod 600 ~/.config/claude/postgres.env`
|
||||
|
||||
### Test Connection
|
||||
```bash
|
||||
source ~/.config/claude/postgres.env && python3 -c "
|
||||
import asyncio, asyncpg
|
||||
async def test():
|
||||
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
|
||||
ver = await conn.fetchval('SELECT version()')
|
||||
await conn.close()
|
||||
print(f'SUCCESS: {ver.split(\",\")[0]}')
|
||||
asyncio.run(test())
|
||||
"
|
||||
```
|
||||
|
||||
## Phase 4: dbt Configuration (Optional)
|
||||
|
||||
dbt is **project-level** (auto-detected via `dbt_project.yml`).
|
||||
|
||||
For subdirectory projects, set in `.env`:
|
||||
```
|
||||
DBT_PROJECT_DIR=./transform
|
||||
DBT_PROFILES_DIR=~/.dbt
|
||||
```
|
||||
|
||||
### Check dbt Installation
|
||||
```bash
|
||||
dbt --version
|
||||
```
|
||||
|
||||
## Phase 5: Validation
|
||||
|
||||
### Verify MCP Server
|
||||
```bash
|
||||
cd /path/to/mcp-servers/data-platform
|
||||
.venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('OK')"
|
||||
```
|
||||
|
||||
## Memory Limits
|
||||
|
||||
Default: 100,000 rows per DataFrame
|
||||
|
||||
Override in project `.env`:
|
||||
```
|
||||
DATA_PLATFORM_MAX_ROWS=500000
|
||||
```
|
||||
|
||||
For larger datasets:
|
||||
- Use chunked processing (`chunk_size` parameter)
|
||||
- Filter data before loading
|
||||
- Store to Parquet for efficient re-loading
|
||||
|
||||
## Session Restart
|
||||
|
||||
After setup, restart Claude Code session for MCP tools to become available.
|
||||
45
plugins/data-platform/skills/visual-header.md
Normal file
45
plugins/data-platform/skills/visual-header.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Visual Header
|
||||
|
||||
## Standard Format
|
||||
|
||||
Display at the start of every command execution:
|
||||
|
||||
```
|
||||
+----------------------------------------------------------------------+
|
||||
| DATA-PLATFORM - [Command Name] |
|
||||
+----------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## Command Headers
|
||||
|
||||
| Command | Header Text |
|
||||
|---------|-------------|
|
||||
| initial-setup | Setup Wizard |
|
||||
| ingest | Ingest |
|
||||
| profile | Data Profile |
|
||||
| schema | Schema Explorer |
|
||||
| data-quality | Data Quality |
|
||||
| run | dbt Run |
|
||||
| dbt-test | dbt Tests |
|
||||
| lineage | Lineage |
|
||||
| lineage-viz | Lineage Visualization |
|
||||
| explain | Model Explanation |
|
||||
|
||||
## Summary Box Format
|
||||
|
||||
For completion summaries:
|
||||
|
||||
```
|
||||
+============================================================+
|
||||
| DATA-PLATFORM [OPERATION] COMPLETE |
|
||||
+============================================================+
|
||||
| Component: [Status] |
|
||||
| Component: [Status] |
|
||||
+============================================================+
|
||||
```
|
||||
|
||||
## Status Indicators
|
||||
|
||||
- Success: `[check]` or `Ready`
|
||||
- Warning: `[!]` or `Partial`
|
||||
- Failure: `[X]` or `Failed`
|
||||
Reference in New Issue
Block a user