refactor: extract skills from commands across 8 plugins
Refactored commands to extract reusable skills following the Commands → Skills separation pattern. Each command is now <50 lines and references skill files for detailed knowledge. Plugins refactored: - claude-config-maintainer: 5 commands → 7 skills - code-sentinel: 3 commands → 2 skills - contract-validator: 5 commands → 6 skills - data-platform: 10 commands → 6 skills - doc-guardian: 5 commands → 6 skills (replaced nested dir) - git-flow: 8 commands → 7 skills Skills contain: workflows, validation rules, conventions, reference data, tool documentation Commands now contain: YAML frontmatter, agent assignment, skills list, brief workflow steps, parameters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,18 +1,13 @@
|
||||
# /data-quality - Data Quality Assessment
|
||||
|
||||
## Skills to Load
|
||||
- skills/data-profiling.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Data Quality │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the assessment.
|
||||
|
||||
Comprehensive data quality check for DataFrames with pass/warn/fail scoring.
|
||||
Display header: `DATA-PLATFORM - Data Quality`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,72 +17,18 @@ Comprehensive data quality check for DataFrames with pass/warn/fail scoring.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get data reference**:
|
||||
- If no data_ref provided, use `list_data` to show available options
|
||||
- Validate the data_ref exists
|
||||
Execute `skills/data-profiling.md` quality assessment:
|
||||
|
||||
2. **Null analysis**:
|
||||
- Calculate null percentage per column
|
||||
- **PASS**: < 5% nulls
|
||||
- **WARN**: 5-20% nulls
|
||||
- **FAIL**: > 20% nulls
|
||||
|
||||
3. **Duplicate detection**:
|
||||
- Check for fully duplicated rows
|
||||
- **PASS**: 0% duplicates
|
||||
- **WARN**: < 1% duplicates
|
||||
- **FAIL**: >= 1% duplicates
|
||||
|
||||
4. **Type consistency**:
|
||||
- Identify mixed-type columns (object columns with mixed content)
|
||||
- Flag columns that could be numeric but contain strings
|
||||
- **PASS**: All columns have consistent types
|
||||
- **FAIL**: Mixed types detected
|
||||
|
||||
5. **Outlier detection** (numeric columns):
|
||||
- Use IQR method (values beyond 1.5 * IQR)
|
||||
- Report percentage of outliers per column
|
||||
- **PASS**: < 1% outliers
|
||||
- **WARN**: 1-5% outliers
|
||||
- **FAIL**: > 5% outliers
|
||||
|
||||
6. **Generate quality report**:
|
||||
- Overall quality score (0-100)
|
||||
- Per-column breakdown
|
||||
- Recommendations for remediation
|
||||
|
||||
## Report Format
|
||||
|
||||
```
|
||||
=== Data Quality Report ===
|
||||
Dataset: sales_data
|
||||
Rows: 10,000 | Columns: 15
|
||||
Overall Score: 82/100 [PASS]
|
||||
|
||||
--- Column Analysis ---
|
||||
| Column | Nulls | Dups | Type | Outliers | Status |
|
||||
|--------------|-------|------|----------|----------|--------|
|
||||
| customer_id | 0.0% | - | int64 | 0.2% | PASS |
|
||||
| email | 2.3% | - | object | - | PASS |
|
||||
| amount | 15.2% | - | float64 | 3.1% | WARN |
|
||||
| created_at | 0.0% | - | datetime | - | PASS |
|
||||
|
||||
--- Issues Found ---
|
||||
[WARN] Column 'amount': 15.2% null values (threshold: 5%)
|
||||
[WARN] Column 'amount': 3.1% outliers detected
|
||||
[FAIL] 1.2% duplicate rows detected (12 rows)
|
||||
|
||||
--- Recommendations ---
|
||||
1. Investigate null values in 'amount' column
|
||||
2. Review outliers in 'amount' - may be data entry errors
|
||||
3. Remove or deduplicate 12 duplicate rows
|
||||
```
|
||||
1. **Get data reference**: Use `list_data` if none provided
|
||||
2. **Run quality checks**: Nulls, duplicates, types, outliers
|
||||
3. **Calculate score**: Apply weighted scoring formula
|
||||
4. **Generate report**: Issues and recommendations
|
||||
|
||||
## Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--strict` | Use stricter thresholds (WARN at 1% nulls, FAIL at 5%) |
|
||||
| `--strict` | Stricter thresholds (WARN at 1%, FAIL at 5% nulls) |
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -96,20 +37,12 @@ Overall Score: 82/100 [PASS]
|
||||
/data-quality df_customers --strict
|
||||
```
|
||||
|
||||
## Scoring
|
||||
## Quality Thresholds
|
||||
|
||||
| Component | Weight | Scoring |
|
||||
|-----------|--------|---------|
|
||||
| Nulls | 30% | 100 - (avg_null_pct * 2) |
|
||||
| Duplicates | 20% | 100 - (dup_pct * 50) |
|
||||
| Type consistency | 25% | 100 if clean, 0 if mixed |
|
||||
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
|
||||
See `skills/data-profiling.md` for detailed thresholds and scoring.
|
||||
|
||||
Final score: Weighted average, capped at 0-100
|
||||
## Required MCP Tools
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `describe` - Get statistical summary (for outlier detection)
|
||||
- `describe` - Get statistical summary
|
||||
- `head` - Preview data
|
||||
- `list_data` - List available DataFrames
|
||||
|
||||
@@ -1,18 +1,13 @@
|
||||
# /dbt-test - Run dbt Tests
|
||||
|
||||
## Skills to Load
|
||||
- skills/dbt-workflow.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · dbt Tests │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the tests.
|
||||
|
||||
Execute dbt tests with formatted pass/fail results.
|
||||
Display header: `DATA-PLATFORM - dbt Tests`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,75 +17,17 @@ Execute dbt tests with formatted pass/fail results.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Pre-validation** (MANDATORY):
|
||||
- Use `dbt_parse` to validate project first
|
||||
- If validation fails, show errors and STOP
|
||||
Execute `skills/dbt-workflow.md` test workflow:
|
||||
|
||||
2. **Execute tests**:
|
||||
- Use `dbt_test` with provided selection
|
||||
- Capture all test results
|
||||
|
||||
3. **Format results**:
|
||||
- Group by test type (schema vs. data)
|
||||
- Show pass/fail status with counts
|
||||
- Display failure details
|
||||
|
||||
## Report Format
|
||||
|
||||
```
|
||||
=== dbt Test Results ===
|
||||
Project: my_project
|
||||
Selection: tag:critical
|
||||
|
||||
--- Summary ---
|
||||
Total: 24 tests
|
||||
PASS: 22 (92%)
|
||||
FAIL: 1 (4%)
|
||||
WARN: 1 (4%)
|
||||
SKIP: 0 (0%)
|
||||
|
||||
--- Schema Tests (18) ---
|
||||
[PASS] unique_dim_customers_customer_id
|
||||
[PASS] not_null_dim_customers_customer_id
|
||||
[PASS] not_null_dim_customers_email
|
||||
[PASS] accepted_values_dim_customers_status
|
||||
[FAIL] relationships_fct_orders_customer_id
|
||||
|
||||
--- Data Tests (6) ---
|
||||
[PASS] assert_positive_order_amounts
|
||||
[PASS] assert_valid_dates
|
||||
[WARN] assert_recent_orders (threshold: 7 days)
|
||||
|
||||
--- Failure Details ---
|
||||
Test: relationships_fct_orders_customer_id
|
||||
Type: schema (relationships)
|
||||
Model: fct_orders
|
||||
Message: 15 records failed referential integrity check
|
||||
Query: SELECT * FROM fct_orders WHERE customer_id NOT IN (SELECT customer_id FROM dim_customers)
|
||||
|
||||
--- Warning Details ---
|
||||
Test: assert_recent_orders
|
||||
Type: data
|
||||
Message: No orders in last 7 days (expected for dev environment)
|
||||
Severity: warn
|
||||
```
|
||||
|
||||
## Selection Syntax
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---------|---------|
|
||||
| (none) | Run all tests |
|
||||
| `model_name` | Tests for specific model |
|
||||
| `+model_name` | Tests for model and upstream |
|
||||
| `tag:critical` | Tests with tag |
|
||||
| `test_type:schema` | Only schema tests |
|
||||
| `test_type:data` | Only data tests |
|
||||
1. **Pre-validation** (MANDATORY): Run `dbt_parse` first
|
||||
2. **Execute tests**: Use `dbt_test` with selection
|
||||
3. **Format results**: Group by test type, show pass/fail/warn counts
|
||||
|
||||
## Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--warn-only` | Treat failures as warnings (don't fail CI) |
|
||||
| `--warn-only` | Treat failures as warnings |
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -98,34 +35,9 @@ Severity: warn
|
||||
/dbt-test # Run all tests
|
||||
/dbt-test dim_customers # Tests for specific model
|
||||
/dbt-test tag:critical # Run critical tests only
|
||||
/dbt-test +fct_orders # Test model and its upstream
|
||||
```
|
||||
|
||||
## Test Types
|
||||
## Required MCP Tools
|
||||
|
||||
### Schema Tests
|
||||
Built-in tests defined in `schema.yml`:
|
||||
- `unique` - No duplicate values
|
||||
- `not_null` - No null values
|
||||
- `accepted_values` - Value in allowed list
|
||||
- `relationships` - Foreign key integrity
|
||||
|
||||
### Data Tests
|
||||
Custom SQL tests in `tests/` directory:
|
||||
- Return rows that fail the assertion
|
||||
- Zero rows = pass, any rows = fail
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | All tests passed |
|
||||
| 1 | One or more tests failed |
|
||||
| 2 | dbt error (parse failure, etc.) |
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_parse` - Pre-validation (ALWAYS RUN FIRST)
|
||||
- `dbt_test` - Execute tests (REQUIRED)
|
||||
- `dbt_build` - Alternative: run + test together
|
||||
|
||||
@@ -1,18 +1,14 @@
|
||||
# /explain - dbt Model Explanation
|
||||
|
||||
## Skills to Load
|
||||
- skills/dbt-workflow.md
|
||||
- skills/lineage-analysis.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Model Explanation │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the explanation.
|
||||
|
||||
Explain a dbt model's purpose, dependencies, and SQL logic.
|
||||
Display header: `DATA-PLATFORM - Model Explanation`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,24 +18,10 @@ Explain a dbt model's purpose, dependencies, and SQL logic.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get model info**:
|
||||
- Use `dbt_lineage` to get model metadata
|
||||
- Extract description, tags, materialization
|
||||
|
||||
2. **Analyze dependencies**:
|
||||
- Show upstream models (what this depends on)
|
||||
- Show downstream models (what depends on this)
|
||||
- Visualize as dependency tree
|
||||
|
||||
3. **Compile SQL**:
|
||||
- Use `dbt_compile` to get rendered SQL
|
||||
- Explain key transformations
|
||||
|
||||
4. **Report**:
|
||||
- Model purpose (from description)
|
||||
- Materialization strategy
|
||||
- Dependency graph
|
||||
- Key SQL logic explained
|
||||
1. **Get model info**: Use `dbt_lineage` for metadata (description, tags, materialization)
|
||||
2. **Analyze dependencies**: Show upstream/downstream as tree
|
||||
3. **Compile SQL**: Use `dbt_compile` to get rendered SQL
|
||||
4. **Report**: Purpose, materialization, dependencies, key SQL logic
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -48,9 +30,8 @@ Explain a dbt model's purpose, dependencies, and SQL logic.
|
||||
/explain fct_orders
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
## Required MCP Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_lineage` - Get model dependencies
|
||||
- `dbt_compile` - Get compiled SQL
|
||||
- `dbt_ls` - List related resources
|
||||
|
||||
@@ -1,18 +1,12 @@
|
||||
# /ingest - Data Ingestion
|
||||
|
||||
## Skills to Load
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Ingest │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the ingestion.
|
||||
|
||||
Load data from files or database into the data platform.
|
||||
Display header: `DATA-PLATFORM - Ingest`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,21 +16,17 @@ Load data from files or database into the data platform.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Identify data source**:
|
||||
- If source is a file path, determine format (CSV, Parquet, JSON)
|
||||
- If source is "db" or a table name, query PostgreSQL
|
||||
1. **Identify source**:
|
||||
- File path: determine format (CSV, Parquet, JSON)
|
||||
- SQL query or table name: query PostgreSQL
|
||||
|
||||
2. **Load data**:
|
||||
- For files: Use `read_csv`, `read_parquet`, or `read_json`
|
||||
- For database: Use `pg_query` with appropriate SELECT
|
||||
- Files: `read_csv`, `read_parquet`, `read_json`
|
||||
- Database: `pg_query`
|
||||
|
||||
3. **Validate**:
|
||||
- Check row count against limits
|
||||
- If exceeds 100k rows, suggest chunking or filtering
|
||||
3. **Validate**: Check row count against 100k limit
|
||||
|
||||
4. **Report**:
|
||||
- Show data_ref, row count, columns, and memory usage
|
||||
- Preview first few rows
|
||||
4. **Report**: data_ref, row count, columns, memory usage, preview
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -46,9 +36,8 @@ Load data from files or database into the data platform.
|
||||
/ingest "SELECT * FROM orders WHERE created_at > '2024-01-01'"
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
## Required MCP Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `read_csv` - Load CSV files
|
||||
- `read_parquet` - Load Parquet files
|
||||
- `read_json` - Load JSON/JSONL files
|
||||
|
||||
@@ -1,243 +1,49 @@
|
||||
---
|
||||
description: Interactive setup wizard for data-platform plugin - configures MCP server and optional PostgreSQL/dbt
|
||||
---
|
||||
# /initial-setup - Data Platform Setup Wizard
|
||||
|
||||
# Data Platform Setup Wizard
|
||||
## Skills to Load
|
||||
- skills/setup-workflow.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
Display header: `DATA-PLATFORM - Setup Wizard`
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Setup Wizard │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
/initial-setup
|
||||
```
|
||||
|
||||
Then proceed with the setup.
|
||||
## Workflow
|
||||
|
||||
This command sets up the data-platform plugin with pandas, PostgreSQL, and dbt integration.
|
||||
Execute `skills/setup-workflow.md` phases in order:
|
||||
|
||||
## Important Context
|
||||
### Phase 1: Environment Validation
|
||||
- Check Python 3.10+ installed
|
||||
- Stop if version too old
|
||||
|
||||
- **This command uses Bash, Read, Write, and AskUserQuestion tools** - NOT MCP tools
|
||||
- **MCP tools won't work until after setup + session restart**
|
||||
- **PostgreSQL and dbt are optional** - pandas tools work without them
|
||||
### Phase 2: MCP Server Setup
|
||||
- Locate MCP server (installed or source path)
|
||||
- Check/create virtual environment
|
||||
- Install dependencies if needed
|
||||
|
||||
---
|
||||
### Phase 3: PostgreSQL Configuration (Optional)
|
||||
- Ask user if they want PostgreSQL access
|
||||
- If yes: create `~/.config/claude/postgres.env`
|
||||
- Test connection and report status
|
||||
|
||||
## Phase 1: Environment Validation
|
||||
### Phase 4: dbt Configuration (Optional)
|
||||
- Ask user if they use dbt
|
||||
- If yes: explain auto-detection via `dbt_project.yml`
|
||||
- Check dbt CLI installation
|
||||
|
||||
### Step 1.1: Check Python Version
|
||||
### Phase 5: Validation
|
||||
- Verify MCP server can be imported
|
||||
- Display summary with component status
|
||||
- Inform user to restart session
|
||||
|
||||
```bash
|
||||
python3 --version
|
||||
```
|
||||
## Important Notes
|
||||
|
||||
Requires Python 3.10+. If below, stop setup and inform user.
|
||||
|
||||
### Step 1.2: Check for Required Libraries
|
||||
|
||||
```bash
|
||||
python3 -c "import sys; print(f'Python {sys.version_info.major}.{sys.version_info.minor}')"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: MCP Server Setup
|
||||
|
||||
### Step 2.1: Locate Data Platform MCP Server
|
||||
|
||||
The MCP server should be at the marketplace root:
|
||||
|
||||
```bash
|
||||
# If running from installed marketplace
|
||||
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_INSTALLED"
|
||||
|
||||
# If running from source
|
||||
ls -la ~/claude-plugins-work/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_SOURCE"
|
||||
```
|
||||
|
||||
Determine the correct path based on which exists.
|
||||
|
||||
### Step 2.2: Check Virtual Environment
|
||||
|
||||
```bash
|
||||
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python 2>/dev/null && echo "VENV_EXISTS" || echo "VENV_MISSING"
|
||||
```
|
||||
|
||||
### Step 2.3: Create Virtual Environment (if missing)
|
||||
|
||||
```bash
|
||||
cd /path/to/mcp-servers/data-platform && python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt && deactivate
|
||||
```
|
||||
|
||||
**Note:** This may take a few minutes due to pandas, pyarrow, and dbt dependencies.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: PostgreSQL Configuration (Optional)
|
||||
|
||||
### Step 3.1: Ask About PostgreSQL
|
||||
|
||||
Use AskUserQuestion:
|
||||
- Question: "Do you want to configure PostgreSQL database access?"
|
||||
- Header: "PostgreSQL"
|
||||
- Options:
|
||||
- "Yes, I have a PostgreSQL database"
|
||||
- "No, I'll only use pandas/dbt tools"
|
||||
|
||||
**If user chooses "No":** Skip to Phase 4.
|
||||
|
||||
### Step 3.2: Create Config Directory
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/claude
|
||||
```
|
||||
|
||||
### Step 3.3: Check PostgreSQL Configuration
|
||||
|
||||
```bash
|
||||
cat ~/.config/claude/postgres.env 2>/dev/null || echo "FILE_NOT_FOUND"
|
||||
```
|
||||
|
||||
**If file exists with valid URL:** Skip to Step 3.6.
|
||||
**If missing or has placeholders:** Continue.
|
||||
|
||||
### Step 3.4: Gather PostgreSQL Information
|
||||
|
||||
Use AskUserQuestion:
|
||||
- Question: "What is your PostgreSQL connection URL format?"
|
||||
- Header: "DB Format"
|
||||
- Options:
|
||||
- "Standard: postgresql://user:pass@host:5432/db"
|
||||
- "PostGIS: postgresql://user:pass@host:5432/db (with PostGIS extension)"
|
||||
- "Other (I'll provide the full URL)"
|
||||
|
||||
Ask user to provide the connection URL.
|
||||
|
||||
### Step 3.5: Create Configuration File
|
||||
|
||||
```bash
|
||||
cat > ~/.config/claude/postgres.env << 'EOF'
|
||||
# PostgreSQL Configuration
|
||||
# Generated by data-platform /initial-setup
|
||||
|
||||
POSTGRES_URL=<USER_PROVIDED_URL>
|
||||
EOF
|
||||
chmod 600 ~/.config/claude/postgres.env
|
||||
```
|
||||
|
||||
### Step 3.6: Test PostgreSQL Connection (if configured)
|
||||
|
||||
```bash
|
||||
source ~/.config/claude/postgres.env && python3 -c "
|
||||
import asyncio
|
||||
import asyncpg
|
||||
async def test():
|
||||
try:
|
||||
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
|
||||
ver = await conn.fetchval('SELECT version()')
|
||||
await conn.close()
|
||||
print(f'SUCCESS: {ver.split(\",\")[0]}')
|
||||
except Exception as e:
|
||||
print(f'FAILED: {e}')
|
||||
asyncio.run(test())
|
||||
"
|
||||
```
|
||||
|
||||
Report result:
|
||||
- SUCCESS: Connection works
|
||||
- FAILED: Show error and suggest fixes
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: dbt Configuration (Optional)
|
||||
|
||||
### Step 4.1: Ask About dbt
|
||||
|
||||
Use AskUserQuestion:
|
||||
- Question: "Do you use dbt for data transformations in your projects?"
|
||||
- Header: "dbt"
|
||||
- Options:
|
||||
- "Yes, I have dbt projects"
|
||||
- "No, I don't use dbt"
|
||||
|
||||
**If user chooses "No":** Skip to Phase 5.
|
||||
|
||||
### Step 4.2: dbt Discovery
|
||||
|
||||
dbt configuration is **project-level** (not system-level). The plugin auto-detects dbt projects by looking for `dbt_project.yml`.
|
||||
|
||||
Inform user:
|
||||
```
|
||||
dbt projects are detected automatically when you work in a directory
|
||||
containing dbt_project.yml.
|
||||
|
||||
If your dbt project is in a subdirectory, you can set DBT_PROJECT_DIR
|
||||
in your project's .env file:
|
||||
|
||||
DBT_PROJECT_DIR=./transform
|
||||
DBT_PROFILES_DIR=~/.dbt
|
||||
```
|
||||
|
||||
### Step 4.3: Check dbt Installation
|
||||
|
||||
```bash
|
||||
dbt --version 2>/dev/null || echo "DBT_NOT_FOUND"
|
||||
```
|
||||
|
||||
**If not found:** Inform user that dbt CLI tools require dbt-core to be installed globally or in the project.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Validation
|
||||
|
||||
### Step 5.1: Verify MCP Server
|
||||
|
||||
```bash
|
||||
cd /path/to/mcp-servers/data-platform && .venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('MCP Server OK')"
|
||||
```
|
||||
|
||||
### Step 5.2: Summary
|
||||
|
||||
```
|
||||
╔════════════════════════════════════════════════════════════╗
|
||||
║ DATA-PLATFORM SETUP COMPLETE ║
|
||||
╠════════════════════════════════════════════════════════════╣
|
||||
║ MCP Server: ✓ Ready ║
|
||||
║ pandas Tools: ✓ Available (14 tools) ║
|
||||
║ PostgreSQL Tools: [✓/✗] [Status based on config] ║
|
||||
║ PostGIS Tools: [✓/✗] [Status based on PostGIS] ║
|
||||
║ dbt Tools: [✓/✗] [Status based on discovery] ║
|
||||
╚════════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
### Step 5.3: Session Restart Notice
|
||||
|
||||
---
|
||||
|
||||
**⚠️ Session Restart Required**
|
||||
|
||||
Restart your Claude Code session for MCP tools to become available.
|
||||
|
||||
**After restart, you can:**
|
||||
- Run `/ingest` to load data from files or database
|
||||
- Run `/profile` to analyze DataFrame statistics
|
||||
- Run `/schema` to explore database/DataFrame schema
|
||||
- Run `/run` to execute dbt models (if configured)
|
||||
- Run `/lineage` to view dbt model dependencies
|
||||
|
||||
---
|
||||
|
||||
## Memory Limits
|
||||
|
||||
The data-platform plugin has a default row limit of 100,000 rows per DataFrame. For larger datasets:
|
||||
- Use chunked processing (`chunk_size` parameter)
|
||||
- Filter data before loading
|
||||
- Store to Parquet for efficient re-loading
|
||||
|
||||
You can override the limit by setting in your project `.env`:
|
||||
```
|
||||
DATA_PLATFORM_MAX_ROWS=500000
|
||||
```
|
||||
- Uses Bash, Read, Write, AskUserQuestion tools (NOT MCP tools)
|
||||
- MCP tools unavailable until session restart
|
||||
- PostgreSQL and dbt are optional - pandas works without them
|
||||
|
||||
@@ -1,18 +1,13 @@
|
||||
# /lineage-viz - Mermaid Lineage Visualization
|
||||
|
||||
## Skills to Load
|
||||
- skills/lineage-analysis.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Lineage Visualization │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the visualization.
|
||||
|
||||
Generate Mermaid flowchart syntax for dbt model lineage.
|
||||
Display header: `DATA-PLATFORM - Lineage Visualization`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,61 +17,16 @@ Generate Mermaid flowchart syntax for dbt model lineage.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get lineage data**:
|
||||
- Use `dbt_lineage` to fetch model dependencies
|
||||
- Capture upstream sources and downstream consumers
|
||||
|
||||
2. **Build Mermaid graph**:
|
||||
- Create nodes for each model/source
|
||||
- Style nodes by materialization type
|
||||
- Add directional arrows for dependencies
|
||||
|
||||
3. **Output**:
|
||||
- Render Mermaid flowchart syntax
|
||||
- Include copy-paste ready code block
|
||||
|
||||
## Output Format
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Sources
|
||||
raw_customers[(raw_customers)]
|
||||
raw_orders[(raw_orders)]
|
||||
end
|
||||
|
||||
subgraph Staging
|
||||
stg_customers[stg_customers]
|
||||
stg_orders[stg_orders]
|
||||
end
|
||||
|
||||
subgraph Marts
|
||||
dim_customers{{dim_customers}}
|
||||
fct_orders{{fct_orders}}
|
||||
end
|
||||
|
||||
raw_customers --> stg_customers
|
||||
raw_orders --> stg_orders
|
||||
stg_customers --> dim_customers
|
||||
stg_orders --> fct_orders
|
||||
dim_customers --> fct_orders
|
||||
```
|
||||
|
||||
## Node Styles
|
||||
|
||||
| Materialization | Mermaid Shape | Example |
|
||||
|-----------------|---------------|---------|
|
||||
| source | Cylinder `[( )]` | `raw_data[(raw_data)]` |
|
||||
| view | Rectangle `[ ]` | `stg_model[stg_model]` |
|
||||
| table | Double braces `{{ }}` | `dim_model{{dim_model}}` |
|
||||
| incremental | Hexagon `{{ }}` | `fct_model{{fct_model}}` |
|
||||
| ephemeral | Dashed `[/ /]` | `tmp_model[/tmp_model/]` |
|
||||
1. **Get lineage data**: Use `dbt_lineage` to fetch model dependencies
|
||||
2. **Build Mermaid graph**: Apply node shapes from `skills/lineage-analysis.md`
|
||||
3. **Output**: Render copy-paste ready Mermaid flowchart
|
||||
|
||||
## Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--direction TB` | Top-to-bottom layout (default: LR = left-to-right) |
|
||||
| `--depth N` | Limit lineage depth (default: unlimited) |
|
||||
| `--direction TB` | Top-to-bottom layout (default: LR) |
|
||||
| `--depth N` | Limit lineage depth |
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -86,52 +36,8 @@ flowchart LR
|
||||
/lineage-viz rpt_revenue --depth 2
|
||||
```
|
||||
|
||||
## Usage Tips
|
||||
## Required MCP Tools
|
||||
|
||||
1. **Paste in documentation**: Copy the output directly into README.md or docs
|
||||
2. **GitHub/GitLab rendering**: Both platforms render Mermaid natively
|
||||
3. **Mermaid Live Editor**: Paste at https://mermaid.live for interactive editing
|
||||
|
||||
## Example Output
|
||||
|
||||
For `/lineage-viz fct_orders`:
|
||||
|
||||
~~~markdown
|
||||
```mermaid
|
||||
flowchart LR
|
||||
%% Sources
|
||||
raw_customers[(raw_customers)]
|
||||
raw_orders[(raw_orders)]
|
||||
raw_products[(raw_products)]
|
||||
|
||||
%% Staging
|
||||
stg_customers[stg_customers]
|
||||
stg_orders[stg_orders]
|
||||
stg_products[stg_products]
|
||||
|
||||
%% Marts
|
||||
dim_customers{{dim_customers}}
|
||||
dim_products{{dim_products}}
|
||||
fct_orders{{fct_orders}}
|
||||
|
||||
%% Dependencies
|
||||
raw_customers --> stg_customers
|
||||
raw_orders --> stg_orders
|
||||
raw_products --> stg_products
|
||||
stg_customers --> dim_customers
|
||||
stg_products --> dim_products
|
||||
stg_orders --> fct_orders
|
||||
dim_customers --> fct_orders
|
||||
dim_products --> fct_orders
|
||||
|
||||
%% Highlight target model
|
||||
style fct_orders fill:#f96,stroke:#333,stroke-width:2px
|
||||
```
|
||||
~~~
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_lineage` - Get model dependencies (REQUIRED)
|
||||
- `dbt_ls` - List dbt resources
|
||||
- `dbt_docs_generate` - Generate full manifest if needed
|
||||
|
||||
@@ -1,18 +1,13 @@
|
||||
# /lineage - Data Lineage Visualization
|
||||
|
||||
## Skills to Load
|
||||
- skills/lineage-analysis.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Lineage │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the visualization.
|
||||
|
||||
Show data lineage for dbt models or database tables.
|
||||
Display header: `DATA-PLATFORM - Lineage`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,24 +17,10 @@ Show data lineage for dbt models or database tables.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get lineage data**:
|
||||
- Use `dbt_lineage` for dbt models
|
||||
- For database tables, trace through dbt manifest
|
||||
|
||||
2. **Build lineage graph**:
|
||||
- Identify all upstream sources
|
||||
- Identify all downstream consumers
|
||||
- Note materialization at each node
|
||||
|
||||
3. **Visualize**:
|
||||
- ASCII art dependency tree
|
||||
- List format with indentation
|
||||
- Show depth levels
|
||||
|
||||
4. **Report**:
|
||||
- Full dependency chain
|
||||
- Critical path identification
|
||||
- Refresh implications
|
||||
1. **Get lineage data**: Use `dbt_lineage` for dbt models
|
||||
2. **Build lineage graph**: Identify upstream sources and downstream consumers
|
||||
3. **Visualize**: ASCII tree with depth levels (see `skills/lineage-analysis.md`)
|
||||
4. **Report**: Full dependency chain and refresh implications
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -48,25 +29,8 @@ Show data lineage for dbt models or database tables.
|
||||
/lineage fct_orders --depth 3
|
||||
```
|
||||
|
||||
## Output Format
|
||||
## Required MCP Tools
|
||||
|
||||
```
|
||||
Sources:
|
||||
└── raw_customers (source)
|
||||
└── raw_orders (source)
|
||||
|
||||
dim_customers (table)
|
||||
├── upstream:
|
||||
│ └── stg_customers (view)
|
||||
│ └── raw_customers (source)
|
||||
└── downstream:
|
||||
└── fct_orders (incremental)
|
||||
└── rpt_customer_lifetime (table)
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_lineage` - Get model dependencies
|
||||
- `dbt_ls` - List dbt resources
|
||||
- `dbt_docs_generate` - Generate full manifest
|
||||
|
||||
@@ -1,18 +1,13 @@
|
||||
# /profile - Data Profiling
|
||||
|
||||
## Skills to Load
|
||||
- skills/data-profiling.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Data Profile │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the profiling.
|
||||
|
||||
Generate statistical profile and quality report for a DataFrame.
|
||||
Display header: `DATA-PLATFORM - Data Profile`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,24 +17,12 @@ Generate statistical profile and quality report for a DataFrame.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get data reference**:
|
||||
- If no data_ref provided, use `list_data` to show available options
|
||||
- Validate the data_ref exists
|
||||
Execute `skills/data-profiling.md` profiling workflow:
|
||||
|
||||
2. **Generate profile**:
|
||||
- Use `describe` for statistical summary
|
||||
- Analyze null counts, unique values, data types
|
||||
|
||||
3. **Quality assessment**:
|
||||
- Identify columns with high null percentage
|
||||
- Flag potential data quality issues
|
||||
- Suggest cleaning operations if needed
|
||||
|
||||
4. **Report**:
|
||||
- Summary statistics per column
|
||||
- Data type distribution
|
||||
- Memory usage
|
||||
- Quality score
|
||||
1. **Get data reference**: Use `list_data` if none provided
|
||||
2. **Generate profile**: Use `describe` for statistics
|
||||
3. **Quality assessment**: Identify null columns, potential issues
|
||||
4. **Report**: Statistics, types, memory usage, quality score
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -48,9 +31,8 @@ Generate statistical profile and quality report for a DataFrame.
|
||||
/profile df_a1b2c3d4
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
## Required MCP Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `describe` - Get statistical summary
|
||||
- `head` - Preview first rows
|
||||
- `list_data` - List available DataFrames
|
||||
|
||||
@@ -1,18 +1,13 @@
|
||||
# /run - Execute dbt Models
|
||||
|
||||
## Skills to Load
|
||||
- skills/dbt-workflow.md
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · dbt Run │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the execution.
|
||||
|
||||
Run dbt models with automatic pre-validation.
|
||||
Display header: `DATA-PLATFORM - dbt Run`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -22,46 +17,28 @@ Run dbt models with automatic pre-validation.
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Pre-validation** (MANDATORY):
|
||||
- Use `dbt_parse` to validate project
|
||||
- Check for deprecated syntax (dbt 1.9+)
|
||||
- If validation fails, show errors and STOP
|
||||
Execute `skills/dbt-workflow.md` run workflow:
|
||||
|
||||
2. **Execute models**:
|
||||
- Use `dbt_run` with provided selection
|
||||
- Monitor progress and capture output
|
||||
1. **Pre-validation** (MANDATORY): Run `dbt_parse` first
|
||||
2. **Execute models**: Use `dbt_run` with selection
|
||||
3. **Report results**: Status, execution time, row counts
|
||||
|
||||
3. **Report results**:
|
||||
- Success/failure status per model
|
||||
- Execution time
|
||||
- Row counts where available
|
||||
- Any warnings or errors
|
||||
## Selection Syntax
|
||||
|
||||
See `skills/dbt-workflow.md` for full selection patterns.
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/run # Run all models
|
||||
/run dim_customers # Run specific model
|
||||
/run +fct_orders # Run model and its upstream
|
||||
/run +fct_orders # Run model and upstream
|
||||
/run tag:daily # Run models with tag
|
||||
/run --full-refresh # Rebuild incremental models
|
||||
```
|
||||
|
||||
## Selection Syntax
|
||||
## Required MCP Tools
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---------|---------|
|
||||
| `model_name` | Run single model |
|
||||
| `+model_name` | Run model and upstream |
|
||||
| `model_name+` | Run model and downstream |
|
||||
| `+model_name+` | Run model with all deps |
|
||||
| `tag:name` | Run by tag |
|
||||
| `path:models/staging` | Run by path |
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_parse` - Pre-validation (ALWAYS RUN FIRST)
|
||||
- `dbt_run` - Execute models
|
||||
- `dbt_build` - Run + test
|
||||
- `dbt_test` - Run tests only
|
||||
|
||||
@@ -1,18 +1,12 @@
|
||||
# /schema - Schema Exploration
|
||||
|
||||
## Skills to Load
|
||||
- skills/mcp-tools-reference.md
|
||||
- skills/visual-header.md
|
||||
|
||||
## Visual Output
|
||||
|
||||
When executing this command, display the plugin header:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ 📊 DATA-PLATFORM · Schema Explorer │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Then proceed with the exploration.
|
||||
|
||||
Display schema information for database tables or DataFrames.
|
||||
Display header: `DATA-PLATFORM - Schema Explorer`
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -23,23 +17,15 @@ Display schema information for database tables or DataFrames.
|
||||
## Workflow
|
||||
|
||||
1. **Determine target**:
|
||||
- If argument is a loaded data_ref, show DataFrame schema
|
||||
- If argument is a table name, query database schema
|
||||
- If no argument, list all available tables and DataFrames
|
||||
- DataFrame: show pandas schema via `describe`
|
||||
- Database table: query via `pg_columns`
|
||||
- No argument: list all tables and DataFrames
|
||||
|
||||
2. **For DataFrames**:
|
||||
- Use `describe` to get column info
|
||||
- Show dtypes, null counts, sample values
|
||||
2. **For DataFrames**: Show dtypes, null counts, sample values
|
||||
|
||||
3. **For database tables**:
|
||||
- Use `pg_columns` for column details
|
||||
- Use `st_tables` to check for PostGIS columns
|
||||
- Show constraints and indexes if available
|
||||
3. **For database tables**: Show columns, types, constraints, indexes
|
||||
|
||||
4. **Report**:
|
||||
- Column name, type, nullable, default
|
||||
- For PostGIS: geometry type, SRID
|
||||
- For DataFrames: pandas dtype, null percentage
|
||||
4. **For PostGIS**: Include geometry type and SRID via `st_tables`
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -49,9 +35,8 @@ Display schema information for database tables or DataFrames.
|
||||
/schema sales_data # Show DataFrame schema
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
## Required MCP Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `pg_tables` - List database tables
|
||||
- `pg_columns` - Get column info
|
||||
- `pg_schemas` - List schemas
|
||||
|
||||
72
plugins/data-platform/skills/data-profiling.md
Normal file
72
plugins/data-platform/skills/data-profiling.md
Normal file
@@ -0,0 +1,72 @@
|
||||
# Data Profiling
|
||||
|
||||
## Profiling Workflow
|
||||
|
||||
1. **Get data reference** via `list_data`
|
||||
2. **Generate statistics** via `describe`
|
||||
3. **Analyze quality** (nulls, duplicates, types, outliers)
|
||||
4. **Calculate score** and generate report
|
||||
|
||||
## Quality Checks
|
||||
|
||||
### Null Analysis
|
||||
- Calculate null percentage per column
|
||||
- **PASS**: < 5% nulls
|
||||
- **WARN**: 5-20% nulls
|
||||
- **FAIL**: > 20% nulls
|
||||
|
||||
### Duplicate Detection
|
||||
- Check for fully duplicated rows
|
||||
- **PASS**: 0% duplicates
|
||||
- **WARN**: < 1% duplicates
|
||||
- **FAIL**: >= 1% duplicates
|
||||
|
||||
### Type Consistency
|
||||
- Identify mixed-type columns
|
||||
- Flag numeric columns with string values
|
||||
- **PASS**: Consistent types
|
||||
- **FAIL**: Mixed types detected
|
||||
|
||||
### Outlier Detection (IQR Method)
|
||||
- Calculate Q1, Q3, IQR = Q3 - Q1
|
||||
- Outliers: values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR
|
||||
- **PASS**: < 1% outliers
|
||||
- **WARN**: 1-5% outliers
|
||||
- **FAIL**: > 5% outliers
|
||||
|
||||
## Quality Scoring
|
||||
|
||||
| Component | Weight | Formula |
|
||||
|-----------|--------|---------|
|
||||
| Nulls | 30% | 100 - (avg_null_pct * 2) |
|
||||
| Duplicates | 20% | 100 - (dup_pct * 50) |
|
||||
| Type consistency | 25% | 100 if clean, 0 if mixed |
|
||||
| Outliers | 25% | 100 - (avg_outlier_pct * 10) |
|
||||
|
||||
Final score: Weighted average, capped at 0-100
|
||||
|
||||
## Report Format
|
||||
|
||||
```
|
||||
=== Data Quality Report ===
|
||||
Dataset: [data_ref]
|
||||
Rows: X | Columns: Y
|
||||
Overall Score: XX/100 [PASS/WARN/FAIL]
|
||||
|
||||
--- Column Analysis ---
|
||||
| Column | Nulls | Dups | Type | Outliers | Status |
|
||||
|--------|-------|------|------|----------|--------|
|
||||
| col1 | X.X% | - | type | X.X% | PASS |
|
||||
|
||||
--- Issues Found ---
|
||||
[WARN/FAIL] Column 'X': Issue description
|
||||
|
||||
--- Recommendations ---
|
||||
1. Suggested remediation steps
|
||||
```
|
||||
|
||||
## Strict Mode
|
||||
|
||||
With `--strict` flag:
|
||||
- **WARN** at 1% nulls (vs 5%)
|
||||
- **FAIL** at 5% nulls (vs 20%)
|
||||
85
plugins/data-platform/skills/dbt-workflow.md
Normal file
85
plugins/data-platform/skills/dbt-workflow.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# dbt Workflow
|
||||
|
||||
## Pre-Validation (MANDATORY)
|
||||
|
||||
**Always run `dbt_parse` before any dbt operation.**
|
||||
|
||||
This validates:
|
||||
- dbt_project.yml syntax
|
||||
- Model SQL syntax
|
||||
- schema.yml definitions
|
||||
- Deprecated syntax (dbt 1.9+)
|
||||
|
||||
If validation fails, show errors and STOP.
|
||||
|
||||
## Model Selection Syntax
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---------|---------|
|
||||
| `model_name` | Single model |
|
||||
| `+model_name` | Model and upstream dependencies |
|
||||
| `model_name+` | Model and downstream dependents |
|
||||
| `+model_name+` | Model with all dependencies |
|
||||
| `tag:name` | Models with specific tag |
|
||||
| `path:models/staging` | Models in path |
|
||||
| `test_type:schema` | Schema tests only |
|
||||
| `test_type:data` | Data tests only |
|
||||
|
||||
## Execution Workflow
|
||||
|
||||
1. **Parse**: `dbt_parse` - Validate project
|
||||
2. **Run**: `dbt_run` - Execute models
|
||||
3. **Test**: `dbt_test` - Run tests
|
||||
4. **Build**: `dbt_build` - Run + test together
|
||||
|
||||
## Test Types
|
||||
|
||||
### Schema Tests
|
||||
Defined in `schema.yml`:
|
||||
- `unique` - No duplicate values
|
||||
- `not_null` - No null values
|
||||
- `accepted_values` - Value in allowed list
|
||||
- `relationships` - Foreign key integrity
|
||||
|
||||
### Data Tests
|
||||
Custom SQL in `tests/` directory:
|
||||
- Return rows that fail assertion
|
||||
- Zero rows = pass, any rows = fail
|
||||
|
||||
## Materialization Types
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `view` | Virtual table, always fresh |
|
||||
| `table` | Physical table, full rebuild |
|
||||
| `incremental` | Append/merge new rows only |
|
||||
| `ephemeral` | CTE, no physical object |
|
||||
|
||||
## Exit Codes
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| 0 | Success |
|
||||
| 1 | Test/run failure |
|
||||
| 2 | dbt error (parse failure) |
|
||||
|
||||
## Result Formatting
|
||||
|
||||
```
|
||||
=== dbt [Operation] Results ===
|
||||
Project: [project_name]
|
||||
Selection: [selection_pattern]
|
||||
|
||||
--- Summary ---
|
||||
Total: X models/tests
|
||||
PASS: X (%)
|
||||
FAIL: X (%)
|
||||
WARN: X (%)
|
||||
SKIP: X (%)
|
||||
|
||||
--- Details ---
|
||||
[Model/Test details with status]
|
||||
|
||||
--- Failure Details ---
|
||||
[Error messages and remediation]
|
||||
```
|
||||
73
plugins/data-platform/skills/lineage-analysis.md
Normal file
73
plugins/data-platform/skills/lineage-analysis.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Lineage Analysis
|
||||
|
||||
## Lineage Workflow
|
||||
|
||||
1. **Get lineage data** via `dbt_lineage`
|
||||
2. **Build dependency graph** (upstream + downstream)
|
||||
3. **Visualize** (ASCII tree or Mermaid)
|
||||
4. **Report** critical path and refresh implications
|
||||
|
||||
## ASCII Tree Format
|
||||
|
||||
```
|
||||
Sources:
|
||||
|-- raw_customers (source)
|
||||
|-- raw_orders (source)
|
||||
|
||||
model_name (materialization)
|
||||
|-- upstream:
|
||||
| |-- stg_model (view)
|
||||
| |-- raw_source (source)
|
||||
|-- downstream:
|
||||
|-- fct_model (incremental)
|
||||
|-- rpt_model (table)
|
||||
```
|
||||
|
||||
## Mermaid Diagram Format
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph Sources
|
||||
raw_data[(raw_data)]
|
||||
end
|
||||
|
||||
subgraph Staging
|
||||
stg_model[stg_model]
|
||||
end
|
||||
|
||||
subgraph Marts
|
||||
dim_model{{dim_model}}
|
||||
end
|
||||
|
||||
raw_data --> stg_model
|
||||
stg_model --> dim_model
|
||||
```
|
||||
|
||||
## Mermaid Node Shapes
|
||||
|
||||
| Materialization | Shape | Syntax |
|
||||
|-----------------|-------|--------|
|
||||
| source | Cylinder | `[(name)]` |
|
||||
| view | Rectangle | `[name]` |
|
||||
| table | Double braces | `{{name}}` |
|
||||
| incremental | Hexagon | `{{name}}` |
|
||||
| ephemeral | Dashed | `[/name/]` |
|
||||
|
||||
## Mermaid Options
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--direction TB` | Top-to-bottom (default: LR) |
|
||||
| `--depth N` | Limit lineage depth |
|
||||
|
||||
## Styling Target Model
|
||||
|
||||
```mermaid
|
||||
style target_model fill:#f96,stroke:#333,stroke-width:2px
|
||||
```
|
||||
|
||||
## Usage Tips
|
||||
|
||||
1. **Documentation**: Copy Mermaid to README.md
|
||||
2. **GitHub/GitLab**: Both render Mermaid natively
|
||||
3. **Live Editor**: https://mermaid.live for interactive editing
|
||||
69
plugins/data-platform/skills/mcp-tools-reference.md
Normal file
69
plugins/data-platform/skills/mcp-tools-reference.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# MCP Tools Reference
|
||||
|
||||
## pandas Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `read_csv` | Load CSV file into DataFrame |
|
||||
| `read_parquet` | Load Parquet file into DataFrame |
|
||||
| `read_json` | Load JSON/JSONL file into DataFrame |
|
||||
| `to_csv` | Export DataFrame to CSV |
|
||||
| `to_parquet` | Export DataFrame to Parquet |
|
||||
| `describe` | Get statistical summary (count, mean, std, min, max) |
|
||||
| `head` | Preview first N rows |
|
||||
| `tail` | Preview last N rows |
|
||||
| `filter` | Filter rows by condition |
|
||||
| `select` | Select specific columns |
|
||||
| `groupby` | Aggregate data by columns |
|
||||
| `join` | Join two DataFrames |
|
||||
| `list_data` | List all loaded DataFrames |
|
||||
| `drop_data` | Remove DataFrame from memory |
|
||||
|
||||
## PostgreSQL Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `pg_connect` | Establish database connection |
|
||||
| `pg_query` | Execute SELECT query, return DataFrame |
|
||||
| `pg_execute` | Execute INSERT/UPDATE/DELETE |
|
||||
| `pg_tables` | List tables in schema |
|
||||
| `pg_columns` | Get column info for table |
|
||||
| `pg_schemas` | List available schemas |
|
||||
|
||||
## PostGIS Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `st_tables` | List tables with geometry columns |
|
||||
| `st_geometry_type` | Get geometry type for column |
|
||||
| `st_srid` | Get SRID for geometry column |
|
||||
| `st_extent` | Get bounding box for geometry |
|
||||
|
||||
## dbt Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `dbt_parse` | Validate project (ALWAYS RUN FIRST) |
|
||||
| `dbt_run` | Execute models |
|
||||
| `dbt_test` | Run tests |
|
||||
| `dbt_build` | Run + test together |
|
||||
| `dbt_compile` | Compile SQL without execution |
|
||||
| `dbt_ls` | List dbt resources |
|
||||
| `dbt_docs_generate` | Generate documentation manifest |
|
||||
| `dbt_lineage` | Get model dependencies |
|
||||
|
||||
## Tool Selection Guidelines
|
||||
|
||||
**For data loading:**
|
||||
- Files: `read_csv`, `read_parquet`, `read_json`
|
||||
- Database: `pg_query`
|
||||
|
||||
**For data exploration:**
|
||||
- Schema: `describe`, `pg_columns`, `st_tables`
|
||||
- Preview: `head`, `tail`
|
||||
- Available data: `list_data`, `pg_tables`
|
||||
|
||||
**For dbt operations:**
|
||||
- Always start with `dbt_parse` for validation
|
||||
- Use `dbt_lineage` for dependency analysis
|
||||
- Use `dbt_compile` to see rendered SQL
|
||||
108
plugins/data-platform/skills/setup-workflow.md
Normal file
108
plugins/data-platform/skills/setup-workflow.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# Setup Workflow
|
||||
|
||||
## Important Context
|
||||
|
||||
- **This workflow uses Bash, Read, Write, AskUserQuestion tools** - NOT MCP tools
|
||||
- **MCP tools won't work until after setup + session restart**
|
||||
- **PostgreSQL and dbt are optional** - pandas tools work without them
|
||||
|
||||
## Phase 1: Environment Validation
|
||||
|
||||
### Check Python Version
|
||||
```bash
|
||||
python3 --version
|
||||
```
|
||||
Requires Python 3.10+. If below, stop and inform user.
|
||||
|
||||
## Phase 2: MCP Server Setup
|
||||
|
||||
### Locate MCP Server
|
||||
Check both paths:
|
||||
```bash
|
||||
# Installed marketplace
|
||||
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/
|
||||
|
||||
# Source
|
||||
ls -la ~/claude-plugins-work/mcp-servers/data-platform/
|
||||
```
|
||||
|
||||
### Check/Create Virtual Environment
|
||||
```bash
|
||||
# Check
|
||||
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python
|
||||
|
||||
# Create if missing
|
||||
cd /path/to/mcp-servers/data-platform
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install --upgrade pip
|
||||
pip install -r requirements.txt
|
||||
deactivate
|
||||
```
|
||||
|
||||
## Phase 3: PostgreSQL Configuration (Optional)
|
||||
|
||||
### Config Location
|
||||
`~/.config/claude/postgres.env`
|
||||
|
||||
### Config Format
|
||||
```bash
|
||||
# PostgreSQL Configuration
|
||||
POSTGRES_URL=postgresql://user:pass@host:5432/db
|
||||
```
|
||||
|
||||
Set permissions: `chmod 600 ~/.config/claude/postgres.env`
|
||||
|
||||
### Test Connection
|
||||
```bash
|
||||
source ~/.config/claude/postgres.env && python3 -c "
|
||||
import asyncio, asyncpg
|
||||
async def test():
|
||||
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
|
||||
ver = await conn.fetchval('SELECT version()')
|
||||
await conn.close()
|
||||
print(f'SUCCESS: {ver.split(\",\")[0]}')
|
||||
asyncio.run(test())
|
||||
"
|
||||
```
|
||||
|
||||
## Phase 4: dbt Configuration (Optional)
|
||||
|
||||
dbt is **project-level** (auto-detected via `dbt_project.yml`).
|
||||
|
||||
For subdirectory projects, set in `.env`:
|
||||
```
|
||||
DBT_PROJECT_DIR=./transform
|
||||
DBT_PROFILES_DIR=~/.dbt
|
||||
```
|
||||
|
||||
### Check dbt Installation
|
||||
```bash
|
||||
dbt --version
|
||||
```
|
||||
|
||||
## Phase 5: Validation
|
||||
|
||||
### Verify MCP Server
|
||||
```bash
|
||||
cd /path/to/mcp-servers/data-platform
|
||||
.venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('OK')"
|
||||
```
|
||||
|
||||
## Memory Limits
|
||||
|
||||
Default: 100,000 rows per DataFrame
|
||||
|
||||
Override in project `.env`:
|
||||
```
|
||||
DATA_PLATFORM_MAX_ROWS=500000
|
||||
```
|
||||
|
||||
For larger datasets:
|
||||
- Use chunked processing (`chunk_size` parameter)
|
||||
- Filter data before loading
|
||||
- Store to Parquet for efficient re-loading
|
||||
|
||||
## Session Restart
|
||||
|
||||
After setup, restart Claude Code session for MCP tools to become available.
|
||||
45
plugins/data-platform/skills/visual-header.md
Normal file
45
plugins/data-platform/skills/visual-header.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Visual Header
|
||||
|
||||
## Standard Format
|
||||
|
||||
Display at the start of every command execution:
|
||||
|
||||
```
|
||||
+----------------------------------------------------------------------+
|
||||
| DATA-PLATFORM - [Command Name] |
|
||||
+----------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
## Command Headers
|
||||
|
||||
| Command | Header Text |
|
||||
|---------|-------------|
|
||||
| initial-setup | Setup Wizard |
|
||||
| ingest | Ingest |
|
||||
| profile | Data Profile |
|
||||
| schema | Schema Explorer |
|
||||
| data-quality | Data Quality |
|
||||
| run | dbt Run |
|
||||
| dbt-test | dbt Tests |
|
||||
| lineage | Lineage |
|
||||
| lineage-viz | Lineage Visualization |
|
||||
| explain | Model Explanation |
|
||||
|
||||
## Summary Box Format
|
||||
|
||||
For completion summaries:
|
||||
|
||||
```
|
||||
+============================================================+
|
||||
| DATA-PLATFORM [OPERATION] COMPLETE |
|
||||
+============================================================+
|
||||
| Component: [Status] |
|
||||
| Component: [Status] |
|
||||
+============================================================+
|
||||
```
|
||||
|
||||
## Status Indicators
|
||||
|
||||
- Success: `[check]` or `Ready`
|
||||
- Warning: `[!]` or `Partial`
|
||||
- Failure: `[X]` or `Failed`
|
||||
Reference in New Issue
Block a user