feat: add data-platform plugin (v4.0.0)
Add new data-platform plugin for data engineering workflows with: MCP Server (32 tools): - pandas operations (14 tools): read_csv, read_parquet, read_json, to_csv, to_parquet, describe, head, tail, filter, select, groupby, join, list_data, drop_data - PostgreSQL/PostGIS (10 tools): pg_connect, pg_query, pg_execute, pg_tables, pg_columns, pg_schemas, st_tables, st_geometry_type, st_srid, st_extent - dbt integration (8 tools): dbt_parse, dbt_run, dbt_test, dbt_build, dbt_compile, dbt_ls, dbt_docs_generate, dbt_lineage Plugin Features: - Arrow IPC data_ref system for DataFrame persistence across tool calls - Pre-execution validation for dbt with `dbt parse` - SessionStart hook for PostgreSQL connectivity check (non-blocking) - Hybrid configuration (system ~/.config/claude/postgres.env + project .env) - Memory management with 100k row limit and chunking support Commands: /initial-setup, /ingest, /profile, /schema, /explain, /lineage, /run Agents: data-ingestion, data-analysis Test suite: 71 tests covering config, data store, pandas, postgres, dbt tools Addresses data workflow issues from personal-portfolio project: - Lost data after multiple interactions (solved by Arrow IPC data_ref) - dbt 1.9+ syntax deprecation (solved by pre-execution validation) - Ungraceful PostgreSQL error handling (solved by SessionStart hook) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
25
plugins/data-platform/.claude-plugin/plugin.json
Normal file
25
plugins/data-platform/.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"name": "data-platform",
|
||||
"version": "1.0.0",
|
||||
"description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration",
|
||||
"author": {
|
||||
"name": "Leo Miranda",
|
||||
"email": "leobmiranda@gmail.com"
|
||||
},
|
||||
"homepage": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace/src/branch/main/plugins/data-platform/README.md",
|
||||
"repository": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace.git",
|
||||
"license": "MIT",
|
||||
"keywords": [
|
||||
"pandas",
|
||||
"postgresql",
|
||||
"postgis",
|
||||
"dbt",
|
||||
"data-engineering",
|
||||
"etl",
|
||||
"dataframe"
|
||||
],
|
||||
"hooks": "hooks/hooks.json",
|
||||
"commands": ["./commands/"],
|
||||
"agents": ["./agents/"],
|
||||
"mcpServers": ["./.mcp.json"]
|
||||
}
|
||||
10
plugins/data-platform/.mcp.json
Normal file
10
plugins/data-platform/.mcp.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"mcpServers": {
|
||||
"data-platform": {
|
||||
"type": "stdio",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python",
|
||||
"args": ["-m", "mcp_server.server"],
|
||||
"cwd": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform"
|
||||
}
|
||||
}
|
||||
}
|
||||
119
plugins/data-platform/README.md
Normal file
119
plugins/data-platform/README.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# data-platform Plugin
|
||||
|
||||
Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration for Claude Code.
|
||||
|
||||
## Features
|
||||
|
||||
- **pandas Operations**: Load, transform, and export DataFrames with persistent data_ref system
|
||||
- **PostgreSQL/PostGIS**: Database queries with connection pooling and spatial data support
|
||||
- **dbt Integration**: Build tool wrapper with pre-execution validation
|
||||
|
||||
## Installation
|
||||
|
||||
This plugin is part of the leo-claude-mktplace. Install via:
|
||||
|
||||
```bash
|
||||
# From marketplace
|
||||
claude plugins install leo-claude-mktplace/data-platform
|
||||
|
||||
# Setup MCP server venv
|
||||
cd ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### PostgreSQL (Optional)
|
||||
|
||||
Create `~/.config/claude/postgres.env`:
|
||||
|
||||
```env
|
||||
POSTGRES_URL=postgresql://user:password@host:5432/database
|
||||
```
|
||||
|
||||
### dbt (Optional)
|
||||
|
||||
Add to project `.env`:
|
||||
|
||||
```env
|
||||
DBT_PROJECT_DIR=/path/to/dbt/project
|
||||
DBT_PROFILES_DIR=~/.dbt
|
||||
```
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/initial-setup` | Interactive setup wizard for PostgreSQL and dbt configuration |
|
||||
| `/ingest` | Load data from files or database |
|
||||
| `/profile` | Generate data profile and statistics |
|
||||
| `/schema` | Show database/DataFrame schema |
|
||||
| `/explain` | Explain dbt model lineage |
|
||||
| `/lineage` | Visualize data dependencies |
|
||||
| `/run` | Execute dbt models |
|
||||
|
||||
## Agents
|
||||
|
||||
| Agent | Description |
|
||||
|-------|-------------|
|
||||
| `data-ingestion` | Data loading and transformation specialist |
|
||||
| `data-analysis` | Exploration and profiling specialist |
|
||||
|
||||
## data_ref System
|
||||
|
||||
All DataFrame operations use a `data_ref` system for persistence:
|
||||
|
||||
```
|
||||
# Load returns a reference
|
||||
read_csv("data.csv") → {"data_ref": "sales_data"}
|
||||
|
||||
# Use reference in subsequent operations
|
||||
filter("sales_data", "amount > 100") → {"data_ref": "sales_data_filtered"}
|
||||
describe("sales_data_filtered") → {statistics}
|
||||
```
|
||||
|
||||
## Example Workflow
|
||||
|
||||
```
|
||||
/ingest data/sales.csv
|
||||
# → Loaded 50,000 rows as "sales_data"
|
||||
|
||||
/profile sales_data
|
||||
# → Statistical summary, null counts, quality assessment
|
||||
|
||||
/schema orders
|
||||
# → Column names, types, constraints
|
||||
|
||||
/lineage fct_orders
|
||||
# → Dependency graph showing upstream/downstream models
|
||||
|
||||
/run dim_customers
|
||||
# → Pre-validates then executes dbt model
|
||||
```
|
||||
|
||||
## Tools Summary
|
||||
|
||||
### pandas (14 tools)
|
||||
`read_csv`, `read_parquet`, `read_json`, `to_csv`, `to_parquet`, `describe`, `head`, `tail`, `filter`, `select`, `groupby`, `join`, `list_data`, `drop_data`
|
||||
|
||||
### PostgreSQL (6 tools)
|
||||
`pg_connect`, `pg_query`, `pg_execute`, `pg_tables`, `pg_columns`, `pg_schemas`
|
||||
|
||||
### PostGIS (4 tools)
|
||||
`st_tables`, `st_geometry_type`, `st_srid`, `st_extent`
|
||||
|
||||
### dbt (8 tools)
|
||||
`dbt_parse`, `dbt_run`, `dbt_test`, `dbt_build`, `dbt_compile`, `dbt_ls`, `dbt_docs_generate`, `dbt_lineage`
|
||||
|
||||
## Memory Management
|
||||
|
||||
- Default limit: 100,000 rows per DataFrame
|
||||
- Configure via `DATA_PLATFORM_MAX_ROWS` environment variable
|
||||
- Use `chunk_size` parameter for large files
|
||||
- Monitor with `list_data` tool
|
||||
|
||||
## SessionStart Hook
|
||||
|
||||
On session start, the plugin checks PostgreSQL connectivity and displays a warning if unavailable. This is non-blocking - pandas and dbt tools remain available.
|
||||
98
plugins/data-platform/agents/data-analysis.md
Normal file
98
plugins/data-platform/agents/data-analysis.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# Data Analysis Agent
|
||||
|
||||
You are a data analysis specialist. Your role is to help users explore, profile, and understand their data.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- Profile datasets with statistical summaries
|
||||
- Explore database schemas and structures
|
||||
- Analyze dbt model lineage and dependencies
|
||||
- Provide data quality assessments
|
||||
- Generate insights and recommendations
|
||||
|
||||
## Available Tools
|
||||
|
||||
### Data Exploration
|
||||
- `describe` - Statistical summary
|
||||
- `head` - Preview first rows
|
||||
- `tail` - Preview last rows
|
||||
- `list_data` - List available DataFrames
|
||||
|
||||
### Database Exploration
|
||||
- `pg_connect` - Check database connection
|
||||
- `pg_tables` - List all tables
|
||||
- `pg_columns` - Get column details
|
||||
- `pg_schemas` - List schemas
|
||||
|
||||
### PostGIS Exploration
|
||||
- `st_tables` - List spatial tables
|
||||
- `st_geometry_type` - Get geometry type
|
||||
- `st_srid` - Get coordinate system
|
||||
- `st_extent` - Get bounding box
|
||||
|
||||
### dbt Analysis
|
||||
- `dbt_lineage` - Model dependencies
|
||||
- `dbt_ls` - List resources
|
||||
- `dbt_compile` - View compiled SQL
|
||||
- `dbt_docs_generate` - Generate docs
|
||||
|
||||
## Workflow Guidelines
|
||||
|
||||
1. **Understand the question**:
|
||||
- What does the user want to know?
|
||||
- What data is available?
|
||||
- What level of detail is needed?
|
||||
|
||||
2. **Explore the data**:
|
||||
- Start with `list_data` or `pg_tables`
|
||||
- Get schema info with `describe` or `pg_columns`
|
||||
- Preview with `head` to understand content
|
||||
|
||||
3. **Profile thoroughly**:
|
||||
- Use `describe` for statistics
|
||||
- Check for nulls, outliers, patterns
|
||||
- Note data quality issues
|
||||
|
||||
4. **Analyze dependencies** (for dbt):
|
||||
- Use `dbt_lineage` to trace data flow
|
||||
- Understand transformations
|
||||
- Identify critical paths
|
||||
|
||||
5. **Provide insights**:
|
||||
- Summarize findings clearly
|
||||
- Highlight potential issues
|
||||
- Recommend next steps
|
||||
|
||||
## Analysis Patterns
|
||||
|
||||
### Data Quality Check
|
||||
1. `describe` - Get statistics
|
||||
2. Check null percentages
|
||||
3. Identify outliers (min/max vs mean)
|
||||
4. Flag suspicious patterns
|
||||
|
||||
### Schema Comparison
|
||||
1. `pg_columns` - Get table A schema
|
||||
2. `pg_columns` - Get table B schema
|
||||
3. Compare column names, types
|
||||
4. Identify mismatches
|
||||
|
||||
### Lineage Analysis
|
||||
1. `dbt_lineage` - Get model graph
|
||||
2. Trace upstream sources
|
||||
3. Identify downstream impact
|
||||
4. Document critical path
|
||||
|
||||
## Example Interactions
|
||||
|
||||
**User**: What's in the sales_data DataFrame?
|
||||
**Agent**: Uses `describe`, `head`, explains columns, statistics, patterns
|
||||
|
||||
**User**: What tables are in the database?
|
||||
**Agent**: Uses `pg_tables`, shows list with column counts
|
||||
|
||||
**User**: How does the dim_customers model work?
|
||||
**Agent**: Uses `dbt_lineage`, `dbt_compile`, explains dependencies and SQL
|
||||
|
||||
**User**: Is there any spatial data?
|
||||
**Agent**: Uses `st_tables`, shows PostGIS tables with geometry types
|
||||
81
plugins/data-platform/agents/data-ingestion.md
Normal file
81
plugins/data-platform/agents/data-ingestion.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Data Ingestion Agent
|
||||
|
||||
You are a data ingestion specialist. Your role is to help users load, transform, and prepare data for analysis.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- Load data from CSV, Parquet, JSON files
|
||||
- Query PostgreSQL databases
|
||||
- Transform data using filter, select, groupby, join operations
|
||||
- Export data to various formats
|
||||
- Handle large datasets with chunking
|
||||
|
||||
## Available Tools
|
||||
|
||||
### File Operations
|
||||
- `read_csv` - Load CSV files with optional chunking
|
||||
- `read_parquet` - Load Parquet files
|
||||
- `read_json` - Load JSON/JSONL files
|
||||
- `to_csv` - Export to CSV
|
||||
- `to_parquet` - Export to Parquet
|
||||
|
||||
### Data Transformation
|
||||
- `filter` - Filter rows by condition
|
||||
- `select` - Select specific columns
|
||||
- `groupby` - Group and aggregate
|
||||
- `join` - Join two DataFrames
|
||||
|
||||
### Database Operations
|
||||
- `pg_query` - Execute SELECT queries
|
||||
- `pg_execute` - Execute INSERT/UPDATE/DELETE
|
||||
- `pg_tables` - List available tables
|
||||
|
||||
### Management
|
||||
- `list_data` - List all stored DataFrames
|
||||
- `drop_data` - Remove DataFrame from store
|
||||
|
||||
## Workflow Guidelines
|
||||
|
||||
1. **Understand the data source**:
|
||||
- Ask about file location/format
|
||||
- For database, understand table structure
|
||||
- Clarify any filters or transformations needed
|
||||
|
||||
2. **Load data efficiently**:
|
||||
- Use appropriate reader for file format
|
||||
- For large files (>100k rows), use chunking
|
||||
- Name DataFrames meaningfully
|
||||
|
||||
3. **Transform as needed**:
|
||||
- Apply filters early to reduce data size
|
||||
- Select only needed columns
|
||||
- Join related datasets
|
||||
|
||||
4. **Validate results**:
|
||||
- Check row counts after transformations
|
||||
- Verify data types are correct
|
||||
- Preview results with `head`
|
||||
|
||||
5. **Store with meaningful names**:
|
||||
- Use descriptive data_ref names
|
||||
- Document the source and transformations
|
||||
|
||||
## Memory Management
|
||||
|
||||
- Default row limit: 100,000 rows
|
||||
- For larger datasets, suggest:
|
||||
- Filtering before loading
|
||||
- Using chunk_size parameter
|
||||
- Aggregating to reduce size
|
||||
- Storing to Parquet for efficient retrieval
|
||||
|
||||
## Example Interactions
|
||||
|
||||
**User**: Load the sales data from data/sales.csv
|
||||
**Agent**: Uses `read_csv` to load, reports data_ref, row count, columns
|
||||
|
||||
**User**: Filter to only Q4 2024 sales
|
||||
**Agent**: Uses `filter` with date condition, stores filtered result
|
||||
|
||||
**User**: Join with customer data
|
||||
**Agent**: Uses `join` to combine, validates result counts
|
||||
90
plugins/data-platform/claude-md-integration.md
Normal file
90
plugins/data-platform/claude-md-integration.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# data-platform Plugin - CLAUDE.md Integration
|
||||
|
||||
Add this section to your project's CLAUDE.md to enable data-platform plugin features.
|
||||
|
||||
## Suggested CLAUDE.md Section
|
||||
|
||||
```markdown
|
||||
## Data Platform Integration
|
||||
|
||||
This project uses the data-platform plugin for data engineering workflows.
|
||||
|
||||
### Configuration
|
||||
|
||||
**PostgreSQL**: Credentials in `~/.config/claude/postgres.env`
|
||||
**dbt**: Project path auto-detected from `dbt_project.yml`
|
||||
|
||||
### Available Commands
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `/ingest` | Load data from files or database |
|
||||
| `/profile` | Generate statistical profile |
|
||||
| `/schema` | Show schema information |
|
||||
| `/explain` | Explain dbt model |
|
||||
| `/lineage` | Show data lineage |
|
||||
| `/run` | Execute dbt models |
|
||||
|
||||
### data_ref Convention
|
||||
|
||||
DataFrames are stored with references. Use meaningful names:
|
||||
- `raw_*` for source data
|
||||
- `stg_*` for staged/cleaned data
|
||||
- `dim_*` for dimension tables
|
||||
- `fct_*` for fact tables
|
||||
- `rpt_*` for reports
|
||||
|
||||
### dbt Workflow
|
||||
|
||||
1. Always validate before running: `/run` includes automatic `dbt_parse`
|
||||
2. For dbt 1.9+, check for deprecated syntax before commits
|
||||
3. Use `/lineage` to understand impact of changes
|
||||
|
||||
### Database Access
|
||||
|
||||
PostgreSQL tools require POSTGRES_URL configuration:
|
||||
- Read-only queries: `pg_query`
|
||||
- Write operations: `pg_execute`
|
||||
- Schema exploration: `pg_tables`, `pg_columns`
|
||||
|
||||
PostGIS spatial data:
|
||||
- List spatial tables: `st_tables`
|
||||
- Check geometry: `st_geometry_type`, `st_srid`, `st_extent`
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Add to project `.env` if needed:
|
||||
|
||||
```env
|
||||
# dbt configuration
|
||||
DBT_PROJECT_DIR=./transform
|
||||
DBT_PROFILES_DIR=~/.dbt
|
||||
|
||||
# Memory limits
|
||||
DATA_PLATFORM_MAX_ROWS=100000
|
||||
```
|
||||
|
||||
## Typical Workflows
|
||||
|
||||
### Data Exploration
|
||||
```
|
||||
/ingest data/raw_customers.csv
|
||||
/profile raw_customers
|
||||
/schema
|
||||
```
|
||||
|
||||
### ETL Development
|
||||
```
|
||||
/schema orders # Understand source
|
||||
/explain stg_orders # Understand transformation
|
||||
/run stg_orders # Test the model
|
||||
/lineage fct_orders # Check downstream impact
|
||||
```
|
||||
|
||||
### Database Analysis
|
||||
```
|
||||
/schema # List all tables
|
||||
pg_columns orders # Detailed schema
|
||||
st_tables # Find spatial data
|
||||
```
|
||||
44
plugins/data-platform/commands/explain.md
Normal file
44
plugins/data-platform/commands/explain.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# /explain - dbt Model Explanation
|
||||
|
||||
Explain a dbt model's purpose, dependencies, and SQL logic.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/explain <model_name>
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get model info**:
|
||||
- Use `dbt_lineage` to get model metadata
|
||||
- Extract description, tags, materialization
|
||||
|
||||
2. **Analyze dependencies**:
|
||||
- Show upstream models (what this depends on)
|
||||
- Show downstream models (what depends on this)
|
||||
- Visualize as dependency tree
|
||||
|
||||
3. **Compile SQL**:
|
||||
- Use `dbt_compile` to get rendered SQL
|
||||
- Explain key transformations
|
||||
|
||||
4. **Report**:
|
||||
- Model purpose (from description)
|
||||
- Materialization strategy
|
||||
- Dependency graph
|
||||
- Key SQL logic explained
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/explain dim_customers
|
||||
/explain fct_orders
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_lineage` - Get model dependencies
|
||||
- `dbt_compile` - Get compiled SQL
|
||||
- `dbt_ls` - List related resources
|
||||
44
plugins/data-platform/commands/ingest.md
Normal file
44
plugins/data-platform/commands/ingest.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# /ingest - Data Ingestion
|
||||
|
||||
Load data from files or database into the data platform.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/ingest [source]
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Identify data source**:
|
||||
- If source is a file path, determine format (CSV, Parquet, JSON)
|
||||
- If source is "db" or a table name, query PostgreSQL
|
||||
|
||||
2. **Load data**:
|
||||
- For files: Use `read_csv`, `read_parquet`, or `read_json`
|
||||
- For database: Use `pg_query` with appropriate SELECT
|
||||
|
||||
3. **Validate**:
|
||||
- Check row count against limits
|
||||
- If exceeds 100k rows, suggest chunking or filtering
|
||||
|
||||
4. **Report**:
|
||||
- Show data_ref, row count, columns, and memory usage
|
||||
- Preview first few rows
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/ingest data/sales.csv
|
||||
/ingest data/customers.parquet
|
||||
/ingest "SELECT * FROM orders WHERE created_at > '2024-01-01'"
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `read_csv` - Load CSV files
|
||||
- `read_parquet` - Load Parquet files
|
||||
- `read_json` - Load JSON/JSONL files
|
||||
- `pg_query` - Query PostgreSQL database
|
||||
- `list_data` - List loaded DataFrames
|
||||
231
plugins/data-platform/commands/initial-setup.md
Normal file
231
plugins/data-platform/commands/initial-setup.md
Normal file
@@ -0,0 +1,231 @@
|
||||
---
|
||||
description: Interactive setup wizard for data-platform plugin - configures MCP server and optional PostgreSQL/dbt
|
||||
---
|
||||
|
||||
# Data Platform Setup Wizard
|
||||
|
||||
This command sets up the data-platform plugin with pandas, PostgreSQL, and dbt integration.
|
||||
|
||||
## Important Context
|
||||
|
||||
- **This command uses Bash, Read, Write, and AskUserQuestion tools** - NOT MCP tools
|
||||
- **MCP tools won't work until after setup + session restart**
|
||||
- **PostgreSQL and dbt are optional** - pandas tools work without them
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Environment Validation
|
||||
|
||||
### Step 1.1: Check Python Version
|
||||
|
||||
```bash
|
||||
python3 --version
|
||||
```
|
||||
|
||||
Requires Python 3.10+. If below, stop setup and inform user.
|
||||
|
||||
### Step 1.2: Check for Required Libraries
|
||||
|
||||
```bash
|
||||
python3 -c "import sys; print(f'Python {sys.version_info.major}.{sys.version_info.minor}')"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: MCP Server Setup
|
||||
|
||||
### Step 2.1: Locate Data Platform MCP Server
|
||||
|
||||
The MCP server should be at the marketplace root:
|
||||
|
||||
```bash
|
||||
# If running from installed marketplace
|
||||
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_INSTALLED"
|
||||
|
||||
# If running from source
|
||||
ls -la ~/claude-plugins-work/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_SOURCE"
|
||||
```
|
||||
|
||||
Determine the correct path based on which exists.
|
||||
|
||||
### Step 2.2: Check Virtual Environment
|
||||
|
||||
```bash
|
||||
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python 2>/dev/null && echo "VENV_EXISTS" || echo "VENV_MISSING"
|
||||
```
|
||||
|
||||
### Step 2.3: Create Virtual Environment (if missing)
|
||||
|
||||
```bash
|
||||
cd /path/to/mcp-servers/data-platform && python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt && deactivate
|
||||
```
|
||||
|
||||
**Note:** This may take a few minutes due to pandas, pyarrow, and dbt dependencies.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: PostgreSQL Configuration (Optional)
|
||||
|
||||
### Step 3.1: Ask About PostgreSQL
|
||||
|
||||
Use AskUserQuestion:
|
||||
- Question: "Do you want to configure PostgreSQL database access?"
|
||||
- Header: "PostgreSQL"
|
||||
- Options:
|
||||
- "Yes, I have a PostgreSQL database"
|
||||
- "No, I'll only use pandas/dbt tools"
|
||||
|
||||
**If user chooses "No":** Skip to Phase 4.
|
||||
|
||||
### Step 3.2: Create Config Directory
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/claude
|
||||
```
|
||||
|
||||
### Step 3.3: Check PostgreSQL Configuration
|
||||
|
||||
```bash
|
||||
cat ~/.config/claude/postgres.env 2>/dev/null || echo "FILE_NOT_FOUND"
|
||||
```
|
||||
|
||||
**If file exists with valid URL:** Skip to Step 3.6.
|
||||
**If missing or has placeholders:** Continue.
|
||||
|
||||
### Step 3.4: Gather PostgreSQL Information
|
||||
|
||||
Use AskUserQuestion:
|
||||
- Question: "What is your PostgreSQL connection URL format?"
|
||||
- Header: "DB Format"
|
||||
- Options:
|
||||
- "Standard: postgresql://user:pass@host:5432/db"
|
||||
- "PostGIS: postgresql://user:pass@host:5432/db (with PostGIS extension)"
|
||||
- "Other (I'll provide the full URL)"
|
||||
|
||||
Ask user to provide the connection URL.
|
||||
|
||||
### Step 3.5: Create Configuration File
|
||||
|
||||
```bash
|
||||
cat > ~/.config/claude/postgres.env << 'EOF'
|
||||
# PostgreSQL Configuration
|
||||
# Generated by data-platform /initial-setup
|
||||
|
||||
POSTGRES_URL=<USER_PROVIDED_URL>
|
||||
EOF
|
||||
chmod 600 ~/.config/claude/postgres.env
|
||||
```
|
||||
|
||||
### Step 3.6: Test PostgreSQL Connection (if configured)
|
||||
|
||||
```bash
|
||||
source ~/.config/claude/postgres.env && python3 -c "
|
||||
import asyncio
|
||||
import asyncpg
|
||||
async def test():
|
||||
try:
|
||||
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
|
||||
ver = await conn.fetchval('SELECT version()')
|
||||
await conn.close()
|
||||
print(f'SUCCESS: {ver.split(\",\")[0]}')
|
||||
except Exception as e:
|
||||
print(f'FAILED: {e}')
|
||||
asyncio.run(test())
|
||||
"
|
||||
```
|
||||
|
||||
Report result:
|
||||
- SUCCESS: Connection works
|
||||
- FAILED: Show error and suggest fixes
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: dbt Configuration (Optional)
|
||||
|
||||
### Step 4.1: Ask About dbt
|
||||
|
||||
Use AskUserQuestion:
|
||||
- Question: "Do you use dbt for data transformations in your projects?"
|
||||
- Header: "dbt"
|
||||
- Options:
|
||||
- "Yes, I have dbt projects"
|
||||
- "No, I don't use dbt"
|
||||
|
||||
**If user chooses "No":** Skip to Phase 5.
|
||||
|
||||
### Step 4.2: dbt Discovery
|
||||
|
||||
dbt configuration is **project-level** (not system-level). The plugin auto-detects dbt projects by looking for `dbt_project.yml`.
|
||||
|
||||
Inform user:
|
||||
```
|
||||
dbt projects are detected automatically when you work in a directory
|
||||
containing dbt_project.yml.
|
||||
|
||||
If your dbt project is in a subdirectory, you can set DBT_PROJECT_DIR
|
||||
in your project's .env file:
|
||||
|
||||
DBT_PROJECT_DIR=./transform
|
||||
DBT_PROFILES_DIR=~/.dbt
|
||||
```
|
||||
|
||||
### Step 4.3: Check dbt Installation
|
||||
|
||||
```bash
|
||||
dbt --version 2>/dev/null || echo "DBT_NOT_FOUND"
|
||||
```
|
||||
|
||||
**If not found:** Inform user that dbt CLI tools require dbt-core to be installed globally or in the project.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Validation
|
||||
|
||||
### Step 5.1: Verify MCP Server
|
||||
|
||||
```bash
|
||||
cd /path/to/mcp-servers/data-platform && .venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('MCP Server OK')"
|
||||
```
|
||||
|
||||
### Step 5.2: Summary
|
||||
|
||||
```
|
||||
╔════════════════════════════════════════════════════════════╗
|
||||
║ DATA-PLATFORM SETUP COMPLETE ║
|
||||
╠════════════════════════════════════════════════════════════╣
|
||||
║ MCP Server: ✓ Ready ║
|
||||
║ pandas Tools: ✓ Available (14 tools) ║
|
||||
║ PostgreSQL Tools: [✓/✗] [Status based on config] ║
|
||||
║ PostGIS Tools: [✓/✗] [Status based on PostGIS] ║
|
||||
║ dbt Tools: [✓/✗] [Status based on discovery] ║
|
||||
╚════════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
### Step 5.3: Session Restart Notice
|
||||
|
||||
---
|
||||
|
||||
**⚠️ Session Restart Required**
|
||||
|
||||
Restart your Claude Code session for MCP tools to become available.
|
||||
|
||||
**After restart, you can:**
|
||||
- Run `/ingest` to load data from files or database
|
||||
- Run `/profile` to analyze DataFrame statistics
|
||||
- Run `/schema` to explore database/DataFrame schema
|
||||
- Run `/run` to execute dbt models (if configured)
|
||||
- Run `/lineage` to view dbt model dependencies
|
||||
|
||||
---
|
||||
|
||||
## Memory Limits
|
||||
|
||||
The data-platform plugin has a default row limit of 100,000 rows per DataFrame. For larger datasets:
|
||||
- Use chunked processing (`chunk_size` parameter)
|
||||
- Filter data before loading
|
||||
- Store to Parquet for efficient re-loading
|
||||
|
||||
You can override the limit by setting in your project `.env`:
|
||||
```
|
||||
DATA_PLATFORM_MAX_ROWS=500000
|
||||
```
|
||||
60
plugins/data-platform/commands/lineage.md
Normal file
60
plugins/data-platform/commands/lineage.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# /lineage - Data Lineage Visualization
|
||||
|
||||
Show data lineage for dbt models or database tables.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/lineage <model_name> [--depth N]
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get lineage data**:
|
||||
- Use `dbt_lineage` for dbt models
|
||||
- For database tables, trace through dbt manifest
|
||||
|
||||
2. **Build lineage graph**:
|
||||
- Identify all upstream sources
|
||||
- Identify all downstream consumers
|
||||
- Note materialization at each node
|
||||
|
||||
3. **Visualize**:
|
||||
- ASCII art dependency tree
|
||||
- List format with indentation
|
||||
- Show depth levels
|
||||
|
||||
4. **Report**:
|
||||
- Full dependency chain
|
||||
- Critical path identification
|
||||
- Refresh implications
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/lineage dim_customers
|
||||
/lineage fct_orders --depth 3
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
```
|
||||
Sources:
|
||||
└── raw_customers (source)
|
||||
└── raw_orders (source)
|
||||
|
||||
dim_customers (table)
|
||||
├── upstream:
|
||||
│ └── stg_customers (view)
|
||||
│ └── raw_customers (source)
|
||||
└── downstream:
|
||||
└── fct_orders (incremental)
|
||||
└── rpt_customer_lifetime (table)
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_lineage` - Get model dependencies
|
||||
- `dbt_ls` - List dbt resources
|
||||
- `dbt_docs_generate` - Generate full manifest
|
||||
44
plugins/data-platform/commands/profile.md
Normal file
44
plugins/data-platform/commands/profile.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# /profile - Data Profiling
|
||||
|
||||
Generate statistical profile and quality report for a DataFrame.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/profile <data_ref>
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Get data reference**:
|
||||
- If no data_ref provided, use `list_data` to show available options
|
||||
- Validate the data_ref exists
|
||||
|
||||
2. **Generate profile**:
|
||||
- Use `describe` for statistical summary
|
||||
- Analyze null counts, unique values, data types
|
||||
|
||||
3. **Quality assessment**:
|
||||
- Identify columns with high null percentage
|
||||
- Flag potential data quality issues
|
||||
- Suggest cleaning operations if needed
|
||||
|
||||
4. **Report**:
|
||||
- Summary statistics per column
|
||||
- Data type distribution
|
||||
- Memory usage
|
||||
- Quality score
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/profile sales_data
|
||||
/profile df_a1b2c3d4
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `describe` - Get statistical summary
|
||||
- `head` - Preview first rows
|
||||
- `list_data` - List available DataFrames
|
||||
55
plugins/data-platform/commands/run.md
Normal file
55
plugins/data-platform/commands/run.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# /run - Execute dbt Models
|
||||
|
||||
Run dbt models with automatic pre-validation.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/run [model_selection] [--full-refresh]
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Pre-validation** (MANDATORY):
|
||||
- Use `dbt_parse` to validate project
|
||||
- Check for deprecated syntax (dbt 1.9+)
|
||||
- If validation fails, show errors and STOP
|
||||
|
||||
2. **Execute models**:
|
||||
- Use `dbt_run` with provided selection
|
||||
- Monitor progress and capture output
|
||||
|
||||
3. **Report results**:
|
||||
- Success/failure status per model
|
||||
- Execution time
|
||||
- Row counts where available
|
||||
- Any warnings or errors
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/run # Run all models
|
||||
/run dim_customers # Run specific model
|
||||
/run +fct_orders # Run model and its upstream
|
||||
/run tag:daily # Run models with tag
|
||||
/run --full-refresh # Rebuild incremental models
|
||||
```
|
||||
|
||||
## Selection Syntax
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---------|---------|
|
||||
| `model_name` | Run single model |
|
||||
| `+model_name` | Run model and upstream |
|
||||
| `model_name+` | Run model and downstream |
|
||||
| `+model_name+` | Run model with all deps |
|
||||
| `tag:name` | Run by tag |
|
||||
| `path:models/staging` | Run by path |
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `dbt_parse` - Pre-validation (ALWAYS RUN FIRST)
|
||||
- `dbt_run` - Execute models
|
||||
- `dbt_build` - Run + test
|
||||
- `dbt_test` - Run tests only
|
||||
48
plugins/data-platform/commands/schema.md
Normal file
48
plugins/data-platform/commands/schema.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# /schema - Schema Exploration
|
||||
|
||||
Display schema information for database tables or DataFrames.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
/schema [table_name | data_ref]
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Determine target**:
|
||||
- If argument is a loaded data_ref, show DataFrame schema
|
||||
- If argument is a table name, query database schema
|
||||
- If no argument, list all available tables and DataFrames
|
||||
|
||||
2. **For DataFrames**:
|
||||
- Use `describe` to get column info
|
||||
- Show dtypes, null counts, sample values
|
||||
|
||||
3. **For database tables**:
|
||||
- Use `pg_columns` for column details
|
||||
- Use `st_tables` to check for PostGIS columns
|
||||
- Show constraints and indexes if available
|
||||
|
||||
4. **Report**:
|
||||
- Column name, type, nullable, default
|
||||
- For PostGIS: geometry type, SRID
|
||||
- For DataFrames: pandas dtype, null percentage
|
||||
|
||||
## Examples
|
||||
|
||||
```
|
||||
/schema # List all tables and DataFrames
|
||||
/schema customers # Show table schema
|
||||
/schema sales_data # Show DataFrame schema
|
||||
```
|
||||
|
||||
## Available Tools
|
||||
|
||||
Use these MCP tools:
|
||||
- `pg_tables` - List database tables
|
||||
- `pg_columns` - Get column info
|
||||
- `pg_schemas` - List schemas
|
||||
- `st_tables` - List PostGIS tables
|
||||
- `describe` - Get DataFrame info
|
||||
- `list_data` - List DataFrames
|
||||
10
plugins/data-platform/hooks/hooks.json
Normal file
10
plugins/data-platform/hooks/hooks.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"hooks": {
|
||||
"SessionStart": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/startup-check.sh"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
54
plugins/data-platform/hooks/startup-check.sh
Executable file
54
plugins/data-platform/hooks/startup-check.sh
Executable file
@@ -0,0 +1,54 @@
|
||||
#!/bin/bash
|
||||
# data-platform startup check hook
|
||||
# Checks for common issues at session start
|
||||
# All output MUST have [data-platform] prefix
|
||||
|
||||
PREFIX="[data-platform]"
|
||||
|
||||
# Check if MCP venv exists
|
||||
PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT:-$(dirname "$(dirname "$(realpath "$0")")")}"
|
||||
VENV_PATH="$PLUGIN_ROOT/mcp-servers/data-platform/.venv/bin/python"
|
||||
|
||||
if [[ ! -f "$VENV_PATH" ]]; then
|
||||
echo "$PREFIX MCP venv missing - run /initial-setup or setup.sh"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Check PostgreSQL configuration (optional - just warn if configured but failing)
|
||||
POSTGRES_CONFIG="$HOME/.config/claude/postgres.env"
|
||||
if [[ -f "$POSTGRES_CONFIG" ]]; then
|
||||
source "$POSTGRES_CONFIG"
|
||||
if [[ -n "${POSTGRES_URL:-}" ]]; then
|
||||
# Quick connection test (5 second timeout)
|
||||
RESULT=$("$VENV_PATH" -c "
|
||||
import asyncio
|
||||
import sys
|
||||
async def test():
|
||||
try:
|
||||
import asyncpg
|
||||
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
|
||||
await conn.close()
|
||||
return 'OK'
|
||||
except Exception as e:
|
||||
return f'FAIL: {e}'
|
||||
print(asyncio.run(test()))
|
||||
" 2>/dev/null || echo "FAIL: asyncpg not installed")
|
||||
|
||||
if [[ "$RESULT" == "OK" ]]; then
|
||||
# PostgreSQL OK - say nothing
|
||||
:
|
||||
elif [[ "$RESULT" == *"FAIL"* ]]; then
|
||||
echo "$PREFIX PostgreSQL connection failed - check POSTGRES_URL"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# Check dbt project (if in a project with dbt_project.yml)
|
||||
if [[ -f "dbt_project.yml" ]] || [[ -f "transform/dbt_project.yml" ]]; then
|
||||
if ! command -v dbt &> /dev/null; then
|
||||
echo "$PREFIX dbt CLI not found - dbt tools unavailable"
|
||||
fi
|
||||
fi
|
||||
|
||||
# All checks passed - say nothing
|
||||
exit 0
|
||||
1
plugins/data-platform/mcp-servers/data-platform
Symbolic link
1
plugins/data-platform/mcp-servers/data-platform
Symbolic link
@@ -0,0 +1 @@
|
||||
../../../mcp-servers/data-platform
|
||||
Reference in New Issue
Block a user