feat: add data-platform plugin (v4.0.0)

Add new data-platform plugin for data engineering workflows with: MCP Server (32 tools): - pandas operations (14 tools): read_csv, read_parquet, read_json, to_csv, to_parquet, describe, head, tail, filter, select, groupby, join, list_data, drop_data - PostgreSQL/PostGIS (10 tools): pg_connect, pg_query, pg_execute, pg_tables, pg_columns, pg_schemas, st_tables, st_geometry_type, st_srid, st_extent - dbt integration (8 tools): dbt_parse, dbt_run, dbt_test, dbt_build, dbt_compile, dbt_ls, dbt_docs_generate, dbt_lineage Plugin Features: - Arrow IPC data_ref system for DataFrame persistence across tool calls - Pre-execution validation for dbt with `dbt parse` - SessionStart hook for PostgreSQL connectivity check (non-blocking) - Hybrid configuration (system ~/.config/claude/postgres.env + project .env) - Memory management with 100k row limit and chunking support Commands: /initial-setup, /ingest, /profile, /schema, /explain, /lineage, /run Agents: data-ingestion, data-analysis Test suite: 71 tests covering config, data store, pandas, postgres, dbt tools Addresses data workflow issues from personal-portfolio project: - Lost data after multiple interactions (solved by Arrow IPC data_ref) - dbt 1.9+ syntax deprecation (solved by pre-execution validation) - Ungraceful PostgreSQL error handling (solved by SessionStart hook) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 14:24:03 -05:00
parent 6a267d074b
commit 89f0354ccc
39 changed files with 5413 additions and 6 deletions
--- a/plugins/data-platform/.claude-plugin/plugin.json
+++ b/plugins/data-platform/.claude-plugin/plugin.json
@@ -0,0 +1,25 @@
+{
+  "name": "data-platform",
+  "version": "1.0.0",
+  "description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration",
+  "author": {
+    "name": "Leo Miranda",
+    "email": "leobmiranda@gmail.com"
+  },
+  "homepage": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace/src/branch/main/plugins/data-platform/README.md",
+  "repository": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace.git",
+  "license": "MIT",
+  "keywords": [
+    "pandas",
+    "postgresql",
+    "postgis",
+    "dbt",
+    "data-engineering",
+    "etl",
+    "dataframe"
+  ],
+  "hooks": "hooks/hooks.json",
+  "commands": ["./commands/"],
+  "agents": ["./agents/"],
+  "mcpServers": ["./.mcp.json"]
+}
--- a/plugins/data-platform/.mcp.json
+++ b/plugins/data-platform/.mcp.json
@@ -0,0 +1,10 @@
+{
+  "mcpServers": {
+    "data-platform": {
+      "type": "stdio",
+      "command": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python",
+      "args": ["-m", "mcp_server.server"],
+      "cwd": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform"
+    }
+  }
+}
--- a/plugins/data-platform/README.md
+++ b/plugins/data-platform/README.md
@@ -0,0 +1,119 @@
+# data-platform Plugin
+
+Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration for Claude Code.
+
+## Features
+
+- **pandas Operations**: Load, transform, and export DataFrames with persistent data_ref system
+- **PostgreSQL/PostGIS**: Database queries with connection pooling and spatial data support
+- **dbt Integration**: Build tool wrapper with pre-execution validation
+
+## Installation
+
+This plugin is part of the leo-claude-mktplace. Install via:
+
+```bash
+# From marketplace
+claude plugins install leo-claude-mktplace/data-platform
+
+# Setup MCP server venv
+cd ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform
+python -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+## Configuration
+
+### PostgreSQL (Optional)
+
+Create `~/.config/claude/postgres.env`:
+
+```env
+POSTGRES_URL=postgresql://user:password@host:5432/database
+```
+
+### dbt (Optional)
+
+Add to project `.env`:
+
+```env
+DBT_PROJECT_DIR=/path/to/dbt/project
+DBT_PROFILES_DIR=~/.dbt
+```
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/initial-setup` | Interactive setup wizard for PostgreSQL and dbt configuration |
+| `/ingest` | Load data from files or database |
+| `/profile` | Generate data profile and statistics |
+| `/schema` | Show database/DataFrame schema |
+| `/explain` | Explain dbt model lineage |
+| `/lineage` | Visualize data dependencies |
+| `/run` | Execute dbt models |
+
+## Agents
+
+| Agent | Description |
+|-------|-------------|
+| `data-ingestion` | Data loading and transformation specialist |
+| `data-analysis` | Exploration and profiling specialist |
+
+## data_ref System
+
+All DataFrame operations use a `data_ref` system for persistence:
+
+```
+# Load returns a reference
+read_csv("data.csv") → {"data_ref": "sales_data"}
+
+# Use reference in subsequent operations
+filter("sales_data", "amount > 100") → {"data_ref": "sales_data_filtered"}
+describe("sales_data_filtered") → {statistics}
+```
+
+## Example Workflow
+
+```
+/ingest data/sales.csv
+# → Loaded 50,000 rows as "sales_data"
+
+/profile sales_data
+# → Statistical summary, null counts, quality assessment
+
+/schema orders
+# → Column names, types, constraints
+
+/lineage fct_orders
+# → Dependency graph showing upstream/downstream models
+
+/run dim_customers
+# → Pre-validates then executes dbt model
+```
+
+## Tools Summary
+
+### pandas (14 tools)
+`read_csv`, `read_parquet`, `read_json`, `to_csv`, `to_parquet`, `describe`, `head`, `tail`, `filter`, `select`, `groupby`, `join`, `list_data`, `drop_data`
+
+### PostgreSQL (6 tools)
+`pg_connect`, `pg_query`, `pg_execute`, `pg_tables`, `pg_columns`, `pg_schemas`
+
+### PostGIS (4 tools)
+`st_tables`, `st_geometry_type`, `st_srid`, `st_extent`
+
+### dbt (8 tools)
+`dbt_parse`, `dbt_run`, `dbt_test`, `dbt_build`, `dbt_compile`, `dbt_ls`, `dbt_docs_generate`, `dbt_lineage`
+
+## Memory Management
+
+- Default limit: 100,000 rows per DataFrame
+- Configure via `DATA_PLATFORM_MAX_ROWS` environment variable
+- Use `chunk_size` parameter for large files
+- Monitor with `list_data` tool
+
+## SessionStart Hook
+
+On session start, the plugin checks PostgreSQL connectivity and displays a warning if unavailable. This is non-blocking - pandas and dbt tools remain available.
--- a/plugins/data-platform/agents/data-analysis.md
+++ b/plugins/data-platform/agents/data-analysis.md
@@ -0,0 +1,98 @@
+# Data Analysis Agent
+
+You are a data analysis specialist. Your role is to help users explore, profile, and understand their data.
+
+## Capabilities
+
+- Profile datasets with statistical summaries
+- Explore database schemas and structures
+- Analyze dbt model lineage and dependencies
+- Provide data quality assessments
+- Generate insights and recommendations
+
+## Available Tools
+
+### Data Exploration
+- `describe` - Statistical summary
+- `head` - Preview first rows
+- `tail` - Preview last rows
+- `list_data` - List available DataFrames
+
+### Database Exploration
+- `pg_connect` - Check database connection
+- `pg_tables` - List all tables
+- `pg_columns` - Get column details
+- `pg_schemas` - List schemas
+
+### PostGIS Exploration
+- `st_tables` - List spatial tables
+- `st_geometry_type` - Get geometry type
+- `st_srid` - Get coordinate system
+- `st_extent` - Get bounding box
+
+### dbt Analysis
+- `dbt_lineage` - Model dependencies
+- `dbt_ls` - List resources
+- `dbt_compile` - View compiled SQL
+- `dbt_docs_generate` - Generate docs
+
+## Workflow Guidelines
+
+1. **Understand the question**:
+   - What does the user want to know?
+   - What data is available?
+   - What level of detail is needed?
+
+2. **Explore the data**:
+   - Start with `list_data` or `pg_tables`
+   - Get schema info with `describe` or `pg_columns`
+   - Preview with `head` to understand content
+
+3. **Profile thoroughly**:
+   - Use `describe` for statistics
+   - Check for nulls, outliers, patterns
+   - Note data quality issues
+
+4. **Analyze dependencies** (for dbt):
+   - Use `dbt_lineage` to trace data flow
+   - Understand transformations
+   - Identify critical paths
+
+5. **Provide insights**:
+   - Summarize findings clearly
+   - Highlight potential issues
+   - Recommend next steps
+
+## Analysis Patterns
+
+### Data Quality Check
+1. `describe` - Get statistics
+2. Check null percentages
+3. Identify outliers (min/max vs mean)
+4. Flag suspicious patterns
+
+### Schema Comparison
+1. `pg_columns` - Get table A schema
+2. `pg_columns` - Get table B schema
+3. Compare column names, types
+4. Identify mismatches
+
+### Lineage Analysis
+1. `dbt_lineage` - Get model graph
+2. Trace upstream sources
+3. Identify downstream impact
+4. Document critical path
+
+## Example Interactions
+
+**User**: What's in the sales_data DataFrame?
+**Agent**: Uses `describe`, `head`, explains columns, statistics, patterns
+
+**User**: What tables are in the database?
+**Agent**: Uses `pg_tables`, shows list with column counts
+
+**User**: How does the dim_customers model work?
+**Agent**: Uses `dbt_lineage`, `dbt_compile`, explains dependencies and SQL
+
+**User**: Is there any spatial data?
+**Agent**: Uses `st_tables`, shows PostGIS tables with geometry types
--- a/plugins/data-platform/agents/data-ingestion.md
+++ b/plugins/data-platform/agents/data-ingestion.md
@@ -0,0 +1,81 @@
+# Data Ingestion Agent
+
+You are a data ingestion specialist. Your role is to help users load, transform, and prepare data for analysis.
+
+## Capabilities
+
+- Load data from CSV, Parquet, JSON files
+- Query PostgreSQL databases
+- Transform data using filter, select, groupby, join operations
+- Export data to various formats
+- Handle large datasets with chunking
+
+## Available Tools
+
+### File Operations
+- `read_csv` - Load CSV files with optional chunking
+- `read_parquet` - Load Parquet files
+- `read_json` - Load JSON/JSONL files
+- `to_csv` - Export to CSV
+- `to_parquet` - Export to Parquet
+
+### Data Transformation
+- `filter` - Filter rows by condition
+- `select` - Select specific columns
+- `groupby` - Group and aggregate
+- `join` - Join two DataFrames
+
+### Database Operations
+- `pg_query` - Execute SELECT queries
+- `pg_execute` - Execute INSERT/UPDATE/DELETE
+- `pg_tables` - List available tables
+
+### Management
+- `list_data` - List all stored DataFrames
+- `drop_data` - Remove DataFrame from store
+
+## Workflow Guidelines
+
+1. **Understand the data source**:
+   - Ask about file location/format
+   - For database, understand table structure
+   - Clarify any filters or transformations needed
+
+2. **Load data efficiently**:
+   - Use appropriate reader for file format
+   - For large files (>100k rows), use chunking
+   - Name DataFrames meaningfully
+
+3. **Transform as needed**:
+   - Apply filters early to reduce data size
+   - Select only needed columns
+   - Join related datasets
+
+4. **Validate results**:
+   - Check row counts after transformations
+   - Verify data types are correct
+   - Preview results with `head`
+
+5. **Store with meaningful names**:
+   - Use descriptive data_ref names
+   - Document the source and transformations
+
+## Memory Management
+
+- Default row limit: 100,000 rows
+- For larger datasets, suggest:
+  - Filtering before loading
+  - Using chunk_size parameter
+  - Aggregating to reduce size
+  - Storing to Parquet for efficient retrieval
+
+## Example Interactions
+
+**User**: Load the sales data from data/sales.csv
+**Agent**: Uses `read_csv` to load, reports data_ref, row count, columns
+
+**User**: Filter to only Q4 2024 sales
+**Agent**: Uses `filter` with date condition, stores filtered result
+
+**User**: Join with customer data
+**Agent**: Uses `join` to combine, validates result counts
--- a/plugins/data-platform/claude-md-integration.md
+++ b/plugins/data-platform/claude-md-integration.md
@@ -0,0 +1,90 @@
+# data-platform Plugin - CLAUDE.md Integration
+
+Add this section to your project's CLAUDE.md to enable data-platform plugin features.
+
+## Suggested CLAUDE.md Section
+
+```markdown
+## Data Platform Integration
+
+This project uses the data-platform plugin for data engineering workflows.
+
+### Configuration
+
+**PostgreSQL**: Credentials in `~/.config/claude/postgres.env`
+**dbt**: Project path auto-detected from `dbt_project.yml`
+
+### Available Commands
+
+| Command | Purpose |
+|---------|---------|
+| `/ingest` | Load data from files or database |
+| `/profile` | Generate statistical profile |
+| `/schema` | Show schema information |
+| `/explain` | Explain dbt model |
+| `/lineage` | Show data lineage |
+| `/run` | Execute dbt models |
+
+### data_ref Convention
+
+DataFrames are stored with references. Use meaningful names:
+- `raw_*` for source data
+- `stg_*` for staged/cleaned data
+- `dim_*` for dimension tables
+- `fct_*` for fact tables
+- `rpt_*` for reports
+
+### dbt Workflow
+
+1. Always validate before running: `/run` includes automatic `dbt_parse`
+2. For dbt 1.9+, check for deprecated syntax before commits
+3. Use `/lineage` to understand impact of changes
+
+### Database Access
+
+PostgreSQL tools require POSTGRES_URL configuration:
+- Read-only queries: `pg_query`
+- Write operations: `pg_execute`
+- Schema exploration: `pg_tables`, `pg_columns`
+
+PostGIS spatial data:
+- List spatial tables: `st_tables`
+- Check geometry: `st_geometry_type`, `st_srid`, `st_extent`
+```
+
+## Environment Variables
+
+Add to project `.env` if needed:
+
+```env
+# dbt configuration
+DBT_PROJECT_DIR=./transform
+DBT_PROFILES_DIR=~/.dbt
+
+# Memory limits
+DATA_PLATFORM_MAX_ROWS=100000
+```
+
+## Typical Workflows
+
+### Data Exploration
+```
+/ingest data/raw_customers.csv
+/profile raw_customers
+/schema
+```
+
+### ETL Development
+```
+/schema orders              # Understand source
+/explain stg_orders         # Understand transformation
+/run stg_orders             # Test the model
+/lineage fct_orders         # Check downstream impact
+```
+
+### Database Analysis
+```
+/schema                     # List all tables
+pg_columns orders           # Detailed schema
+st_tables                   # Find spatial data
+```
--- a/plugins/data-platform/commands/explain.md
+++ b/plugins/data-platform/commands/explain.md
@@ -0,0 +1,44 @@
+# /explain - dbt Model Explanation
+
+Explain a dbt model's purpose, dependencies, and SQL logic.
+
+## Usage
+
+```
+/explain <model_name>
+```
+
+## Workflow
+
+1. **Get model info**:
+   - Use `dbt_lineage` to get model metadata
+   - Extract description, tags, materialization
+
+2. **Analyze dependencies**:
+   - Show upstream models (what this depends on)
+   - Show downstream models (what depends on this)
+   - Visualize as dependency tree
+
+3. **Compile SQL**:
+   - Use `dbt_compile` to get rendered SQL
+   - Explain key transformations
+
+4. **Report**:
+   - Model purpose (from description)
+   - Materialization strategy
+   - Dependency graph
+   - Key SQL logic explained
+
+## Examples
+
+```
+/explain dim_customers
+/explain fct_orders
+```
+
+## Available Tools
+
+Use these MCP tools:
+- `dbt_lineage` - Get model dependencies
+- `dbt_compile` - Get compiled SQL
+- `dbt_ls` - List related resources
--- a/plugins/data-platform/commands/ingest.md
+++ b/plugins/data-platform/commands/ingest.md
@@ -0,0 +1,44 @@
+# /ingest - Data Ingestion
+
+Load data from files or database into the data platform.
+
+## Usage
+
+```
+/ingest [source]
+```
+
+## Workflow
+
+1. **Identify data source**:
+   - If source is a file path, determine format (CSV, Parquet, JSON)
+   - If source is "db" or a table name, query PostgreSQL
+
+2. **Load data**:
+   - For files: Use `read_csv`, `read_parquet`, or `read_json`
+   - For database: Use `pg_query` with appropriate SELECT
+
+3. **Validate**:
+   - Check row count against limits
+   - If exceeds 100k rows, suggest chunking or filtering
+
+4. **Report**:
+   - Show data_ref, row count, columns, and memory usage
+   - Preview first few rows
+
+## Examples
+
+```
+/ingest data/sales.csv
+/ingest data/customers.parquet
+/ingest "SELECT * FROM orders WHERE created_at > '2024-01-01'"
+```
+
+## Available Tools
+
+Use these MCP tools:
+- `read_csv` - Load CSV files
+- `read_parquet` - Load Parquet files
+- `read_json` - Load JSON/JSONL files
+- `pg_query` - Query PostgreSQL database
+- `list_data` - List loaded DataFrames
--- a/plugins/data-platform/commands/initial-setup.md
+++ b/plugins/data-platform/commands/initial-setup.md
@@ -0,0 +1,231 @@
+---
+description: Interactive setup wizard for data-platform plugin - configures MCP server and optional PostgreSQL/dbt
+---
+
+# Data Platform Setup Wizard
+
+This command sets up the data-platform plugin with pandas, PostgreSQL, and dbt integration.
+
+## Important Context
+
+- **This command uses Bash, Read, Write, and AskUserQuestion tools** - NOT MCP tools
+- **MCP tools won't work until after setup + session restart**
+- **PostgreSQL and dbt are optional** - pandas tools work without them
+
+---
+
+## Phase 1: Environment Validation
+
+### Step 1.1: Check Python Version
+
+```bash
+python3 --version
+```
+
+Requires Python 3.10+. If below, stop setup and inform user.
+
+### Step 1.2: Check for Required Libraries
+
+```bash
+python3 -c "import sys; print(f'Python {sys.version_info.major}.{sys.version_info.minor}')"
+```
+
+---
+
+## Phase 2: MCP Server Setup
+
+### Step 2.1: Locate Data Platform MCP Server
+
+The MCP server should be at the marketplace root:
+
+```bash
+# If running from installed marketplace
+ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_INSTALLED"
+
+# If running from source
+ls -la ~/claude-plugins-work/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_SOURCE"
+```
+
+Determine the correct path based on which exists.
+
+### Step 2.2: Check Virtual Environment
+
+```bash
+ls -la /path/to/mcp-servers/data-platform/.venv/bin/python 2>/dev/null && echo "VENV_EXISTS" || echo "VENV_MISSING"
+```
+
+### Step 2.3: Create Virtual Environment (if missing)
+
+```bash
+cd /path/to/mcp-servers/data-platform && python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt && deactivate
+```
+
+**Note:** This may take a few minutes due to pandas, pyarrow, and dbt dependencies.
+
+---
+
+## Phase 3: PostgreSQL Configuration (Optional)
+
+### Step 3.1: Ask About PostgreSQL
+
+Use AskUserQuestion:
+- Question: "Do you want to configure PostgreSQL database access?"
+- Header: "PostgreSQL"
+- Options:
+  - "Yes, I have a PostgreSQL database"
+  - "No, I'll only use pandas/dbt tools"
+
+**If user chooses "No":** Skip to Phase 4.
+
+### Step 3.2: Create Config Directory
+
+```bash
+mkdir -p ~/.config/claude
+```
+
+### Step 3.3: Check PostgreSQL Configuration
+
+```bash
+cat ~/.config/claude/postgres.env 2>/dev/null || echo "FILE_NOT_FOUND"
+```
+
+**If file exists with valid URL:** Skip to Step 3.6.
+**If missing or has placeholders:** Continue.
+
+### Step 3.4: Gather PostgreSQL Information
+
+Use AskUserQuestion:
+- Question: "What is your PostgreSQL connection URL format?"
+- Header: "DB Format"
+- Options:
+  - "Standard: postgresql://user:pass@host:5432/db"
+  - "PostGIS: postgresql://user:pass@host:5432/db (with PostGIS extension)"
+  - "Other (I'll provide the full URL)"
+
+Ask user to provide the connection URL.
+
+### Step 3.5: Create Configuration File
+
+```bash
+cat > ~/.config/claude/postgres.env << 'EOF'
+# PostgreSQL Configuration
+# Generated by data-platform /initial-setup
+
+POSTGRES_URL=<USER_PROVIDED_URL>
+EOF
+chmod 600 ~/.config/claude/postgres.env
+```
+
+### Step 3.6: Test PostgreSQL Connection (if configured)
+
+```bash
+source ~/.config/claude/postgres.env && python3 -c "
+import asyncio
+import asyncpg
+async def test():
+    try:
+        conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
+        ver = await conn.fetchval('SELECT version()')
+        await conn.close()
+        print(f'SUCCESS: {ver.split(\",\")[0]}')
+    except Exception as e:
+        print(f'FAILED: {e}')
+asyncio.run(test())
+"
+```
+
+Report result:
+- SUCCESS: Connection works
+- FAILED: Show error and suggest fixes
+
+---
+
+## Phase 4: dbt Configuration (Optional)
+
+### Step 4.1: Ask About dbt
+
+Use AskUserQuestion:
+- Question: "Do you use dbt for data transformations in your projects?"
+- Header: "dbt"
+- Options:
+  - "Yes, I have dbt projects"
+  - "No, I don't use dbt"
+
+**If user chooses "No":** Skip to Phase 5.
+
+### Step 4.2: dbt Discovery
+
+dbt configuration is **project-level** (not system-level). The plugin auto-detects dbt projects by looking for `dbt_project.yml`.
+
+Inform user:
+```
+dbt projects are detected automatically when you work in a directory
+containing dbt_project.yml.
+
+If your dbt project is in a subdirectory, you can set DBT_PROJECT_DIR
+in your project's .env file:
+
+  DBT_PROJECT_DIR=./transform
+  DBT_PROFILES_DIR=~/.dbt
+```
+
+### Step 4.3: Check dbt Installation
+
+```bash
+dbt --version 2>/dev/null || echo "DBT_NOT_FOUND"
+```
+
+**If not found:** Inform user that dbt CLI tools require dbt-core to be installed globally or in the project.
+
+---
+
+## Phase 5: Validation
+
+### Step 5.1: Verify MCP Server
+
+```bash
+cd /path/to/mcp-servers/data-platform && .venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('MCP Server OK')"
+```
+
+### Step 5.2: Summary
+
+```
+╔════════════════════════════════════════════════════════════╗
+║            DATA-PLATFORM SETUP COMPLETE                    ║
+╠════════════════════════════════════════════════════════════╣
+║ MCP Server:        ✓ Ready                                 ║
+║ pandas Tools:      ✓ Available (14 tools)                  ║
+║ PostgreSQL Tools:  [✓/✗] [Status based on config]          ║
+║ PostGIS Tools:     [✓/✗] [Status based on PostGIS]         ║
+║ dbt Tools:         [✓/✗] [Status based on discovery]       ║
+╚════════════════════════════════════════════════════════════╝
+```
+
+### Step 5.3: Session Restart Notice
+
+---
+
+**⚠️ Session Restart Required**
+
+Restart your Claude Code session for MCP tools to become available.
+
+**After restart, you can:**
+- Run `/ingest` to load data from files or database
+- Run `/profile` to analyze DataFrame statistics
+- Run `/schema` to explore database/DataFrame schema
+- Run `/run` to execute dbt models (if configured)
+- Run `/lineage` to view dbt model dependencies
+
+---
+
+## Memory Limits
+
+The data-platform plugin has a default row limit of 100,000 rows per DataFrame. For larger datasets:
+- Use chunked processing (`chunk_size` parameter)
+- Filter data before loading
+- Store to Parquet for efficient re-loading
+
+You can override the limit by setting in your project `.env`:
+```
+DATA_PLATFORM_MAX_ROWS=500000
+```
--- a/plugins/data-platform/commands/lineage.md
+++ b/plugins/data-platform/commands/lineage.md
@@ -0,0 +1,60 @@
+# /lineage - Data Lineage Visualization
+
+Show data lineage for dbt models or database tables.
+
+## Usage
+
+```
+/lineage <model_name> [--depth N]
+```
+
+## Workflow
+
+1. **Get lineage data**:
+   - Use `dbt_lineage` for dbt models
+   - For database tables, trace through dbt manifest
+
+2. **Build lineage graph**:
+   - Identify all upstream sources
+   - Identify all downstream consumers
+   - Note materialization at each node
+
+3. **Visualize**:
+   - ASCII art dependency tree
+   - List format with indentation
+   - Show depth levels
+
+4. **Report**:
+   - Full dependency chain
+   - Critical path identification
+   - Refresh implications
+
+## Examples
+
+```
+/lineage dim_customers
+/lineage fct_orders --depth 3
+```
+
+## Output Format
+
+```
+Sources:
+  └── raw_customers (source)
+  └── raw_orders (source)
+
+dim_customers (table)
+  ├── upstream:
+  │   └── stg_customers (view)
+  │       └── raw_customers (source)
+  └── downstream:
+      └── fct_orders (incremental)
+      └── rpt_customer_lifetime (table)
+```
+
+## Available Tools
+
+Use these MCP tools:
+- `dbt_lineage` - Get model dependencies
+- `dbt_ls` - List dbt resources
+- `dbt_docs_generate` - Generate full manifest
--- a/plugins/data-platform/commands/profile.md
+++ b/plugins/data-platform/commands/profile.md
@@ -0,0 +1,44 @@
+# /profile - Data Profiling
+
+Generate statistical profile and quality report for a DataFrame.
+
+## Usage
+
+```
+/profile <data_ref>
+```
+
+## Workflow
+
+1. **Get data reference**:
+   - If no data_ref provided, use `list_data` to show available options
+   - Validate the data_ref exists
+
+2. **Generate profile**:
+   - Use `describe` for statistical summary
+   - Analyze null counts, unique values, data types
+
+3. **Quality assessment**:
+   - Identify columns with high null percentage
+   - Flag potential data quality issues
+   - Suggest cleaning operations if needed
+
+4. **Report**:
+   - Summary statistics per column
+   - Data type distribution
+   - Memory usage
+   - Quality score
+
+## Examples
+
+```
+/profile sales_data
+/profile df_a1b2c3d4
+```
+
+## Available Tools
+
+Use these MCP tools:
+- `describe` - Get statistical summary
+- `head` - Preview first rows
+- `list_data` - List available DataFrames
--- a/plugins/data-platform/commands/run.md
+++ b/plugins/data-platform/commands/run.md
@@ -0,0 +1,55 @@
+# /run - Execute dbt Models
+
+Run dbt models with automatic pre-validation.
+
+## Usage
+
+```
+/run [model_selection] [--full-refresh]
+```
+
+## Workflow
+
+1. **Pre-validation** (MANDATORY):
+   - Use `dbt_parse` to validate project
+   - Check for deprecated syntax (dbt 1.9+)
+   - If validation fails, show errors and STOP
+
+2. **Execute models**:
+   - Use `dbt_run` with provided selection
+   - Monitor progress and capture output
+
+3. **Report results**:
+   - Success/failure status per model
+   - Execution time
+   - Row counts where available
+   - Any warnings or errors
+
+## Examples
+
+```
+/run                           # Run all models
+/run dim_customers             # Run specific model
+/run +fct_orders               # Run model and its upstream
+/run tag:daily                 # Run models with tag
+/run --full-refresh            # Rebuild incremental models
+```
+
+## Selection Syntax
+
+| Pattern | Meaning |
+|---------|---------|
+| `model_name` | Run single model |
+| `+model_name` | Run model and upstream |
+| `model_name+` | Run model and downstream |
+| `+model_name+` | Run model with all deps |
+| `tag:name` | Run by tag |
+| `path:models/staging` | Run by path |
+
+## Available Tools
+
+Use these MCP tools:
+- `dbt_parse` - Pre-validation (ALWAYS RUN FIRST)
+- `dbt_run` - Execute models
+- `dbt_build` - Run + test
+- `dbt_test` - Run tests only
--- a/plugins/data-platform/commands/schema.md
+++ b/plugins/data-platform/commands/schema.md
@@ -0,0 +1,48 @@
+# /schema - Schema Exploration
+
+Display schema information for database tables or DataFrames.
+
+## Usage
+
+```
+/schema [table_name | data_ref]
+```
+
+## Workflow
+
+1. **Determine target**:
+   - If argument is a loaded data_ref, show DataFrame schema
+   - If argument is a table name, query database schema
+   - If no argument, list all available tables and DataFrames
+
+2. **For DataFrames**:
+   - Use `describe` to get column info
+   - Show dtypes, null counts, sample values
+
+3. **For database tables**:
+   - Use `pg_columns` for column details
+   - Use `st_tables` to check for PostGIS columns
+   - Show constraints and indexes if available
+
+4. **Report**:
+   - Column name, type, nullable, default
+   - For PostGIS: geometry type, SRID
+   - For DataFrames: pandas dtype, null percentage
+
+## Examples
+
+```
+/schema                    # List all tables and DataFrames
+/schema customers          # Show table schema
+/schema sales_data         # Show DataFrame schema
+```
+
+## Available Tools
+
+Use these MCP tools:
+- `pg_tables` - List database tables
+- `pg_columns` - Get column info
+- `pg_schemas` - List schemas
+- `st_tables` - List PostGIS tables
+- `describe` - Get DataFrame info
+- `list_data` - List DataFrames
--- a/plugins/data-platform/hooks/hooks.json
+++ b/plugins/data-platform/hooks/hooks.json
@@ -0,0 +1,10 @@
+{
+  "hooks": {
+    "SessionStart": [
+      {
+        "type": "command",
+        "command": "${CLAUDE_PLUGIN_ROOT}/hooks/startup-check.sh"
+      }
+    ]
+  }
+}
--- a/plugins/data-platform/hooks/startup-check.sh
+++ b/plugins/data-platform/hooks/startup-check.sh
@@ -0,0 +1,54 @@
+#!/bin/bash
+# data-platform startup check hook
+# Checks for common issues at session start
+# All output MUST have [data-platform] prefix
+
+PREFIX="[data-platform]"
+
+# Check if MCP venv exists
+PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT:-$(dirname "$(dirname "$(realpath "$0")")")}"
+VENV_PATH="$PLUGIN_ROOT/mcp-servers/data-platform/.venv/bin/python"
+
+if [[ ! -f "$VENV_PATH" ]]; then
+    echo "$PREFIX MCP venv missing - run /initial-setup or setup.sh"
+    exit 0
+fi
+
+# Check PostgreSQL configuration (optional - just warn if configured but failing)
+POSTGRES_CONFIG="$HOME/.config/claude/postgres.env"
+if [[ -f "$POSTGRES_CONFIG" ]]; then
+    source "$POSTGRES_CONFIG"
+    if [[ -n "${POSTGRES_URL:-}" ]]; then
+        # Quick connection test (5 second timeout)
+        RESULT=$("$VENV_PATH" -c "
+import asyncio
+import sys
+async def test():
+    try:
+        import asyncpg
+        conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
+        await conn.close()
+        return 'OK'
+    except Exception as e:
+        return f'FAIL: {e}'
+print(asyncio.run(test()))
+" 2>/dev/null || echo "FAIL: asyncpg not installed")
+
+        if [[ "$RESULT" == "OK" ]]; then
+            # PostgreSQL OK - say nothing
+            :
+        elif [[ "$RESULT" == *"FAIL"* ]]; then
+            echo "$PREFIX PostgreSQL connection failed - check POSTGRES_URL"
+        fi
+    fi
+fi
+
+# Check dbt project (if in a project with dbt_project.yml)
+if [[ -f "dbt_project.yml" ]] || [[ -f "transform/dbt_project.yml" ]]; then
+    if ! command -v dbt &> /dev/null; then
+        echo "$PREFIX dbt CLI not found - dbt tools unavailable"
+    fi
+fi
+
+# All checks passed - say nothing
+exit 0
--- a/plugins/data-platform/mcp-servers/data-platform
+++ b/plugins/data-platform/mcp-servers/data-platform
@@ -0,0 +1 @@
+../../../mcp-servers/data-platform