Add new data-platform plugin for data engineering workflows with: MCP Server (32 tools): - pandas operations (14 tools): read_csv, read_parquet, read_json, to_csv, to_parquet, describe, head, tail, filter, select, groupby, join, list_data, drop_data - PostgreSQL/PostGIS (10 tools): pg_connect, pg_query, pg_execute, pg_tables, pg_columns, pg_schemas, st_tables, st_geometry_type, st_srid, st_extent - dbt integration (8 tools): dbt_parse, dbt_run, dbt_test, dbt_build, dbt_compile, dbt_ls, dbt_docs_generate, dbt_lineage Plugin Features: - Arrow IPC data_ref system for DataFrame persistence across tool calls - Pre-execution validation for dbt with `dbt parse` - SessionStart hook for PostgreSQL connectivity check (non-blocking) - Hybrid configuration (system ~/.config/claude/postgres.env + project .env) - Memory management with 100k row limit and chunking support Commands: /initial-setup, /ingest, /profile, /schema, /explain, /lineage, /run Agents: data-ingestion, data-analysis Test suite: 71 tests covering config, data store, pandas, postgres, dbt tools Addresses data workflow issues from personal-portfolio project: - Lost data after multiple interactions (solved by Arrow IPC data_ref) - dbt 1.9+ syntax deprecation (solved by pre-execution validation) - Ungraceful PostgreSQL error handling (solved by SessionStart hook) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
232 lines
6.4 KiB
Markdown
232 lines
6.4 KiB
Markdown
---
|
|
description: Interactive setup wizard for data-platform plugin - configures MCP server and optional PostgreSQL/dbt
|
|
---
|
|
|
|
# Data Platform Setup Wizard
|
|
|
|
This command sets up the data-platform plugin with pandas, PostgreSQL, and dbt integration.
|
|
|
|
## Important Context
|
|
|
|
- **This command uses Bash, Read, Write, and AskUserQuestion tools** - NOT MCP tools
|
|
- **MCP tools won't work until after setup + session restart**
|
|
- **PostgreSQL and dbt are optional** - pandas tools work without them
|
|
|
|
---
|
|
|
|
## Phase 1: Environment Validation
|
|
|
|
### Step 1.1: Check Python Version
|
|
|
|
```bash
|
|
python3 --version
|
|
```
|
|
|
|
Requires Python 3.10+. If below, stop setup and inform user.
|
|
|
|
### Step 1.2: Check for Required Libraries
|
|
|
|
```bash
|
|
python3 -c "import sys; print(f'Python {sys.version_info.major}.{sys.version_info.minor}')"
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 2: MCP Server Setup
|
|
|
|
### Step 2.1: Locate Data Platform MCP Server
|
|
|
|
The MCP server should be at the marketplace root:
|
|
|
|
```bash
|
|
# If running from installed marketplace
|
|
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_INSTALLED"
|
|
|
|
# If running from source
|
|
ls -la ~/claude-plugins-work/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_SOURCE"
|
|
```
|
|
|
|
Determine the correct path based on which exists.
|
|
|
|
### Step 2.2: Check Virtual Environment
|
|
|
|
```bash
|
|
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python 2>/dev/null && echo "VENV_EXISTS" || echo "VENV_MISSING"
|
|
```
|
|
|
|
### Step 2.3: Create Virtual Environment (if missing)
|
|
|
|
```bash
|
|
cd /path/to/mcp-servers/data-platform && python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt && deactivate
|
|
```
|
|
|
|
**Note:** This may take a few minutes due to pandas, pyarrow, and dbt dependencies.
|
|
|
|
---
|
|
|
|
## Phase 3: PostgreSQL Configuration (Optional)
|
|
|
|
### Step 3.1: Ask About PostgreSQL
|
|
|
|
Use AskUserQuestion:
|
|
- Question: "Do you want to configure PostgreSQL database access?"
|
|
- Header: "PostgreSQL"
|
|
- Options:
|
|
- "Yes, I have a PostgreSQL database"
|
|
- "No, I'll only use pandas/dbt tools"
|
|
|
|
**If user chooses "No":** Skip to Phase 4.
|
|
|
|
### Step 3.2: Create Config Directory
|
|
|
|
```bash
|
|
mkdir -p ~/.config/claude
|
|
```
|
|
|
|
### Step 3.3: Check PostgreSQL Configuration
|
|
|
|
```bash
|
|
cat ~/.config/claude/postgres.env 2>/dev/null || echo "FILE_NOT_FOUND"
|
|
```
|
|
|
|
**If file exists with valid URL:** Skip to Step 3.6.
|
|
**If missing or has placeholders:** Continue.
|
|
|
|
### Step 3.4: Gather PostgreSQL Information
|
|
|
|
Use AskUserQuestion:
|
|
- Question: "What is your PostgreSQL connection URL format?"
|
|
- Header: "DB Format"
|
|
- Options:
|
|
- "Standard: postgresql://user:pass@host:5432/db"
|
|
- "PostGIS: postgresql://user:pass@host:5432/db (with PostGIS extension)"
|
|
- "Other (I'll provide the full URL)"
|
|
|
|
Ask user to provide the connection URL.
|
|
|
|
### Step 3.5: Create Configuration File
|
|
|
|
```bash
|
|
cat > ~/.config/claude/postgres.env << 'EOF'
|
|
# PostgreSQL Configuration
|
|
# Generated by data-platform /initial-setup
|
|
|
|
POSTGRES_URL=<USER_PROVIDED_URL>
|
|
EOF
|
|
chmod 600 ~/.config/claude/postgres.env
|
|
```
|
|
|
|
### Step 3.6: Test PostgreSQL Connection (if configured)
|
|
|
|
```bash
|
|
source ~/.config/claude/postgres.env && python3 -c "
|
|
import asyncio
|
|
import asyncpg
|
|
async def test():
|
|
try:
|
|
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
|
|
ver = await conn.fetchval('SELECT version()')
|
|
await conn.close()
|
|
print(f'SUCCESS: {ver.split(\",\")[0]}')
|
|
except Exception as e:
|
|
print(f'FAILED: {e}')
|
|
asyncio.run(test())
|
|
"
|
|
```
|
|
|
|
Report result:
|
|
- SUCCESS: Connection works
|
|
- FAILED: Show error and suggest fixes
|
|
|
|
---
|
|
|
|
## Phase 4: dbt Configuration (Optional)
|
|
|
|
### Step 4.1: Ask About dbt
|
|
|
|
Use AskUserQuestion:
|
|
- Question: "Do you use dbt for data transformations in your projects?"
|
|
- Header: "dbt"
|
|
- Options:
|
|
- "Yes, I have dbt projects"
|
|
- "No, I don't use dbt"
|
|
|
|
**If user chooses "No":** Skip to Phase 5.
|
|
|
|
### Step 4.2: dbt Discovery
|
|
|
|
dbt configuration is **project-level** (not system-level). The plugin auto-detects dbt projects by looking for `dbt_project.yml`.
|
|
|
|
Inform user:
|
|
```
|
|
dbt projects are detected automatically when you work in a directory
|
|
containing dbt_project.yml.
|
|
|
|
If your dbt project is in a subdirectory, you can set DBT_PROJECT_DIR
|
|
in your project's .env file:
|
|
|
|
DBT_PROJECT_DIR=./transform
|
|
DBT_PROFILES_DIR=~/.dbt
|
|
```
|
|
|
|
### Step 4.3: Check dbt Installation
|
|
|
|
```bash
|
|
dbt --version 2>/dev/null || echo "DBT_NOT_FOUND"
|
|
```
|
|
|
|
**If not found:** Inform user that dbt CLI tools require dbt-core to be installed globally or in the project.
|
|
|
|
---
|
|
|
|
## Phase 5: Validation
|
|
|
|
### Step 5.1: Verify MCP Server
|
|
|
|
```bash
|
|
cd /path/to/mcp-servers/data-platform && .venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('MCP Server OK')"
|
|
```
|
|
|
|
### Step 5.2: Summary
|
|
|
|
```
|
|
╔════════════════════════════════════════════════════════════╗
|
|
║ DATA-PLATFORM SETUP COMPLETE ║
|
|
╠════════════════════════════════════════════════════════════╣
|
|
║ MCP Server: ✓ Ready ║
|
|
║ pandas Tools: ✓ Available (14 tools) ║
|
|
║ PostgreSQL Tools: [✓/✗] [Status based on config] ║
|
|
║ PostGIS Tools: [✓/✗] [Status based on PostGIS] ║
|
|
║ dbt Tools: [✓/✗] [Status based on discovery] ║
|
|
╚════════════════════════════════════════════════════════════╝
|
|
```
|
|
|
|
### Step 5.3: Session Restart Notice
|
|
|
|
---
|
|
|
|
**⚠️ Session Restart Required**
|
|
|
|
Restart your Claude Code session for MCP tools to become available.
|
|
|
|
**After restart, you can:**
|
|
- Run `/ingest` to load data from files or database
|
|
- Run `/profile` to analyze DataFrame statistics
|
|
- Run `/schema` to explore database/DataFrame schema
|
|
- Run `/run` to execute dbt models (if configured)
|
|
- Run `/lineage` to view dbt model dependencies
|
|
|
|
---
|
|
|
|
## Memory Limits
|
|
|
|
The data-platform plugin has a default row limit of 100,000 rows per DataFrame. For larger datasets:
|
|
- Use chunked processing (`chunk_size` parameter)
|
|
- Filter data before loading
|
|
- Store to Parquet for efficient re-loading
|
|
|
|
You can override the limit by setting in your project `.env`:
|
|
```
|
|
DATA_PLATFORM_MAX_ROWS=500000
|
|
```
|