leo-claude-mktplace/plugins/data-platform/commands/initial-setup.md

---
description: Interactive setup wizard for data-platform plugin - configures MCP server and optional PostgreSQL/dbt
---

# Data Platform Setup Wizard

This command sets up the data-platform plugin with pandas, PostgreSQL, and dbt integration.

## Important Context

- **This command uses Bash, Read, Write, and AskUserQuestion tools** - NOT MCP tools
- **MCP tools won't work until after setup + session restart**
- **PostgreSQL and dbt are optional** - pandas tools work without them

---

## Phase 1: Environment Validation

### Step 1.1: Check Python Version

```bash
python3 --version
```

Requires Python 3.10+. If below, stop setup and inform user.

### Step 1.2: Check for Required Libraries

```bash
python3 -c "import sys; print(f'Python {sys.version_info.major}.{sys.version_info.minor}')"
```

---

## Phase 2: MCP Server Setup

### Step 2.1: Locate Data Platform MCP Server

The MCP server should be at the marketplace root:

```bash
# If running from installed marketplace
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_INSTALLED"

# If running from source
ls -la ~/claude-plugins-work/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_SOURCE"
```

Determine the correct path based on which exists.

### Step 2.2: Check Virtual Environment

```bash
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python 2>/dev/null && echo "VENV_EXISTS" || echo "VENV_MISSING"
```

### Step 2.3: Create Virtual Environment (if missing)

```bash
cd /path/to/mcp-servers/data-platform && python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt && deactivate
```

**Note:** This may take a few minutes due to pandas, pyarrow, and dbt dependencies.

---

## Phase 3: PostgreSQL Configuration (Optional)

### Step 3.1: Ask About PostgreSQL

Use AskUserQuestion:
- Question: "Do you want to configure PostgreSQL database access?"
- Header: "PostgreSQL"
- Options:
  - "Yes, I have a PostgreSQL database"
  - "No, I'll only use pandas/dbt tools"

**If user chooses "No":** Skip to Phase 4.

### Step 3.2: Create Config Directory

```bash
mkdir -p ~/.config/claude
```

### Step 3.3: Check PostgreSQL Configuration

```bash
cat ~/.config/claude/postgres.env 2>/dev/null || echo "FILE_NOT_FOUND"
```

**If file exists with valid URL:** Skip to Step 3.6.
**If missing or has placeholders:** Continue.

### Step 3.4: Gather PostgreSQL Information

Use AskUserQuestion:
- Question: "What is your PostgreSQL connection URL format?"
- Header: "DB Format"
- Options:
  - "Standard: postgresql://user:pass@host:5432/db"
  - "PostGIS: postgresql://user:pass@host:5432/db (with PostGIS extension)"
  - "Other (I'll provide the full URL)"

Ask user to provide the connection URL.

### Step 3.5: Create Configuration File

```bash
cat > ~/.config/claude/postgres.env << 'EOF'
# PostgreSQL Configuration
# Generated by data-platform /initial-setup

POSTGRES_URL=<USER_PROVIDED_URL>
EOF
chmod 600 ~/.config/claude/postgres.env
```

### Step 3.6: Test PostgreSQL Connection (if configured)

```bash
source ~/.config/claude/postgres.env && python3 -c "
import asyncio
import asyncpg
async def test():
    try:
        conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
        ver = await conn.fetchval('SELECT version()')
        await conn.close()
        print(f'SUCCESS: {ver.split(\",\")[0]}')
    except Exception as e:
        print(f'FAILED: {e}')
asyncio.run(test())
"
```

Report result:
- SUCCESS: Connection works
- FAILED: Show error and suggest fixes

---

## Phase 4: dbt Configuration (Optional)

### Step 4.1: Ask About dbt

Use AskUserQuestion:
- Question: "Do you use dbt for data transformations in your projects?"
- Header: "dbt"
- Options:
  - "Yes, I have dbt projects"
  - "No, I don't use dbt"

**If user chooses "No":** Skip to Phase 5.

### Step 4.2: dbt Discovery

dbt configuration is **project-level** (not system-level). The plugin auto-detects dbt projects by looking for `dbt_project.yml`.

Inform user:
```
dbt projects are detected automatically when you work in a directory
containing dbt_project.yml.

If your dbt project is in a subdirectory, you can set DBT_PROJECT_DIR
in your project's .env file:

  DBT_PROJECT_DIR=./transform
  DBT_PROFILES_DIR=~/.dbt
```

### Step 4.3: Check dbt Installation

```bash
dbt --version 2>/dev/null || echo "DBT_NOT_FOUND"
```

**If not found:** Inform user that dbt CLI tools require dbt-core to be installed globally or in the project.

---

## Phase 5: Validation

### Step 5.1: Verify MCP Server

```bash
cd /path/to/mcp-servers/data-platform && .venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('MCP Server OK')"
```

### Step 5.2: Summary

```
╔════════════════════════════════════════════════════════════╗
║            DATA-PLATFORM SETUP COMPLETE                    ║
╠════════════════════════════════════════════════════════════╣
║ MCP Server:        ✓ Ready                                 ║
║ pandas Tools:      ✓ Available (14 tools)                  ║
║ PostgreSQL Tools:  [✓/✗] [Status based on config]          ║
║ PostGIS Tools:     [✓/✗] [Status based on PostGIS]         ║
║ dbt Tools:         [✓/✗] [Status based on discovery]       ║
╚════════════════════════════════════════════════════════════╝
```

### Step 5.3: Session Restart Notice

---

**⚠️ Session Restart Required**

Restart your Claude Code session for MCP tools to become available.

**After restart, you can:**
- Run `/ingest` to load data from files or database
- Run `/profile` to analyze DataFrame statistics
- Run `/schema` to explore database/DataFrame schema
- Run `/run` to execute dbt models (if configured)
- Run `/lineage` to view dbt model dependencies

---

## Memory Limits

The data-platform plugin has a default row limit of 100,000 rows per DataFrame. For larger datasets:
- Use chunked processing (`chunk_size` parameter)
- Filter data before loading
- Store to Parquet for efficient re-loading

You can override the limit by setting in your project `.env`:
```
DATA_PLATFORM_MAX_ROWS=500000
```