leo-claude-mktplace/plugins/data-platform/README.md

# data-platform Plugin

Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration for Claude Code.

## Features

- **pandas Operations**: Load, transform, and export DataFrames with persistent data_ref system
- **PostgreSQL/PostGIS**: Database queries with connection pooling and spatial data support
- **dbt Integration**: Build tool wrapper with pre-execution validation

## Installation

This plugin is part of the leo-claude-mktplace. Install via:

```bash
# From marketplace
claude plugins install leo-claude-mktplace/data-platform

# Setup MCP server venv
cd ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## Configuration

### PostgreSQL (Optional)

Create `~/.config/claude/postgres.env`:

```env
POSTGRES_URL=postgresql://user:password@host:5432/database
```

### dbt (Optional)

Add to project `.env`:

```env
DBT_PROJECT_DIR=/path/to/dbt/project
DBT_PROFILES_DIR=~/.dbt
```

## Commands

| Command | Description |
|---------|-------------|
| `/initial-setup` | Interactive setup wizard for PostgreSQL and dbt configuration |
| `/ingest` | Load data from files or database |
| `/profile` | Generate data profile and statistics |
| `/data-quality` | Data quality assessment with pass/warn/fail scoring |
| `/schema` | Show database/DataFrame schema |
| `/explain` | Explain dbt model lineage |
| `/lineage` | Visualize data dependencies (ASCII) |
| `/lineage-viz` | Generate Mermaid flowchart for dbt lineage |
| `/run` | Execute dbt models |
| `/dbt-test` | Run dbt tests with formatted results |

## Agents

| Agent | Description |
|-------|-------------|
| `data-ingestion` | Data loading and transformation specialist |
| `data-analysis` | Exploration and profiling specialist |

## data_ref System

All DataFrame operations use a `data_ref` system for persistence:

```
# Load returns a reference
read_csv("data.csv") → {"data_ref": "sales_data"}

# Use reference in subsequent operations
filter("sales_data", "amount > 100") → {"data_ref": "sales_data_filtered"}
describe("sales_data_filtered") → {statistics}
```

## Example Workflow

```
/ingest data/sales.csv
# → Loaded 50,000 rows as "sales_data"

/profile sales_data
# → Statistical summary, null counts, quality assessment

/schema orders
# → Column names, types, constraints

/lineage fct_orders
# → Dependency graph showing upstream/downstream models

/run dim_customers
# → Pre-validates then executes dbt model
```

## Tools Summary

### pandas (14 tools)
`read_csv`, `read_parquet`, `read_json`, `to_csv`, `to_parquet`, `describe`, `head`, `tail`, `filter`, `select`, `groupby`, `join`, `list_data`, `drop_data`

### PostgreSQL (6 tools)
`pg_connect`, `pg_query`, `pg_execute`, `pg_tables`, `pg_columns`, `pg_schemas`

### PostGIS (4 tools)
`st_tables`, `st_geometry_type`, `st_srid`, `st_extent`

### dbt (8 tools)
`dbt_parse`, `dbt_run`, `dbt_test`, `dbt_build`, `dbt_compile`, `dbt_ls`, `dbt_docs_generate`, `dbt_lineage`

## Memory Management

- Default limit: 100,000 rows per DataFrame
- Configure via `DATA_PLATFORM_MAX_ROWS` environment variable
- Use `chunk_size` parameter for large files
- Monitor with `list_data` tool

## SessionStart Hook

On session start, the plugin checks PostgreSQL connectivity and displays a warning if unavailable. This is non-blocking - pandas and dbt tools remain available.