Add new data-platform plugin for data engineering workflows with: MCP Server (32 tools): - pandas operations (14 tools): read_csv, read_parquet, read_json, to_csv, to_parquet, describe, head, tail, filter, select, groupby, join, list_data, drop_data - PostgreSQL/PostGIS (10 tools): pg_connect, pg_query, pg_execute, pg_tables, pg_columns, pg_schemas, st_tables, st_geometry_type, st_srid, st_extent - dbt integration (8 tools): dbt_parse, dbt_run, dbt_test, dbt_build, dbt_compile, dbt_ls, dbt_docs_generate, dbt_lineage Plugin Features: - Arrow IPC data_ref system for DataFrame persistence across tool calls - Pre-execution validation for dbt with `dbt parse` - SessionStart hook for PostgreSQL connectivity check (non-blocking) - Hybrid configuration (system ~/.config/claude/postgres.env + project .env) - Memory management with 100k row limit and chunking support Commands: /initial-setup, /ingest, /profile, /schema, /explain, /lineage, /run Agents: data-ingestion, data-analysis Test suite: 71 tests covering config, data store, pandas, postgres, dbt tools Addresses data workflow issues from personal-portfolio project: - Lost data after multiple interactions (solved by Arrow IPC data_ref) - dbt 1.9+ syntax deprecation (solved by pre-execution validation) - Ungraceful PostgreSQL error handling (solved by SessionStart hook) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.6 KiB
3.6 KiB
Data Platform MCP Server
MCP Server providing pandas, PostgreSQL/PostGIS, and dbt tools for Claude Code.
Features
- pandas Tools: DataFrame operations with Arrow IPC data_ref persistence
- PostgreSQL Tools: Database queries with asyncpg connection pooling
- PostGIS Tools: Spatial data operations
- dbt Tools: Build tool wrapper with pre-execution validation
Installation
cd mcp-servers/data-platform
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
Configuration
System-Level (PostgreSQL credentials)
Create ~/.config/claude/postgres.env:
POSTGRES_URL=postgresql://user:password@host:5432/database
Project-Level (dbt paths)
Create .env in your project root:
DBT_PROJECT_DIR=/path/to/dbt/project
DBT_PROFILES_DIR=/path/to/.dbt
DATA_PLATFORM_MAX_ROWS=100000
Tools
pandas Tools (14 tools)
| Tool | Description |
|---|---|
read_csv |
Load CSV file into DataFrame |
read_parquet |
Load Parquet file into DataFrame |
read_json |
Load JSON/JSONL file into DataFrame |
to_csv |
Export DataFrame to CSV file |
to_parquet |
Export DataFrame to Parquet file |
describe |
Get statistical summary of DataFrame |
head |
Get first N rows of DataFrame |
tail |
Get last N rows of DataFrame |
filter |
Filter DataFrame rows by condition |
select |
Select specific columns from DataFrame |
groupby |
Group DataFrame and aggregate |
join |
Join two DataFrames |
list_data |
List all stored DataFrames |
drop_data |
Remove a DataFrame from storage |
PostgreSQL Tools (6 tools)
| Tool | Description |
|---|---|
pg_connect |
Test connection and return status |
pg_query |
Execute SELECT, return as data_ref |
pg_execute |
Execute INSERT/UPDATE/DELETE |
pg_tables |
List all tables in schema |
pg_columns |
Get column info for table |
pg_schemas |
List all schemas |
PostGIS Tools (4 tools)
| Tool | Description |
|---|---|
st_tables |
List PostGIS-enabled tables |
st_geometry_type |
Get geometry type of column |
st_srid |
Get SRID of geometry column |
st_extent |
Get bounding box of geometries |
dbt Tools (8 tools)
| Tool | Description |
|---|---|
dbt_parse |
Validate project (pre-execution) |
dbt_run |
Run models with selection |
dbt_test |
Run tests |
dbt_build |
Run + test |
dbt_compile |
Compile SQL without executing |
dbt_ls |
List resources |
dbt_docs_generate |
Generate documentation |
dbt_lineage |
Get model dependencies |
data_ref System
All DataFrame operations use a data_ref system to persist data across tool calls:
- Load data: Returns a
data_refstring (e.g.,"df_a1b2c3d4") - Use data_ref: Pass to other tools (filter, join, export)
- List data: Use
list_datato see all stored DataFrames - Clean up: Use
drop_datawhen done
Example Flow
read_csv("data.csv") → {"data_ref": "sales_data", "rows": 1000}
filter("sales_data", "amount > 100") → {"data_ref": "sales_data_filtered"}
describe("sales_data_filtered") → {statistics}
to_parquet("sales_data_filtered", "output.parquet") → {success}
Memory Management
- Default row limit: 100,000 rows per DataFrame
- Configure via
DATA_PLATFORM_MAX_ROWSenvironment variable - Use chunked processing for large files (
chunk_sizeparameter) - Monitor with
list_datatool (shows memory usage)
Running
python -m mcp_server.server
Development
pip install -e ".[dev]"
pytest