diff --git a/Change-V04.0.0%3A-Proposal-%28Implementation-1%29.md b/Change-V04.0.0%3A-Proposal-%28Implementation-1%29.md deleted file mode 100644 index 79cb738..0000000 --- a/Change-V04.0.0%3A-Proposal-%28Implementation-1%29.md +++ /dev/null @@ -1,452 +0,0 @@ -# data-platform Plugin Implementation Plan (v4.0.0) - -> **Origin:** [Change V04.0.0: Proposal](Change-V04.0.0:-Proposal) -> **Status:** Implemented -> **Date:** 2026-01-25 - ---- - -## Overview - -Implement a new `data-platform` plugin for leo-claude-mktplace that addresses data workflow issues encountered in the personal-portfolio project: -- Lost data after multiple interactions (solved by Arrow IPC data_ref passing) -- dbt 1.9+ syntax deprecation (solved by pre-execution validation with `dbt parse`) -- Ungraceful PostgreSQL error handling (solved by SessionStart hook with warnings) - -## Architecture Decisions - -| Decision | Choice | -|----------|--------| -| Data Passing | Arrow IPC with data_ref | -| DB Auth | Environment variables (~/.config/claude/postgres.env) | -| dbt Discovery | Auto-detect + explicit override | -| dbt Validation | Pre-execution (`dbt parse`) | -| Plugin Structure | Single plugin, 3 MCP servers | -| Server Location | Root mcp-servers/ | -| Memory Management | 100k row limit with chunking | -| PostGIS Support | Yes, with geoalchemy2 | -| Agent Model | 2 agents (Ingestion + Analysis) | -| Commands | Core 6 | -| Startup Hook | Graceful DB warning (non-blocking) | -| MCP Framework | Manual SDK (following gitea pattern) | - -## File Structure - -``` -mcp-servers/ -└── data-platform/ - ├── mcp_server/ - │ ├── __init__.py - │ ├── server.py # Main MCP server with routing - │ ├── config.py # Hybrid config (system + project) - │ ├── data_store.py # Arrow IPC DataFrame registry - │ ├── pandas_tools.py # pandas tool implementations - │ ├── postgres_tools.py # PostgreSQL/PostGIS tools - │ └── dbt_tools.py # dbt CLI wrapper tools - ├── requirements.txt - ├── pyproject.toml - └── README.md - -plugins/ -└── data-platform/ - ├── .claude-plugin/ - │ └── plugin.json - ├── .mcp.json - ├── mcp-servers/ - │ └── data-platform -> ../../../mcp-servers/data-platform # symlink - ├── commands/ - │ ├── ingest.md # /ingest command - │ ├── profile.md # /profile command - │ ├── schema.md # /schema command - │ ├── explain.md # /explain command - │ ├── lineage.md # /lineage command - │ └── run.md # /run command - ├── agents/ - │ ├── data-ingestion.md # Data loading and transformation - │ └── data-analysis.md # Exploration and profiling - ├── hooks/ - │ └── hooks.json # SessionStart DB check - ├── README.md - └── claude-md-integration.md -``` - -## Implementation Phases - -### Phase 1: Foundation (Issues #1-2) - -**Files to create:** -- `mcp-servers/data-platform/mcp_server/__init__.py` -- `mcp-servers/data-platform/mcp_server/config.py` -- `mcp-servers/data-platform/mcp_server/data_store.py` -- `mcp-servers/data-platform/mcp_server/server.py` (skeleton) -- `mcp-servers/data-platform/requirements.txt` -- `mcp-servers/data-platform/pyproject.toml` - -**config.py pattern** (from gitea): -```python -import os -from pathlib import Path - -def load_config(): - # System-level credentials - system_env = Path.home() / ".config/claude/postgres.env" - if system_env.exists(): - load_dotenv(system_env) - - # Project-level settings - project_env = Path.cwd() / ".env" - if project_env.exists(): - load_dotenv(project_env, override=True) - - return { - "postgres_url": os.getenv("POSTGRES_URL"), - "dbt_project_dir": os.getenv("DBT_PROJECT_DIR"), - "dbt_profiles_dir": os.getenv("DBT_PROFILES_DIR"), - } -``` - -**data_store.py** (Arrow IPC registry): -```python -import pyarrow as pa -import uuid -from typing import Dict, Optional - -class DataStore: - _instance = None - _dataframes: Dict[str, pa.Table] = {} - - @classmethod - def get_instance(cls): - if cls._instance is None: - cls._instance = cls() - return cls._instance - - def store(self, df: pa.Table, name: Optional[str] = None) -> str: - data_ref = name or f"df_{uuid.uuid4().hex[:8]}" - self._dataframes[data_ref] = df - return data_ref - - def get(self, data_ref: str) -> Optional[pa.Table]: - return self._dataframes.get(data_ref) - - def list_refs(self) -> list: - return [{"ref": k, "rows": v.num_rows, "cols": v.num_columns} - for k, v in self._dataframes.items()] -``` - -### Phase 2: pandas-mcp Tools (Issue #3) - -**Tools to implement in pandas_tools.py:** - -| Tool | Description | -|------|-------------| -| `read_csv` | Load CSV with optional chunking | -| `read_parquet` | Load Parquet files | -| `read_json` | Load JSON/JSONL files | -| `to_csv` | Export DataFrame to CSV | -| `to_parquet` | Export DataFrame to Parquet | -| `describe` | Statistical summary | -| `head` | First N rows | -| `tail` | Last N rows | -| `filter` | Filter rows by condition | -| `select` | Select columns | -| `groupby` | Group and aggregate | -| `join` | Join two DataFrames | -| `list_data` | List all stored DataFrames | -| `drop_data` | Remove DataFrame from store | - -**Memory management:** -```python -MAX_ROWS = 100_000 - -def read_csv(file_path: str, chunk_size: int = None) -> dict: - df = pd.read_csv(file_path) - if len(df) > MAX_ROWS: - return { - "warning": f"DataFrame has {len(df)} rows, exceeds {MAX_ROWS} limit", - "suggestion": f"Use chunk_size={MAX_ROWS} for chunked processing", - "preview": df.head(100).to_dict() - } - # Convert to Arrow and store - table = pa.Table.from_pandas(df) - data_ref = DataStore.get_instance().store(table) - return {"data_ref": data_ref, "rows": len(df), "columns": list(df.columns)} -``` - -### Phase 3: postgres-mcp Tools (Issue #4) - -**Tools to implement in postgres_tools.py:** - -| Tool | Description | -|------|-------------| -| `pg_connect` | Test connection and return status | -| `pg_query` | Execute SELECT, return as data_ref | -| `pg_execute` | Execute INSERT/UPDATE/DELETE | -| `pg_tables` | List all tables in schema | -| `pg_columns` | Get column info for table | -| `pg_schemas` | List all schemas | -| `st_tables` | List PostGIS-enabled tables | -| `st_geometry_type` | Get geometry type of column | -| `st_srid` | Get SRID of geometry column | -| `st_extent` | Get bounding box of geometries | - -**asyncpg implementation:** -```python -import asyncpg -from geoalchemy2 import Geometry - -async def pg_query(query: str, params: list = None) -> dict: - config = load_config() - conn = await asyncpg.connect(config["postgres_url"]) - try: - rows = await conn.fetch(query, *(params or [])) - df = pd.DataFrame([dict(r) for r in rows]) - if len(df) > MAX_ROWS: - return {"warning": "Result truncated", "data_ref": store_truncated(df)} - table = pa.Table.from_pandas(df) - data_ref = DataStore.get_instance().store(table) - return {"data_ref": data_ref, "rows": len(df)} - finally: - await conn.close() -``` - -### Phase 4: dbt-mcp Tools (Issue #5) - -**Tools to implement in dbt_tools.py:** - -| Tool | Description | -|------|-------------| -| `dbt_parse` | Validate project (pre-execution) | -| `dbt_run` | Run models with selection | -| `dbt_test` | Run tests | -| `dbt_build` | Run + test | -| `dbt_compile` | Compile SQL without executing | -| `dbt_ls` | List resources | -| `dbt_docs_generate` | Generate documentation | -| `dbt_lineage` | Get model dependencies | - -**Pre-execution validation pattern:** -```python -import subprocess -import json - -def dbt_run(select: str = None, exclude: str = None) -> dict: - config = load_config() - project_dir = config.get("dbt_project_dir") or find_dbt_project() - - # ALWAYS validate first - parse_result = subprocess.run( - ["dbt", "parse", "--project-dir", project_dir], - capture_output=True, text=True - ) - if parse_result.returncode != 0: - return { - "error": "dbt parse failed - fix issues before running", - "details": parse_result.stderr, - "suggestion": "Check for deprecated syntax (dbt 1.9+)" - } - - # Execute run - cmd = ["dbt", "run", "--project-dir", project_dir] - if select: - cmd.extend(["--select", select]) - result = subprocess.run(cmd, capture_output=True, text=True) - return {"success": result.returncode == 0, "output": result.stdout} -``` - -### Phase 5: Plugin Wrapper (Issue #6) - -**plugins/data-platform/.claude-plugin/plugin.json:** -```json -{ - "name": "data-platform", - "version": "1.0.0", - "description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration", - "author": "Leo Miranda", - "license": "MIT", - "hooks": "hooks/hooks.json", - "commands": "commands/", - "agents": "agents/", - "mcp": ".mcp.json" -} -``` - -**plugins/data-platform/.mcp.json:** -```json -{ - "mcpServers": { - "data-platform": { - "type": "stdio", - "command": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python", - "args": ["-m", "mcp_server.server"], - "cwd": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform" - } - } -} -``` - -**plugins/data-platform/hooks/hooks.json:** -```json -{ - "hooks": [ - { - "event": "SessionStart", - "type": "command", - "command": ["${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python", "-c", "from mcp_server.postgres_tools import check_connection; check_connection()"], - "timeout": 5000, - "onError": "warn" - } - ] -} -``` - -**Agents:** - -`agents/data-ingestion.md`: -```markdown -# Data Ingestion Agent - -You are a data ingestion specialist. Your role is to help users load, transform, and prepare data for analysis. - -## Available Tools -- pandas: read_csv, read_parquet, read_json, filter, select, groupby, join -- postgres: pg_query, pg_execute - -## Workflow -1. Understand the data source and format -2. Load data with appropriate chunking for large files -3. Transform as needed (filter, select, aggregate) -4. Store results with meaningful data_ref names -``` - -`agents/data-analysis.md`: -```markdown -# Data Analysis Agent - -You are a data analysis specialist. Your role is to help users explore, profile, and understand their data. - -## Available Tools -- pandas: describe, head, tail, list_data -- postgres: pg_tables, pg_columns -- dbt: dbt_lineage, dbt_docs_generate - -## Workflow -1. List available data (list_data or pg_tables) -2. Profile data structure and statistics -3. Identify patterns and anomalies -4. Provide insights and recommendations -``` - -**Commands:** - -| Command | File | Description | -|---------|------|-------------| -| `/ingest` | commands/ingest.md | Load data from files or database | -| `/profile` | commands/profile.md | Generate data profile and statistics | -| `/schema` | commands/schema.md | Show database/DataFrame schema | -| `/explain` | commands/explain.md | Explain dbt model lineage | -| `/lineage` | commands/lineage.md | Visualize data dependencies | -| `/run` | commands/run.md | Execute dbt models | - -### Phase 6: Documentation & Integration - -**Files to update:** -- `.claude-plugin/marketplace.json` - Add data-platform plugin entry -- `CHANGELOG.md` - Add v4.0.0 section under [Unreleased] -- `README.md` - Update plugin table - -**Files to create:** -- `plugins/data-platform/README.md` -- `plugins/data-platform/claude-md-integration.md` -- `mcp-servers/data-platform/README.md` - -## Sprint Structure (projman) - -**Milestone:** Sprint 1 - data-platform Plugin (v4.0.0) - -### Gitea Issues to Create - -| # | Title | Labels | Effort | -|---|-------|--------|--------| -| 1 | [Sprint 01] feat: MCP server foundation and config | Type/Feature, Priority/High, Complexity/Medium, Effort/M, Tech/Python, Component/Backend | 1-2 days | -| 2 | [Sprint 01] feat: Arrow IPC data registry with memory limits | Type/Feature, Priority/High, Complexity/Medium, Effort/M, Tech/Python, Component/Backend | 1-2 days | -| 3 | [Sprint 01] feat: pandas-mcp core data operations (14 tools) | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Component/Backend | 3-5 days | -| 4 | [Sprint 01] feat: postgres-mcp database tools with PostGIS | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Tech/PostgreSQL, Component/Database | 3-5 days | -| 5 | [Sprint 01] feat: dbt-mcp build tools with pre-validation | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Component/Backend | 3-5 days | -| 6 | [Sprint 01] feat: Plugin wrapper, commands, and agents | Type/Feature, Priority/Medium, Complexity/Medium, Effort/M, Component/Docs | 1-2 days | -| 7 | [Sprint 01] docs: Documentation and marketplace integration | Type/Documentation, Priority/Medium, Complexity/Simple, Effort/S, Component/Docs | 2-4 hours | - -### Issue Dependencies - -``` -#1 (foundation) ─┬─> #2 (data registry) - │ - ├─> #3 (pandas-mcp) ──┐ - │ │ - ├─> #4 (postgres-mcp) ├─> #6 (plugin wrapper) ─> #7 (docs) - │ │ - └─> #5 (dbt-mcp) ─────┘ -``` - -**Parallel Execution Batches:** -1. Batch 1: #1 (foundation) -2. Batch 2: #2, #3, #4, #5 (can run in parallel after foundation) -3. Batch 3: #6 (plugin wrapper - needs all tools complete) -4. Batch 4: #7 (docs - final) - -## Verification Steps - -1. **MCP Server starts:** - ```bash - cd mcp-servers/data-platform - python -m venv .venv - source .venv/bin/activate - pip install -r requirements.txt - python -m mcp_server.server - ``` - -2. **Tools are registered:** - - Start Claude Code in a test project - - Run `/ingest` command - - Verify MCP tools appear in tool list - -3. **Data persistence:** - - Load a CSV file with `/ingest` - - Run multiple commands referencing the data_ref - - Verify data persists across tool calls - -4. **PostgreSQL connection:** - - Configure `~/.config/claude/postgres.env` - - Start new session - - Verify SessionStart hook shows connection status (warning if unavailable) - -5. **dbt validation:** - - Run `/run` on a dbt project with deprecated syntax - - Verify pre-execution validation catches issues - - Fix syntax and re-run successfully - -6. **Validation script:** - ```bash - ./scripts/validate-marketplace.sh - ``` - -## Dependencies - -``` -# requirements.txt -mcp>=1.0.0 -pandas>=2.0.0 -pyarrow>=14.0.0 -asyncpg>=0.29.0 -geoalchemy2>=0.14.0 -python-dotenv>=1.0.0 -dbt-core>=1.9.0 -dbt-postgres>=1.9.0 -``` - -## Out of Scope (v4.1.0+) - -- Integration with projman sprint tracking -- Cross-plugin DataFrame sharing -- Visualization components (deferred to v5.0.0) -- Advanced dbt features (seeds, snapshots, exposures) diff --git a/unnamed.md b/unnamed.md index 30d8231..8a990b4 100644 --- a/unnamed.md +++ b/unnamed.md @@ -1,666 +1,454 @@ -> **Type:** Change Proposal +> **Type:** Change Proposal Implementation > **Version:** V04.0.0 > **Status:** Implemented > **Date:** 2026-01-25 - -## Implementations - -- [Change V04.0.0: Proposal (Implementation 1)](Change-V04.0.0:-Proposal-(Implementation-1)) - data-platform plugin +> **Origin:** [Change V04.0.0: Proposal](Change-V04.0.0:-Proposal) --- -# MCP Data Platform — Architecture Reference - -*Plugin taxonomy, server responsibilities, and interaction patterns for Leo data marketplace* - ------ +# data-platform Plugin Implementation Plan (v4.0.0) ## Overview -Two plugins serving distinct domains, designed for independent or combined use. +Implement a new `data-platform` plugin for leo-claude-mktplace that addresses data workflow issues encountered in the personal-portfolio project: +- Lost data after multiple interactions (solved by Arrow IPC data_ref passing) +- dbt 1.9+ syntax deprecation (solved by pre-execution validation with `dbt parse`) +- Ungraceful PostgreSQL error handling (solved by SessionStart hook with warnings) -|Plugin |Servers |Domain | -|-----------------|---------------------------------|-----------------------------------------| -|**data-platform**|pandas-mcp, postgres-mcp, dbt-mcp|Ingestion, storage, transformation | -|**viz-platform** |dmc-mcp, dash-mcp |Component validation, dashboards, theming| +## Architecture Decisions -**Key principles:** +| Decision | Choice | +|----------|--------| +| Data Passing | Arrow IPC with data_ref | +| DB Auth | Environment variables (~/.config/claude/postgres.env) | +| dbt Discovery | Auto-detect + explicit override | +| dbt Validation | Pre-execution (`dbt parse`) | +| Plugin Structure | Single plugin, 3 MCP servers | +| Server Location | Root mcp-servers/ | +| Memory Management | 100k row limit with chunking | +| PostGIS Support | Yes, with geoalchemy2 | +| Agent Model | 2 agents (Ingestion + Analysis) | +| Commands | Core 6 | +| Startup Hook | Graceful DB warning (non-blocking) | +| MCP Framework | Manual SDK (following gitea pattern) | -- MCP servers are independent processes—they do not import each other -- Claude orchestrates cross-server data flow at runtime -- Plugins ship multiple servers; projects load only what they need -- Claude.md defines project-specific workflows spanning plugins +## File Structure ------ +``` +mcp-servers/ +└── data-platform/ + ├── mcp_server/ + │ ├── __init__.py + │ ├── server.py # Main MCP server with routing + │ ├── config.py # Hybrid config (system + project) + │ ├── data_store.py # Arrow IPC DataFrame registry + │ ├── pandas_tools.py # pandas tool implementations + │ ├── postgres_tools.py # PostgreSQL/PostGIS tools + │ └── dbt_tools.py # dbt CLI wrapper tools + ├── requirements.txt + ├── pyproject.toml + └── README.md -## Component Definitions - -|Component Type|Definition |Runtime Context | -|--------------|--------------------------------------------------------------------------------------------------------------------|----------------------------------------------------| -|**MCP Server**|Standalone service exposing tools via Model Context Protocol. One server = one domain responsibility. |Long-running process, spawned by Claude Desktop/Code| -|**Tool** |Single callable function within an MCP server. Atomic operation with defined input schema and output. |Invoked per-request by LLM | -|**Resource** |Read-only data exposed by MCP server (files, schemas, configs). Discoverable but not executable. |Static or cached | -|**Agent** |Orchestration layer that chains multiple tool calls across servers. Lives in Claude reasoning, not in MCP servers.|LLM-driven, multi-step | -|**Command** |User-facing shortcut (e.g., `/ingest`) that triggers predefined tool sequences. |Chat interface trigger | - ------ - -## Plugin: data-platform - -### Server Loading - -Single plugin ships all three servers. Which servers load is determined by project config—not environment variables. - -|Server |Default|Optional| -|------------|-------|--------| -|pandas-mcp |✓ |— | -|postgres-mcp|✓ |— | -|dbt-mcp |— |✓ | - -**Example project configs:** - -```yaml -# Web app project (no dbt) -mcp_servers: - - pandas-mcp - - postgres-mcp +plugins/ +└── data-platform/ + ├── .claude-plugin/ + │ └── plugin.json + ├── .mcp.json + ├── mcp-servers/ + │ └── data-platform -> ../../../mcp-servers/data-platform # symlink + ├── commands/ + │ ├── ingest.md # /ingest command + │ ├── profile.md # /profile command + │ ├── schema.md # /schema command + │ ├── explain.md # /explain command + │ ├── lineage.md # /lineage command + │ └── run.md # /run command + ├── agents/ + │ ├── data-ingestion.md # Data loading and transformation + │ └── data-analysis.md # Exploration and profiling + ├── hooks/ + │ └── hooks.json # SessionStart DB check + ├── README.md + └── claude-md-integration.md ``` -```yaml -# Data engineering project (full stack) -mcp_servers: - - pandas-mcp - - postgres-mcp - - dbt-mcp +## Implementation Phases + +### Phase 1: Foundation (Issues #1-2) + +**Files to create:** +- `mcp-servers/data-platform/mcp_server/__init__.py` +- `mcp-servers/data-platform/mcp_server/config.py` +- `mcp-servers/data-platform/mcp_server/data_store.py` +- `mcp-servers/data-platform/mcp_server/server.py` (skeleton) +- `mcp-servers/data-platform/requirements.txt` +- `mcp-servers/data-platform/pyproject.toml` + +**config.py pattern** (from gitea): +```python +import os +from pathlib import Path + +def load_config(): + # System-level credentials + system_env = Path.home() / ".config/claude/postgres.env" + if system_env.exists(): + load_dotenv(system_env) + + # Project-level settings + project_env = Path.cwd() / ".env" + if project_env.exists(): + load_dotenv(project_env, override=True) + + return { + "postgres_url": os.getenv("POSTGRES_URL"), + "dbt_project_dir": os.getenv("DBT_PROJECT_DIR"), + "dbt_profiles_dir": os.getenv("DBT_PROFILES_DIR"), + } ``` -Agents check server availability at runtime. If dbt-mcp is not loaded, dbt-related steps are skipped or surface "not available for this project." +**data_store.py** (Arrow IPC registry): +```python +import pyarrow as pa +import uuid +from typing import Dict, Optional ------ +class DataStore: + _instance = None + _dataframes: Dict[str, pa.Table] = {} -### Server: pandas-mcp (Data Shaping Layer) + @classmethod + def get_instance(cls): + if cls._instance is None: + cls._instance = cls() + return cls._instance -**Responsibility:** File ingestion, data profiling, schema inference, and utility shaping operations. + def store(self, df: pa.Table, name: Optional[str] = None) -> str: + data_ref = name or f"df_{uuid.uuid4().hex[:8]}" + self._dataframes[data_ref] = df + return data_ref -**Philosophy:** SQL-first for persistent transforms (use dbt). Pandas for: + def get(self, data_ref: str) -> Optional[pa.Table]: + return self._dataframes.get(data_ref) -- Pre-database ingestion (profiling, validation, schema inference) -- Visualization prep (reshaping query results for chart formats) -- Ad-hoc operations (prototyping, merging with local files) + def list_refs(self) -> list: + return [{"ref": k, "rows": v.num_rows, "cols": v.num_columns} + for k, v in self._dataframes.items()] +``` -#### Tool Categories +### Phase 2: pandas-mcp Tools (Issue #3) -|Category |Tools |Description | -|---------|------------------------------------------------------------------|---------------------------------| -|Ingestion|`read_file`, `write_file`, `detect_encoding` |File I/O with format auto-detection| -|Profiling|`profile`, `validate`, `sample` |Data quality assessment | -|Schema |`infer_schema` |Generate DDL from data structure | -|Shaping |`reshape`, `pivot`, `melt`, `merge`, `add_columns`, `filter_rows` |Transform any data reference | +**Tools to implement in pandas_tools.py:** -#### Data Reference Sources +| Tool | Description | +|------|-------------| +| `read_csv` | Load CSV with optional chunking | +| `read_parquet` | Load Parquet files | +| `read_json` | Load JSON/JSONL files | +| `to_csv` | Export DataFrame to CSV | +| `to_parquet` | Export DataFrame to Parquet | +| `describe` | Statistical summary | +| `head` | First N rows | +| `tail` | Last N rows | +| `filter` | Filter rows by condition | +| `select` | Select columns | +| `groupby` | Group and aggregate | +| `join` | Join two DataFrames | +| `list_data` | List all stored DataFrames | +| `drop_data` | Remove DataFrame from store | -pandas-mcp accepts `data_ref` from multiple origins: +**Memory management:** +```python +MAX_ROWS = 100_000 -|Source |How It Arrives | -|------------------|-------------------------| -|Local file |`read_file` tool | -|Query result |Passed from postgres-mcp | -|dbt model output |Passed from dbt-mcp | -|Previous transform|Chained from shaping tool| +def read_csv(file_path: str, chunk_size: int = None) -> dict: + df = pd.read_csv(file_path) + if len(df) > MAX_ROWS: + return { + "warning": f"DataFrame has {len(df)} rows, exceeds {MAX_ROWS} limit", + "suggestion": f"Use chunk_size={MAX_ROWS} for chunked processing", + "preview": df.head(100).to_dict() + } + # Convert to Arrow and store + table = pa.Table.from_pandas(df) + data_ref = DataStore.get_instance().store(table) + return {"data_ref": data_ref, "rows": len(df), "columns": list(df.columns)} +``` -#### When to Use Shaping Tools +### Phase 3: postgres-mcp Tools (Issue #4) -|Scenario |Use pandas-mcp|Use SQL/dbt| -|--------------------------------------|--------------|-----------| -|Pivot for heatmap chart |✓ |— | -|Join query result with local CSV |✓ |— | -|Prototype transform before formalizing|✓ |— | -|Persistent aggregation in pipeline |— |✓ | -|Reusable business logic |— |✓ | -|Needs version control + testing |— |✓ | +**Tools to implement in postgres_tools.py:** ------ +| Tool | Description | +|------|-------------| +| `pg_connect` | Test connection and return status | +| `pg_query` | Execute SELECT, return as data_ref | +| `pg_execute` | Execute INSERT/UPDATE/DELETE | +| `pg_tables` | List all tables in schema | +| `pg_columns` | Get column info for table | +| `pg_schemas` | List all schemas | +| `st_tables` | List PostGIS-enabled tables | +| `st_geometry_type` | Get geometry type of column | +| `st_srid` | Get SRID of geometry column | +| `st_extent` | Get bounding box of geometries | -### Server: postgres-mcp (Database Layer) +**asyncpg implementation:** +```python +import asyncpg +from geoalchemy2 import Geometry -**Responsibility:** Data loading, querying, schema management, performance analysis, and geospatial operations. +async def pg_query(query: str, params: list = None) -> dict: + config = load_config() + conn = await asyncpg.connect(config["postgres_url"]) + try: + rows = await conn.fetch(query, *(params or [])) + df = pd.DataFrame([dict(r) for r in rows]) + if len(df) > MAX_ROWS: + return {"warning": "Result truncated", "data_ref": store_truncated(df)} + table = pa.Table.from_pandas(df) + data_ref = DataStore.get_instance().store(table) + return {"data_ref": data_ref, "rows": len(df)} + finally: + await conn.close() +``` -#### Tool Categories +### Phase 4: dbt-mcp Tools (Issue #5) -|Category|Tools |Description | -|--------|---------------------------------------------------------------------------------------------------|-------------------------------------| -|Query |`list_schemas`, `list_tables`, `get_table_schema`, `execute_query`, `query_geometry` |Read operations | -|Analysis|`explain_query`, `recommend_indexes`, `health_check` |Performance insights | -|Write |`execute_write`, `load_dataframe` |Data modification | -|DDL |`execute_ddl`, `get_schema_snapshot` |Schema management with change tracking| +**Tools to implement in dbt_tools.py:** -#### DDL Change Tracking +| Tool | Description | +|------|-------------| +| `dbt_parse` | Validate project (pre-execution) | +| `dbt_run` | Run models with selection | +| `dbt_test` | Run tests | +| `dbt_build` | Run + test | +| `dbt_compile` | Compile SQL without executing | +| `dbt_ls` | List resources | +| `dbt_docs_generate` | Generate documentation | +| `dbt_lineage` | Get model dependencies | -`execute_ddl` returns structured output for downstream automation: +**Pre-execution validation pattern:** +```python +import subprocess +import json +def dbt_run(select: str = None, exclude: str = None) -> dict: + config = load_config() + project_dir = config.get("dbt_project_dir") or find_dbt_project() + + # ALWAYS validate first + parse_result = subprocess.run( + ["dbt", "parse", "--project-dir", project_dir], + capture_output=True, text=True + ) + if parse_result.returncode != 0: + return { + "error": "dbt parse failed - fix issues before running", + "details": parse_result.stderr, + "suggestion": "Check for deprecated syntax (dbt 1.9+)" + } + + # Execute run + cmd = ["dbt", "run", "--project-dir", project_dir] + if select: + cmd.extend(["--select", select]) + result = subprocess.run(cmd, capture_output=True, text=True) + return {"success": result.returncode == 0, "output": result.stdout} +``` + +### Phase 5: Plugin Wrapper (Issue #6) + +**plugins/data-platform/.claude-plugin/plugin.json:** ```json { - "success": true, - "operation": "CREATE TABLE", - "affected_objects": [ - { - "type": "table", - "schema": "public", - "name": "customer_orders", - "change": "created" - } - ], - "timestamp": "2025-01-22T14:30:00Z" + "name": "data-platform", + "version": "1.0.0", + "description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration", + "author": "Leo Miranda", + "license": "MIT", + "hooks": "hooks/hooks.json", + "commands": "commands/", + "agents": "agents/", + "mcp": ".mcp.json" } ``` -This enables documentation updates, ERD regeneration (via Mermaid Chart MCP), or other automated responses. - ------ - -### Server: dbt-mcp (Transform Layer) - -**Responsibility:** Model execution, lineage, documentation, and YAML generation for local dbt-core projects. - -**Note:** Official dbt-mcp is Cloud-only. This server wraps local dbt-core CLI. - -#### Tool Categories - -|Category |Tools |Description | -|-------------|-------------------------------------------------|------------------------| -|Discovery |`parse_manifest`, `list_models`, `list_sources` |Project exploration | -|Model |`get_model`, `get_lineage`, `compile_sql` |Model inspection | -|Execution |`run_model`, `test_model`, `get_run_results` |dbt CLI wrapper | -|Documentation|`generate_yaml` |Auto-generate schema.yml| - -#### Lineage Output - -`get_lineage` outputs Mermaid-formatted DAG, compatible with existing Mermaid Chart MCP for rendering. - ------ - -### Internal Dependency Flow (data-platform) - -``` -files → pandas-mcp → postgres-mcp ↔ dbt-mcp - ↑______________| - (query results for reshaping) +**plugins/data-platform/.mcp.json:** +```json +{ + "mcpServers": { + "data-platform": { + "type": "stdio", + "command": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python", + "args": ["-m", "mcp_server.server"], + "cwd": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform" + } + } +} ``` -|Flow |Description | -|-----------------|-------------------------------------| -|files → pandas |Entry point for raw data | -|pandas → postgres|Schema inference, bulk loading | -|postgres ↔ dbt |dbt queries marts, postgres executes | -|postgres → pandas|Query results for reshaping | -|dbt → pandas |Model outputs for visualization prep | - ------ - -### Agents (data-platform) - -|Agent |Trigger |Sequence | -|----------------|--------------------------|----------------------------------------------------------------------------------| -|`data_ingestion`|User provides file |read_file → profile → infer_schema → execute_ddl → load_dataframe → validate | -|`model_analysis`|User asks about dbt model |get_model → get_lineage → explain_query → test_model → synthesize | -|`full_pipeline` |File to materialized model|data_ingestion → create dbt model → run_model | - -**Behavior when dbt-mcp absent:** - -|Agent |Behavior | -|----------------|-----------------------------------| -|`data_ingestion`|Runs fully (no dbt steps) | -|`model_analysis`|Skipped—surfaces "dbt not configured"| -|`full_pipeline` |Stops after load, prompts user | - ------ - -### Commands (data-platform) - -|Command |Maps To | -|--------------------------------|-------------------------------| -|`/ingest {file}` |`data_ingestion` agent | -|`/profile {file}` |`pandas-mcp.profile` | -|`/pivot {data} by {cols}` |`pandas-mcp.pivot` | -|`/merge {left} {right} on {key}`|`pandas-mcp.merge` | -|`/explain {query}` |`postgres-mcp.explain_query` | -|`/schema {table}` |`postgres-mcp.get_table_schema`| -|`/lineage {model}` |`dbt-mcp.get_lineage` | -|`/run {model}` |`dbt-mcp.run_model` | -|`/test {model}` |`dbt-mcp.test_model` | - -dbt commands return graceful "dbt-mcp not loaded" when unavailable. - ------ - -## Plugin: viz-platform - -### Servers - -|Server |Responsibility | -|--------|-----------------------------------------------------------| -|dmc-mcp |Version-locked component registry, prop validation | -|dash-mcp|Charts, layouts, pages, theming—validates against dmc-mcp | - ------ - -### Server: dmc-mcp (Component Constraint Layer) - -**Responsibility:** Single source of truth for Dash Mantine Components API. Prevents Claude from hallucinating deprecated props or non-existent components. - -**Problem solved:** DMC versions introduce breaking changes. Claude training data mixes versions. Runtime errors from invalid props waste cycles. - -#### Tool Categories - -|Category |Tools |Description | -|-------------|----------------------|-----------------------------------------| -|Discovery |`list_components` |What exists in installed version | -|Introspection|`get_component_props` |Valid props, types, defaults | -|Validation |`validate_component` |Check component definition before use | - -#### Usage Pattern - -Claude queries dmc-mcp first: - -1. "What props does `dmc.Select` accept?" → `get_component_props` -1. Build component with valid props -1. Pass to dash-mcp for rendering - -dash-mcp validates against dmc-mcp before rendering. Invalid components fail fast with actionable errors. - ------ - -### Server: dash-mcp (Visualization Layer) - -**Responsibility:** Chart generation, dashboard layouts, page structure, theming system, and export. - -**Philosophy:** Single server, multiple concerns. Tools are namespaced but share context (theme tokens flow to charts automatically). - -#### Tool Categories - -|Category |Tools |Description | -|----------|--------------------------------------------------------------------|---------------------------------| -|`chart_*` |`chart_create`, `chart_configure_interaction` |Data visualization (Plotly) | -|`layout_*`|`layout_create`, `layout_add_filter`, `layout_set_grid` |Dashboard composition | -|`page_*` |`page_create`, `page_add_navbar`, `page_set_auth` |App-level structure | -|`theme_*` |`theme_create`, `theme_extend`, `theme_validate`, `theme_export_css`|Design tokens, component styles | - -#### Design Token Structure - -Themes are built from design tokens—single source of truth for visual consistency: - -```yaml -tokens: - colors: - primary: "#228be6" - secondary: "#868e96" - background: - base: "#ffffff" - subtle: "#f8f9fa" - text: - primary: "#212529" - muted: "#868e96" - - spacing: - xs: "4px" - sm: "8px" - md: "16px" - lg: "24px" - - typography: - fontFamily: "Inter, sans-serif" - fontSize: - sm: "14px" - md: "16px" - - radii: - sm: "4px" - md: "8px" +**plugins/data-platform/hooks/hooks.json:** +```json +{ + "hooks": [ + { + "event": "SessionStart", + "type": "command", + "command": ["${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python", "-c", "from mcp_server.postgres_tools import check_connection; check_connection()"], + "timeout": 5000, + "onError": "warn" + } + ] +} ``` -#### Component Style Registry +**Agents:** -Per-component overrides ensuring consistency: +`agents/data-ingestion.md`: +```markdown +# Data Ingestion Agent -|Component |Registered Style |Purpose | -|--------------|------------------------------|-------------------------------| -|`kpi_card` |Shadow, padding, border-radius|All KPIs look identical | -|`data_table` |Header bg, row hover, border |Tables share appearance | -|`filter_panel`|Background, spacing, alignment|Filters positioned consistently| -|`chart_card` |Title typography, padding |Chart containers unified | +You are a data ingestion specialist. Your role is to help users load, transform, and prepare data for analysis. ------ +## Available Tools +- pandas: read_csv, read_parquet, read_json, filter, select, groupby, join +- postgres: pg_query, pg_execute -### Internal Dependency Flow (viz-platform) - -``` -dmc-mcp ← dash-mcp - ↑ | - └──────────┘ - (validation before render) +## Workflow +1. Understand the data source and format +2. Load data with appropriate chunking for large files +3. Transform as needed (filter, select, aggregate) +4. Store results with meaningful data_ref names ``` -dash-mcp always validates component definitions against dmc-mcp. No direct data dependency—data comes from external sources. +`agents/data-analysis.md`: +```markdown +# Data Analysis Agent ------ +You are a data analysis specialist. Your role is to help users explore, profile, and understand their data. -### Agents (viz-platform) +## Available Tools +- pandas: describe, head, tail, list_data +- postgres: pg_tables, pg_columns +- dbt: dbt_lineage, dbt_docs_generate -|Agent |Trigger |Sequence | -|-----------------|-----------------------------------|------------------------------------------------------------------------------------| -|`theme_setup` |New project or brand consistency |list_themes → create_theme → register_component_style → validate_theme | -|`layout_builder` |User wants dashboard structure |create_layout → add_filter → apply_theme → preview | -|`component_check`|Before rendering any DMC component |get_component_props → validate_component → proceed or error | - ------ - -### Commands (viz-platform) - -|Command |Maps To | -|-----------------------|--------------------------------------------| -|`/chart {type}` |`dash-mcp.chart_create` (expects data input)| -|`/dashboard {template}`|`layout_builder` agent | -|`/theme {name}` |`dash-mcp.theme_apply` | -|`/theme new {name}` |`dash-mcp.theme_create` | -|`/theme css {name}` |`dash-mcp.theme_export_css` | -|`/component {name}` |`dmc-mcp.get_component_props` | - ------ - -## Cross-Plugin Interactions - -### How It Works - -MCP servers do not call each other. Claude orchestrates: - -1. Server A returns output to Claude -1. Claude interprets and determines next step -1. Claude passes relevant data to Server B - -### Documentation Layers - -|Layer |Location |Purpose | -|------------------|-------------------------|---------------------------------------| -|Plugin docs |Each plugin's README.md |Declares inputs/outputs | -|Claude.md |Project root |Cross-plugin agents for this project | -|contract-validator|Separate plugin |Validates compatibility | -|doc-guardian |Separate plugin |Catches drift within each project | - -### Interface Contracts - -Each plugin declares what it produces and accepts: - -**data-platform outputs:** - -- `data_ref`: In-memory DataFrame reference -- `query_result`: Row set from postgres-mcp -- `model_output`: Materialized table reference from dbt-mcp -- `schema_snapshot`: Full schema state for documentation - -**viz-platform inputs:** - -- Accepts `data_ref`, `query_result`, or `model_output` as data source -- Validates all DMC components against dmc-mcp before rendering - -### Cross-Plugin Agents (defined in Claude.md) - -|Agent |Trigger |Sequence | -|--------------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------| -|`dashboard_builder` |User requests visualization of database content |postgres-mcp.execute_query → pandas-mcp.pivot (if needed) → dmc-mcp.validate → dash-mcp.chart_create → dash-mcp.layout_create| -|`visualization_prep`|Query result needs reshaping |postgres-mcp.execute_query → pandas-mcp.reshape → dash-mcp.chart_create | - -### Validation: contract-validator - -Separate plugin for cross-plugin validation. See **Plugin: contract-validator** section for full specification. - -**Key distinction from doc-guardian:** - -- doc-guardian: "did code change break docs?" (within a project) -- contract-validator: "do plugins work together?" (across plugins) - ------ - -## Plugin: contract-validator - -### Purpose - -Validates cross-plugin compatibility and Claude.md agent definitions. Ensures plugins can actually work together before runtime failures occur. - -**Problem solved:** Plugins declare interfaces in README. Claude.md references tools across plugins. Without validation: - -- Agents reference tools that don't exist -- viz-platform expects input format data-platform doesn't produce -- Plugin updates break workflows silently - ------ - -### What It Reads - -|Source |Purpose | -|------------------|-------------------------------------------------| -|Plugin README.md |Extract declared inputs/outputs | -|Claude.md |Extract agent definitions and tool references | -|MCP server schemas|Verify tools actually exist with expected signatures| - ------ - -### Tool Categories - -|Category|Tools |Description | -|--------|---------------------------------------------------------------------|---------------------------------| -|Parse |`parse_plugin_interface`, `parse_claude_md_agents` |Extract structured data from docs| -|Validate|`validate_compatibility`, `validate_agent_refs`, `validate_data_flow`|Check contracts match | -|Report |`generate_compatibility_report`, `list_issues` |Output findings | - -#### Tool Details - -**`parse_plugin_interface`** - -- Input: Plugin path or README content -- Output: Structured interface (inputs accepted, outputs produced, tool names) - -**`parse_claude_md_agents`** - -- Input: Claude.md path or content -- Output: List of agents with their tool sequences - -**`validate_compatibility`** - -- Input: Two plugin interfaces -- Output: Compatibility report (what A produces that B accepts, gaps) - -**`validate_agent_refs`** - -- Input: Agent definition, list of available plugins -- Output: Missing tools, invalid sequences - -**`validate_data_flow`** - -- Input: Agent sequence -- Output: Verification that each step output matches next step expected input - ------ - -### Agents (contract-validator) - -|Agent |Trigger |Sequence | -|-----------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------| -|`full_validation`|User runs `/validate-contracts` |parse all plugin interfaces → parse Claude.md → validate_compatibility for each pair → validate_agent_refs for each agent → generate_compatibility_report| -|`agent_check` |User runs `/check-agent {name}` |parse_claude_md_agents → find agent → validate_agent_refs → validate_data_flow → report issues | - ------ - -### Commands - -|Command |Maps To |Description | -|---------------------|---------------------------------------|-----------------------------------| -|`/validate-contracts`|`full_validation` agent |Full project validation | -|`/check-agent {name}`|`agent_check` agent |Validate single agent definition | -|`/list-interfaces` |`parse_plugin_interface` for all plugins|Show what each plugin produces/accepts| - ------ - -### Output Format - -**Compatibility Report:** - -``` -## Contract Validation Report - -### Plugin Interfaces -- data-platform: produces [data_ref, query_result, model_output, schema_snapshot] -- viz-platform: accepts [data_ref, query_result, model_output] - -### Compatibility Matrix -| Producer | Consumer | Status | -|----------|----------|--------| -| data-platform → viz-platform | ✓ Compatible | All outputs accepted | - -### Agent Validation -| Agent | Status | Issues | -|-------|--------|--------| -| dashboard_builder | ✓ Valid | — | -| model_analysis | ⚠ Warning | dbt-mcp optional; agent fails if not loaded | - -### Issues Found -- None - -### Warnings -- Agent `model_analysis` depends on optional server `dbt-mcp` +## Workflow +1. List available data (list_data or pg_tables) +2. Profile data structure and statistics +3. Identify patterns and anomalies +4. Provide insights and recommendations ``` -**Issue Types:** +**Commands:** -|Type |Severity|Example | -|-------------------|--------|--------------------------------------------------------------------------------| -|Missing tool |Error |Agent references `pandas-mcp.transform` but tool is `pandas-mcp.reshape` | -|Interface mismatch |Error |viz-platform expects `chart_data` but data-platform produces `data_ref` | -|Optional dependency|Warning |Agent uses dbt-mcp which may not be loaded | -|Undeclared output |Warning |Plugin produces output not listed in README | +| Command | File | Description | +|---------|------|-------------| +| `/ingest` | commands/ingest.md | Load data from files or database | +| `/profile` | commands/profile.md | Generate data profile and statistics | +| `/schema` | commands/schema.md | Show database/DataFrame schema | +| `/explain` | commands/explain.md | Explain dbt model lineage | +| `/lineage` | commands/lineage.md | Visualize data dependencies | +| `/run` | commands/run.md | Execute dbt models | ------ +### Phase 6: Documentation & Integration -### Integration with doc-guardian +**Files to update:** +- `.claude-plugin/marketplace.json` - Add data-platform plugin entry +- `CHANGELOG.md` - Add v4.0.0 section under [Unreleased] +- `README.md` - Update plugin table -**Separation of concerns:** +**Files to create:** +- `plugins/data-platform/README.md` +- `plugins/data-platform/claude-md-integration.md` +- `mcp-servers/data-platform/README.md` -|Plugin |Scope |Trigger | -|------------------|----------------------------------|----------------------| -|doc-guardian |Code → docs drift within a project|PostToolUse (Write/Edit)| -|contract-validator|Plugin → plugin compatibility |On-demand or CI hook | +## Sprint Structure (projman) -contract-validator does NOT watch for file changes. It runs on-demand or as CI step. +**Milestone:** Sprint 1 - data-platform Plugin (v4.0.0) -**Potential future integration:** doc-guardian could trigger contract-validator when Claude.md or plugin README changes. Not required for v1. +### Gitea Issues to Create ------ +| # | Title | Labels | Effort | +|---|-------|--------|--------| +| 1 | [Sprint 01] feat: MCP server foundation and config | Type/Feature, Priority/High, Complexity/Medium, Effort/M, Tech/Python, Component/Backend | 1-2 days | +| 2 | [Sprint 01] feat: Arrow IPC data registry with memory limits | Type/Feature, Priority/High, Complexity/Medium, Effort/M, Tech/Python, Component/Backend | 1-2 days | +| 3 | [Sprint 01] feat: pandas-mcp core data operations (14 tools) | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Component/Backend | 3-5 days | +| 4 | [Sprint 01] feat: postgres-mcp database tools with PostGIS | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Tech/PostgreSQL, Component/Database | 3-5 days | +| 5 | [Sprint 01] feat: dbt-mcp build tools with pre-validation | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Component/Backend | 3-5 days | +| 6 | [Sprint 01] feat: Plugin wrapper, commands, and agents | Type/Feature, Priority/Medium, Complexity/Medium, Effort/M, Component/Docs | 1-2 days | +| 7 | [Sprint 01] docs: Documentation and marketplace integration | Type/Documentation, Priority/Medium, Complexity/Simple, Effort/S, Component/Docs | 2-4 hours | -## Diagramming Approach - -No diagram-mcp server. Use existing Mermaid Chart MCP. - -**For ERDs:** - -- postgres-mcp exposes schema metadata via `get_schema_snapshot` -- Claude generates Mermaid syntax -- Mermaid Chart MCP renders - -**For dbt lineage:** - -- dbt-mcp.get_lineage outputs Mermaid-formatted DAG -- Mermaid Chart MCP renders - -This avoids the complexity of draw.io XML generation while maintaining documentation capability. - ------ - -## Implementation Order - -|Phase|Plugin |Server |Rationale | -|-----|------------------|------------|------------------------------------------------| -|1 |data-platform |pandas-mcp |Entry point, no dependencies | -|2 |data-platform |postgres-mcp|Load from Phase 1, query capabilities | -|3 |data-platform |dbt-mcp |Transform layer, requires postgres-mcp | -|4 |viz-platform |dmc-mcp |Constraint layer, no dependencies | -|5 |viz-platform |dash-mcp |Visualization, validates against dmc-mcp | -|6 |contract-validator|— |Validates all above, requires stable interfaces | - -**Notes:** - -- Phases 1-3 (data-platform) and 4-5 (viz-platform) can proceed in parallel -- contract-validator (Phase 6) should wait until plugin interfaces stabilize -- doc-guardian already exists; update scope documentation only - ------ - -## Open Questions - -### Data Reference Passing - -How do servers share `data_ref` objects? Options: - -- **Temporary files with URIs**: Portable but I/O overhead -- **Arrow IPC**: Efficient but requires both servers to support -- **Recommendation**: Arrow IPC for efficiency, file fallback for compatibility - -### Authentication - -Should postgres-mcp handle connection strings directly, or use a secrets manager pattern? - -### Theme Storage - -Where do custom themes persist? - -- Local config file (`~/.dash-mcp/themes/`) -- Project-level (alongside dbt_project.yml) -- Database table (for shared team themes) - -### dbt Project Discovery - -Auto-detect `dbt_project.yml` in common locations, or require explicit path? - ------ - -## Technology Stack - -|Layer |Technology |Notes | -|---------------|-----------------------|---------------------------| -|MCP Framework |FastMCP |Or manual MCP SDK | -|Python |3.11+ |Type hints, async support | -|Data Processing|pandas |Core DataFrame ops | -|Arrow |pyarrow |Parquet, efficient memory | -|Database |psycopg |Async-ready Postgres driver| -|Geospatial |geoalchemy2 |PostGIS integration | -|dbt |dbt-core |CLI wrapper | -|Visualization |plotly |Figure generation | -|UI Components |dash-mantine-components|Version-locked via dmc-mcp | - ------ - -## Summary - -### Core Plugins - -|Plugin |Servers/Scope |Key Characteristic | -|------------------|-------------------------------------------|---------------------------------------------------| -|data-platform |pandas-mcp, postgres-mcp, dbt-mcp |Optional server loading per project | -|viz-platform |dmc-mcp, dash-mcp |dmc-mcp validates before dash-mcp renders | -|contract-validator|Interface parsing, compatibility checks |Validates cross-plugin contracts and agent definitions| - -### Supporting Plugins (Existing) - -|Plugin |Purpose | -|-----------------|------------------------------------| -|doc-guardian |Code-to-docs drift (unchanged scope)| -|Mermaid Chart MCP|Diagram rendering | - -### Interaction Model +### Issue Dependencies ``` -Plugin READMEs → declare inputs/outputs -Claude.md → define cross-plugin agents -contract-validator → validate compatibility -doc-guardian → catch drift within projects +#1 (foundation) ─┬─> #2 (data registry) + │ + ├─> #3 (pandas-mcp) ──┐ + │ │ + ├─> #4 (postgres-mcp) ├─> #6 (plugin wrapper) ─> #7 (docs) + │ │ + └─> #5 (dbt-mcp) ─────┘ ``` -**Flow:** Plugins declare interfaces. Claude.md defines workflows. contract-validator enforces compatibility. doc-guardian handles internal drift. \ No newline at end of file +**Parallel Execution Batches:** +1. Batch 1: #1 (foundation) +2. Batch 2: #2, #3, #4, #5 (can run in parallel after foundation) +3. Batch 3: #6 (plugin wrapper - needs all tools complete) +4. Batch 4: #7 (docs - final) + +## Verification Steps + +1. **MCP Server starts:** + ```bash + cd mcp-servers/data-platform + python -m venv .venv + source .venv/bin/activate + pip install -r requirements.txt + python -m mcp_server.server + ``` + +2. **Tools are registered:** + - Start Claude Code in a test project + - Run `/ingest` command + - Verify MCP tools appear in tool list + +3. **Data persistence:** + - Load a CSV file with `/ingest` + - Run multiple commands referencing the data_ref + - Verify data persists across tool calls + +4. **PostgreSQL connection:** + - Configure `~/.config/claude/postgres.env` + - Start new session + - Verify SessionStart hook shows connection status (warning if unavailable) + +5. **dbt validation:** + - Run `/run` on a dbt project with deprecated syntax + - Verify pre-execution validation catches issues + - Fix syntax and re-run successfully + +6. **Validation script:** + ```bash + ./scripts/validate-marketplace.sh + ``` + +## Dependencies + +``` +# requirements.txt +mcp>=1.0.0 +pandas>=2.0.0 +pyarrow>=14.0.0 +asyncpg>=0.29.0 +geoalchemy2>=0.14.0 +python-dotenv>=1.0.0 +dbt-core>=1.9.0 +dbt-postgres>=1.9.0 +``` + +## Out of Scope (v4.1.0+) + +- Integration with projman sprint tracking +- Cross-plugin DataFrame sharing +- Visualization components (deferred to v5.0.0) +- Advanced dbt features (seeds, snapshots, exposures) \ No newline at end of file