diff --git a/Change-V04.0.0%3A-Proposal.md b/Change-V04.0.0%3A-Proposal.md new file mode 100644 index 0000000..58157cf --- /dev/null +++ b/Change-V04.0.0%3A-Proposal.md @@ -0,0 +1,638 @@ +# MCP Data Platform — Architecture Reference + +*Plugin taxonomy, server responsibilities, and interaction patterns for Leo's data marketplace* + +--- + +## Overview + +Two plugins serving distinct domains, designed for independent or combined use. + +| Plugin | Servers | Domain | +|--------|---------|--------| +| **data-platform** | pandas-mcp, postgres-mcp, dbt-mcp | Ingestion, storage, transformation | +| **viz-platform** | dmc-mcp, dash-mcp | Component validation, dashboards, theming | + +**Key principles:** +- MCP servers are independent processes—they don't import each other +- Claude orchestrates cross-server data flow at runtime +- Plugins ship multiple servers; projects load only what they need +- Claude.md defines project-specific workflows spanning plugins + +--- + +## Component Definitions + +| Component Type | Definition | Runtime Context | +|----------------|------------|-----------------| +| **MCP Server** | Standalone service exposing tools via Model Context Protocol. One server = one domain responsibility. | Long-running process, spawned by Claude Desktop/Code | +| **Tool** | Single callable function within an MCP server. Atomic operation with defined input schema and output. | Invoked per-request by LLM | +| **Resource** | Read-only data exposed by MCP server (files, schemas, configs). Discoverable but not executable. | Static or cached | +| **Agent** | Orchestration layer that chains multiple tool calls across servers. Lives in Claude's reasoning, not in MCP servers. | LLM-driven, multi-step | +| **Command** | User-facing shortcut (e.g., `/ingest`) that triggers predefined tool sequences. | Chat interface trigger | + +--- + +## Plugin: data-platform + +### Server Loading + +Single plugin ships all three servers. Which servers load is determined by project config—not environment variables. + +| Server | Default | Optional | +|--------|---------|----------| +| pandas-mcp | ✓ | — | +| postgres-mcp | ✓ | — | +| dbt-mcp | — | ✓ | + +**Example project configs:** + +```yaml +# Web app project (no dbt) +mcp_servers: + - pandas-mcp + - postgres-mcp +``` + +```yaml +# Data engineering project (full stack) +mcp_servers: + - pandas-mcp + - postgres-mcp + - dbt-mcp +``` + +Agents check server availability at runtime. If dbt-mcp isn't loaded, dbt-related steps are skipped or surface "not available for this project." + +--- + +### Server: pandas-mcp (Data Shaping Layer) + +**Responsibility:** File ingestion, data profiling, schema inference, and utility shaping operations. + +**Philosophy:** SQL-first for persistent transforms (use dbt). Pandas for: +- Pre-database ingestion (profiling, validation, schema inference) +- Visualization prep (reshaping query results for chart formats) +- Ad-hoc operations (prototyping, merging with local files) + +#### Tool Categories + +| Category | Tools | Description | +|----------|-------|-------------| +| Ingestion | `read_file`, `write_file`, `detect_encoding` | File I/O with format auto-detection | +| Profiling | `profile`, `validate`, `sample` | Data quality assessment | +| Schema | `infer_schema` | Generate DDL from data structure | +| Shaping | `reshape`, `pivot`, `melt`, `merge`, `add_columns`, `filter_rows` | Transform any data reference | + +#### Data Reference Sources + +pandas-mcp accepts `data_ref` from multiple origins: + +| Source | How It Arrives | +|--------|----------------| +| Local file | `read_file` tool | +| Query result | Passed from postgres-mcp | +| dbt model output | Passed from dbt-mcp | +| Previous transform | Chained from shaping tool | + +#### When to Use Shaping Tools + +| Scenario | Use pandas-mcp | Use SQL/dbt | +|----------|----------------|-------------| +| Pivot for heatmap chart | ✓ | — | +| Join query result with local CSV | ✓ | — | +| Prototype transform before formalizing | ✓ | — | +| Persistent aggregation in pipeline | — | ✓ | +| Reusable business logic | — | ✓ | +| Needs version control + testing | — | ✓ | + +--- + +### Server: postgres-mcp (Database Layer) + +**Responsibility:** Data loading, querying, schema management, performance analysis, and geospatial operations. + +#### Tool Categories + +| Category | Tools | Description | +|----------|-------|-------------| +| Query | `list_schemas`, `list_tables`, `get_table_schema`, `execute_query`, `query_geometry` | Read operations | +| Analysis | `explain_query`, `recommend_indexes`, `health_check` | Performance insights | +| Write | `execute_write`, `load_dataframe` | Data modification | +| DDL | `execute_ddl`, `get_schema_snapshot` | Schema management with change tracking | + +#### DDL Change Tracking + +`execute_ddl` returns structured output for downstream automation: + +```json +{ + "success": true, + "operation": "CREATE TABLE", + "affected_objects": [ + { + "type": "table", + "schema": "public", + "name": "customer_orders", + "change": "created" + } + ], + "timestamp": "2025-01-22T14:30:00Z" +} +``` + +This enables documentation updates, ERD regeneration (via Mermaid Chart MCP), or other automated responses. + +--- + +### Server: dbt-mcp (Transform Layer) + +**Responsibility:** Model execution, lineage, documentation, and YAML generation for local dbt-core projects. + +**Note:** Official dbt-mcp is Cloud-only. This server wraps local dbt-core CLI. + +#### Tool Categories + +| Category | Tools | Description | +|----------|-------|-------------| +| Discovery | `parse_manifest`, `list_models`, `list_sources` | Project exploration | +| Model | `get_model`, `get_lineage`, `compile_sql` | Model inspection | +| Execution | `run_model`, `test_model`, `get_run_results` | dbt CLI wrapper | +| Documentation | `generate_yaml` | Auto-generate schema.yml | + +#### Lineage Output + +`get_lineage` outputs Mermaid-formatted DAG, compatible with existing Mermaid Chart MCP for rendering. + +--- + +### Internal Dependency Flow (data-platform) + +``` +files → pandas-mcp → postgres-mcp ↔ dbt-mcp + ↑______________| + (query results for reshaping) +``` + +| Flow | Description | +|------|-------------| +| files → pandas | Entry point for raw data | +| pandas → postgres | Schema inference, bulk loading | +| postgres ↔ dbt | dbt queries marts, postgres executes | +| postgres → pandas | Query results for reshaping | +| dbt → pandas | Model outputs for visualization prep | + +--- + +### Agents (data-platform) + +| Agent | Trigger | Sequence | +|-------|---------|----------| +| `data_ingestion` | User provides file | read_file → profile → infer_schema → execute_ddl → load_dataframe → validate | +| `model_analysis` | User asks about dbt model | get_model → get_lineage → explain_query → test_model → synthesize | +| `full_pipeline` | File to materialized model | data_ingestion → create dbt model → run_model | + +**Behavior when dbt-mcp absent:** + +| Agent | Behavior | +|-------|----------| +| `data_ingestion` | Runs fully (no dbt steps) | +| `model_analysis` | Skipped—surfaces "dbt not configured" | +| `full_pipeline` | Stops after load, prompts user | + +--- + +### Commands (data-platform) + +| Command | Maps To | +|---------|---------| +| `/ingest {file}` | `data_ingestion` agent | +| `/profile {file}` | `pandas-mcp.profile` | +| `/pivot {data} by {cols}` | `pandas-mcp.pivot` | +| `/merge {left} {right} on {key}` | `pandas-mcp.merge` | +| `/explain {query}` | `postgres-mcp.explain_query` | +| `/schema {table}` | `postgres-mcp.get_table_schema` | +| `/lineage {model}` | `dbt-mcp.get_lineage` | +| `/run {model}` | `dbt-mcp.run_model` | +| `/test {model}` | `dbt-mcp.test_model` | + +dbt commands return graceful "dbt-mcp not loaded" when unavailable. + +--- + +## Plugin: viz-platform + +### Servers + +| Server | Responsibility | +|--------|----------------| +| dmc-mcp | Version-locked component registry, prop validation | +| dash-mcp | Charts, layouts, pages, theming—validates against dmc-mcp | + +--- + +### Server: dmc-mcp (Component Constraint Layer) + +**Responsibility:** Single source of truth for Dash Mantine Components API. Prevents Claude from hallucinating deprecated props or non-existent components. + +**Problem solved:** DMC versions introduce breaking changes. Claude's training data mixes versions. Runtime errors from invalid props waste cycles. + +#### Tool Categories + +| Category | Tools | Description | +|----------|-------|-------------| +| Discovery | `list_components` | What exists in installed version | +| Introspection | `get_component_props` | Valid props, types, defaults | +| Validation | `validate_component` | Check component definition before use | + +#### Usage Pattern + +Claude queries dmc-mcp first: +1. "What props does `dmc.Select` accept?" → `get_component_props` +2. Build component with valid props +3. Pass to dash-mcp for rendering + +dash-mcp validates against dmc-mcp before rendering. Invalid components fail fast with actionable errors. + +--- + +### Server: dash-mcp (Visualization Layer) + +**Responsibility:** Chart generation, dashboard layouts, page structure, theming system, and export. + +**Philosophy:** Single server, multiple concerns. Tools are namespaced but share context (theme tokens flow to charts automatically). + +#### Tool Categories + +| Category | Tools | Description | +|----------|-------|-------------| +| `chart_*` | `chart_create`, `chart_configure_interaction` | Data visualization (Plotly) | +| `layout_*` | `layout_create`, `layout_add_filter`, `layout_set_grid` | Dashboard composition | +| `page_*` | `page_create`, `page_add_navbar`, `page_set_auth` | App-level structure | +| `theme_*` | `theme_create`, `theme_extend`, `theme_validate`, `theme_export_css` | Design tokens, component styles | + +#### Design Token Structure + +Themes are built from design tokens—single source of truth for visual consistency: + +```yaml +tokens: + colors: + primary: "#228be6" + secondary: "#868e96" + background: + base: "#ffffff" + subtle: "#f8f9fa" + text: + primary: "#212529" + muted: "#868e96" + + spacing: + xs: "4px" + sm: "8px" + md: "16px" + lg: "24px" + + typography: + fontFamily: "Inter, sans-serif" + fontSize: + sm: "14px" + md: "16px" + + radii: + sm: "4px" + md: "8px" +``` + +#### Component Style Registry + +Per-component overrides ensuring consistency: + +| Component | Registered Style | Purpose | +|-----------|------------------|---------| +| `kpi_card` | Shadow, padding, border-radius | All KPIs look identical | +| `data_table` | Header bg, row hover, border | Tables share appearance | +| `filter_panel` | Background, spacing, alignment | Filters positioned consistently | +| `chart_card` | Title typography, padding | Chart containers unified | + +--- + +### Internal Dependency Flow (viz-platform) + +``` +dmc-mcp ← dash-mcp + ↑ | + └──────────┘ + (validation before render) +``` + +dash-mcp always validates component definitions against dmc-mcp. No direct data dependency—data comes from external sources. + +--- + +### Agents (viz-platform) + +| Agent | Trigger | Sequence | +|-------|---------|----------| +| `theme_setup` | New project or brand consistency | list_themes → create_theme → register_component_style → validate_theme | +| `layout_builder` | User wants dashboard structure | create_layout → add_filter → apply_theme → preview | +| `component_check` | Before rendering any DMC component | get_component_props → validate_component → proceed or error | + +--- + +### Commands (viz-platform) + +| Command | Maps To | +|---------|---------| +| `/chart {type}` | `dash-mcp.chart_create` (expects data input) | +| `/dashboard {template}` | `layout_builder` agent | +| `/theme {name}` | `dash-mcp.theme_apply` | +| `/theme new {name}` | `dash-mcp.theme_create` | +| `/theme css {name}` | `dash-mcp.theme_export_css` | +| `/component {name}` | `dmc-mcp.get_component_props` | + +--- + +## Cross-Plugin Interactions + +### How It Works + +MCP servers don't call each other. Claude orchestrates: + +1. Server A returns output to Claude +2. Claude interprets and determines next step +3. Claude passes relevant data to Server B + +### Documentation Layers + +| Layer | Location | Purpose | +|-------|----------|---------| +| Plugin docs | Each plugin's README.md | Declares inputs/outputs | +| Claude.md | Project root | Cross-plugin agents for this project | +| contract-validator | Separate plugin | Validates compatibility | +| doc-guardian | Separate plugin | Catches drift within each project | + +### Interface Contracts + +Each plugin declares what it produces and accepts: + +**data-platform outputs:** +- `data_ref`: In-memory DataFrame reference +- `query_result`: Row set from postgres-mcp +- `model_output`: Materialized table reference from dbt-mcp +- `schema_snapshot`: Full schema state for documentation + +**viz-platform inputs:** +- Accepts `data_ref`, `query_result`, or `model_output` as data source +- Validates all DMC components against dmc-mcp before rendering + +### Cross-Plugin Agents (defined in Claude.md) + +| Agent | Trigger | Sequence | +|-------|---------|----------| +| `dashboard_builder` | User requests visualization of database content | postgres-mcp.execute_query → pandas-mcp.pivot (if needed) → dmc-mcp.validate → dash-mcp.chart_create → dash-mcp.layout_create | +| `visualization_prep` | Query result needs reshaping | postgres-mcp.execute_query → pandas-mcp.reshape → dash-mcp.chart_create | + +### Validation: contract-validator + +Separate plugin for cross-plugin validation. See **Plugin: contract-validator** section for full specification. + +**Key distinction from doc-guardian:** +- doc-guardian: "did code change break docs?" (within a project) +- contract-validator: "do plugins work together?" (across plugins) + +--- + +## Plugin: contract-validator + +### Purpose + +Validates cross-plugin compatibility and Claude.md agent definitions. Ensures plugins can actually work together before runtime failures occur. + +**Problem solved:** Plugins declare interfaces in README. Claude.md references tools across plugins. Without validation: +- Agents reference tools that don't exist +- viz-platform expects input format data-platform doesn't produce +- Plugin updates break workflows silently + +--- + +### What It Reads + +| Source | Purpose | +|--------|---------| +| Plugin README.md | Extract declared inputs/outputs | +| Claude.md | Extract agent definitions and tool references | +| MCP server schemas | Verify tools actually exist with expected signatures | + +--- + +### Tool Categories + +| Category | Tools | Description | +|----------|-------|-------------| +| Parse | `parse_plugin_interface`, `parse_claude_md_agents` | Extract structured data from docs | +| Validate | `validate_compatibility`, `validate_agent_refs`, `validate_data_flow` | Check contracts match | +| Report | `generate_compatibility_report`, `list_issues` | Output findings | + +#### Tool Details + +**`parse_plugin_interface`** +- Input: Plugin path or README content +- Output: Structured interface (inputs accepted, outputs produced, tool names) + +**`parse_claude_md_agents`** +- Input: Claude.md path or content +- Output: List of agents with their tool sequences + +**`validate_compatibility`** +- Input: Two plugin interfaces +- Output: Compatibility report (what A produces that B accepts, gaps) + +**`validate_agent_refs`** +- Input: Agent definition, list of available plugins +- Output: Missing tools, invalid sequences + +**`validate_data_flow`** +- Input: Agent sequence +- Output: Verification that each step's output matches next step's expected input + +--- + +### Agents (contract-validator) + +| Agent | Trigger | Sequence | +|-------|---------|----------| +| `full_validation` | User runs `/validate-contracts` | parse all plugin interfaces → parse Claude.md → validate_compatibility for each pair → validate_agent_refs for each agent → generate_compatibility_report | +| `agent_check` | User runs `/check-agent {name}` | parse_claude_md_agents → find agent → validate_agent_refs → validate_data_flow → report issues | + +--- + +### Commands + +| Command | Maps To | Description | +|---------|---------|-------------| +| `/validate-contracts` | `full_validation` agent | Full project validation | +| `/check-agent {name}` | `agent_check` agent | Validate single agent definition | +| `/list-interfaces` | `parse_plugin_interface` for all plugins | Show what each plugin produces/accepts | + +--- + +### Output Format + +**Compatibility Report:** + +``` +## Contract Validation Report + +### Plugin Interfaces +- data-platform: produces [data_ref, query_result, model_output, schema_snapshot] +- viz-platform: accepts [data_ref, query_result, model_output] + +### Compatibility Matrix +| Producer | Consumer | Status | +|----------|----------|--------| +| data-platform → viz-platform | ✓ Compatible | All outputs accepted | + +### Agent Validation +| Agent | Status | Issues | +|-------|--------|--------| +| dashboard_builder | ✓ Valid | — | +| model_analysis | ⚠ Warning | dbt-mcp optional; agent fails if not loaded | + +### Issues Found +- None + +### Warnings +- Agent `model_analysis` depends on optional server `dbt-mcp` +``` + +**Issue Types:** + +| Type | Severity | Example | +|------|----------|---------| +| Missing tool | Error | Agent references `pandas-mcp.transform` but tool is `pandas-mcp.reshape` | +| Interface mismatch | Error | viz-platform expects `chart_data` but data-platform produces `data_ref` | +| Optional dependency | Warning | Agent uses dbt-mcp which may not be loaded | +| Undeclared output | Warning | Plugin produces output not listed in README | + +--- + +### Integration with doc-guardian + +**Separation of concerns:** + +| Plugin | Scope | Trigger | +|--------|-------|---------| +| doc-guardian | Code ↔ docs drift within a project | PostToolUse (Write/Edit) | +| contract-validator | Plugin ↔ plugin compatibility | On-demand or CI hook | + +contract-validator does NOT watch for file changes. It runs on-demand or as CI step. + +**Potential future integration:** doc-guardian could trigger contract-validator when Claude.md or plugin README changes. Not required for v1. + +--- + +## Diagramming Approach + +No diagram-mcp server. Use existing Mermaid Chart MCP. + +**For ERDs:** +- postgres-mcp exposes schema metadata via `get_schema_snapshot` +- Claude generates Mermaid syntax +- Mermaid Chart MCP renders + +**For dbt lineage:** +- dbt-mcp.get_lineage outputs Mermaid-formatted DAG +- Mermaid Chart MCP renders + +This avoids the complexity of draw.io XML generation while maintaining documentation capability. + +--- + +## Implementation Order + +| Phase | Plugin | Server | Rationale | +|-------|--------|--------|-----------| +| 1 | data-platform | pandas-mcp | Entry point, no dependencies | +| 2 | data-platform | postgres-mcp | Load from Phase 1, query capabilities | +| 3 | data-platform | dbt-mcp | Transform layer, requires postgres-mcp | +| 4 | viz-platform | dmc-mcp | Constraint layer, no dependencies | +| 5 | viz-platform | dash-mcp | Visualization, validates against dmc-mcp | +| 6 | contract-validator | — | Validates all above, requires stable interfaces | + +**Notes:** +- Phases 1-3 (data-platform) and 4-5 (viz-platform) can proceed in parallel +- contract-validator (Phase 6) should wait until plugin interfaces stabilize +- doc-guardian already exists; update scope documentation only + +--- + +## Open Questions + +### Data Reference Passing + +How do servers share `data_ref` objects? Options: +- **Temporary files with URIs**: Portable but I/O overhead +- **Arrow IPC**: Efficient but requires both servers to support +- **Recommendation**: Arrow IPC for efficiency, file fallback for compatibility + +### Authentication + +Should postgres-mcp handle connection strings directly, or use a secrets manager pattern? + +### Theme Storage + +Where do custom themes persist? +- Local config file (`~/.dash-mcp/themes/`) +- Project-level (alongside dbt_project.yml) +- Database table (for shared team themes) + +### dbt Project Discovery + +Auto-detect `dbt_project.yml` in common locations, or require explicit path? + +--- + +## Technology Stack + +| Layer | Technology | Notes | +|-------|------------|-------| +| MCP Framework | FastMCP | Or manual MCP SDK | +| Python | 3.11+ | Type hints, async support | +| Data Processing | pandas | Core DataFrame ops | +| Arrow | pyarrow | Parquet, efficient memory | +| Database | psycopg | Async-ready Postgres driver | +| Geospatial | geoalchemy2 | PostGIS integration | +| dbt | dbt-core | CLI wrapper | +| Visualization | plotly | Figure generation | +| UI Components | dash-mantine-components | Version-locked via dmc-mcp | + +--- + +## Summary + +### Core Plugins + +| Plugin | Servers/Scope | Key Characteristic | +|--------|---------------|-------------------| +| data-platform | pandas-mcp, postgres-mcp, dbt-mcp | Optional server loading per project | +| viz-platform | dmc-mcp, dash-mcp | dmc-mcp validates before dash-mcp renders | +| contract-validator | Interface parsing, compatibility checks | Validates cross-plugin contracts and agent definitions | + +### Supporting Plugins (Existing) + +| Plugin | Purpose | +|--------|---------| +| doc-guardian | Code-to-docs drift (unchanged scope) | +| Mermaid Chart MCP | Diagram rendering | + +### Interaction Model + +``` +Plugin READMEs → declare inputs/outputs +Claude.md → define cross-plugin agents +contract-validator → validate compatibility +doc-guardian → catch drift within projects +``` + +**Flow:** Plugins declare interfaces. Claude.md defines workflows. contract-validator enforces compatibility. doc-guardian handles internal drift.