From 09bc26c47192d363358cd49b6fa3f6a969a38272 Mon Sep 17 00:00:00 2001 From: Leo Miranda Date: Sun, 25 Jan 2026 15:31:30 +0000 Subject: [PATCH] Add Change V04.0.0: Proposal --- Change-V04.0.0%3A-Proposal.md | 655 ++++++++++++++++++++++++++++++++++ 1 file changed, 655 insertions(+) create mode 100644 Change-V04.0.0%3A-Proposal.md diff --git a/Change-V04.0.0%3A-Proposal.md b/Change-V04.0.0%3A-Proposal.md new file mode 100644 index 0000000..fa73a97 --- /dev/null +++ b/Change-V04.0.0%3A-Proposal.md @@ -0,0 +1,655 @@ +# MCP Data Platform — Architecture Reference + +*Plugin taxonomy, server responsibilities, and interaction patterns for Leo’s data marketplace* + +----- + +## Overview + +Two plugins serving distinct domains, designed for independent or combined use. + +|Plugin |Servers |Domain | +|-----------------|---------------------------------|-----------------------------------------| +|**data-platform**|pandas-mcp, postgres-mcp, dbt-mcp|Ingestion, storage, transformation | +|**viz-platform** |dmc-mcp, dash-mcp |Component validation, dashboards, theming| + +**Key principles:** + +- MCP servers are independent processes—they don’t import each other +- Claude orchestrates cross-server data flow at runtime +- Plugins ship multiple servers; projects load only what they need +- Claude.md defines project-specific workflows spanning plugins + +----- + +## Component Definitions + +|Component Type|Definition |Runtime Context | +|--------------|--------------------------------------------------------------------------------------------------------------------|----------------------------------------------------| +|**MCP Server**|Standalone service exposing tools via Model Context Protocol. One server = one domain responsibility. |Long-running process, spawned by Claude Desktop/Code| +|**Tool** |Single callable function within an MCP server. Atomic operation with defined input schema and output. |Invoked per-request by LLM | +|**Resource** |Read-only data exposed by MCP server (files, schemas, configs). Discoverable but not executable. |Static or cached | +|**Agent** |Orchestration layer that chains multiple tool calls across servers. Lives in Claude’s reasoning, not in MCP servers.|LLM-driven, multi-step | +|**Command** |User-facing shortcut (e.g., `/ingest`) that triggers predefined tool sequences. |Chat interface trigger | + +----- + +## Plugin: data-platform + +### Server Loading + +Single plugin ships all three servers. Which servers load is determined by project config—not environment variables. + +|Server |Default|Optional| +|------------|-------|--------| +|pandas-mcp |✓ |— | +|postgres-mcp|✓ |— | +|dbt-mcp |— |✓ | + +**Example project configs:** + +```yaml +# Web app project (no dbt) +mcp_servers: + - pandas-mcp + - postgres-mcp +``` + +```yaml +# Data engineering project (full stack) +mcp_servers: + - pandas-mcp + - postgres-mcp + - dbt-mcp +``` + +Agents check server availability at runtime. If dbt-mcp isn’t loaded, dbt-related steps are skipped or surface “not available for this project.” + +----- + +### Server: pandas-mcp (Data Shaping Layer) + +**Responsibility:** File ingestion, data profiling, schema inference, and utility shaping operations. + +**Philosophy:** SQL-first for persistent transforms (use dbt). Pandas for: + +- Pre-database ingestion (profiling, validation, schema inference) +- Visualization prep (reshaping query results for chart formats) +- Ad-hoc operations (prototyping, merging with local files) + +#### Tool Categories + +|Category |Tools |Description | +|---------|-----------------------------------------------------------------|-----------------------------------| +|Ingestion|`read_file`, `write_file`, `detect_encoding` |File I/O with format auto-detection| +|Profiling|`profile`, `validate`, `sample` |Data quality assessment | +|Schema |`infer_schema` |Generate DDL from data structure | +|Shaping |`reshape`, `pivot`, `melt`, `merge`, `add_columns`, `filter_rows`|Transform any data reference | + +#### Data Reference Sources + +pandas-mcp accepts `data_ref` from multiple origins: + +|Source |How It Arrives | +|------------------|-------------------------| +|Local file |`read_file` tool | +|Query result |Passed from postgres-mcp | +|dbt model output |Passed from dbt-mcp | +|Previous transform|Chained from shaping tool| + +#### When to Use Shaping Tools + +|Scenario |Use pandas-mcp|Use SQL/dbt| +|--------------------------------------|--------------|-----------| +|Pivot for heatmap chart |✓ |— | +|Join query result with local CSV |✓ |— | +|Prototype transform before formalizing|✓ |— | +|Persistent aggregation in pipeline |— |✓ | +|Reusable business logic |— |✓ | +|Needs version control + testing |— |✓ | + +----- + +### Server: postgres-mcp (Database Layer) + +**Responsibility:** Data loading, querying, schema management, performance analysis, and geospatial operations. + +#### Tool Categories + +|Category|Tools |Description | +|--------|------------------------------------------------------------------------------------|--------------------------------------| +|Query |`list_schemas`, `list_tables`, `get_table_schema`, `execute_query`, `query_geometry`|Read operations | +|Analysis|`explain_query`, `recommend_indexes`, `health_check` |Performance insights | +|Write |`execute_write`, `load_dataframe` |Data modification | +|DDL |`execute_ddl`, `get_schema_snapshot` |Schema management with change tracking| + +#### DDL Change Tracking + +`execute_ddl` returns structured output for downstream automation: + +```json +{ + "success": true, + "operation": "CREATE TABLE", + "affected_objects": [ + { + "type": "table", + "schema": "public", + "name": "customer_orders", + "change": "created" + } + ], + "timestamp": "2025-01-22T14:30:00Z" +} +``` + +This enables documentation updates, ERD regeneration (via Mermaid Chart MCP), or other automated responses. + +----- + +### Server: dbt-mcp (Transform Layer) + +**Responsibility:** Model execution, lineage, documentation, and YAML generation for local dbt-core projects. + +**Note:** Official dbt-mcp is Cloud-only. This server wraps local dbt-core CLI. + +#### Tool Categories + +|Category |Tools |Description | +|-------------|-----------------------------------------------|------------------------| +|Discovery |`parse_manifest`, `list_models`, `list_sources`|Project exploration | +|Model |`get_model`, `get_lineage`, `compile_sql` |Model inspection | +|Execution |`run_model`, `test_model`, `get_run_results` |dbt CLI wrapper | +|Documentation|`generate_yaml` |Auto-generate schema.yml| + +#### Lineage Output + +`get_lineage` outputs Mermaid-formatted DAG, compatible with existing Mermaid Chart MCP for rendering. + +----- + +### Internal Dependency Flow (data-platform) + +``` +files → pandas-mcp → postgres-mcp ↔ dbt-mcp + ↑______________| + (query results for reshaping) +``` + +|Flow |Description | +|-----------------|------------------------------------| +|files → pandas |Entry point for raw data | +|pandas → postgres|Schema inference, bulk loading | +|postgres ↔ dbt |dbt queries marts, postgres executes| +|postgres → pandas|Query results for reshaping | +|dbt → pandas |Model outputs for visualization prep| + +----- + +### Agents (data-platform) + +|Agent |Trigger |Sequence | +|----------------|--------------------------|----------------------------------------------------------------------------| +|`data_ingestion`|User provides file |read_file → profile → infer_schema → execute_ddl → load_dataframe → validate| +|`model_analysis`|User asks about dbt model |get_model → get_lineage → explain_query → test_model → synthesize | +|`full_pipeline` |File to materialized model|data_ingestion → create dbt model → run_model | + +**Behavior when dbt-mcp absent:** + +|Agent |Behavior | +|----------------|-------------------------------------| +|`data_ingestion`|Runs fully (no dbt steps) | +|`model_analysis`|Skipped—surfaces “dbt not configured”| +|`full_pipeline` |Stops after load, prompts user | + +----- + +### Commands (data-platform) + +|Command |Maps To | +|--------------------------------|-------------------------------| +|`/ingest {file}` |`data_ingestion` agent | +|`/profile {file}` |`pandas-mcp.profile` | +|`/pivot {data} by {cols}` |`pandas-mcp.pivot` | +|`/merge {left} {right} on {key}`|`pandas-mcp.merge` | +|`/explain {query}` |`postgres-mcp.explain_query` | +|`/schema {table}` |`postgres-mcp.get_table_schema`| +|`/lineage {model}` |`dbt-mcp.get_lineage` | +|`/run {model}` |`dbt-mcp.run_model` | +|`/test {model}` |`dbt-mcp.test_model` | + +dbt commands return graceful “dbt-mcp not loaded” when unavailable. + +----- + +## Plugin: viz-platform + +### Servers + +|Server |Responsibility | +|--------|---------------------------------------------------------| +|dmc-mcp |Version-locked component registry, prop validation | +|dash-mcp|Charts, layouts, pages, theming—validates against dmc-mcp| + +----- + +### Server: dmc-mcp (Component Constraint Layer) + +**Responsibility:** Single source of truth for Dash Mantine Components API. Prevents Claude from hallucinating deprecated props or non-existent components. + +**Problem solved:** DMC versions introduce breaking changes. Claude’s training data mixes versions. Runtime errors from invalid props waste cycles. + +#### Tool Categories + +|Category |Tools |Description | +|-------------|---------------------|-------------------------------------| +|Discovery |`list_components` |What exists in installed version | +|Introspection|`get_component_props`|Valid props, types, defaults | +|Validation |`validate_component` |Check component definition before use| + +#### Usage Pattern + +Claude queries dmc-mcp first: + +1. “What props does `dmc.Select` accept?” → `get_component_props` +1. Build component with valid props +1. Pass to dash-mcp for rendering + +dash-mcp validates against dmc-mcp before rendering. Invalid components fail fast with actionable errors. + +----- + +### Server: dash-mcp (Visualization Layer) + +**Responsibility:** Chart generation, dashboard layouts, page structure, theming system, and export. + +**Philosophy:** Single server, multiple concerns. Tools are namespaced but share context (theme tokens flow to charts automatically). + +#### Tool Categories + +|Category |Tools |Description | +|----------|--------------------------------------------------------------------|-------------------------------| +|`chart_*` |`chart_create`, `chart_configure_interaction` |Data visualization (Plotly) | +|`layout_*`|`layout_create`, `layout_add_filter`, `layout_set_grid` |Dashboard composition | +|`page_*` |`page_create`, `page_add_navbar`, `page_set_auth` |App-level structure | +|`theme_*` |`theme_create`, `theme_extend`, `theme_validate`, `theme_export_css`|Design tokens, component styles| + +#### Design Token Structure + +Themes are built from design tokens—single source of truth for visual consistency: + +```yaml +tokens: + colors: + primary: "#228be6" + secondary: "#868e96" + background: + base: "#ffffff" + subtle: "#f8f9fa" + text: + primary: "#212529" + muted: "#868e96" + + spacing: + xs: "4px" + sm: "8px" + md: "16px" + lg: "24px" + + typography: + fontFamily: "Inter, sans-serif" + fontSize: + sm: "14px" + md: "16px" + + radii: + sm: "4px" + md: "8px" +``` + +#### Component Style Registry + +Per-component overrides ensuring consistency: + +|Component |Registered Style |Purpose | +|--------------|------------------------------|-------------------------------| +|`kpi_card` |Shadow, padding, border-radius|All KPIs look identical | +|`data_table` |Header bg, row hover, border |Tables share appearance | +|`filter_panel`|Background, spacing, alignment|Filters positioned consistently| +|`chart_card` |Title typography, padding |Chart containers unified | + +----- + +### Internal Dependency Flow (viz-platform) + +``` +dmc-mcp ← dash-mcp + ↑ | + └──────────┘ + (validation before render) +``` + +dash-mcp always validates component definitions against dmc-mcp. No direct data dependency—data comes from external sources. + +----- + +### Agents (viz-platform) + +|Agent |Trigger |Sequence | +|-----------------|----------------------------------|----------------------------------------------------------------------| +|`theme_setup` |New project or brand consistency |list_themes → create_theme → register_component_style → validate_theme| +|`layout_builder` |User wants dashboard structure |create_layout → add_filter → apply_theme → preview | +|`component_check`|Before rendering any DMC component|get_component_props → validate_component → proceed or error | + +----- + +### Commands (viz-platform) + +|Command |Maps To | +|-----------------------|--------------------------------------------| +|`/chart {type}` |`dash-mcp.chart_create` (expects data input)| +|`/dashboard {template}`|`layout_builder` agent | +|`/theme {name}` |`dash-mcp.theme_apply` | +|`/theme new {name}` |`dash-mcp.theme_create` | +|`/theme css {name}` |`dash-mcp.theme_export_css` | +|`/component {name}` |`dmc-mcp.get_component_props` | + +----- + +## Cross-Plugin Interactions + +### How It Works + +MCP servers don’t call each other. Claude orchestrates: + +1. Server A returns output to Claude +1. Claude interprets and determines next step +1. Claude passes relevant data to Server B + +### Documentation Layers + +|Layer |Location |Purpose | +|------------------|-----------------------|------------------------------------| +|Plugin docs |Each plugin’s README.md|Declares inputs/outputs | +|Claude.md |Project root |Cross-plugin agents for this project| +|contract-validator|Separate plugin |Validates compatibility | +|doc-guardian |Separate plugin |Catches drift within each project | + +### Interface Contracts + +Each plugin declares what it produces and accepts: + +**data-platform outputs:** + +- `data_ref`: In-memory DataFrame reference +- `query_result`: Row set from postgres-mcp +- `model_output`: Materialized table reference from dbt-mcp +- `schema_snapshot`: Full schema state for documentation + +**viz-platform inputs:** + +- Accepts `data_ref`, `query_result`, or `model_output` as data source +- Validates all DMC components against dmc-mcp before rendering + +### Cross-Plugin Agents (defined in Claude.md) + +|Agent |Trigger |Sequence | +|--------------------|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| +|`dashboard_builder` |User requests visualization of database content|postgres-mcp.execute_query → pandas-mcp.pivot (if needed) → dmc-mcp.validate → dash-mcp.chart_create → dash-mcp.layout_create| +|`visualization_prep`|Query result needs reshaping |postgres-mcp.execute_query → pandas-mcp.reshape → dash-mcp.chart_create | + +### Validation: contract-validator + +Separate plugin for cross-plugin validation. See **Plugin: contract-validator** section for full specification. + +**Key distinction from doc-guardian:** + +- doc-guardian: “did code change break docs?” (within a project) +- contract-validator: “do plugins work together?” (across plugins) + +----- + +## Plugin: contract-validator + +### Purpose + +Validates cross-plugin compatibility and Claude.md agent definitions. Ensures plugins can actually work together before runtime failures occur. + +**Problem solved:** Plugins declare interfaces in README. Claude.md references tools across plugins. Without validation: + +- Agents reference tools that don’t exist +- viz-platform expects input format data-platform doesn’t produce +- Plugin updates break workflows silently + +----- + +### What It Reads + +|Source |Purpose | +|------------------|----------------------------------------------------| +|Plugin README.md |Extract declared inputs/outputs | +|Claude.md |Extract agent definitions and tool references | +|MCP server schemas|Verify tools actually exist with expected signatures| + +----- + +### Tool Categories + +|Category|Tools |Description | +|--------|---------------------------------------------------------------------|---------------------------------| +|Parse |`parse_plugin_interface`, `parse_claude_md_agents` |Extract structured data from docs| +|Validate|`validate_compatibility`, `validate_agent_refs`, `validate_data_flow`|Check contracts match | +|Report |`generate_compatibility_report`, `list_issues` |Output findings | + +#### Tool Details + +**`parse_plugin_interface`** + +- Input: Plugin path or README content +- Output: Structured interface (inputs accepted, outputs produced, tool names) + +**`parse_claude_md_agents`** + +- Input: Claude.md path or content +- Output: List of agents with their tool sequences + +**`validate_compatibility`** + +- Input: Two plugin interfaces +- Output: Compatibility report (what A produces that B accepts, gaps) + +**`validate_agent_refs`** + +- Input: Agent definition, list of available plugins +- Output: Missing tools, invalid sequences + +**`validate_data_flow`** + +- Input: Agent sequence +- Output: Verification that each step’s output matches next step’s expected input + +----- + +### Agents (contract-validator) + +|Agent |Trigger |Sequence | +|-----------------|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| +|`full_validation`|User runs `/validate-contracts`|parse all plugin interfaces → parse Claude.md → validate_compatibility for each pair → validate_agent_refs for each agent → generate_compatibility_report| +|`agent_check` |User runs `/check-agent {name}`|parse_claude_md_agents → find agent → validate_agent_refs → validate_data_flow → report issues | + +----- + +### Commands + +|Command |Maps To |Description | +|---------------------|----------------------------------------|--------------------------------------| +|`/validate-contracts`|`full_validation` agent |Full project validation | +|`/check-agent {name}`|`agent_check` agent |Validate single agent definition | +|`/list-interfaces` |`parse_plugin_interface` for all plugins|Show what each plugin produces/accepts| + +----- + +### Output Format + +**Compatibility Report:** + +``` +## Contract Validation Report + +### Plugin Interfaces +- data-platform: produces [data_ref, query_result, model_output, schema_snapshot] +- viz-platform: accepts [data_ref, query_result, model_output] + +### Compatibility Matrix +| Producer | Consumer | Status | +|----------|----------|--------| +| data-platform → viz-platform | ✓ Compatible | All outputs accepted | + +### Agent Validation +| Agent | Status | Issues | +|-------|--------|--------| +| dashboard_builder | ✓ Valid | — | +| model_analysis | ⚠ Warning | dbt-mcp optional; agent fails if not loaded | + +### Issues Found +- None + +### Warnings +- Agent `model_analysis` depends on optional server `dbt-mcp` +``` + +**Issue Types:** + +|Type |Severity|Example | +|-------------------|--------|------------------------------------------------------------------------| +|Missing tool |Error |Agent references `pandas-mcp.transform` but tool is `pandas-mcp.reshape`| +|Interface mismatch |Error |viz-platform expects `chart_data` but data-platform produces `data_ref` | +|Optional dependency|Warning |Agent uses dbt-mcp which may not be loaded | +|Undeclared output |Warning |Plugin produces output not listed in README | + +----- + +### Integration with doc-guardian + +**Separation of concerns:** + +|Plugin |Scope |Trigger | +|------------------|----------------------------------|------------------------| +|doc-guardian |Code ↔ docs drift within a project|PostToolUse (Write/Edit)| +|contract-validator|Plugin ↔ plugin compatibility |On-demand or CI hook | + +contract-validator does NOT watch for file changes. It runs on-demand or as CI step. + +**Potential future integration:** doc-guardian could trigger contract-validator when Claude.md or plugin README changes. Not required for v1. + +----- + +## Diagramming Approach + +No diagram-mcp server. Use existing Mermaid Chart MCP. + +**For ERDs:** + +- postgres-mcp exposes schema metadata via `get_schema_snapshot` +- Claude generates Mermaid syntax +- Mermaid Chart MCP renders + +**For dbt lineage:** + +- dbt-mcp.get_lineage outputs Mermaid-formatted DAG +- Mermaid Chart MCP renders + +This avoids the complexity of draw.io XML generation while maintaining documentation capability. + +----- + +## Implementation Order + +|Phase|Plugin |Server |Rationale | +|-----|------------------|------------|-----------------------------------------------| +|1 |data-platform |pandas-mcp |Entry point, no dependencies | +|2 |data-platform |postgres-mcp|Load from Phase 1, query capabilities | +|3 |data-platform |dbt-mcp |Transform layer, requires postgres-mcp | +|4 |viz-platform |dmc-mcp |Constraint layer, no dependencies | +|5 |viz-platform |dash-mcp |Visualization, validates against dmc-mcp | +|6 |contract-validator|— |Validates all above, requires stable interfaces| + +**Notes:** + +- Phases 1-3 (data-platform) and 4-5 (viz-platform) can proceed in parallel +- contract-validator (Phase 6) should wait until plugin interfaces stabilize +- doc-guardian already exists; update scope documentation only + +----- + +## Open Questions + +### Data Reference Passing + +How do servers share `data_ref` objects? Options: + +- **Temporary files with URIs**: Portable but I/O overhead +- **Arrow IPC**: Efficient but requires both servers to support +- **Recommendation**: Arrow IPC for efficiency, file fallback for compatibility + +### Authentication + +Should postgres-mcp handle connection strings directly, or use a secrets manager pattern? + +### Theme Storage + +Where do custom themes persist? + +- Local config file (`~/.dash-mcp/themes/`) +- Project-level (alongside dbt_project.yml) +- Database table (for shared team themes) + +### dbt Project Discovery + +Auto-detect `dbt_project.yml` in common locations, or require explicit path? + +----- + +## Technology Stack + +|Layer |Technology |Notes | +|---------------|-----------------------|---------------------------| +|MCP Framework |FastMCP |Or manual MCP SDK | +|Python |3.11+ |Type hints, async support | +|Data Processing|pandas |Core DataFrame ops | +|Arrow |pyarrow |Parquet, efficient memory | +|Database |psycopg |Async-ready Postgres driver| +|Geospatial |geoalchemy2 |PostGIS integration | +|dbt |dbt-core |CLI wrapper | +|Visualization |plotly |Figure generation | +|UI Components |dash-mantine-components|Version-locked via dmc-mcp | + +----- + +## Summary + +### Core Plugins + +|Plugin |Servers/Scope |Key Characteristic | +|------------------|---------------------------------------|------------------------------------------------------| +|data-platform |pandas-mcp, postgres-mcp, dbt-mcp |Optional server loading per project | +|viz-platform |dmc-mcp, dash-mcp |dmc-mcp validates before dash-mcp renders | +|contract-validator|Interface parsing, compatibility checks|Validates cross-plugin contracts and agent definitions| + +### Supporting Plugins (Existing) + +|Plugin |Purpose | +|-----------------|------------------------------------| +|doc-guardian |Code-to-docs drift (unchanged scope)| +|Mermaid Chart MCP|Diagram rendering | + +### Interaction Model + +``` +Plugin READMEs → declare inputs/outputs +Claude.md → define cross-plugin agents +contract-validator → validate compatibility +doc-guardian → catch drift within projects +``` + +**Flow:** Plugins declare interfaces. Claude.md defines workflows. contract-validator enforces compatibility. doc-guardian handles internal drift. \ No newline at end of file