Add Change V04.0.0: Proposal

2026-01-26 14:37:30 +00:00
parent 0b538aebbd
commit 07b23cde10
1 changed files with 638 additions and 0 deletions
--- a/Change-V04.0.0%3A-Proposal.md
+++ b/Change-V04.0.0%3A-Proposal.md
@@ -0,0 +1,638 @@
+# MCP Data Platform — Architecture Reference
+
+*Plugin taxonomy, server responsibilities, and interaction patterns for Leo's data marketplace*
+
+---
+
+## Overview
+
+Two plugins serving distinct domains, designed for independent or combined use.
+
+| Plugin | Servers | Domain |
+|--------|---------|--------|
+| **data-platform** | pandas-mcp, postgres-mcp, dbt-mcp | Ingestion, storage, transformation |
+| **viz-platform** | dmc-mcp, dash-mcp | Component validation, dashboards, theming |
+
+**Key principles:**
+- MCP servers are independent processes—they don't import each other
+- Claude orchestrates cross-server data flow at runtime
+- Plugins ship multiple servers; projects load only what they need
+- Claude.md defines project-specific workflows spanning plugins
+
+---
+
+## Component Definitions
+
+| Component Type | Definition | Runtime Context |
+|----------------|------------|-----------------|
+| **MCP Server** | Standalone service exposing tools via Model Context Protocol. One server = one domain responsibility. | Long-running process, spawned by Claude Desktop/Code |
+| **Tool** | Single callable function within an MCP server. Atomic operation with defined input schema and output. | Invoked per-request by LLM |
+| **Resource** | Read-only data exposed by MCP server (files, schemas, configs). Discoverable but not executable. | Static or cached |
+| **Agent** | Orchestration layer that chains multiple tool calls across servers. Lives in Claude's reasoning, not in MCP servers. | LLM-driven, multi-step |
+| **Command** | User-facing shortcut (e.g., `/ingest`) that triggers predefined tool sequences. | Chat interface trigger |
+
+---
+
+## Plugin: data-platform
+
+### Server Loading
+
+Single plugin ships all three servers. Which servers load is determined by project config—not environment variables.
+
+| Server | Default | Optional |
+|--------|---------|----------|
+| pandas-mcp | ✓ | — |
+| postgres-mcp | ✓ | — |
+| dbt-mcp | — | ✓ |
+
+**Example project configs:**
+
+```yaml
+# Web app project (no dbt)
+mcp_servers:
+  - pandas-mcp
+  - postgres-mcp
+```
+
+```yaml
+# Data engineering project (full stack)
+mcp_servers:
+  - pandas-mcp
+  - postgres-mcp
+  - dbt-mcp
+```
+
+Agents check server availability at runtime. If dbt-mcp isn't loaded, dbt-related steps are skipped or surface "not available for this project."
+
+---
+
+### Server: pandas-mcp (Data Shaping Layer)
+
+**Responsibility:** File ingestion, data profiling, schema inference, and utility shaping operations.
+
+**Philosophy:** SQL-first for persistent transforms (use dbt). Pandas for:
+- Pre-database ingestion (profiling, validation, schema inference)
+- Visualization prep (reshaping query results for chart formats)
+- Ad-hoc operations (prototyping, merging with local files)
+
+#### Tool Categories
+
+| Category | Tools | Description |
+|----------|-------|-------------|
+| Ingestion | `read_file`, `write_file`, `detect_encoding` | File I/O with format auto-detection |
+| Profiling | `profile`, `validate`, `sample` | Data quality assessment |
+| Schema | `infer_schema` | Generate DDL from data structure |
+| Shaping | `reshape`, `pivot`, `melt`, `merge`, `add_columns`, `filter_rows` | Transform any data reference |
+
+#### Data Reference Sources
+
+pandas-mcp accepts `data_ref` from multiple origins:
+
+| Source | How It Arrives |
+|--------|----------------|
+| Local file | `read_file` tool |
+| Query result | Passed from postgres-mcp |
+| dbt model output | Passed from dbt-mcp |
+| Previous transform | Chained from shaping tool |
+
+#### When to Use Shaping Tools
+
+| Scenario | Use pandas-mcp | Use SQL/dbt |
+|----------|----------------|-------------|
+| Pivot for heatmap chart | ✓ | — |
+| Join query result with local CSV | ✓ | — |
+| Prototype transform before formalizing | ✓ | — |
+| Persistent aggregation in pipeline | — | ✓ |
+| Reusable business logic | — | ✓ |
+| Needs version control + testing | — | ✓ |
+
+---
+
+### Server: postgres-mcp (Database Layer)
+
+**Responsibility:** Data loading, querying, schema management, performance analysis, and geospatial operations.
+
+#### Tool Categories
+
+| Category | Tools | Description |
+|----------|-------|-------------|
+| Query | `list_schemas`, `list_tables`, `get_table_schema`, `execute_query`, `query_geometry` | Read operations |
+| Analysis | `explain_query`, `recommend_indexes`, `health_check` | Performance insights |
+| Write | `execute_write`, `load_dataframe` | Data modification |
+| DDL | `execute_ddl`, `get_schema_snapshot` | Schema management with change tracking |
+
+#### DDL Change Tracking
+
+`execute_ddl` returns structured output for downstream automation:
+
+```json
+{
+  "success": true,
+  "operation": "CREATE TABLE",
+  "affected_objects": [
+    {
+      "type": "table",
+      "schema": "public",
+      "name": "customer_orders",
+      "change": "created"
+    }
+  ],
+  "timestamp": "2025-01-22T14:30:00Z"
+}
+```
+
+This enables documentation updates, ERD regeneration (via Mermaid Chart MCP), or other automated responses.
+
+---
+
+### Server: dbt-mcp (Transform Layer)
+
+**Responsibility:** Model execution, lineage, documentation, and YAML generation for local dbt-core projects.
+
+**Note:** Official dbt-mcp is Cloud-only. This server wraps local dbt-core CLI.
+
+#### Tool Categories
+
+| Category | Tools | Description |
+|----------|-------|-------------|
+| Discovery | `parse_manifest`, `list_models`, `list_sources` | Project exploration |
+| Model | `get_model`, `get_lineage`, `compile_sql` | Model inspection |
+| Execution | `run_model`, `test_model`, `get_run_results` | dbt CLI wrapper |
+| Documentation | `generate_yaml` | Auto-generate schema.yml |
+
+#### Lineage Output
+
+`get_lineage` outputs Mermaid-formatted DAG, compatible with existing Mermaid Chart MCP for rendering.
+
+---
+
+### Internal Dependency Flow (data-platform)
+
+```
+files → pandas-mcp → postgres-mcp ↔ dbt-mcp
+              ↑______________|
+           (query results for reshaping)
+```
+
+| Flow | Description |
+|------|-------------|
+| files → pandas | Entry point for raw data |
+| pandas → postgres | Schema inference, bulk loading |
+| postgres ↔ dbt | dbt queries marts, postgres executes |
+| postgres → pandas | Query results for reshaping |
+| dbt → pandas | Model outputs for visualization prep |
+
+---
+
+### Agents (data-platform)
+
+| Agent | Trigger | Sequence |
+|-------|---------|----------|
+| `data_ingestion` | User provides file | read_file → profile → infer_schema → execute_ddl → load_dataframe → validate |
+| `model_analysis` | User asks about dbt model | get_model → get_lineage → explain_query → test_model → synthesize |
+| `full_pipeline` | File to materialized model | data_ingestion → create dbt model → run_model |
+
+**Behavior when dbt-mcp absent:**
+
+| Agent | Behavior |
+|-------|----------|
+| `data_ingestion` | Runs fully (no dbt steps) |
+| `model_analysis` | Skipped—surfaces "dbt not configured" |
+| `full_pipeline` | Stops after load, prompts user |
+
+---
+
+### Commands (data-platform)
+
+| Command | Maps To |
+|---------|---------|
+| `/ingest {file}` | `data_ingestion` agent |
+| `/profile {file}` | `pandas-mcp.profile` |
+| `/pivot {data} by {cols}` | `pandas-mcp.pivot` |
+| `/merge {left} {right} on {key}` | `pandas-mcp.merge` |
+| `/explain {query}` | `postgres-mcp.explain_query` |
+| `/schema {table}` | `postgres-mcp.get_table_schema` |
+| `/lineage {model}` | `dbt-mcp.get_lineage` |
+| `/run {model}` | `dbt-mcp.run_model` |
+| `/test {model}` | `dbt-mcp.test_model` |
+
+dbt commands return graceful "dbt-mcp not loaded" when unavailable.
+
+---
+
+## Plugin: viz-platform
+
+### Servers
+
+| Server | Responsibility |
+|--------|----------------|
+| dmc-mcp | Version-locked component registry, prop validation |
+| dash-mcp | Charts, layouts, pages, theming—validates against dmc-mcp |
+
+---
+
+### Server: dmc-mcp (Component Constraint Layer)
+
+**Responsibility:** Single source of truth for Dash Mantine Components API. Prevents Claude from hallucinating deprecated props or non-existent components.
+
+**Problem solved:** DMC versions introduce breaking changes. Claude's training data mixes versions. Runtime errors from invalid props waste cycles.
+
+#### Tool Categories
+
+| Category | Tools | Description |
+|----------|-------|-------------|
+| Discovery | `list_components` | What exists in installed version |
+| Introspection | `get_component_props` | Valid props, types, defaults |
+| Validation | `validate_component` | Check component definition before use |
+
+#### Usage Pattern
+
+Claude queries dmc-mcp first:
+1. "What props does `dmc.Select` accept?" → `get_component_props`
+2. Build component with valid props
+3. Pass to dash-mcp for rendering
+
+dash-mcp validates against dmc-mcp before rendering. Invalid components fail fast with actionable errors.
+
+---
+
+### Server: dash-mcp (Visualization Layer)
+
+**Responsibility:** Chart generation, dashboard layouts, page structure, theming system, and export.
+
+**Philosophy:** Single server, multiple concerns. Tools are namespaced but share context (theme tokens flow to charts automatically).
+
+#### Tool Categories
+
+| Category | Tools | Description |
+|----------|-------|-------------|
+| `chart_*` | `chart_create`, `chart_configure_interaction` | Data visualization (Plotly) |
+| `layout_*` | `layout_create`, `layout_add_filter`, `layout_set_grid` | Dashboard composition |
+| `page_*` | `page_create`, `page_add_navbar`, `page_set_auth` | App-level structure |
+| `theme_*` | `theme_create`, `theme_extend`, `theme_validate`, `theme_export_css` | Design tokens, component styles |
+
+#### Design Token Structure
+
+Themes are built from design tokens—single source of truth for visual consistency:
+
+```yaml
+tokens:
+  colors:
+    primary: "#228be6"
+    secondary: "#868e96"
+    background:
+      base: "#ffffff"
+      subtle: "#f8f9fa"
+    text:
+      primary: "#212529"
+      muted: "#868e96"
+  
+  spacing:
+    xs: "4px"
+    sm: "8px"
+    md: "16px"
+    lg: "24px"
+  
+  typography:
+    fontFamily: "Inter, sans-serif"
+    fontSize:
+      sm: "14px"
+      md: "16px"
+  
+  radii:
+    sm: "4px"
+    md: "8px"
+```
+
+#### Component Style Registry
+
+Per-component overrides ensuring consistency:
+
+| Component | Registered Style | Purpose |
+|-----------|------------------|---------|
+| `kpi_card` | Shadow, padding, border-radius | All KPIs look identical |
+| `data_table` | Header bg, row hover, border | Tables share appearance |
+| `filter_panel` | Background, spacing, alignment | Filters positioned consistently |
+| `chart_card` | Title typography, padding | Chart containers unified |
+
+---
+
+### Internal Dependency Flow (viz-platform)
+
+```
+dmc-mcp ← dash-mcp
+   ↑          |
+   └──────────┘
+   (validation before render)
+```
+
+dash-mcp always validates component definitions against dmc-mcp. No direct data dependency—data comes from external sources.
+
+---
+
+### Agents (viz-platform)
+
+| Agent | Trigger | Sequence |
+|-------|---------|----------|
+| `theme_setup` | New project or brand consistency | list_themes → create_theme → register_component_style → validate_theme |
+| `layout_builder` | User wants dashboard structure | create_layout → add_filter → apply_theme → preview |
+| `component_check` | Before rendering any DMC component | get_component_props → validate_component → proceed or error |
+
+---
+
+### Commands (viz-platform)
+
+| Command | Maps To |
+|---------|---------|
+| `/chart {type}` | `dash-mcp.chart_create` (expects data input) |
+| `/dashboard {template}` | `layout_builder` agent |
+| `/theme {name}` | `dash-mcp.theme_apply` |
+| `/theme new {name}` | `dash-mcp.theme_create` |
+| `/theme css {name}` | `dash-mcp.theme_export_css` |
+| `/component {name}` | `dmc-mcp.get_component_props` |
+
+---
+
+## Cross-Plugin Interactions
+
+### How It Works
+
+MCP servers don't call each other. Claude orchestrates:
+
+1. Server A returns output to Claude
+2. Claude interprets and determines next step
+3. Claude passes relevant data to Server B
+
+### Documentation Layers
+
+| Layer | Location | Purpose |
+|-------|----------|---------|
+| Plugin docs | Each plugin's README.md | Declares inputs/outputs |
+| Claude.md | Project root | Cross-plugin agents for this project |
+| contract-validator | Separate plugin | Validates compatibility |
+| doc-guardian | Separate plugin | Catches drift within each project |
+
+### Interface Contracts
+
+Each plugin declares what it produces and accepts:
+
+**data-platform outputs:**
+- `data_ref`: In-memory DataFrame reference
+- `query_result`: Row set from postgres-mcp
+- `model_output`: Materialized table reference from dbt-mcp
+- `schema_snapshot`: Full schema state for documentation
+
+**viz-platform inputs:**
+- Accepts `data_ref`, `query_result`, or `model_output` as data source
+- Validates all DMC components against dmc-mcp before rendering
+
+### Cross-Plugin Agents (defined in Claude.md)
+
+| Agent | Trigger | Sequence |
+|-------|---------|----------|
+| `dashboard_builder` | User requests visualization of database content | postgres-mcp.execute_query → pandas-mcp.pivot (if needed) → dmc-mcp.validate → dash-mcp.chart_create → dash-mcp.layout_create |
+| `visualization_prep` | Query result needs reshaping | postgres-mcp.execute_query → pandas-mcp.reshape → dash-mcp.chart_create |
+
+### Validation: contract-validator
+
+Separate plugin for cross-plugin validation. See **Plugin: contract-validator** section for full specification.
+
+**Key distinction from doc-guardian:**
+- doc-guardian: "did code change break docs?" (within a project)
+- contract-validator: "do plugins work together?" (across plugins)
+
+---
+
+## Plugin: contract-validator
+
+### Purpose
+
+Validates cross-plugin compatibility and Claude.md agent definitions. Ensures plugins can actually work together before runtime failures occur.
+
+**Problem solved:** Plugins declare interfaces in README. Claude.md references tools across plugins. Without validation:
+- Agents reference tools that don't exist
+- viz-platform expects input format data-platform doesn't produce
+- Plugin updates break workflows silently
+
+---
+
+### What It Reads
+
+| Source | Purpose |
+|--------|---------|
+| Plugin README.md | Extract declared inputs/outputs |
+| Claude.md | Extract agent definitions and tool references |
+| MCP server schemas | Verify tools actually exist with expected signatures |
+
+---
+
+### Tool Categories
+
+| Category | Tools | Description |
+|----------|-------|-------------|
+| Parse | `parse_plugin_interface`, `parse_claude_md_agents` | Extract structured data from docs |
+| Validate | `validate_compatibility`, `validate_agent_refs`, `validate_data_flow` | Check contracts match |
+| Report | `generate_compatibility_report`, `list_issues` | Output findings |
+
+#### Tool Details
+
+**`parse_plugin_interface`**
+- Input: Plugin path or README content
+- Output: Structured interface (inputs accepted, outputs produced, tool names)
+
+**`parse_claude_md_agents`**
+- Input: Claude.md path or content
+- Output: List of agents with their tool sequences
+
+**`validate_compatibility`**
+- Input: Two plugin interfaces
+- Output: Compatibility report (what A produces that B accepts, gaps)
+
+**`validate_agent_refs`**
+- Input: Agent definition, list of available plugins
+- Output: Missing tools, invalid sequences
+
+**`validate_data_flow`**
+- Input: Agent sequence
+- Output: Verification that each step's output matches next step's expected input
+
+---
+
+### Agents (contract-validator)
+
+| Agent | Trigger | Sequence |
+|-------|---------|----------|
+| `full_validation` | User runs `/validate-contracts` | parse all plugin interfaces → parse Claude.md → validate_compatibility for each pair → validate_agent_refs for each agent → generate_compatibility_report |
+| `agent_check` | User runs `/check-agent {name}` | parse_claude_md_agents → find agent → validate_agent_refs → validate_data_flow → report issues |
+
+---
+
+### Commands
+
+| Command | Maps To | Description |
+|---------|---------|-------------|
+| `/validate-contracts` | `full_validation` agent | Full project validation |
+| `/check-agent {name}` | `agent_check` agent | Validate single agent definition |
+| `/list-interfaces` | `parse_plugin_interface` for all plugins | Show what each plugin produces/accepts |
+
+---
+
+### Output Format
+
+**Compatibility Report:**
+
+```
+## Contract Validation Report
+
+### Plugin Interfaces
+- data-platform: produces [data_ref, query_result, model_output, schema_snapshot]
+- viz-platform: accepts [data_ref, query_result, model_output]
+
+### Compatibility Matrix
+| Producer | Consumer | Status |
+|----------|----------|--------|
+| data-platform → viz-platform | ✓ Compatible | All outputs accepted |
+
+### Agent Validation
+| Agent | Status | Issues |
+|-------|--------|--------|
+| dashboard_builder | ✓ Valid | — |
+| model_analysis | ⚠ Warning | dbt-mcp optional; agent fails if not loaded |
+
+### Issues Found
+- None
+
+### Warnings
+- Agent `model_analysis` depends on optional server `dbt-mcp`
+```
+
+**Issue Types:**
+
+| Type | Severity | Example |
+|------|----------|---------|
+| Missing tool | Error | Agent references `pandas-mcp.transform` but tool is `pandas-mcp.reshape` |
+| Interface mismatch | Error | viz-platform expects `chart_data` but data-platform produces `data_ref` |
+| Optional dependency | Warning | Agent uses dbt-mcp which may not be loaded |
+| Undeclared output | Warning | Plugin produces output not listed in README |
+
+---
+
+### Integration with doc-guardian
+
+**Separation of concerns:**
+
+| Plugin | Scope | Trigger |
+|--------|-------|---------|
+| doc-guardian | Code ↔ docs drift within a project | PostToolUse (Write/Edit) |
+| contract-validator | Plugin ↔ plugin compatibility | On-demand or CI hook |
+
+contract-validator does NOT watch for file changes. It runs on-demand or as CI step.
+
+**Potential future integration:** doc-guardian could trigger contract-validator when Claude.md or plugin README changes. Not required for v1.
+
+---
+
+## Diagramming Approach
+
+No diagram-mcp server. Use existing Mermaid Chart MCP.
+
+**For ERDs:**
+- postgres-mcp exposes schema metadata via `get_schema_snapshot`
+- Claude generates Mermaid syntax
+- Mermaid Chart MCP renders
+
+**For dbt lineage:**
+- dbt-mcp.get_lineage outputs Mermaid-formatted DAG
+- Mermaid Chart MCP renders
+
+This avoids the complexity of draw.io XML generation while maintaining documentation capability.
+
+---
+
+## Implementation Order
+
+| Phase | Plugin | Server | Rationale |
+|-------|--------|--------|-----------|
+| 1 | data-platform | pandas-mcp | Entry point, no dependencies |
+| 2 | data-platform | postgres-mcp | Load from Phase 1, query capabilities |
+| 3 | data-platform | dbt-mcp | Transform layer, requires postgres-mcp |
+| 4 | viz-platform | dmc-mcp | Constraint layer, no dependencies |
+| 5 | viz-platform | dash-mcp | Visualization, validates against dmc-mcp |
+| 6 | contract-validator | — | Validates all above, requires stable interfaces |
+
+**Notes:**
+- Phases 1-3 (data-platform) and 4-5 (viz-platform) can proceed in parallel
+- contract-validator (Phase 6) should wait until plugin interfaces stabilize
+- doc-guardian already exists; update scope documentation only
+
+---
+
+## Open Questions
+
+### Data Reference Passing
+
+How do servers share `data_ref` objects? Options:
+- **Temporary files with URIs**: Portable but I/O overhead
+- **Arrow IPC**: Efficient but requires both servers to support
+- **Recommendation**: Arrow IPC for efficiency, file fallback for compatibility
+
+### Authentication
+
+Should postgres-mcp handle connection strings directly, or use a secrets manager pattern?
+
+### Theme Storage
+
+Where do custom themes persist?
+- Local config file (`~/.dash-mcp/themes/`)
+- Project-level (alongside dbt_project.yml)
+- Database table (for shared team themes)
+
+### dbt Project Discovery
+
+Auto-detect `dbt_project.yml` in common locations, or require explicit path?
+
+---
+
+## Technology Stack
+
+| Layer | Technology | Notes |
+|-------|------------|-------|
+| MCP Framework | FastMCP | Or manual MCP SDK |
+| Python | 3.11+ | Type hints, async support |
+| Data Processing | pandas | Core DataFrame ops |
+| Arrow | pyarrow | Parquet, efficient memory |
+| Database | psycopg | Async-ready Postgres driver |
+| Geospatial | geoalchemy2 | PostGIS integration |
+| dbt | dbt-core | CLI wrapper |
+| Visualization | plotly | Figure generation |
+| UI Components | dash-mantine-components | Version-locked via dmc-mcp |
+
+---
+
+## Summary
+
+### Core Plugins
+
+| Plugin | Servers/Scope | Key Characteristic |
+|--------|---------------|-------------------|
+| data-platform | pandas-mcp, postgres-mcp, dbt-mcp | Optional server loading per project |
+| viz-platform | dmc-mcp, dash-mcp | dmc-mcp validates before dash-mcp renders |
+| contract-validator | Interface parsing, compatibility checks | Validates cross-plugin contracts and agent definitions |
+
+### Supporting Plugins (Existing)
+
+| Plugin | Purpose |
+|--------|---------|
+| doc-guardian | Code-to-docs drift (unchanged scope) |
+| Mermaid Chart MCP | Diagram rendering |
+
+### Interaction Model
+
+```
+Plugin READMEs     →  declare inputs/outputs
+Claude.md          →  define cross-plugin agents  
+contract-validator →  validate compatibility
+doc-guardian       →  catch drift within projects
+```
+
+**Flow:** Plugins declare interfaces. Claude.md defines workflows. contract-validator enforces compatibility. doc-guardian handles internal drift.