Add Change V04.0.0: Proposal

2026-01-26 14:37:30 +00:00
parent 0b538aebbd
commit 07b23cde10

@@ -0,0 +1,638 @@
# MCP Data Platform — Architecture Reference
*Plugin taxonomy, server responsibilities, and interaction patterns for Leo's data marketplace*
---
## Overview
Two plugins serving distinct domains, designed for independent or combined use.
| Plugin | Servers | Domain |
|--------|---------|--------|
| **data-platform** | pandas-mcp, postgres-mcp, dbt-mcp | Ingestion, storage, transformation |
| **viz-platform** | dmc-mcp, dash-mcp | Component validation, dashboards, theming |
**Key principles:**
- MCP servers are independent processes—they don't import each other
- Claude orchestrates cross-server data flow at runtime
- Plugins ship multiple servers; projects load only what they need
- Claude.md defines project-specific workflows spanning plugins
---
## Component Definitions
| Component Type | Definition | Runtime Context |
|----------------|------------|-----------------|
| **MCP Server** | Standalone service exposing tools via Model Context Protocol. One server = one domain responsibility. | Long-running process, spawned by Claude Desktop/Code |
| **Tool** | Single callable function within an MCP server. Atomic operation with defined input schema and output. | Invoked per-request by LLM |
| **Resource** | Read-only data exposed by MCP server (files, schemas, configs). Discoverable but not executable. | Static or cached |
| **Agent** | Orchestration layer that chains multiple tool calls across servers. Lives in Claude's reasoning, not in MCP servers. | LLM-driven, multi-step |
| **Command** | User-facing shortcut (e.g., `/ingest`) that triggers predefined tool sequences. | Chat interface trigger |
---
## Plugin: data-platform
### Server Loading
Single plugin ships all three servers. Which servers load is determined by project config—not environment variables.
| Server | Default | Optional |
|--------|---------|----------|
| pandas-mcp | ✓ | — |
| postgres-mcp | ✓ | — |
| dbt-mcp | — | ✓ |
**Example project configs:**
```yaml
# Web app project (no dbt)
mcp_servers:
- pandas-mcp
- postgres-mcp
```
```yaml
# Data engineering project (full stack)
mcp_servers:
- pandas-mcp
- postgres-mcp
- dbt-mcp
```
Agents check server availability at runtime. If dbt-mcp isn't loaded, dbt-related steps are skipped or surface "not available for this project."
---
### Server: pandas-mcp (Data Shaping Layer)
**Responsibility:** File ingestion, data profiling, schema inference, and utility shaping operations.
**Philosophy:** SQL-first for persistent transforms (use dbt). Pandas for:
- Pre-database ingestion (profiling, validation, schema inference)
- Visualization prep (reshaping query results for chart formats)
- Ad-hoc operations (prototyping, merging with local files)
#### Tool Categories
| Category | Tools | Description |
|----------|-------|-------------|
| Ingestion | `read_file`, `write_file`, `detect_encoding` | File I/O with format auto-detection |
| Profiling | `profile`, `validate`, `sample` | Data quality assessment |
| Schema | `infer_schema` | Generate DDL from data structure |
| Shaping | `reshape`, `pivot`, `melt`, `merge`, `add_columns`, `filter_rows` | Transform any data reference |
#### Data Reference Sources
pandas-mcp accepts `data_ref` from multiple origins:
| Source | How It Arrives |
|--------|----------------|
| Local file | `read_file` tool |
| Query result | Passed from postgres-mcp |
| dbt model output | Passed from dbt-mcp |
| Previous transform | Chained from shaping tool |
#### When to Use Shaping Tools
| Scenario | Use pandas-mcp | Use SQL/dbt |
|----------|----------------|-------------|
| Pivot for heatmap chart | ✓ | — |
| Join query result with local CSV | ✓ | — |
| Prototype transform before formalizing | ✓ | — |
| Persistent aggregation in pipeline | — | ✓ |
| Reusable business logic | — | ✓ |
| Needs version control + testing | — | ✓ |
---
### Server: postgres-mcp (Database Layer)
**Responsibility:** Data loading, querying, schema management, performance analysis, and geospatial operations.
#### Tool Categories
| Category | Tools | Description |
|----------|-------|-------------|
| Query | `list_schemas`, `list_tables`, `get_table_schema`, `execute_query`, `query_geometry` | Read operations |
| Analysis | `explain_query`, `recommend_indexes`, `health_check` | Performance insights |
| Write | `execute_write`, `load_dataframe` | Data modification |
| DDL | `execute_ddl`, `get_schema_snapshot` | Schema management with change tracking |
#### DDL Change Tracking
`execute_ddl` returns structured output for downstream automation:
```json
{
"success": true,
"operation": "CREATE TABLE",
"affected_objects": [
{
"type": "table",
"schema": "public",
"name": "customer_orders",
"change": "created"
}
],
"timestamp": "2025-01-22T14:30:00Z"
}
```
This enables documentation updates, ERD regeneration (via Mermaid Chart MCP), or other automated responses.
---
### Server: dbt-mcp (Transform Layer)
**Responsibility:** Model execution, lineage, documentation, and YAML generation for local dbt-core projects.
**Note:** Official dbt-mcp is Cloud-only. This server wraps local dbt-core CLI.
#### Tool Categories
| Category | Tools | Description |
|----------|-------|-------------|
| Discovery | `parse_manifest`, `list_models`, `list_sources` | Project exploration |
| Model | `get_model`, `get_lineage`, `compile_sql` | Model inspection |
| Execution | `run_model`, `test_model`, `get_run_results` | dbt CLI wrapper |
| Documentation | `generate_yaml` | Auto-generate schema.yml |
#### Lineage Output
`get_lineage` outputs Mermaid-formatted DAG, compatible with existing Mermaid Chart MCP for rendering.
---
### Internal Dependency Flow (data-platform)
```
files → pandas-mcp → postgres-mcp ↔ dbt-mcp
↑______________|
(query results for reshaping)
```
| Flow | Description |
|------|-------------|
| files → pandas | Entry point for raw data |
| pandas → postgres | Schema inference, bulk loading |
| postgres ↔ dbt | dbt queries marts, postgres executes |
| postgres → pandas | Query results for reshaping |
| dbt → pandas | Model outputs for visualization prep |
---
### Agents (data-platform)
| Agent | Trigger | Sequence |
|-------|---------|----------|
| `data_ingestion` | User provides file | read_file → profile → infer_schema → execute_ddl → load_dataframe → validate |
| `model_analysis` | User asks about dbt model | get_model → get_lineage → explain_query → test_model → synthesize |
| `full_pipeline` | File to materialized model | data_ingestion → create dbt model → run_model |
**Behavior when dbt-mcp absent:**
| Agent | Behavior |
|-------|----------|
| `data_ingestion` | Runs fully (no dbt steps) |
| `model_analysis` | Skipped—surfaces "dbt not configured" |
| `full_pipeline` | Stops after load, prompts user |
---
### Commands (data-platform)
| Command | Maps To |
|---------|---------|
| `/ingest {file}` | `data_ingestion` agent |
| `/profile {file}` | `pandas-mcp.profile` |
| `/pivot {data} by {cols}` | `pandas-mcp.pivot` |
| `/merge {left} {right} on {key}` | `pandas-mcp.merge` |
| `/explain {query}` | `postgres-mcp.explain_query` |
| `/schema {table}` | `postgres-mcp.get_table_schema` |
| `/lineage {model}` | `dbt-mcp.get_lineage` |
| `/run {model}` | `dbt-mcp.run_model` |
| `/test {model}` | `dbt-mcp.test_model` |
dbt commands return graceful "dbt-mcp not loaded" when unavailable.
---
## Plugin: viz-platform
### Servers
| Server | Responsibility |
|--------|----------------|
| dmc-mcp | Version-locked component registry, prop validation |
| dash-mcp | Charts, layouts, pages, theming—validates against dmc-mcp |
---
### Server: dmc-mcp (Component Constraint Layer)
**Responsibility:** Single source of truth for Dash Mantine Components API. Prevents Claude from hallucinating deprecated props or non-existent components.
**Problem solved:** DMC versions introduce breaking changes. Claude's training data mixes versions. Runtime errors from invalid props waste cycles.
#### Tool Categories
| Category | Tools | Description |
|----------|-------|-------------|
| Discovery | `list_components` | What exists in installed version |
| Introspection | `get_component_props` | Valid props, types, defaults |
| Validation | `validate_component` | Check component definition before use |
#### Usage Pattern
Claude queries dmc-mcp first:
1. "What props does `dmc.Select` accept?" → `get_component_props`
2. Build component with valid props
3. Pass to dash-mcp for rendering
dash-mcp validates against dmc-mcp before rendering. Invalid components fail fast with actionable errors.
---
### Server: dash-mcp (Visualization Layer)
**Responsibility:** Chart generation, dashboard layouts, page structure, theming system, and export.
**Philosophy:** Single server, multiple concerns. Tools are namespaced but share context (theme tokens flow to charts automatically).
#### Tool Categories
| Category | Tools | Description |
|----------|-------|-------------|
| `chart_*` | `chart_create`, `chart_configure_interaction` | Data visualization (Plotly) |
| `layout_*` | `layout_create`, `layout_add_filter`, `layout_set_grid` | Dashboard composition |
| `page_*` | `page_create`, `page_add_navbar`, `page_set_auth` | App-level structure |
| `theme_*` | `theme_create`, `theme_extend`, `theme_validate`, `theme_export_css` | Design tokens, component styles |
#### Design Token Structure
Themes are built from design tokens—single source of truth for visual consistency:
```yaml
tokens:
colors:
primary: "#228be6"
secondary: "#868e96"
background:
base: "#ffffff"
subtle: "#f8f9fa"
text:
primary: "#212529"
muted: "#868e96"
spacing:
xs: "4px"
sm: "8px"
md: "16px"
lg: "24px"
typography:
fontFamily: "Inter, sans-serif"
fontSize:
sm: "14px"
md: "16px"
radii:
sm: "4px"
md: "8px"
```
#### Component Style Registry
Per-component overrides ensuring consistency:
| Component | Registered Style | Purpose |
|-----------|------------------|---------|
| `kpi_card` | Shadow, padding, border-radius | All KPIs look identical |
| `data_table` | Header bg, row hover, border | Tables share appearance |
| `filter_panel` | Background, spacing, alignment | Filters positioned consistently |
| `chart_card` | Title typography, padding | Chart containers unified |
---
### Internal Dependency Flow (viz-platform)
```
dmc-mcp ← dash-mcp
↑ |
└──────────┘
(validation before render)
```
dash-mcp always validates component definitions against dmc-mcp. No direct data dependency—data comes from external sources.
---
### Agents (viz-platform)
| Agent | Trigger | Sequence |
|-------|---------|----------|
| `theme_setup` | New project or brand consistency | list_themes → create_theme → register_component_style → validate_theme |
| `layout_builder` | User wants dashboard structure | create_layout → add_filter → apply_theme → preview |
| `component_check` | Before rendering any DMC component | get_component_props → validate_component → proceed or error |
---
### Commands (viz-platform)
| Command | Maps To |
|---------|---------|
| `/chart {type}` | `dash-mcp.chart_create` (expects data input) |
| `/dashboard {template}` | `layout_builder` agent |
| `/theme {name}` | `dash-mcp.theme_apply` |
| `/theme new {name}` | `dash-mcp.theme_create` |
| `/theme css {name}` | `dash-mcp.theme_export_css` |
| `/component {name}` | `dmc-mcp.get_component_props` |
---
## Cross-Plugin Interactions
### How It Works
MCP servers don't call each other. Claude orchestrates:
1. Server A returns output to Claude
2. Claude interprets and determines next step
3. Claude passes relevant data to Server B
### Documentation Layers
| Layer | Location | Purpose |
|-------|----------|---------|
| Plugin docs | Each plugin's README.md | Declares inputs/outputs |
| Claude.md | Project root | Cross-plugin agents for this project |
| contract-validator | Separate plugin | Validates compatibility |
| doc-guardian | Separate plugin | Catches drift within each project |
### Interface Contracts
Each plugin declares what it produces and accepts:
**data-platform outputs:**
- `data_ref`: In-memory DataFrame reference
- `query_result`: Row set from postgres-mcp
- `model_output`: Materialized table reference from dbt-mcp
- `schema_snapshot`: Full schema state for documentation
**viz-platform inputs:**
- Accepts `data_ref`, `query_result`, or `model_output` as data source
- Validates all DMC components against dmc-mcp before rendering
### Cross-Plugin Agents (defined in Claude.md)
| Agent | Trigger | Sequence |
|-------|---------|----------|
| `dashboard_builder` | User requests visualization of database content | postgres-mcp.execute_query → pandas-mcp.pivot (if needed) → dmc-mcp.validate → dash-mcp.chart_create → dash-mcp.layout_create |
| `visualization_prep` | Query result needs reshaping | postgres-mcp.execute_query → pandas-mcp.reshape → dash-mcp.chart_create |
### Validation: contract-validator
Separate plugin for cross-plugin validation. See **Plugin: contract-validator** section for full specification.
**Key distinction from doc-guardian:**
- doc-guardian: "did code change break docs?" (within a project)
- contract-validator: "do plugins work together?" (across plugins)
---
## Plugin: contract-validator
### Purpose
Validates cross-plugin compatibility and Claude.md agent definitions. Ensures plugins can actually work together before runtime failures occur.
**Problem solved:** Plugins declare interfaces in README. Claude.md references tools across plugins. Without validation:
- Agents reference tools that don't exist
- viz-platform expects input format data-platform doesn't produce
- Plugin updates break workflows silently
---
### What It Reads
| Source | Purpose |
|--------|---------|
| Plugin README.md | Extract declared inputs/outputs |
| Claude.md | Extract agent definitions and tool references |
| MCP server schemas | Verify tools actually exist with expected signatures |
---
### Tool Categories
| Category | Tools | Description |
|----------|-------|-------------|
| Parse | `parse_plugin_interface`, `parse_claude_md_agents` | Extract structured data from docs |
| Validate | `validate_compatibility`, `validate_agent_refs`, `validate_data_flow` | Check contracts match |
| Report | `generate_compatibility_report`, `list_issues` | Output findings |
#### Tool Details
**`parse_plugin_interface`**
- Input: Plugin path or README content
- Output: Structured interface (inputs accepted, outputs produced, tool names)
**`parse_claude_md_agents`**
- Input: Claude.md path or content
- Output: List of agents with their tool sequences
**`validate_compatibility`**
- Input: Two plugin interfaces
- Output: Compatibility report (what A produces that B accepts, gaps)
**`validate_agent_refs`**
- Input: Agent definition, list of available plugins
- Output: Missing tools, invalid sequences
**`validate_data_flow`**
- Input: Agent sequence
- Output: Verification that each step's output matches next step's expected input
---
### Agents (contract-validator)
| Agent | Trigger | Sequence |
|-------|---------|----------|
| `full_validation` | User runs `/validate-contracts` | parse all plugin interfaces → parse Claude.md → validate_compatibility for each pair → validate_agent_refs for each agent → generate_compatibility_report |
| `agent_check` | User runs `/check-agent {name}` | parse_claude_md_agents → find agent → validate_agent_refs → validate_data_flow → report issues |
---
### Commands
| Command | Maps To | Description |
|---------|---------|-------------|
| `/validate-contracts` | `full_validation` agent | Full project validation |
| `/check-agent {name}` | `agent_check` agent | Validate single agent definition |
| `/list-interfaces` | `parse_plugin_interface` for all plugins | Show what each plugin produces/accepts |
---
### Output Format
**Compatibility Report:**
```
## Contract Validation Report
### Plugin Interfaces
- data-platform: produces [data_ref, query_result, model_output, schema_snapshot]
- viz-platform: accepts [data_ref, query_result, model_output]
### Compatibility Matrix
| Producer | Consumer | Status |
|----------|----------|--------|
| data-platform → viz-platform | ✓ Compatible | All outputs accepted |
### Agent Validation
| Agent | Status | Issues |
|-------|--------|--------|
| dashboard_builder | ✓ Valid | — |
| model_analysis | ⚠ Warning | dbt-mcp optional; agent fails if not loaded |
### Issues Found
- None
### Warnings
- Agent `model_analysis` depends on optional server `dbt-mcp`
```
**Issue Types:**
| Type | Severity | Example |
|------|----------|---------|
| Missing tool | Error | Agent references `pandas-mcp.transform` but tool is `pandas-mcp.reshape` |
| Interface mismatch | Error | viz-platform expects `chart_data` but data-platform produces `data_ref` |
| Optional dependency | Warning | Agent uses dbt-mcp which may not be loaded |
| Undeclared output | Warning | Plugin produces output not listed in README |
---
### Integration with doc-guardian
**Separation of concerns:**
| Plugin | Scope | Trigger |
|--------|-------|---------|
| doc-guardian | Code ↔ docs drift within a project | PostToolUse (Write/Edit) |
| contract-validator | Plugin ↔ plugin compatibility | On-demand or CI hook |
contract-validator does NOT watch for file changes. It runs on-demand or as CI step.
**Potential future integration:** doc-guardian could trigger contract-validator when Claude.md or plugin README changes. Not required for v1.
---
## Diagramming Approach
No diagram-mcp server. Use existing Mermaid Chart MCP.
**For ERDs:**
- postgres-mcp exposes schema metadata via `get_schema_snapshot`
- Claude generates Mermaid syntax
- Mermaid Chart MCP renders
**For dbt lineage:**
- dbt-mcp.get_lineage outputs Mermaid-formatted DAG
- Mermaid Chart MCP renders
This avoids the complexity of draw.io XML generation while maintaining documentation capability.
---
## Implementation Order
| Phase | Plugin | Server | Rationale |
|-------|--------|--------|-----------|
| 1 | data-platform | pandas-mcp | Entry point, no dependencies |
| 2 | data-platform | postgres-mcp | Load from Phase 1, query capabilities |
| 3 | data-platform | dbt-mcp | Transform layer, requires postgres-mcp |
| 4 | viz-platform | dmc-mcp | Constraint layer, no dependencies |
| 5 | viz-platform | dash-mcp | Visualization, validates against dmc-mcp |
| 6 | contract-validator | — | Validates all above, requires stable interfaces |
**Notes:**
- Phases 1-3 (data-platform) and 4-5 (viz-platform) can proceed in parallel
- contract-validator (Phase 6) should wait until plugin interfaces stabilize
- doc-guardian already exists; update scope documentation only
---
## Open Questions
### Data Reference Passing
How do servers share `data_ref` objects? Options:
- **Temporary files with URIs**: Portable but I/O overhead
- **Arrow IPC**: Efficient but requires both servers to support
- **Recommendation**: Arrow IPC for efficiency, file fallback for compatibility
### Authentication
Should postgres-mcp handle connection strings directly, or use a secrets manager pattern?
### Theme Storage
Where do custom themes persist?
- Local config file (`~/.dash-mcp/themes/`)
- Project-level (alongside dbt_project.yml)
- Database table (for shared team themes)
### dbt Project Discovery
Auto-detect `dbt_project.yml` in common locations, or require explicit path?
---
## Technology Stack
| Layer | Technology | Notes |
|-------|------------|-------|
| MCP Framework | FastMCP | Or manual MCP SDK |
| Python | 3.11+ | Type hints, async support |
| Data Processing | pandas | Core DataFrame ops |
| Arrow | pyarrow | Parquet, efficient memory |
| Database | psycopg | Async-ready Postgres driver |
| Geospatial | geoalchemy2 | PostGIS integration |
| dbt | dbt-core | CLI wrapper |
| Visualization | plotly | Figure generation |
| UI Components | dash-mantine-components | Version-locked via dmc-mcp |
---
## Summary
### Core Plugins
| Plugin | Servers/Scope | Key Characteristic |
|--------|---------------|-------------------|
| data-platform | pandas-mcp, postgres-mcp, dbt-mcp | Optional server loading per project |
| viz-platform | dmc-mcp, dash-mcp | dmc-mcp validates before dash-mcp renders |
| contract-validator | Interface parsing, compatibility checks | Validates cross-plugin contracts and agent definitions |
### Supporting Plugins (Existing)
| Plugin | Purpose |
|--------|---------|
| doc-guardian | Code-to-docs drift (unchanged scope) |
| Mermaid Chart MCP | Diagram rendering |
### Interaction Model
```
Plugin READMEs → declare inputs/outputs
Claude.md → define cross-plugin agents
contract-validator → validate compatibility
doc-guardian → catch drift within projects
```
**Flow:** Plugins declare interfaces. Claude.md defines workflows. contract-validator enforces compatibility. doc-guardian handles internal drift.