Update "unnamed"

2026-01-25 20:43:15 +00:00
parent 72596476a3
commit 6bdf6612b7

@@ -1,454 +1,96 @@
> **Type:** Change Proposal Implementation # Plugin Manifest Validation - Hooks and Agents Format Requirements
> **Version:** V04.0.0
> **Status:** Implemented
> **Date:** 2026-01-25
> **Origin:** [Change V04.0.0: Proposal](Change-V04.0.0:-Proposal)
--- **Date:** 2026-01-25 (Updated)
**Impact:** Plugin installation fails with "invalid input" errors
**Severity:** Critical - blocks plugin usage entirely
# data-platform Plugin Implementation Plan (v4.0.0) ## Context
## Overview When creating the data-platform plugin (v4.0.0), the plugin.json manifest had invalid formats for `hooks` and `agents` fields, causing installation to fail with:
Implement a new `data-platform` plugin for leo-claude-mktplace that addresses data workflow issues encountered in the personal-portfolio project:
- Lost data after multiple interactions (solved by Arrow IPC data_ref passing)
- dbt 1.9+ syntax deprecation (solved by pre-execution validation with `dbt parse`)
- Ungraceful PostgreSQL error handling (solved by SessionStart hook with warnings)
## Architecture Decisions
| Decision | Choice |
|----------|--------|
| Data Passing | Arrow IPC with data_ref |
| DB Auth | Environment variables (~/.config/claude/postgres.env) |
| dbt Discovery | Auto-detect + explicit override |
| dbt Validation | Pre-execution (`dbt parse`) |
| Plugin Structure | Single plugin, 3 MCP servers |
| Server Location | Root mcp-servers/ |
| Memory Management | 100k row limit with chunking |
| PostGIS Support | Yes, with geoalchemy2 |
| Agent Model | 2 agents (Ingestion + Analysis) |
| Commands | Core 6 |
| Startup Hook | Graceful DB warning (non-blocking) |
| MCP Framework | Manual SDK (following gitea pattern) |
## File Structure
``` ```
mcp-servers/ Error: failed to install plugin has an invalid manifest file
└── data-platform/ validation errors: hooks: invalid input, agents: invalid input
├── mcp_server/
│ ├── __init__.py
│ ├── server.py # Main MCP server with routing
│ ├── config.py # Hybrid config (system + project)
│ ├── data_store.py # Arrow IPC DataFrame registry
│ ├── pandas_tools.py # pandas tool implementations
│ ├── postgres_tools.py # PostgreSQL/PostGIS tools
│ └── dbt_tools.py # dbt CLI wrapper tools
├── requirements.txt
├── pyproject.toml
└── README.md
plugins/
└── data-platform/
├── .claude-plugin/
│ └── plugin.json
├── .mcp.json
├── mcp-servers/
│ └── data-platform -> ../../../mcp-servers/data-platform # symlink
├── commands/
│ ├── ingest.md # /ingest command
│ ├── profile.md # /profile command
│ ├── schema.md # /schema command
│ ├── explain.md # /explain command
│ ├── lineage.md # /lineage command
│ └── run.md # /run command
├── agents/
│ ├── data-ingestion.md # Data loading and transformation
│ └── data-analysis.md # Exploration and profiling
├── hooks/
│ └── hooks.json # SessionStart DB check
├── README.md
└── claude-md-integration.md
``` ```
## Implementation Phases ## Problem
### Phase 1: Foundation (Issues #1-2) Two invalid patterns were used in plugin.json:
**Files to create:** ### 1. Any hooks field in plugin.json (INVALID)
- `mcp-servers/data-platform/mcp_server/__init__.py` ```json
- `mcp-servers/data-platform/mcp_server/config.py` "hooks": "hooks/hooks.json"
- `mcp-servers/data-platform/mcp_server/data_store.py` ```
- `mcp-servers/data-platform/mcp_server/server.py` (skeleton) OR
- `mcp-servers/data-platform/requirements.txt` ```json
- `mcp-servers/data-platform/pyproject.toml` "hooks": { "SessionStart": [...] }
```
Both are invalid. Hooks should NOT be in plugin.json at all.
**config.py pattern** (from gitea): ### 2. Directory reference for agents (INVALID)
```python ```json
import os "agents": ["./agents/"]
from pathlib import Path ```
This assumed agent .md files needed to be registered in the manifest.
def load_config(): ## Solution
# System-level credentials
system_env = Path.home() / ".config/claude/postgres.env"
if system_env.exists():
load_dotenv(system_env)
# Project-level settings ### Hooks: Use separate hooks/hooks.json file (auto-discovered)
project_env = Path.cwd() / ".env"
if project_env.exists():
load_dotenv(project_env, override=True)
return { **WRONG (in plugin.json):**
"postgres_url": os.getenv("POSTGRES_URL"), ```json
"dbt_project_dir": os.getenv("DBT_PROJECT_DIR"), "hooks": "hooks/hooks.json"
"dbt_profiles_dir": os.getenv("DBT_PROFILES_DIR"), ```
} ```json
"hooks": { "SessionStart": [...] }
``` ```
**data_store.py** (Arrow IPC registry): **CORRECT:** Create `hooks/hooks.json` file (separate from plugin.json):
```python
import pyarrow as pa
import uuid
from typing import Dict, Optional
class DataStore:
_instance = None
_dataframes: Dict[str, pa.Table] = {}
@classmethod
def get_instance(cls):
if cls._instance is None:
cls._instance = cls()
return cls._instance
def store(self, df: pa.Table, name: Optional[str] = None) -> str:
data_ref = name or f"df_{uuid.uuid4().hex[:8]}"
self._dataframes[data_ref] = df
return data_ref
def get(self, data_ref: str) -> Optional[pa.Table]:
return self._dataframes.get(data_ref)
def list_refs(self) -> list:
return [{"ref": k, "rows": v.num_rows, "cols": v.num_columns}
for k, v in self._dataframes.items()]
```
### Phase 2: pandas-mcp Tools (Issue #3)
**Tools to implement in pandas_tools.py:**
| Tool | Description |
|------|-------------|
| `read_csv` | Load CSV with optional chunking |
| `read_parquet` | Load Parquet files |
| `read_json` | Load JSON/JSONL files |
| `to_csv` | Export DataFrame to CSV |
| `to_parquet` | Export DataFrame to Parquet |
| `describe` | Statistical summary |
| `head` | First N rows |
| `tail` | Last N rows |
| `filter` | Filter rows by condition |
| `select` | Select columns |
| `groupby` | Group and aggregate |
| `join` | Join two DataFrames |
| `list_data` | List all stored DataFrames |
| `drop_data` | Remove DataFrame from store |
**Memory management:**
```python
MAX_ROWS = 100_000
def read_csv(file_path: str, chunk_size: int = None) -> dict:
df = pd.read_csv(file_path)
if len(df) > MAX_ROWS:
return {
"warning": f"DataFrame has {len(df)} rows, exceeds {MAX_ROWS} limit",
"suggestion": f"Use chunk_size={MAX_ROWS} for chunked processing",
"preview": df.head(100).to_dict()
}
# Convert to Arrow and store
table = pa.Table.from_pandas(df)
data_ref = DataStore.get_instance().store(table)
return {"data_ref": data_ref, "rows": len(df), "columns": list(df.columns)}
```
### Phase 3: postgres-mcp Tools (Issue #4)
**Tools to implement in postgres_tools.py:**
| Tool | Description |
|------|-------------|
| `pg_connect` | Test connection and return status |
| `pg_query` | Execute SELECT, return as data_ref |
| `pg_execute` | Execute INSERT/UPDATE/DELETE |
| `pg_tables` | List all tables in schema |
| `pg_columns` | Get column info for table |
| `pg_schemas` | List all schemas |
| `st_tables` | List PostGIS-enabled tables |
| `st_geometry_type` | Get geometry type of column |
| `st_srid` | Get SRID of geometry column |
| `st_extent` | Get bounding box of geometries |
**asyncpg implementation:**
```python
import asyncpg
from geoalchemy2 import Geometry
async def pg_query(query: str, params: list = None) -> dict:
config = load_config()
conn = await asyncpg.connect(config["postgres_url"])
try:
rows = await conn.fetch(query, *(params or []))
df = pd.DataFrame([dict(r) for r in rows])
if len(df) > MAX_ROWS:
return {"warning": "Result truncated", "data_ref": store_truncated(df)}
table = pa.Table.from_pandas(df)
data_ref = DataStore.get_instance().store(table)
return {"data_ref": data_ref, "rows": len(df)}
finally:
await conn.close()
```
### Phase 4: dbt-mcp Tools (Issue #5)
**Tools to implement in dbt_tools.py:**
| Tool | Description |
|------|-------------|
| `dbt_parse` | Validate project (pre-execution) |
| `dbt_run` | Run models with selection |
| `dbt_test` | Run tests |
| `dbt_build` | Run + test |
| `dbt_compile` | Compile SQL without executing |
| `dbt_ls` | List resources |
| `dbt_docs_generate` | Generate documentation |
| `dbt_lineage` | Get model dependencies |
**Pre-execution validation pattern:**
```python
import subprocess
import json
def dbt_run(select: str = None, exclude: str = None) -> dict:
config = load_config()
project_dir = config.get("dbt_project_dir") or find_dbt_project()
# ALWAYS validate first
parse_result = subprocess.run(
["dbt", "parse", "--project-dir", project_dir],
capture_output=True, text=True
)
if parse_result.returncode != 0:
return {
"error": "dbt parse failed - fix issues before running",
"details": parse_result.stderr,
"suggestion": "Check for deprecated syntax (dbt 1.9+)"
}
# Execute run
cmd = ["dbt", "run", "--project-dir", project_dir]
if select:
cmd.extend(["--select", select])
result = subprocess.run(cmd, capture_output=True, text=True)
return {"success": result.returncode == 0, "output": result.stdout}
```
### Phase 5: Plugin Wrapper (Issue #6)
**plugins/data-platform/.claude-plugin/plugin.json:**
```json ```json
{ {
"name": "data-platform", "hooks": {
"version": "1.0.0", "SessionStart": [
"description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration", {
"author": "Leo Miranda", "type": "command",
"license": "MIT", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/startup-check.sh"
"hooks": "hooks/hooks.json", }
"commands": "commands/", ]
"agents": "agents/",
"mcp": ".mcp.json"
}
```
**plugins/data-platform/.mcp.json:**
```json
{
"mcpServers": {
"data-platform": {
"type": "stdio",
"command": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python",
"args": ["-m", "mcp_server.server"],
"cwd": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform"
}
} }
} }
``` ```
**plugins/data-platform/hooks/hooks.json:** The `hooks/hooks.json` file is AUTO-DISCOVERED by Claude Code. Do NOT reference it in plugin.json.
### Agents: Do NOT register .md files in plugin.json
Agent definition files (`.md`) in the `agents/` directory are automatically discovered by Claude Code. They do NOT need to be listed in plugin.json.
**WRONG:**
```json ```json
{ "agents": ["./agents/"]
"hooks": [
{
"event": "SessionStart",
"type": "command",
"command": ["${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python", "-c", "from mcp_server.postgres_tools import check_connection; check_connection()"],
"timeout": 5000,
"onError": "warn"
}
]
}
``` ```
**Agents:** **CORRECT:** Simply omit the "agents" field entirely. Just have the .md files in the agents/ directory.
`agents/data-ingestion.md`: ## Prevention
```markdown
# Data Ingestion Agent
You are a data ingestion specialist. Your role is to help users load, transform, and prepare data for analysis. ### Validation Checklist for New Plugins
## Available Tools 1. **Hooks:** Create `hooks/hooks.json` file, NEVER add hooks field to plugin.json
- pandas: read_csv, read_parquet, read_json, filter, select, groupby, join 2. **Agents:** Do not add "agents" field - .md files auto-discovered
- postgres: pg_query, pg_execute 3. **Run validation:** `./scripts/validate-marketplace.sh` before committing
4. **Test install:** Actually install the plugin before merging
## Workflow ### Reference Working Examples
1. Understand the data source and format
2. Load data with appropriate chunking for large files
3. Transform as needed (filter, select, aggregate)
4. Store results with meaningful data_ref names
```
`agents/data-analysis.md`: - **projman:** Has `hooks/hooks.json`, no hooks field in plugin.json
```markdown - **pr-review:** Has `hooks/hooks.json`, no hooks field in plugin.json
# Data Analysis Agent - **claude-config-maintainer:** Has `hooks/hooks.json`, no hooks field in plugin.json
You are a data analysis specialist. Your role is to help users explore, profile, and understand their data. ## Files Changed
## Available Tools - `plugins/data-platform/.claude-plugin/plugin.json` - Removed hooks field entirely
- pandas: describe, head, tail, list_data - `plugins/data-platform/hooks/hooks.json` - Created with proper format
- postgres: pg_tables, pg_columns
- dbt: dbt_lineage, dbt_docs_generate
## Workflow ---
1. List available data (list_data or pg_tables) **Tags:** plugin-development, manifest, validation, hooks, agents, data-platform, critical-fix
2. Profile data structure and statistics
3. Identify patterns and anomalies
4. Provide insights and recommendations
```
**Commands:**
| Command | File | Description |
|---------|------|-------------|
| `/ingest` | commands/ingest.md | Load data from files or database |
| `/profile` | commands/profile.md | Generate data profile and statistics |
| `/schema` | commands/schema.md | Show database/DataFrame schema |
| `/explain` | commands/explain.md | Explain dbt model lineage |
| `/lineage` | commands/lineage.md | Visualize data dependencies |
| `/run` | commands/run.md | Execute dbt models |
### Phase 6: Documentation & Integration
**Files to update:**
- `.claude-plugin/marketplace.json` - Add data-platform plugin entry
- `CHANGELOG.md` - Add v4.0.0 section under [Unreleased]
- `README.md` - Update plugin table
**Files to create:**
- `plugins/data-platform/README.md`
- `plugins/data-platform/claude-md-integration.md`
- `mcp-servers/data-platform/README.md`
## Sprint Structure (projman)
**Milestone:** Sprint 1 - data-platform Plugin (v4.0.0)
### Gitea Issues to Create
| # | Title | Labels | Effort |
|---|-------|--------|--------|
| 1 | [Sprint 01] feat: MCP server foundation and config | Type/Feature, Priority/High, Complexity/Medium, Effort/M, Tech/Python, Component/Backend | 1-2 days |
| 2 | [Sprint 01] feat: Arrow IPC data registry with memory limits | Type/Feature, Priority/High, Complexity/Medium, Effort/M, Tech/Python, Component/Backend | 1-2 days |
| 3 | [Sprint 01] feat: pandas-mcp core data operations (14 tools) | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Component/Backend | 3-5 days |
| 4 | [Sprint 01] feat: postgres-mcp database tools with PostGIS | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Tech/PostgreSQL, Component/Database | 3-5 days |
| 5 | [Sprint 01] feat: dbt-mcp build tools with pre-validation | Type/Feature, Priority/High, Complexity/Complex, Effort/L, Tech/Python, Component/Backend | 3-5 days |
| 6 | [Sprint 01] feat: Plugin wrapper, commands, and agents | Type/Feature, Priority/Medium, Complexity/Medium, Effort/M, Component/Docs | 1-2 days |
| 7 | [Sprint 01] docs: Documentation and marketplace integration | Type/Documentation, Priority/Medium, Complexity/Simple, Effort/S, Component/Docs | 2-4 hours |
### Issue Dependencies
```
#1 (foundation) ─┬─> #2 (data registry)
├─> #3 (pandas-mcp) ──┐
│ │
├─> #4 (postgres-mcp) ├─> #6 (plugin wrapper) ─> #7 (docs)
│ │
└─> #5 (dbt-mcp) ─────┘
```
**Parallel Execution Batches:**
1. Batch 1: #1 (foundation)
2. Batch 2: #2, #3, #4, #5 (can run in parallel after foundation)
3. Batch 3: #6 (plugin wrapper - needs all tools complete)
4. Batch 4: #7 (docs - final)
## Verification Steps
1. **MCP Server starts:**
```bash
cd mcp-servers/data-platform
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m mcp_server.server
```
2. **Tools are registered:**
- Start Claude Code in a test project
- Run `/ingest` command
- Verify MCP tools appear in tool list
3. **Data persistence:**
- Load a CSV file with `/ingest`
- Run multiple commands referencing the data_ref
- Verify data persists across tool calls
4. **PostgreSQL connection:**
- Configure `~/.config/claude/postgres.env`
- Start new session
- Verify SessionStart hook shows connection status (warning if unavailable)
5. **dbt validation:**
- Run `/run` on a dbt project with deprecated syntax
- Verify pre-execution validation catches issues
- Fix syntax and re-run successfully
6. **Validation script:**
```bash
./scripts/validate-marketplace.sh
```
## Dependencies
```
# requirements.txt
mcp>=1.0.0
pandas>=2.0.0
pyarrow>=14.0.0
asyncpg>=0.29.0
geoalchemy2>=0.14.0
python-dotenv>=1.0.0
dbt-core>=1.9.0
dbt-postgres>=1.9.0
```
## Out of Scope (v4.1.0+)
- Integration with projman sprint tracking
- Cross-plugin DataFrame sharing
- Visualization components (deferred to v5.0.0)
- Advanced dbt features (seeds, snapshots, exposures)