27 Commits

Author SHA1 Message Date
d353c74432 chore: release v4.1.0 2026-01-26 10:36:14 -05:00
ab4d940f94 fix(projman): add mandatory CHANGELOG and versioning to sprint-close workflow
Problem: Version workflow was documented but not enforced in sprint-close.
After completing sprints, CHANGELOG wasn't updated and releases weren't created.

Solution:
- Add "Update CHANGELOG" as mandatory step 7 in sprint-close
- Add "Version Check" as step 8 with /suggest-version and release.sh
- Update orchestrator agent with CHANGELOG and version reminders
- Add V04.1.0 wiki workflow changes to CHANGELOG [Unreleased]
- Document MCP bug #160 in Known Issues

This ensures versioning workflow is part of the sprint close process,
not just documented in CLAUDE.md.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 10:35:06 -05:00
34de0e4e82 [Sprint V04.1.0] feat: Add /proposal-status command for proposal tree view
- Create new proposal-status.md command
- Shows all proposals with status (Pending, In Progress, Implemented, Abandoned)
- Tree view of implementations under each proposal
- Links to issues and lessons learned
- Update README with new command documentation

Phase 4 of V04.1.0 Wiki-Based Planning Enhancement.
Closes #164

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 10:23:53 -05:00
1a0f3aa779 [Sprint V04.1.0] feat: Add cross-linking between issues, wiki, and lessons
- Add Metadata section to lesson structure with Implementation link
- Update lesson examples to include metadata with wiki reference
- Enable bidirectional traceability: lessons ↔ implementation pages

Phase 3 of V04.1.0 Wiki-Based Planning Enhancement.
Closes #163

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 10:21:53 -05:00
30b379b68c [Sprint V04.1.0] feat: Add wiki status updates to sprint-close workflow
- Add step 5: Update wiki implementation page status (Implemented/Partial/Failed)
- Add step 6: Update wiki proposal page when all implementations complete
- Add update_wiki_page to MCP tools in both sprint-close.md and orchestrator.md
- Add critical reminders for wiki status updates

Implements Phase 2 of V04.1.0 Wiki-Based Planning Enhancement.
Closes #162

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 10:15:08 -05:00
842ce3f6bc feat(projman): add wiki-based planning workflow to sprint-plan
Phase 1 implementation for Change V04.1.0:
- Add flexible input source detection (file, wiki, conversation)
- Add wiki proposal and implementation page creation
- Add wiki reference to created issues
- Add cleanup step to delete local files after migration
- Update planner agent with wiki workflow responsibilities

Closes #161

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 10:01:13 -05:00
74ff305b67 Merge pull request 'fix(data-platform): use separate hooks.json for development' (#158) from fix/data-platform-hooks-dev into development
Reviewed-on: #158
2026-01-25 20:52:19 +00:00
f22d49ed07 docs: correct plugin.json hooks format rules
Hooks should be in separate hooks/hooks.json file (auto-discovered),
NOT inline in plugin.json.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 15:48:58 -05:00
3f79288b54 fix(data-platform): use separate hooks.json file (not inline)
Cherry-pick fix from hotfix/data-platform-hooks to development.
Hooks must be in separate hooks/hooks.json file (auto-discovered),
NOT inline in plugin.json.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 15:48:41 -05:00
a910e9327d Merge pull request 'fix(data-platform): correct invalid plugin.json manifest format' (#155) from fix/data-platform-manifest into development
Reviewed-on: #155
2026-01-25 20:36:53 +00:00
9d1fedd3a5 docs: add plugin.json format rules to CLAUDE.md
Add critical warning about hooks and agents field formats
to prevent future manifest validation failures.

References lesson: plugin-manifest-validation---hooks-and-agents-format-requirements

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 15:33:49 -05:00
320ea6b72b fix(data-platform): correct invalid plugin.json manifest format
- Remove invalid "agents": ["./agents/"] - agent .md files don't need registration
- Inline hooks instead of external reference "hooks/hooks.json"
- Delete orphaned hooks.json file (content now in plugin.json)

This fixes "invalid input" validation errors for hooks and agents fields.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 15:32:54 -05:00
69bbffd9cc Merge pull request 'fix: proactive documentation and sprint planning automation' (#153) from fix/proactive-automation-docs into development
Reviewed-on: #153
2026-01-25 20:23:56 +00:00
6b666bff46 fix: proactive documentation and sprint planning automation
- doc-guardian: Hook now tracks documentation dependencies and outputs
  specific files needing updates (e.g., commands → COMMANDS-CHEATSHEET.md)
- projman: SessionStart hook now suggests /sprint-plan when open issues
  exist without milestone, and warns about unreleased CHANGELOG entries
- projman: Add /suggest-version command for semantic version recommendations
- docs: Update COMMANDS-CHEATSHEET.md with data-platform plugin (was missing)
- docs: Update CLAUDE.md with data-platform and version 4.0.0

Fixes documentation drift and lack of proactive workflow suggestions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 15:18:30 -05:00
a77b8ee123 Merge pull request 'chore: release v4.0.0' (#151) from chore/release-v4.0.0 into development
Reviewed-on: #151
2026-01-25 19:36:25 +00:00
498dac5230 chore: release v4.0.0
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 14:33:47 -05:00
af0b92461a Merge pull request 'feat: add data-platform plugin (v4.0.0)' (#149) from feat/data-platform-plugin-v4.0.0 into development
Reviewed-on: #149
2026-01-25 19:27:26 +00:00
89f0354ccc feat: add data-platform plugin (v4.0.0)
Add new data-platform plugin for data engineering workflows with:

MCP Server (32 tools):
- pandas operations (14 tools): read_csv, read_parquet, read_json,
  to_csv, to_parquet, describe, head, tail, filter, select, groupby,
  join, list_data, drop_data
- PostgreSQL/PostGIS (10 tools): pg_connect, pg_query, pg_execute,
  pg_tables, pg_columns, pg_schemas, st_tables, st_geometry_type,
  st_srid, st_extent
- dbt integration (8 tools): dbt_parse, dbt_run, dbt_test, dbt_build,
  dbt_compile, dbt_ls, dbt_docs_generate, dbt_lineage

Plugin Features:
- Arrow IPC data_ref system for DataFrame persistence across tool calls
- Pre-execution validation for dbt with `dbt parse`
- SessionStart hook for PostgreSQL connectivity check (non-blocking)
- Hybrid configuration (system ~/.config/claude/postgres.env + project .env)
- Memory management with 100k row limit and chunking support

Commands: /initial-setup, /ingest, /profile, /schema, /explain, /lineage, /run
Agents: data-ingestion, data-analysis

Test suite: 71 tests covering config, data store, pandas, postgres, dbt tools

Addresses data workflow issues from personal-portfolio project:
- Lost data after multiple interactions (solved by Arrow IPC data_ref)
- dbt 1.9+ syntax deprecation (solved by pre-execution validation)
- Ungraceful PostgreSQL error handling (solved by SessionStart hook)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 14:24:03 -05:00
6a267d074b Merge pull request 'feat: add versioning workflow with release script + release v3.2.0' (#147) from fix/issue-143-versioning-workflow into development
Reviewed-on: #147
2026-01-24 18:16:05 +00:00
bcde33c7d0 chore: release v3.2.0
Version 3.2.0 includes:
- New features: netbox device params, claude-config-maintainer auto-enforce
- Bug fixes: cmdb-assistant schemas, netbox tool names, API URLs
- All hooks converted to command type
- Versioning workflow with release script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 13:15:30 -05:00
ee3268fbe0 feat: add versioning workflow with release script
- Add scripts/release.sh for consistent version releases
- Fix CHANGELOG.md: consolidate all changes under [Unreleased]
- Update CLAUDE.md with comprehensive versioning documentation
- Include all commits since v3.1.1 in [Unreleased] section

The release script ensures version consistency across:
- Git tags
- README.md title
- marketplace.json version
- CHANGELOG.md sections

Addresses #143

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 13:15:30 -05:00
f6a38ffaa8 Merge pull request 'fix: remove destructive cache-clear instruction from CLAUDE.md' (#146) from fix/issue-145-cache-clear-instructions into development
Reviewed-on: #146
2026-01-24 18:14:03 +00:00
b13ffce0a0 fix: remove destructive cache-clear instruction from CLAUDE.md
The CLAUDE.md mandatory rule instructed clearing the plugin cache
mid-session, which breaks all MCP tools that were already loaded.
MCP tools are loaded with absolute paths to the venv, and deleting
the cache removes the venv while the session still references it.

Changes:
- CLAUDE.md: Replace "ALWAYS CLEAR CACHE" with "VERIFY AND RESTART"
- verify-hooks.sh: Change cache existence from error to info message
- DEBUGGING-CHECKLIST.md: Add section explaining cache timing

Fixes #145

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:57:32 -05:00
b39e01efd7 Merge pull request 'feat(projman): add user-reported issue mode to /debug-report' (#141) from feat/issue-139-debug-report-user-mode into development
Reviewed-on: #141
2026-01-24 17:36:16 +00:00
98eea5b6f9 Merge pull request 'chore: remove unreleased v3.1.2 changelog entry' (#140) from chore/remove-unreleased-changelog into development
Reviewed-on: #140
2026-01-24 17:35:58 +00:00
fe36ed91f2 feat(projman): add user-reported issue mode to /debug-report
Adds a new "user-reported" mode alongside the existing automated
diagnostics mode. Users can now choose to:

1. Run automated diagnostics (existing behavior)
2. Report an issue they experienced while using any plugin command

User-reported mode:
- Step 0: Mode selection via AskUserQuestion
- Step 0.1: Structured feedback gathering
  - Which plugin/command was affected
  - What the user was trying to do
  - What went wrong (error, missing feature, unexpected behavior, docs)
  - Expected vs actual behavior
  - Any workarounds found
- Step 5.1: Smart label generation based on problem type
- Step 6.1: User-friendly issue template with investigation hints

This allows capturing UX issues, missing features, and documentation
problems that wouldn't be caught by automated MCP tool tests.

Fixes #139

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:30:37 -05:00
8c85f9ca5f chore: remove unreleased v3.1.2 changelog entry
Tag v3.1.2 was never created. Remove the changelog entry to
avoid confusion between documented and actual releases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:26:08 -05:00
53 changed files with 6787 additions and 161 deletions

View File

@@ -6,12 +6,12 @@
},
"metadata": {
"description": "Project management plugins with Gitea and NetBox integrations",
"version": "3.1.0"
"version": "4.1.0"
},
"plugins": [
{
"name": "projman",
"version": "3.1.0",
"version": "3.2.0",
"description": "Sprint planning and project management with Gitea integration",
"source": "./plugins/projman",
"author": {
@@ -149,6 +149,22 @@
"category": "development",
"tags": ["code-review", "pull-requests", "security", "quality"],
"license": "MIT"
},
{
"name": "data-platform",
"version": "1.0.0",
"description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration",
"source": "./plugins/data-platform",
"author": {
"name": "Leo Miranda",
"email": "leobmiranda@gmail.com"
},
"homepage": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace/src/branch/main/plugins/data-platform/README.md",
"repository": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace.git",
"mcpServers": ["./.mcp.json"],
"category": "data",
"tags": ["pandas", "postgresql", "postgis", "dbt", "data-engineering", "etl"],
"license": "MIT"
}
]
}

View File

@@ -4,14 +4,85 @@ All notable changes to the Leo Claude Marketplace will be documented in this fil
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [3.1.2] - 2026-01-23
## [Unreleased]
---
## [4.1.0] - 2026-01-26
### Added
- **projman:** Wiki-based planning workflow enhancement (V04.1.0)
- Flexible input source detection in `/sprint-plan` (file, wiki, or conversation)
- Wiki proposal and implementation page creation during sprint planning
- Wiki reference linking in created issues
- Wiki status updates in `/sprint-close` (Implemented/Partial/Failed)
- Metadata section in lessons learned with implementation link for traceability
- New `/proposal-status` command for viewing proposal/implementation tree
- **projman:** `/suggest-version` command - Analyzes CHANGELOG and recommends semantic version bump
- **projman:** SessionStart hook now suggests sprint planning when open issues exist without milestone
- **projman:** SessionStart hook now warns about unreleased CHANGELOG entries
### Changed
- **doc-guardian:** Hook now tracks documentation dependencies and queues specific files needing updates
- Outputs which specific docs need updating (e.g., "commands changed → update needed: docs/COMMANDS-CHEATSHEET.md README.md")
- Maintains queue file (`.doc-guardian-queue`) for batch processing
- **docs:** COMMANDS-CHEATSHEET.md updated with data-platform plugin (7 commands + hook)
### Fixed
- Documentation drift: COMMANDS-CHEATSHEET.md was missing data-platform plugin added in v4.0.0
- Proactive sprint planning: projman now suggests `/sprint-plan` at session start when unplanned issues exist
### Known Issues
- **MCP Bug #160:** `update_wiki_page` tool renames pages to "unnamed" when page_name contains URL-encoded characters (`:``%3A`). Workaround: use `create_wiki_page` to overwrite instead.
---
## [4.0.0] - 2026-01-25
### Added
#### New Plugin: data-platform v1.0.0
- **pandas MCP Tools** (14 tools): DataFrame operations with Arrow IPC data_ref persistence
- `read_csv`, `read_parquet`, `read_json` - Load data with chunking support
- `to_csv`, `to_parquet` - Export to various formats
- `describe`, `head`, `tail` - Data exploration
- `filter`, `select`, `groupby`, `join` - Data transformation
- `list_data`, `drop_data` - Memory management
- **PostgreSQL MCP Tools** (10 tools): Database operations with asyncpg connection pooling
- `pg_connect`, `pg_query`, `pg_execute` - Core database operations
- `pg_tables`, `pg_columns`, `pg_schemas` - Schema exploration
- `st_tables`, `st_geometry_type`, `st_srid`, `st_extent` - PostGIS spatial support
- **dbt MCP Tools** (8 tools): Build tool wrapper with pre-execution validation
- `dbt_parse` - Pre-flight validation (catches dbt 1.9+ deprecations)
- `dbt_run`, `dbt_test`, `dbt_build` - Execution with auto-validation
- `dbt_compile`, `dbt_ls`, `dbt_docs_generate`, `dbt_lineage` - Analysis tools
- **Commands**: `/ingest`, `/profile`, `/schema`, `/explain`, `/lineage`, `/run`
- **Agents**: `data-ingestion` (loading/transformation), `data-analysis` (exploration/profiling)
- **SessionStart Hook**: Graceful PostgreSQL connection check (non-blocking warning)
- **Key Features**:
- data_ref system for DataFrame persistence across tool calls
- 100k row limit with chunking support for large datasets
- Hybrid configuration (system: `~/.config/claude/postgres.env`, project: `.env`)
- Auto-detection of dbt projects
- Arrow IPC format for efficient memory management
---
## [3.2.0] - 2026-01-24
### Added
- **git-flow:** `/commit` now detects protected branches before committing
- Warns when on protected branch (main, master, development, staging, production)
- Offers to create feature branch automatically instead of committing directly
- Configurable via `GIT_PROTECTED_BRANCHES` environment variable
- Resolves issue where commits to protected branches would fail on push
- **netbox:** Platform and primary_ip parameters added to device update tools
- **claude-config-maintainer:** Auto-enforce mandatory behavior rules via SessionStart hook
- **scripts:** `release.sh` - Versioning workflow script for consistent releases
- **scripts:** `verify-hooks.sh` - Verify all hooks are command type
### Changed
- **doc-guardian:** Hook switched from `prompt` type to `command` type
@@ -19,14 +90,24 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
- New `notify.sh` bash script guarantees exact output behavior
- Only notifies for config file changes (commands/, agents/, skills/, hooks/)
- Silent exit for all other files - no blocking possible
- **All hooks:** Stricter plugin prefix enforcement
- All prompts now mandate `[plugin-name]` prefix with "NO EXCEPTIONS" rule
- **All hooks:** Converted to command type with stricter plugin prefix enforcement
- All hooks now mandate `[plugin-name]` prefix with "NO EXCEPTIONS" rule
- Simplified output formats with word limits
- Consistent structure across projman, pr-review, code-sentinel, doc-guardian
- **CLAUDE.md:** Replaced destructive "ALWAYS CLEAR CACHE" rule with "VERIFY AND RESTART"
- Cache clearing mid-session breaks MCP tools
- Added guidance for proper plugin development workflow
### Fixed
- Protected branch workflow: Claude no longer commits directly to protected branches and then fails on push (fixes #109)
- doc-guardian hook no longer blocks workflow - switched to command hook that can't be overridden by model (fixes #110)
- **cmdb-assistant:** Complete MCP tool schemas for update operations (#138)
- **netbox:** Shorten tool names to meet 64-char API limit (#134)
- **cmdb-assistant:** Correct NetBox API URL format in setup wizard (#132)
- **gitea/projman:** Type safety for `create_label_smart`, curl-based debug-report (#124)
- **netbox:** Add diagnostic logging for JSON parse errors (#121)
- **labels:** Add duplicate check before creating labels (#116)
- **hooks:** Convert ALL hooks to command type with proper prefixes (#114)
- Protected branch workflow: Claude no longer commits directly to protected branches (fixes #109)
- doc-guardian hook no longer blocks workflow (fixes #110)
---

112
CLAUDE.md
View File

@@ -31,11 +31,16 @@ This file provides guidance to Claude Code when working with code in this reposi
- If user asks for output, show the OUTPUT
- **Don't interpret or summarize unless asked**
### 5. AFTER PLUGIN UPDATES - ALWAYS CLEAR CACHE
```bash
rm -rf ~/.claude/plugins/cache/leo-claude-mktplace/
./scripts/verify-hooks.sh
```
### 5. AFTER PLUGIN UPDATES - VERIFY AND RESTART
**⚠️ DO NOT clear cache mid-session** - this breaks MCP tools that are already loaded.
1. Run `./scripts/verify-hooks.sh` to check hook types
2. If changes affect MCP servers or hooks, inform the user:
> "Plugin changes require a session restart to take effect. Please restart Claude Code."
3. Cache clearing is ONLY safe **before** starting a new session (not during)
See `docs/DEBUGGING-CHECKLIST.md` for details on cache timing.
**FAILURE TO FOLLOW THESE RULES = WASTED USER TIME = UNACCEPTABLE**
@@ -45,7 +50,7 @@ rm -rf ~/.claude/plugins/cache/leo-claude-mktplace/
## Project Overview
**Repository:** leo-claude-mktplace
**Version:** 3.1.2
**Version:** 4.0.0
**Status:** Production Ready
A plugin marketplace for Claude Code containing:
@@ -60,6 +65,7 @@ A plugin marketplace for Claude Code containing:
| `code-sentinel` | Security scanning and code refactoring tools | 1.0.0 |
| `claude-config-maintainer` | CLAUDE.md optimization and maintenance | 1.0.0 |
| `cmdb-assistant` | NetBox CMDB integration for infrastructure management | 1.0.0 |
| `data-platform` | pandas, PostgreSQL, and dbt integration for data engineering | 1.0.0 |
| `project-hygiene` | Post-task cleanup automation via hooks | 0.1.0 |
## Quick Start
@@ -79,10 +85,12 @@ A plugin marketplace for Claude Code containing:
| **Setup** | `/initial-setup`, `/project-init`, `/project-sync` |
| **Sprint** | `/sprint-plan`, `/sprint-start`, `/sprint-status`, `/sprint-close` |
| **Quality** | `/review`, `/test-check`, `/test-gen` |
| **Versioning** | `/suggest-version` |
| **PR Review** | `/pr-review:initial-setup`, `/pr-review:project-init` |
| **Docs** | `/doc-audit`, `/doc-sync` |
| **Security** | `/security-scan`, `/refactor`, `/refactor-dry` |
| **Config** | `/config-analyze`, `/config-optimize` |
| **Data** | `/ingest`, `/profile`, `/schema`, `/explain`, `/lineage`, `/run` |
| **Debug** | `/debug-report`, `/debug-review` |
## Repository Structure
@@ -99,8 +107,8 @@ leo-claude-mktplace/
│ │ ├── .claude-plugin/plugin.json
│ │ ├── .mcp.json
│ │ ├── mcp-servers/gitea -> ../../../mcp-servers/gitea # SYMLINK
│ │ ├── commands/ # 13 commands (incl. setup, debug)
│ │ ├── hooks/ # SessionStart mismatch detection
│ │ ├── commands/ # 14 commands (incl. setup, debug, suggest-version)
│ │ ├── hooks/ # SessionStart: mismatch detection + sprint suggestions
│ │ ├── agents/ # 4 agents
│ │ └── skills/label-taxonomy/
│ ├── git-flow/ # Git workflow automation
@@ -114,10 +122,17 @@ leo-claude-mktplace/
│ │ ├── commands/ # 6 commands (incl. setup)
│ │ ├── hooks/ # SessionStart mismatch detection
│ │ └── agents/ # 5 agents
│ ├── clarity-assist/ # Prompt optimization (NEW v3.0.0)
│ ├── clarity-assist/ # Prompt optimization
│ │ ├── .claude-plugin/plugin.json
│ │ ├── commands/ # 2 commands
│ │ └── agents/
│ ├── data-platform/ # Data engineering (NEW v4.0.0)
│ │ ├── .claude-plugin/plugin.json
│ │ ├── .mcp.json
│ │ ├── mcp-servers/ # pandas, postgresql, dbt MCPs
│ │ ├── commands/ # 7 commands
│ │ ├── hooks/ # SessionStart PostgreSQL check
│ │ └── agents/ # 2 agents
│ ├── doc-guardian/ # Documentation drift detection
│ ├── code-sentinel/ # Security scanning & refactoring
│ ├── claude-config-maintainer/
@@ -125,7 +140,9 @@ leo-claude-mktplace/
│ └── project-hygiene/
├── scripts/
│ ├── setup.sh, post-update.sh
── validate-marketplace.sh # Marketplace compliance validation
── validate-marketplace.sh # Marketplace compliance validation
│ ├── verify-hooks.sh # Verify all hooks are command type
│ └── check-venv.sh # Check MCP server venvs exist
└── docs/
├── CANONICAL-PATHS.md # Single source of truth for paths
└── CONFIGURATION.md # Centralized configuration guide
@@ -146,6 +163,14 @@ leo-claude-mktplace/
- **MCP server venv path**: `${CLAUDE_PLUGIN_ROOT}/mcp-servers/{name}/.venv/bin/python`
- **CLI tools forbidden** - Use MCP tools exclusively (never `tea`, `gh`, etc.)
#### ⚠️ plugin.json Format Rules (CRITICAL)
- **Hooks in separate file** - Use `hooks/hooks.json` (auto-discovered), NOT inline in plugin.json
- **NEVER reference hooks** - Don't add `"hooks": "..."` field to plugin.json at all
- **Agents auto-discover** - NEVER add `"agents": ["./agents/"]` - .md files found automatically
- **Always validate** - Run `./scripts/validate-marketplace.sh` before committing
- **Working examples:** projman, pr-review, claude-config-maintainer all use `hooks/hooks.json`
- See lesson: `lessons/patterns/plugin-manifest-validation---hooks-and-agents-format-requirements`
### Hooks (Valid Events Only)
`PreToolUse`, `PostToolUse`, `UserPromptSubmit`, `SessionStart`, `SessionEnd`, `Notification`, `Stop`, `SubagentStop`, `PreCompact`
@@ -172,12 +197,12 @@ leo-claude-mktplace/
| Category | Tools |
|----------|-------|
| Issues | `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment` |
| Labels | `get_labels`, `suggest_labels`, `create_label` |
| Milestones | `list_milestones`, `get_milestone`, `create_milestone`, `update_milestone` |
| Dependencies | `list_issue_dependencies`, `create_issue_dependency`, `get_execution_order` |
| Wiki | `list_wiki_pages`, `get_wiki_page`, `create_wiki_page`, `create_lesson`, `search_lessons` |
| **Pull Requests** | `list_pull_requests`, `get_pull_request`, `get_pr_diff`, `get_pr_comments`, `create_pr_review`, `add_pr_comment` *(NEW v3.0.0)* |
| Issues | `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment`, `aggregate_issues` |
| Labels | `get_labels`, `suggest_labels`, `create_label`, `create_label_smart` |
| Milestones | `list_milestones`, `get_milestone`, `create_milestone`, `update_milestone`, `delete_milestone` |
| Dependencies | `list_issue_dependencies`, `create_issue_dependency`, `remove_issue_dependency`, `get_execution_order` |
| Wiki | `list_wiki_pages`, `get_wiki_page`, `create_wiki_page`, `update_wiki_page`, `create_lesson`, `search_lessons` |
| **Pull Requests** | `list_pull_requests`, `get_pull_request`, `get_pr_diff`, `get_pr_comments`, `create_pr_review`, `add_pr_comment` |
| Validation | `validate_repo_org`, `get_branch_protection` |
### Hybrid Configuration
@@ -286,13 +311,56 @@ See `docs/DEBUGGING-CHECKLIST.md` for systematic troubleshooting.
- `/debug-report` - Run full diagnostics, create issue if needed
- `/debug-review` - Investigate and propose fixes
## Versioning Rules
## Versioning Workflow
- Version displayed ONLY in main `README.md` title: `# Leo Claude Marketplace - vX.Y.Z`
- `CHANGELOG.md` is authoritative for version history
- Follow [SemVer](https://semver.org/): MAJOR.MINOR.PATCH
- On release: Update README title → CHANGELOG → marketplace.json → plugin.json files
This project follows [SemVer](https://semver.org/) and [Keep a Changelog](https://keepachangelog.com).
### Version Locations (must stay in sync)
| Location | Format | Example |
|----------|--------|---------|
| Git tags | `vX.Y.Z` | `v3.2.0` |
| README.md title | `# Leo Claude Marketplace - vX.Y.Z` | `v3.2.0` |
| marketplace.json | `"version": "X.Y.Z"` | `3.2.0` |
| CHANGELOG.md | `## [X.Y.Z] - YYYY-MM-DD` | `[3.2.0] - 2026-01-24` |
### During Development
**All changes go under `[Unreleased]` in CHANGELOG.md.** Never create a versioned section until release time.
```markdown
## [Unreleased]
### Added
- New feature description
### Fixed
- Bug fix description
```
### Creating a Release
Use the release script to ensure consistency:
```bash
./scripts/release.sh 3.2.0
```
The script will:
1. Validate `[Unreleased]` section has content
2. Replace `[Unreleased]` with `[3.2.0] - YYYY-MM-DD`
3. Update README.md title
4. Update marketplace.json version
5. Commit and create git tag
### SemVer Guidelines
| Change Type | Version Bump | Example |
|-------------|--------------|---------|
| Bug fixes only | PATCH (x.y.**Z**) | 3.1.1 → 3.1.2 |
| New features (backwards compatible) | MINOR (x.**Y**.0) | 3.1.2 → 3.2.0 |
| Breaking changes | MAJOR (**X**.0.0) | 3.2.0 → 4.0.0 |
---
**Last Updated:** 2026-01-23
**Last Updated:** 2026-01-24

View File

@@ -1,4 +1,4 @@
# Leo Claude Marketplace - v3.1.1
# Leo Claude Marketplace - v4.1.0
A collection of Claude Code plugins for project management, infrastructure automation, and development workflows.
@@ -96,6 +96,21 @@ Full CRUD operations for network infrastructure management directly from Claude
**Commands:** `/initial-setup`, `/cmdb-search`, `/cmdb-device`, `/cmdb-ip`, `/cmdb-site`
### Data Engineering
#### [data-platform](./plugins/data-platform/README.md) *NEW*
**pandas, PostgreSQL/PostGIS, and dbt Integration**
Comprehensive data engineering toolkit with persistent DataFrame storage.
- 14 pandas tools with Arrow IPC data_ref system
- 10 PostgreSQL/PostGIS tools with connection pooling
- 8 dbt tools with automatic pre-validation
- 100k row limit with chunking support
- Auto-detection of dbt projects
**Commands:** `/ingest`, `/profile`, `/schema`, `/explain`, `/lineage`, `/run`
## MCP Servers
MCP servers are **shared at repository root** with **symlinks** from plugins that use them.
@@ -106,11 +121,11 @@ Full Gitea API integration for project management.
| Category | Tools |
|----------|-------|
| Issues | `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment` |
| Labels | `get_labels`, `suggest_labels`, `create_label` |
| Wiki | `list_wiki_pages`, `get_wiki_page`, `create_wiki_page`, `create_lesson`, `search_lessons` |
| Milestones | `list_milestones`, `get_milestone`, `create_milestone`, `update_milestone` |
| Dependencies | `list_issue_dependencies`, `create_issue_dependency`, `get_execution_order` |
| Issues | `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment`, `aggregate_issues` |
| Labels | `get_labels`, `suggest_labels`, `create_label`, `create_label_smart` |
| Wiki | `list_wiki_pages`, `get_wiki_page`, `create_wiki_page`, `update_wiki_page`, `create_lesson`, `search_lessons` |
| Milestones | `list_milestones`, `get_milestone`, `create_milestone`, `update_milestone`, `delete_milestone` |
| Dependencies | `list_issue_dependencies`, `create_issue_dependency`, `remove_issue_dependency`, `get_execution_order` |
| **Pull Requests** | `list_pull_requests`, `get_pull_request`, `get_pr_diff`, `get_pr_comments`, `create_pr_review`, `add_pr_comment` *(NEW in v3.0.0)* |
| Validation | `validate_repo_org`, `get_branch_protection` |
@@ -126,6 +141,17 @@ Comprehensive NetBox REST API integration for infrastructure management.
| Virtualization | Clusters, VMs, Interfaces |
| Extras | Tags, Custom Fields, Audit Log |
### Data Platform MCP Server (shared) *NEW*
pandas, PostgreSQL/PostGIS, and dbt integration for data engineering.
| Category | Tools |
|----------|-------|
| pandas | `read_csv`, `read_parquet`, `read_json`, `to_csv`, `to_parquet`, `describe`, `head`, `tail`, `filter`, `select`, `groupby`, `join`, `list_data`, `drop_data` |
| PostgreSQL | `pg_connect`, `pg_query`, `pg_execute`, `pg_tables`, `pg_columns`, `pg_schemas` |
| PostGIS | `st_tables`, `st_geometry_type`, `st_srid`, `st_extent` |
| dbt | `dbt_parse`, `dbt_run`, `dbt_test`, `dbt_build`, `dbt_compile`, `dbt_ls`, `dbt_docs_generate`, `dbt_lineage` |
## Installation
### Prerequisites
@@ -222,6 +248,7 @@ After installing plugins, the `/plugin` command may show `(no content)` - this i
| code-sentinel | `/code-sentinel:security-scan` |
| claude-config-maintainer | `/claude-config-maintainer:config-analyze` |
| cmdb-assistant | `/cmdb-assistant:cmdb-search` |
| data-platform | `/data-platform:ingest` |
## Repository Structure
@@ -231,12 +258,14 @@ leo-claude-mktplace/
│ └── marketplace.json
├── mcp-servers/ # SHARED MCP servers (v3.0.0+)
│ ├── gitea/ # Gitea MCP (issues, PRs, wiki)
── netbox/ # NetBox MCP (CMDB)
── netbox/ # NetBox MCP (CMDB)
│ └── data-platform/ # Data engineering (pandas, PostgreSQL, dbt)
├── plugins/ # All plugins
│ ├── projman/ # Sprint management
│ ├── git-flow/ # Git workflow automation (NEW)
│ ├── pr-review/ # PR review (NEW)
│ ├── clarity-assist/ # Prompt optimization (NEW)
│ ├── git-flow/ # Git workflow automation
│ ├── pr-review/ # PR review
│ ├── clarity-assist/ # Prompt optimization
│ ├── data-platform/ # Data engineering (NEW)
│ ├── claude-config-maintainer/ # CLAUDE.md optimization
│ ├── cmdb-assistant/ # NetBox CMDB integration
│ ├── doc-guardian/ # Documentation drift detection
@@ -245,7 +274,8 @@ leo-claude-mktplace/
├── docs/ # Documentation
│ ├── CANONICAL-PATHS.md # Path reference
│ └── CONFIGURATION.md # Setup guide
── scripts/ # Setup scripts
── scripts/ # Setup scripts
└── CHANGELOG.md # Version history
```
## Documentation

View File

@@ -22,6 +22,7 @@ Quick reference for all commands in the Leo Claude Marketplace.
| **projman** | `/test-gen` | | X | Generate comprehensive tests for specified code |
| **projman** | `/debug-report` | | X | Run diagnostics and create structured issue in marketplace |
| **projman** | `/debug-review` | | X | Investigate diagnostic issues and propose fixes with approval gates |
| **projman** | `/suggest-version` | | X | Analyze CHANGELOG and recommend semantic version bump |
| **git-flow** | `/commit` | | X | Create commit with auto-generated conventional message |
| **git-flow** | `/commit-push` | | X | Commit and push to remote in one operation |
| **git-flow** | `/commit-merge` | | X | Commit current changes, then merge into target branch |
@@ -55,6 +56,14 @@ Quick reference for all commands in the Leo Claude Marketplace.
| **cmdb-assistant** | `/cmdb-ip` | | X | Manage IP addresses and prefixes |
| **cmdb-assistant** | `/cmdb-site` | | X | Manage sites, locations, racks, and regions |
| **project-hygiene** | *PostToolUse hook* | X | | Removes temp files, warns about unexpected root files |
| **data-platform** | `/ingest` | | X | Load data from CSV, Parquet, JSON into DataFrame |
| **data-platform** | `/profile` | | X | Generate data profiling report with statistics |
| **data-platform** | `/schema` | | X | Explore database schemas, tables, columns |
| **data-platform** | `/explain` | | X | Explain query execution plan |
| **data-platform** | `/lineage` | | X | Show dbt model lineage and dependencies |
| **data-platform** | `/run` | | X | Run dbt models with validation |
| **data-platform** | `/initial-setup` | | X | Setup wizard for data-platform MCP servers |
| **data-platform** | *SessionStart hook* | X | | Checks PostgreSQL connection (non-blocking warning) |
---
@@ -62,12 +71,13 @@ Quick reference for all commands in the Leo Claude Marketplace.
| Category | Plugins | Primary Use |
|----------|---------|-------------|
| **Setup** | projman, pr-review, cmdb-assistant | `/initial-setup`, `/project-init` |
| **Setup** | projman, pr-review, cmdb-assistant, data-platform | `/initial-setup`, `/project-init` |
| **Task Planning** | projman, clarity-assist | Sprint management, requirement clarification |
| **Code Quality** | code-sentinel, pr-review | Security scanning, PR reviews |
| **Documentation** | doc-guardian, claude-config-maintainer | Doc sync, CLAUDE.md maintenance |
| **Git Operations** | git-flow | Commits, branches, workflow automation |
| **Infrastructure** | cmdb-assistant | NetBox CMDB management |
| **Data Engineering** | data-platform | pandas, PostgreSQL, dbt operations |
| **Maintenance** | project-hygiene | Automatic cleanup |
---
@@ -76,11 +86,12 @@ Quick reference for all commands in the Leo Claude Marketplace.
| Plugin | Hook Event | Behavior |
|--------|------------|----------|
| **projman** | SessionStart | Checks git remote vs .env; warns if mismatch detected |
| **projman** | SessionStart | Checks git remote vs .env; warns if mismatch detected; suggests sprint planning if issues exist |
| **pr-review** | SessionStart | Checks git remote vs .env; warns if mismatch detected |
| **doc-guardian** | PostToolUse (Write/Edit) | Silently tracks documentation drift |
| **doc-guardian** | PostToolUse (Write/Edit) | Tracks documentation drift; auto-updates dependent docs |
| **code-sentinel** | PreToolUse (Write/Edit) | Scans for security issues; blocks critical vulnerabilities |
| **project-hygiene** | PostToolUse (Write/Edit) | Cleans temp files, warns about misplaced files |
| **data-platform** | SessionStart | Checks PostgreSQL connection; non-blocking warning if unavailable |
---
@@ -162,6 +173,19 @@ Managing infrastructure with CMDB:
4. /cmdb-site view Y # Check site info
```
### Example 6b: Data Engineering Workflow
Working with data pipelines:
```
1. /ingest file.csv # Load data into DataFrame
2. /profile # Generate data profiling report
3. /schema # Explore database schemas
4. /lineage model_name # View dbt model dependencies
5. /run model_name # Execute dbt models
6. /explain "SELECT ..." # Analyze query execution plan
```
### Example 7: First-Time Setup (New Machine)
Setting up the marketplace for the first time:
@@ -209,9 +233,10 @@ Some plugins require MCP server connectivity:
| projman | Gitea | Issues, PRs, wiki, labels, milestones |
| pr-review | Gitea | PR operations and reviews |
| cmdb-assistant | NetBox | Infrastructure CMDB |
| data-platform | pandas, PostgreSQL, dbt | DataFrames, database queries, dbt builds |
Ensure credentials are configured in `~/.config/claude/gitea.env` or `~/.config/claude/netbox.env`.
Ensure credentials are configured in `~/.config/claude/gitea.env`, `~/.config/claude/netbox.env`, or `~/.config/claude/postgres.env`.
---
*Last Updated: 2026-01-22*
*Last Updated: 2026-01-25*

View File

@@ -197,6 +197,51 @@ echo -e "\n=== Config Files ==="
---
## Cache Clearing: When It's Safe vs Destructive
**⚠️ CRITICAL: Never clear plugin cache mid-session.**
### Why Cache Clearing Breaks MCP Tools
When Claude Code starts a session:
1. MCP tools are loaded from the cache directory
2. Tool definitions include **absolute paths** to the venv (e.g., `~/.claude/plugins/cache/.../venv/`)
3. These paths are cached in the session memory
4. Deleting the cache removes the venv, but the session still references the old paths
5. Any MCP tool making HTTP requests fails with TLS certificate errors
### When Cache Clearing is SAFE
| Scenario | Safe? | Action |
|----------|-------|--------|
| Before starting Claude Code | ✅ Yes | Clear cache, then start session |
| Between sessions | ✅ Yes | Clear cache after `/exit`, before next session |
| During a session | ❌ NO | Never - will break MCP tools |
| After plugin source edits | ❌ NO | Restart session instead |
### Recovery: MCP Tools Broken Mid-Session
If you accidentally cleared cache during a session and MCP tools fail:
```
Error: Could not find a suitable TLS CA certificate bundle, invalid path:
/home/.../.claude/plugins/cache/.../certifi/cacert.pem
```
**Fix:**
1. Exit the current session (`/exit` or Ctrl+C)
2. Start a new Claude Code session
3. MCP tools will reload from the reinstalled cache
### Correct Workflow for Plugin Development
1. Make changes to plugin source files
2. Run `./scripts/verify-hooks.sh` (verifies hook types)
3. Tell user: "Please restart Claude Code for changes to take effect"
4. **Do NOT clear cache** - session restart handles reloading
---
## Automated Diagnostics
Use these commands for automated checking:

View File

@@ -0,0 +1,131 @@
# Data Platform MCP Server
MCP Server providing pandas, PostgreSQL/PostGIS, and dbt tools for Claude Code.
## Features
- **pandas Tools**: DataFrame operations with Arrow IPC data_ref persistence
- **PostgreSQL Tools**: Database queries with asyncpg connection pooling
- **PostGIS Tools**: Spatial data operations
- **dbt Tools**: Build tool wrapper with pre-execution validation
## Installation
```bash
cd mcp-servers/data-platform
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
```
## Configuration
### System-Level (PostgreSQL credentials)
Create `~/.config/claude/postgres.env`:
```env
POSTGRES_URL=postgresql://user:password@host:5432/database
```
### Project-Level (dbt paths)
Create `.env` in your project root:
```env
DBT_PROJECT_DIR=/path/to/dbt/project
DBT_PROFILES_DIR=/path/to/.dbt
DATA_PLATFORM_MAX_ROWS=100000
```
## Tools
### pandas Tools (14 tools)
| Tool | Description |
|------|-------------|
| `read_csv` | Load CSV file into DataFrame |
| `read_parquet` | Load Parquet file into DataFrame |
| `read_json` | Load JSON/JSONL file into DataFrame |
| `to_csv` | Export DataFrame to CSV file |
| `to_parquet` | Export DataFrame to Parquet file |
| `describe` | Get statistical summary of DataFrame |
| `head` | Get first N rows of DataFrame |
| `tail` | Get last N rows of DataFrame |
| `filter` | Filter DataFrame rows by condition |
| `select` | Select specific columns from DataFrame |
| `groupby` | Group DataFrame and aggregate |
| `join` | Join two DataFrames |
| `list_data` | List all stored DataFrames |
| `drop_data` | Remove a DataFrame from storage |
### PostgreSQL Tools (6 tools)
| Tool | Description |
|------|-------------|
| `pg_connect` | Test connection and return status |
| `pg_query` | Execute SELECT, return as data_ref |
| `pg_execute` | Execute INSERT/UPDATE/DELETE |
| `pg_tables` | List all tables in schema |
| `pg_columns` | Get column info for table |
| `pg_schemas` | List all schemas |
### PostGIS Tools (4 tools)
| Tool | Description |
|------|-------------|
| `st_tables` | List PostGIS-enabled tables |
| `st_geometry_type` | Get geometry type of column |
| `st_srid` | Get SRID of geometry column |
| `st_extent` | Get bounding box of geometries |
### dbt Tools (8 tools)
| Tool | Description |
|------|-------------|
| `dbt_parse` | Validate project (pre-execution) |
| `dbt_run` | Run models with selection |
| `dbt_test` | Run tests |
| `dbt_build` | Run + test |
| `dbt_compile` | Compile SQL without executing |
| `dbt_ls` | List resources |
| `dbt_docs_generate` | Generate documentation |
| `dbt_lineage` | Get model dependencies |
## data_ref System
All DataFrame operations use a `data_ref` system to persist data across tool calls:
1. **Load data**: Returns a `data_ref` string (e.g., `"df_a1b2c3d4"`)
2. **Use data_ref**: Pass to other tools (filter, join, export)
3. **List data**: Use `list_data` to see all stored DataFrames
4. **Clean up**: Use `drop_data` when done
### Example Flow
```
read_csv("data.csv") → {"data_ref": "sales_data", "rows": 1000}
filter("sales_data", "amount > 100") → {"data_ref": "sales_data_filtered"}
describe("sales_data_filtered") → {statistics}
to_parquet("sales_data_filtered", "output.parquet") → {success}
```
## Memory Management
- Default row limit: 100,000 rows per DataFrame
- Configure via `DATA_PLATFORM_MAX_ROWS` environment variable
- Use chunked processing for large files (`chunk_size` parameter)
- Monitor with `list_data` tool (shows memory usage)
## Running
```bash
python -m mcp_server.server
```
## Development
```bash
pip install -e ".[dev]"
pytest
```

View File

@@ -0,0 +1,7 @@
"""
Data Platform MCP Server.
Provides pandas, PostgreSQL/PostGIS, and dbt tools to Claude Code via MCP.
"""
__version__ = "1.0.0"

View File

@@ -0,0 +1,195 @@
"""
Configuration loader for Data Platform MCP Server.
Implements hybrid configuration system:
- System-level: ~/.config/claude/postgres.env (credentials)
- Project-level: .env (dbt project paths, overrides)
- Auto-detection: dbt_project.yml discovery
"""
from pathlib import Path
from dotenv import load_dotenv
import os
import logging
from typing import Dict, Optional
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class DataPlatformConfig:
"""Hybrid configuration loader for data platform tools"""
def __init__(self):
self.postgres_url: Optional[str] = None
self.dbt_project_dir: Optional[str] = None
self.dbt_profiles_dir: Optional[str] = None
self.max_rows: int = 100_000
def load(self) -> Dict[str, Optional[str]]:
"""
Load configuration from system and project levels.
Returns:
Dict containing postgres_url, dbt_project_dir, dbt_profiles_dir, max_rows
Note:
PostgreSQL credentials are optional - server can run in pandas-only mode.
"""
# Load system config (PostgreSQL credentials)
system_config = Path.home() / '.config' / 'claude' / 'postgres.env'
if system_config.exists():
load_dotenv(system_config)
logger.info(f"Loaded system configuration from {system_config}")
else:
logger.info(
f"System config not found: {system_config} - "
"PostgreSQL tools will be unavailable"
)
# Find project directory
project_dir = self._find_project_directory()
# Load project config (overrides system)
if project_dir:
project_config = project_dir / '.env'
if project_config.exists():
load_dotenv(project_config, override=True)
logger.info(f"Loaded project configuration from {project_config}")
# Extract values
self.postgres_url = os.getenv('POSTGRES_URL')
self.dbt_project_dir = os.getenv('DBT_PROJECT_DIR')
self.dbt_profiles_dir = os.getenv('DBT_PROFILES_DIR')
self.max_rows = int(os.getenv('DATA_PLATFORM_MAX_ROWS', '100000'))
# Auto-detect dbt project if not specified
if not self.dbt_project_dir and project_dir:
self.dbt_project_dir = self._find_dbt_project(project_dir)
if self.dbt_project_dir:
logger.info(f"Auto-detected dbt project: {self.dbt_project_dir}")
# Default dbt profiles dir to ~/.dbt
if not self.dbt_profiles_dir:
default_profiles = Path.home() / '.dbt'
if default_profiles.exists():
self.dbt_profiles_dir = str(default_profiles)
return {
'postgres_url': self.postgres_url,
'dbt_project_dir': self.dbt_project_dir,
'dbt_profiles_dir': self.dbt_profiles_dir,
'max_rows': self.max_rows,
'postgres_available': self.postgres_url is not None,
'dbt_available': self.dbt_project_dir is not None
}
def _find_project_directory(self) -> Optional[Path]:
"""
Find the user's project directory.
Returns:
Path to project directory, or None if not found
"""
# Strategy 1: Check CLAUDE_PROJECT_DIR environment variable
project_dir = os.getenv('CLAUDE_PROJECT_DIR')
if project_dir:
path = Path(project_dir)
if path.exists():
logger.info(f"Found project directory from CLAUDE_PROJECT_DIR: {path}")
return path
# Strategy 2: Check PWD
pwd = os.getenv('PWD')
if pwd:
path = Path(pwd)
if path.exists() and (
(path / '.git').exists() or
(path / '.env').exists() or
(path / 'dbt_project.yml').exists()
):
logger.info(f"Found project directory from PWD: {path}")
return path
# Strategy 3: Check current working directory
cwd = Path.cwd()
if (cwd / '.git').exists() or (cwd / '.env').exists() or (cwd / 'dbt_project.yml').exists():
logger.info(f"Found project directory from cwd: {cwd}")
return cwd
logger.debug("Could not determine project directory")
return None
def _find_dbt_project(self, start_dir: Path) -> Optional[str]:
"""
Find dbt_project.yml in the project or its subdirectories.
Args:
start_dir: Directory to start searching from
Returns:
Path to dbt project directory, or None if not found
"""
# Check root
if (start_dir / 'dbt_project.yml').exists():
return str(start_dir)
# Check common subdirectories
for subdir in ['dbt', 'transform', 'analytics', 'models']:
candidate = start_dir / subdir
if (candidate / 'dbt_project.yml').exists():
return str(candidate)
# Search one level deep
for item in start_dir.iterdir():
if item.is_dir() and not item.name.startswith('.'):
if (item / 'dbt_project.yml').exists():
return str(item)
return None
def load_config() -> Dict[str, Optional[str]]:
"""
Convenience function to load configuration.
Returns:
Configuration dictionary
"""
config = DataPlatformConfig()
return config.load()
def check_postgres_connection() -> Dict[str, any]:
"""
Check PostgreSQL connection status for SessionStart hook.
Returns:
Dict with connection status and message
"""
import asyncio
config = load_config()
if not config.get('postgres_url'):
return {
'connected': False,
'message': 'PostgreSQL not configured (POSTGRES_URL not set)'
}
async def test_connection():
try:
import asyncpg
conn = await asyncpg.connect(config['postgres_url'], timeout=5)
version = await conn.fetchval('SELECT version()')
await conn.close()
return {
'connected': True,
'message': f'Connected to PostgreSQL',
'version': version.split(',')[0] if version else 'Unknown'
}
except Exception as e:
return {
'connected': False,
'message': f'PostgreSQL connection failed: {str(e)}'
}
return asyncio.run(test_connection())

View File

@@ -0,0 +1,219 @@
"""
Arrow IPC DataFrame Registry.
Provides persistent storage for DataFrames across tool calls using Apache Arrow
for efficient memory management and serialization.
"""
import pyarrow as pa
import pandas as pd
import uuid
import logging
from typing import Dict, Optional, List, Union
from dataclasses import dataclass
from datetime import datetime
logger = logging.getLogger(__name__)
@dataclass
class DataFrameInfo:
"""Metadata about a stored DataFrame"""
ref: str
rows: int
columns: int
column_names: List[str]
dtypes: Dict[str, str]
memory_bytes: int
created_at: datetime
source: Optional[str] = None
class DataStore:
"""
Singleton registry for Arrow Tables (DataFrames).
Uses Arrow IPC format for efficient memory usage and supports
data_ref based retrieval across multiple tool calls.
"""
_instance = None
_dataframes: Dict[str, pa.Table] = {}
_metadata: Dict[str, DataFrameInfo] = {}
_max_rows: int = 100_000
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._dataframes = {}
cls._metadata = {}
return cls._instance
@classmethod
def get_instance(cls) -> 'DataStore':
"""Get the singleton instance"""
if cls._instance is None:
cls._instance = cls()
return cls._instance
@classmethod
def set_max_rows(cls, max_rows: int):
"""Set the maximum rows limit"""
cls._max_rows = max_rows
def store(
self,
data: Union[pa.Table, pd.DataFrame],
name: Optional[str] = None,
source: Optional[str] = None
) -> str:
"""
Store a DataFrame and return its reference.
Args:
data: Arrow Table or pandas DataFrame
name: Optional name for the reference (auto-generated if not provided)
source: Optional source description (e.g., file path, query)
Returns:
data_ref string to retrieve the DataFrame later
"""
# Convert pandas to Arrow if needed
if isinstance(data, pd.DataFrame):
table = pa.Table.from_pandas(data)
else:
table = data
# Generate reference
data_ref = name or f"df_{uuid.uuid4().hex[:8]}"
# Ensure unique reference
if data_ref in self._dataframes and name is None:
data_ref = f"{data_ref}_{uuid.uuid4().hex[:4]}"
# Store table
self._dataframes[data_ref] = table
# Store metadata
schema = table.schema
self._metadata[data_ref] = DataFrameInfo(
ref=data_ref,
rows=table.num_rows,
columns=table.num_columns,
column_names=[f.name for f in schema],
dtypes={f.name: str(f.type) for f in schema},
memory_bytes=table.nbytes,
created_at=datetime.now(),
source=source
)
logger.info(f"Stored DataFrame '{data_ref}': {table.num_rows} rows, {table.num_columns} cols")
return data_ref
def get(self, data_ref: str) -> Optional[pa.Table]:
"""
Retrieve an Arrow Table by reference.
Args:
data_ref: Reference string from store()
Returns:
Arrow Table or None if not found
"""
return self._dataframes.get(data_ref)
def get_pandas(self, data_ref: str) -> Optional[pd.DataFrame]:
"""
Retrieve a DataFrame as pandas.
Args:
data_ref: Reference string from store()
Returns:
pandas DataFrame or None if not found
"""
table = self.get(data_ref)
if table is not None:
return table.to_pandas()
return None
def get_info(self, data_ref: str) -> Optional[DataFrameInfo]:
"""
Get metadata about a stored DataFrame.
Args:
data_ref: Reference string
Returns:
DataFrameInfo or None if not found
"""
return self._metadata.get(data_ref)
def list_refs(self) -> List[Dict]:
"""
List all stored DataFrame references with metadata.
Returns:
List of dicts with ref, rows, columns, memory info
"""
result = []
for ref, info in self._metadata.items():
result.append({
'ref': ref,
'rows': info.rows,
'columns': info.columns,
'column_names': info.column_names,
'memory_mb': round(info.memory_bytes / (1024 * 1024), 2),
'source': info.source,
'created_at': info.created_at.isoformat()
})
return result
def drop(self, data_ref: str) -> bool:
"""
Remove a DataFrame from the store.
Args:
data_ref: Reference string
Returns:
True if removed, False if not found
"""
if data_ref in self._dataframes:
del self._dataframes[data_ref]
del self._metadata[data_ref]
logger.info(f"Dropped DataFrame '{data_ref}'")
return True
return False
def clear(self):
"""Remove all stored DataFrames"""
count = len(self._dataframes)
self._dataframes.clear()
self._metadata.clear()
logger.info(f"Cleared {count} DataFrames from store")
def total_memory_bytes(self) -> int:
"""Get total memory used by all stored DataFrames"""
return sum(info.memory_bytes for info in self._metadata.values())
def total_memory_mb(self) -> float:
"""Get total memory in MB"""
return round(self.total_memory_bytes() / (1024 * 1024), 2)
def check_row_limit(self, row_count: int) -> Dict:
"""
Check if row count exceeds limit.
Args:
row_count: Number of rows
Returns:
Dict with 'exceeded' bool and 'message' if exceeded
"""
if row_count > self._max_rows:
return {
'exceeded': True,
'message': f"Row count ({row_count:,}) exceeds limit ({self._max_rows:,})",
'suggestion': f"Use chunked processing or filter data first",
'limit': self._max_rows
}
return {'exceeded': False}

View File

@@ -0,0 +1,387 @@
"""
dbt MCP Tools.
Provides dbt CLI wrapper with pre-execution validation.
"""
import subprocess
import json
import logging
import os
from pathlib import Path
from typing import Dict, List, Optional, Any
from .config import load_config
logger = logging.getLogger(__name__)
class DbtTools:
"""dbt CLI wrapper tools with pre-validation"""
def __init__(self):
self.config = load_config()
self.project_dir = self.config.get('dbt_project_dir')
self.profiles_dir = self.config.get('dbt_profiles_dir')
def _get_dbt_command(self, cmd: List[str]) -> List[str]:
"""Build dbt command with project and profiles directories"""
base = ['dbt']
if self.project_dir:
base.extend(['--project-dir', self.project_dir])
if self.profiles_dir:
base.extend(['--profiles-dir', self.profiles_dir])
base.extend(cmd)
return base
def _run_dbt(
self,
cmd: List[str],
timeout: int = 300,
capture_json: bool = False
) -> Dict:
"""
Run dbt command and return result.
Args:
cmd: dbt subcommand and arguments
timeout: Command timeout in seconds
capture_json: If True, parse JSON output
Returns:
Dict with command result
"""
if not self.project_dir:
return {
'error': 'dbt project not found',
'suggestion': 'Set DBT_PROJECT_DIR in project .env or ensure dbt_project.yml exists'
}
full_cmd = self._get_dbt_command(cmd)
logger.info(f"Running: {' '.join(full_cmd)}")
try:
env = os.environ.copy()
# Disable dbt analytics/tracking
env['DBT_SEND_ANONYMOUS_USAGE_STATS'] = 'false'
result = subprocess.run(
full_cmd,
capture_output=True,
text=True,
timeout=timeout,
cwd=self.project_dir,
env=env
)
output = {
'success': result.returncode == 0,
'command': ' '.join(cmd),
'stdout': result.stdout,
'stderr': result.stderr if result.returncode != 0 else None
}
if capture_json and result.returncode == 0:
try:
output['data'] = json.loads(result.stdout)
except json.JSONDecodeError:
pass
return output
except subprocess.TimeoutExpired:
return {
'error': f'Command timed out after {timeout}s',
'command': ' '.join(cmd)
}
except FileNotFoundError:
return {
'error': 'dbt not found in PATH',
'suggestion': 'Install dbt: pip install dbt-core dbt-postgres'
}
except Exception as e:
logger.error(f"dbt command failed: {e}")
return {'error': str(e)}
async def dbt_parse(self) -> Dict:
"""
Validate dbt project without executing (pre-flight check).
Returns:
Dict with validation result and any errors
"""
result = self._run_dbt(['parse'])
# Check if _run_dbt returned an error (e.g., project not found, timeout, dbt not installed)
if 'error' in result:
return result
if not result.get('success'):
# Extract useful error info from stderr
stderr = result.get('stderr', '') or result.get('stdout', '')
errors = []
# Look for common dbt 1.9+ deprecation warnings
if 'deprecated' in stderr.lower():
errors.append({
'type': 'deprecation',
'message': 'Deprecated syntax found - check dbt 1.9+ migration guide'
})
# Look for compilation errors
if 'compilation error' in stderr.lower():
errors.append({
'type': 'compilation',
'message': 'SQL compilation error - check model syntax'
})
return {
'valid': False,
'errors': errors,
'details': stderr[:2000] if stderr else None,
'suggestion': 'Fix issues before running dbt models'
}
return {
'valid': True,
'message': 'dbt project validation passed'
}
async def dbt_run(
self,
select: Optional[str] = None,
exclude: Optional[str] = None,
full_refresh: bool = False
) -> Dict:
"""
Run dbt models with pre-validation.
Args:
select: Model selection (e.g., "model_name", "+model_name", "tag:daily")
exclude: Models to exclude
full_refresh: If True, rebuild incremental models
Returns:
Dict with run result
"""
# ALWAYS validate first
parse_result = await self.dbt_parse()
if not parse_result.get('valid'):
return {
'error': 'Pre-validation failed',
**parse_result
}
cmd = ['run']
if select:
cmd.extend(['--select', select])
if exclude:
cmd.extend(['--exclude', exclude])
if full_refresh:
cmd.append('--full-refresh')
return self._run_dbt(cmd)
async def dbt_test(
self,
select: Optional[str] = None,
exclude: Optional[str] = None
) -> Dict:
"""
Run dbt tests.
Args:
select: Test selection
exclude: Tests to exclude
Returns:
Dict with test results
"""
cmd = ['test']
if select:
cmd.extend(['--select', select])
if exclude:
cmd.extend(['--exclude', exclude])
return self._run_dbt(cmd)
async def dbt_build(
self,
select: Optional[str] = None,
exclude: Optional[str] = None,
full_refresh: bool = False
) -> Dict:
"""
Run dbt build (run + test) with pre-validation.
Args:
select: Model/test selection
exclude: Resources to exclude
full_refresh: If True, rebuild incremental models
Returns:
Dict with build result
"""
# ALWAYS validate first
parse_result = await self.dbt_parse()
if not parse_result.get('valid'):
return {
'error': 'Pre-validation failed',
**parse_result
}
cmd = ['build']
if select:
cmd.extend(['--select', select])
if exclude:
cmd.extend(['--exclude', exclude])
if full_refresh:
cmd.append('--full-refresh')
return self._run_dbt(cmd)
async def dbt_compile(
self,
select: Optional[str] = None
) -> Dict:
"""
Compile dbt models to SQL without executing.
Args:
select: Model selection
Returns:
Dict with compiled SQL info
"""
cmd = ['compile']
if select:
cmd.extend(['--select', select])
return self._run_dbt(cmd)
async def dbt_ls(
self,
select: Optional[str] = None,
resource_type: Optional[str] = None,
output: str = 'name'
) -> Dict:
"""
List dbt resources.
Args:
select: Resource selection
resource_type: Filter by type (model, test, seed, snapshot, source)
output: Output format ('name', 'path', 'json')
Returns:
Dict with list of resources
"""
cmd = ['ls', '--output', output]
if select:
cmd.extend(['--select', select])
if resource_type:
cmd.extend(['--resource-type', resource_type])
result = self._run_dbt(cmd)
if result.get('success') and result.get('stdout'):
lines = [l.strip() for l in result['stdout'].split('\n') if l.strip()]
result['resources'] = lines
result['count'] = len(lines)
return result
async def dbt_docs_generate(self) -> Dict:
"""
Generate dbt documentation.
Returns:
Dict with generation result
"""
result = self._run_dbt(['docs', 'generate'])
if result.get('success') and self.project_dir:
# Check for generated catalog
catalog_path = Path(self.project_dir) / 'target' / 'catalog.json'
manifest_path = Path(self.project_dir) / 'target' / 'manifest.json'
result['catalog_generated'] = catalog_path.exists()
result['manifest_generated'] = manifest_path.exists()
return result
async def dbt_lineage(self, model: str) -> Dict:
"""
Get model dependencies and lineage.
Args:
model: Model name to analyze
Returns:
Dict with upstream and downstream dependencies
"""
if not self.project_dir:
return {'error': 'dbt project not found'}
manifest_path = Path(self.project_dir) / 'target' / 'manifest.json'
# Generate manifest if not exists
if not manifest_path.exists():
compile_result = await self.dbt_compile(select=model)
if not compile_result.get('success'):
return {
'error': 'Failed to compile manifest',
'details': compile_result
}
if not manifest_path.exists():
return {
'error': 'Manifest not found',
'suggestion': 'Run dbt compile first'
}
try:
with open(manifest_path) as f:
manifest = json.load(f)
# Find the model node
model_key = None
for key in manifest.get('nodes', {}):
if key.endswith(f'.{model}') or manifest['nodes'][key].get('name') == model:
model_key = key
break
if not model_key:
return {
'error': f'Model not found: {model}',
'available_models': [
n.get('name') for n in manifest.get('nodes', {}).values()
if n.get('resource_type') == 'model'
][:20]
}
node = manifest['nodes'][model_key]
# Get upstream (depends_on)
upstream = node.get('depends_on', {}).get('nodes', [])
# Get downstream (find nodes that depend on this one)
downstream = []
for key, other_node in manifest.get('nodes', {}).items():
deps = other_node.get('depends_on', {}).get('nodes', [])
if model_key in deps:
downstream.append(key)
return {
'model': model,
'unique_id': model_key,
'materialization': node.get('config', {}).get('materialized'),
'schema': node.get('schema'),
'database': node.get('database'),
'upstream': upstream,
'downstream': downstream,
'description': node.get('description'),
'tags': node.get('tags', [])
}
except Exception as e:
logger.error(f"dbt_lineage failed: {e}")
return {'error': str(e)}

View File

@@ -0,0 +1,500 @@
"""
pandas MCP Tools.
Provides DataFrame operations with Arrow IPC data_ref persistence.
"""
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import json
import logging
from pathlib import Path
from typing import Dict, List, Optional, Any, Union
from .data_store import DataStore
from .config import load_config
logger = logging.getLogger(__name__)
class PandasTools:
"""pandas data manipulation tools with data_ref persistence"""
def __init__(self):
self.store = DataStore.get_instance()
config = load_config()
self.max_rows = config.get('max_rows', 100_000)
self.store.set_max_rows(self.max_rows)
def _check_and_store(
self,
df: pd.DataFrame,
name: Optional[str] = None,
source: Optional[str] = None
) -> Dict:
"""Check row limit and store DataFrame if within limits"""
check = self.store.check_row_limit(len(df))
if check['exceeded']:
return {
'error': 'row_limit_exceeded',
**check,
'preview': df.head(100).to_dict(orient='records')
}
data_ref = self.store.store(df, name=name, source=source)
return {
'data_ref': data_ref,
'rows': len(df),
'columns': list(df.columns),
'dtypes': {col: str(dtype) for col, dtype in df.dtypes.items()}
}
async def read_csv(
self,
file_path: str,
name: Optional[str] = None,
chunk_size: Optional[int] = None,
**kwargs
) -> Dict:
"""
Load CSV file into DataFrame.
Args:
file_path: Path to CSV file
name: Optional name for data_ref
chunk_size: If provided, process in chunks
**kwargs: Additional pandas read_csv arguments
Returns:
Dict with data_ref or error info
"""
path = Path(file_path)
if not path.exists():
return {'error': f'File not found: {file_path}'}
try:
if chunk_size:
# Chunked processing - return iterator info
chunks = []
for i, chunk in enumerate(pd.read_csv(path, chunksize=chunk_size, **kwargs)):
chunk_ref = self.store.store(chunk, name=f"{name or 'chunk'}_{i}", source=file_path)
chunks.append({'ref': chunk_ref, 'rows': len(chunk)})
return {
'chunked': True,
'chunks': chunks,
'total_chunks': len(chunks)
}
df = pd.read_csv(path, **kwargs)
return self._check_and_store(df, name=name, source=file_path)
except Exception as e:
logger.error(f"read_csv failed: {e}")
return {'error': str(e)}
async def read_parquet(
self,
file_path: str,
name: Optional[str] = None,
columns: Optional[List[str]] = None
) -> Dict:
"""
Load Parquet file into DataFrame.
Args:
file_path: Path to Parquet file
name: Optional name for data_ref
columns: Optional list of columns to load
Returns:
Dict with data_ref or error info
"""
path = Path(file_path)
if not path.exists():
return {'error': f'File not found: {file_path}'}
try:
table = pq.read_table(path, columns=columns)
df = table.to_pandas()
return self._check_and_store(df, name=name, source=file_path)
except Exception as e:
logger.error(f"read_parquet failed: {e}")
return {'error': str(e)}
async def read_json(
self,
file_path: str,
name: Optional[str] = None,
lines: bool = False,
**kwargs
) -> Dict:
"""
Load JSON/JSONL file into DataFrame.
Args:
file_path: Path to JSON file
name: Optional name for data_ref
lines: If True, read as JSON Lines format
**kwargs: Additional pandas read_json arguments
Returns:
Dict with data_ref or error info
"""
path = Path(file_path)
if not path.exists():
return {'error': f'File not found: {file_path}'}
try:
df = pd.read_json(path, lines=lines, **kwargs)
return self._check_and_store(df, name=name, source=file_path)
except Exception as e:
logger.error(f"read_json failed: {e}")
return {'error': str(e)}
async def to_csv(
self,
data_ref: str,
file_path: str,
index: bool = False,
**kwargs
) -> Dict:
"""
Export DataFrame to CSV file.
Args:
data_ref: Reference to stored DataFrame
file_path: Output file path
index: Whether to include index
**kwargs: Additional pandas to_csv arguments
Returns:
Dict with success status
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
df.to_csv(file_path, index=index, **kwargs)
return {
'success': True,
'file_path': file_path,
'rows': len(df),
'size_bytes': Path(file_path).stat().st_size
}
except Exception as e:
logger.error(f"to_csv failed: {e}")
return {'error': str(e)}
async def to_parquet(
self,
data_ref: str,
file_path: str,
compression: str = 'snappy'
) -> Dict:
"""
Export DataFrame to Parquet file.
Args:
data_ref: Reference to stored DataFrame
file_path: Output file path
compression: Compression codec
Returns:
Dict with success status
"""
table = self.store.get(data_ref)
if table is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
pq.write_table(table, file_path, compression=compression)
return {
'success': True,
'file_path': file_path,
'rows': table.num_rows,
'size_bytes': Path(file_path).stat().st_size
}
except Exception as e:
logger.error(f"to_parquet failed: {e}")
return {'error': str(e)}
async def describe(self, data_ref: str) -> Dict:
"""
Get statistical summary of DataFrame.
Args:
data_ref: Reference to stored DataFrame
Returns:
Dict with statistical summary
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
desc = df.describe(include='all')
info = self.store.get_info(data_ref)
return {
'data_ref': data_ref,
'shape': {'rows': len(df), 'columns': len(df.columns)},
'columns': list(df.columns),
'dtypes': {col: str(dtype) for col, dtype in df.dtypes.items()},
'memory_mb': info.memory_bytes / (1024 * 1024) if info else None,
'null_counts': df.isnull().sum().to_dict(),
'statistics': desc.to_dict()
}
except Exception as e:
logger.error(f"describe failed: {e}")
return {'error': str(e)}
async def head(self, data_ref: str, n: int = 10) -> Dict:
"""
Get first N rows of DataFrame.
Args:
data_ref: Reference to stored DataFrame
n: Number of rows
Returns:
Dict with rows as records
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
head_df = df.head(n)
return {
'data_ref': data_ref,
'total_rows': len(df),
'returned_rows': len(head_df),
'columns': list(df.columns),
'data': head_df.to_dict(orient='records')
}
except Exception as e:
logger.error(f"head failed: {e}")
return {'error': str(e)}
async def tail(self, data_ref: str, n: int = 10) -> Dict:
"""
Get last N rows of DataFrame.
Args:
data_ref: Reference to stored DataFrame
n: Number of rows
Returns:
Dict with rows as records
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
tail_df = df.tail(n)
return {
'data_ref': data_ref,
'total_rows': len(df),
'returned_rows': len(tail_df),
'columns': list(df.columns),
'data': tail_df.to_dict(orient='records')
}
except Exception as e:
logger.error(f"tail failed: {e}")
return {'error': str(e)}
async def filter(
self,
data_ref: str,
condition: str,
name: Optional[str] = None
) -> Dict:
"""
Filter DataFrame rows by condition.
Args:
data_ref: Reference to stored DataFrame
condition: pandas query string (e.g., "age > 30 and city == 'NYC'")
name: Optional name for result data_ref
Returns:
Dict with new data_ref for filtered result
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
filtered = df.query(condition)
result_name = name or f"{data_ref}_filtered"
return self._check_and_store(
filtered,
name=result_name,
source=f"filter({data_ref}, '{condition}')"
)
except Exception as e:
logger.error(f"filter failed: {e}")
return {'error': str(e)}
async def select(
self,
data_ref: str,
columns: List[str],
name: Optional[str] = None
) -> Dict:
"""
Select specific columns from DataFrame.
Args:
data_ref: Reference to stored DataFrame
columns: List of column names to select
name: Optional name for result data_ref
Returns:
Dict with new data_ref for selected columns
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
# Validate columns exist
missing = [c for c in columns if c not in df.columns]
if missing:
return {
'error': f'Columns not found: {missing}',
'available_columns': list(df.columns)
}
selected = df[columns]
result_name = name or f"{data_ref}_select"
return self._check_and_store(
selected,
name=result_name,
source=f"select({data_ref}, {columns})"
)
except Exception as e:
logger.error(f"select failed: {e}")
return {'error': str(e)}
async def groupby(
self,
data_ref: str,
by: Union[str, List[str]],
agg: Dict[str, Union[str, List[str]]],
name: Optional[str] = None
) -> Dict:
"""
Group DataFrame and aggregate.
Args:
data_ref: Reference to stored DataFrame
by: Column(s) to group by
agg: Aggregation dict (e.g., {"sales": "sum", "count": "mean"})
name: Optional name for result data_ref
Returns:
Dict with new data_ref for aggregated result
"""
df = self.store.get_pandas(data_ref)
if df is None:
return {'error': f'DataFrame not found: {data_ref}'}
try:
grouped = df.groupby(by).agg(agg).reset_index()
# Flatten column names if multi-level
if isinstance(grouped.columns, pd.MultiIndex):
grouped.columns = ['_'.join(col).strip('_') for col in grouped.columns]
result_name = name or f"{data_ref}_grouped"
return self._check_and_store(
grouped,
name=result_name,
source=f"groupby({data_ref}, by={by})"
)
except Exception as e:
logger.error(f"groupby failed: {e}")
return {'error': str(e)}
async def join(
self,
left_ref: str,
right_ref: str,
on: Optional[Union[str, List[str]]] = None,
left_on: Optional[Union[str, List[str]]] = None,
right_on: Optional[Union[str, List[str]]] = None,
how: str = 'inner',
name: Optional[str] = None
) -> Dict:
"""
Join two DataFrames.
Args:
left_ref: Reference to left DataFrame
right_ref: Reference to right DataFrame
on: Column(s) to join on (if same name in both)
left_on: Left join column(s)
right_on: Right join column(s)
how: Join type ('inner', 'left', 'right', 'outer')
name: Optional name for result data_ref
Returns:
Dict with new data_ref for joined result
"""
left_df = self.store.get_pandas(left_ref)
right_df = self.store.get_pandas(right_ref)
if left_df is None:
return {'error': f'DataFrame not found: {left_ref}'}
if right_df is None:
return {'error': f'DataFrame not found: {right_ref}'}
try:
joined = pd.merge(
left_df, right_df,
on=on, left_on=left_on, right_on=right_on,
how=how
)
result_name = name or f"{left_ref}_{right_ref}_joined"
return self._check_and_store(
joined,
name=result_name,
source=f"join({left_ref}, {right_ref}, how={how})"
)
except Exception as e:
logger.error(f"join failed: {e}")
return {'error': str(e)}
async def list_data(self) -> Dict:
"""
List all stored DataFrames.
Returns:
Dict with list of stored DataFrames and their info
"""
refs = self.store.list_refs()
return {
'count': len(refs),
'total_memory_mb': self.store.total_memory_mb(),
'max_rows_limit': self.max_rows,
'dataframes': refs
}
async def drop_data(self, data_ref: str) -> Dict:
"""
Remove a DataFrame from storage.
Args:
data_ref: Reference to drop
Returns:
Dict with success status
"""
if self.store.drop(data_ref):
return {'success': True, 'dropped': data_ref}
return {'error': f'DataFrame not found: {data_ref}'}

View File

@@ -0,0 +1,538 @@
"""
PostgreSQL/PostGIS MCP Tools.
Provides database operations with connection pooling and PostGIS support.
"""
import asyncio
import logging
from typing import Dict, List, Optional, Any
import json
from .data_store import DataStore
from .config import load_config
logger = logging.getLogger(__name__)
# Optional imports - gracefully handle missing dependencies
try:
import asyncpg
ASYNCPG_AVAILABLE = True
except ImportError:
ASYNCPG_AVAILABLE = False
logger.warning("asyncpg not available - PostgreSQL tools will be disabled")
try:
import pandas as pd
PANDAS_AVAILABLE = True
except ImportError:
PANDAS_AVAILABLE = False
class PostgresTools:
"""PostgreSQL/PostGIS database tools"""
def __init__(self):
self.store = DataStore.get_instance()
self.config = load_config()
self.pool: Optional[Any] = None
self.max_rows = self.config.get('max_rows', 100_000)
async def _get_pool(self):
"""Get or create connection pool"""
if not ASYNCPG_AVAILABLE:
raise RuntimeError("asyncpg not installed - run: pip install asyncpg")
if self.pool is None:
postgres_url = self.config.get('postgres_url')
if not postgres_url:
raise RuntimeError(
"PostgreSQL not configured. Set POSTGRES_URL in "
"~/.config/claude/postgres.env"
)
self.pool = await asyncpg.create_pool(postgres_url, min_size=1, max_size=5)
return self.pool
async def pg_connect(self) -> Dict:
"""
Test PostgreSQL connection and return status.
Returns:
Dict with connection status, version, and database info
"""
if not ASYNCPG_AVAILABLE:
return {
'connected': False,
'error': 'asyncpg not installed',
'suggestion': 'pip install asyncpg'
}
postgres_url = self.config.get('postgres_url')
if not postgres_url:
return {
'connected': False,
'error': 'POSTGRES_URL not configured',
'suggestion': 'Create ~/.config/claude/postgres.env with POSTGRES_URL=postgresql://...'
}
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
version = await conn.fetchval('SELECT version()')
db_name = await conn.fetchval('SELECT current_database()')
user = await conn.fetchval('SELECT current_user')
# Check for PostGIS
postgis_version = None
try:
postgis_version = await conn.fetchval('SELECT PostGIS_Version()')
except Exception:
pass
return {
'connected': True,
'database': db_name,
'user': user,
'version': version.split(',')[0] if version else 'Unknown',
'postgis_version': postgis_version,
'postgis_available': postgis_version is not None
}
except Exception as e:
logger.error(f"pg_connect failed: {e}")
return {
'connected': False,
'error': str(e)
}
async def pg_query(
self,
query: str,
params: Optional[List] = None,
name: Optional[str] = None
) -> Dict:
"""
Execute SELECT query and return results as data_ref.
Args:
query: SQL SELECT query
params: Query parameters (positional, use $1, $2, etc.)
name: Optional name for result data_ref
Returns:
Dict with data_ref for results or error
"""
if not PANDAS_AVAILABLE:
return {'error': 'pandas not available'}
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
if params:
rows = await conn.fetch(query, *params)
else:
rows = await conn.fetch(query)
if not rows:
return {
'data_ref': None,
'rows': 0,
'message': 'Query returned no results'
}
# Convert to DataFrame
df = pd.DataFrame([dict(r) for r in rows])
# Check row limit
check = self.store.check_row_limit(len(df))
if check['exceeded']:
return {
'error': 'row_limit_exceeded',
**check,
'preview': df.head(100).to_dict(orient='records')
}
# Store result
data_ref = self.store.store(df, name=name, source=f"pg_query: {query[:100]}...")
return {
'data_ref': data_ref,
'rows': len(df),
'columns': list(df.columns)
}
except Exception as e:
logger.error(f"pg_query failed: {e}")
return {'error': str(e)}
async def pg_execute(
self,
query: str,
params: Optional[List] = None
) -> Dict:
"""
Execute INSERT/UPDATE/DELETE query.
Args:
query: SQL DML query
params: Query parameters
Returns:
Dict with affected rows count
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
if params:
result = await conn.execute(query, *params)
else:
result = await conn.execute(query)
# Parse result (e.g., "INSERT 0 1" or "UPDATE 5")
parts = result.split()
affected = int(parts[-1]) if parts else 0
return {
'success': True,
'command': parts[0] if parts else 'UNKNOWN',
'affected_rows': affected
}
except Exception as e:
logger.error(f"pg_execute failed: {e}")
return {'error': str(e)}
async def pg_tables(self, schema: str = 'public') -> Dict:
"""
List all tables in schema.
Args:
schema: Schema name (default: public)
Returns:
Dict with list of tables
"""
query = """
SELECT
table_name,
table_type,
(SELECT count(*) FROM information_schema.columns c
WHERE c.table_schema = t.table_schema
AND c.table_name = t.table_name) as column_count
FROM information_schema.tables t
WHERE table_schema = $1
ORDER BY table_name
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(query, schema)
tables = [
{
'name': r['table_name'],
'type': r['table_type'],
'columns': r['column_count']
}
for r in rows
]
return {
'schema': schema,
'count': len(tables),
'tables': tables
}
except Exception as e:
logger.error(f"pg_tables failed: {e}")
return {'error': str(e)}
async def pg_columns(self, table: str, schema: str = 'public') -> Dict:
"""
Get column information for a table.
Args:
table: Table name
schema: Schema name (default: public)
Returns:
Dict with column details
"""
query = """
SELECT
column_name,
data_type,
udt_name,
is_nullable,
column_default,
character_maximum_length,
numeric_precision
FROM information_schema.columns
WHERE table_schema = $1 AND table_name = $2
ORDER BY ordinal_position
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(query, schema, table)
columns = [
{
'name': r['column_name'],
'type': r['data_type'],
'udt': r['udt_name'],
'nullable': r['is_nullable'] == 'YES',
'default': r['column_default'],
'max_length': r['character_maximum_length'],
'precision': r['numeric_precision']
}
for r in rows
]
return {
'table': f'{schema}.{table}',
'column_count': len(columns),
'columns': columns
}
except Exception as e:
logger.error(f"pg_columns failed: {e}")
return {'error': str(e)}
async def pg_schemas(self) -> Dict:
"""
List all schemas in database.
Returns:
Dict with list of schemas
"""
query = """
SELECT schema_name
FROM information_schema.schemata
WHERE schema_name NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
ORDER BY schema_name
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(query)
schemas = [r['schema_name'] for r in rows]
return {
'count': len(schemas),
'schemas': schemas
}
except Exception as e:
logger.error(f"pg_schemas failed: {e}")
return {'error': str(e)}
async def st_tables(self, schema: str = 'public') -> Dict:
"""
List PostGIS-enabled tables.
Args:
schema: Schema name (default: public)
Returns:
Dict with list of tables with geometry columns
"""
query = """
SELECT
f_table_name as table_name,
f_geometry_column as geometry_column,
type as geometry_type,
srid,
coord_dimension
FROM geometry_columns
WHERE f_table_schema = $1
ORDER BY f_table_name
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(query, schema)
tables = [
{
'table': r['table_name'],
'geometry_column': r['geometry_column'],
'geometry_type': r['geometry_type'],
'srid': r['srid'],
'dimensions': r['coord_dimension']
}
for r in rows
]
return {
'schema': schema,
'count': len(tables),
'postgis_tables': tables
}
except Exception as e:
if 'geometry_columns' in str(e):
return {
'error': 'PostGIS not installed or extension not enabled',
'suggestion': 'Run: CREATE EXTENSION IF NOT EXISTS postgis;'
}
logger.error(f"st_tables failed: {e}")
return {'error': str(e)}
async def st_geometry_type(self, table: str, column: str, schema: str = 'public') -> Dict:
"""
Get geometry type of a column.
Args:
table: Table name
column: Geometry column name
schema: Schema name
Returns:
Dict with geometry type information
"""
query = f"""
SELECT DISTINCT ST_GeometryType({column}) as geom_type
FROM {schema}.{table}
WHERE {column} IS NOT NULL
LIMIT 10
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
rows = await conn.fetch(query)
types = [r['geom_type'] for r in rows]
return {
'table': f'{schema}.{table}',
'column': column,
'geometry_types': types
}
except Exception as e:
logger.error(f"st_geometry_type failed: {e}")
return {'error': str(e)}
async def st_srid(self, table: str, column: str, schema: str = 'public') -> Dict:
"""
Get SRID of geometry column.
Args:
table: Table name
column: Geometry column name
schema: Schema name
Returns:
Dict with SRID information
"""
query = f"""
SELECT DISTINCT ST_SRID({column}) as srid
FROM {schema}.{table}
WHERE {column} IS NOT NULL
LIMIT 1
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(query)
srid = row['srid'] if row else None
# Get SRID description
srid_info = None
if srid:
srid_query = """
SELECT srtext, proj4text
FROM spatial_ref_sys
WHERE srid = $1
"""
srid_row = await conn.fetchrow(srid_query, srid)
if srid_row:
srid_info = {
'description': srid_row['srtext'][:200] if srid_row['srtext'] else None,
'proj4': srid_row['proj4text']
}
return {
'table': f'{schema}.{table}',
'column': column,
'srid': srid,
'info': srid_info
}
except Exception as e:
logger.error(f"st_srid failed: {e}")
return {'error': str(e)}
async def st_extent(self, table: str, column: str, schema: str = 'public') -> Dict:
"""
Get bounding box of all geometries.
Args:
table: Table name
column: Geometry column name
schema: Schema name
Returns:
Dict with bounding box coordinates
"""
query = f"""
SELECT
ST_XMin(extent) as xmin,
ST_YMin(extent) as ymin,
ST_XMax(extent) as xmax,
ST_YMax(extent) as ymax
FROM (
SELECT ST_Extent({column}) as extent
FROM {schema}.{table}
) sub
"""
try:
pool = await self._get_pool()
async with pool.acquire() as conn:
row = await conn.fetchrow(query)
if row and row['xmin'] is not None:
return {
'table': f'{schema}.{table}',
'column': column,
'bbox': {
'xmin': float(row['xmin']),
'ymin': float(row['ymin']),
'xmax': float(row['xmax']),
'ymax': float(row['ymax'])
}
}
return {
'table': f'{schema}.{table}',
'column': column,
'bbox': None,
'message': 'No geometries found or all NULL'
}
except Exception as e:
logger.error(f"st_extent failed: {e}")
return {'error': str(e)}
async def close(self):
"""Close connection pool"""
if self.pool:
await self.pool.close()
self.pool = None
def check_connection() -> None:
"""
Check PostgreSQL connection for SessionStart hook.
Prints warning to stderr if connection fails.
"""
import sys
config = load_config()
if not config.get('postgres_url'):
print(
"[data-platform] PostgreSQL not configured (POSTGRES_URL not set)",
file=sys.stderr
)
return
async def test():
try:
if not ASYNCPG_AVAILABLE:
print(
"[data-platform] asyncpg not installed - PostgreSQL tools unavailable",
file=sys.stderr
)
return
conn = await asyncpg.connect(config['postgres_url'], timeout=5)
await conn.close()
print("[data-platform] PostgreSQL connection OK", file=sys.stderr)
except Exception as e:
print(
f"[data-platform] PostgreSQL connection failed: {e}",
file=sys.stderr
)
asyncio.run(test())

View File

@@ -0,0 +1,795 @@
"""
MCP Server entry point for Data Platform integration.
Provides pandas, PostgreSQL/PostGIS, and dbt tools to Claude Code via JSON-RPC 2.0 over stdio.
"""
import asyncio
import logging
import json
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
from .config import DataPlatformConfig
from .data_store import DataStore
from .pandas_tools import PandasTools
from .postgres_tools import PostgresTools
from .dbt_tools import DbtTools
# Suppress noisy MCP validation warnings on stderr
logging.basicConfig(level=logging.INFO)
logging.getLogger("root").setLevel(logging.ERROR)
logging.getLogger("mcp").setLevel(logging.ERROR)
logger = logging.getLogger(__name__)
class DataPlatformMCPServer:
"""MCP Server for data platform integration"""
def __init__(self):
self.server = Server("data-platform-mcp")
self.config = None
self.pandas_tools = None
self.postgres_tools = None
self.dbt_tools = None
async def initialize(self):
"""Initialize server and load configuration."""
try:
config_loader = DataPlatformConfig()
self.config = config_loader.load()
self.pandas_tools = PandasTools()
self.postgres_tools = PostgresTools()
self.dbt_tools = DbtTools()
# Log available capabilities
caps = []
caps.append("pandas")
if self.config.get('postgres_available'):
caps.append("PostgreSQL")
if self.config.get('dbt_available'):
caps.append("dbt")
logger.info(f"Data Platform MCP Server initialized with: {', '.join(caps)}")
except Exception as e:
logger.error(f"Failed to initialize: {e}")
raise
def setup_tools(self):
"""Register all available tools with the MCP server"""
@self.server.list_tools()
async def list_tools() -> list[Tool]:
"""Return list of available tools"""
tools = [
# pandas tools - always available
Tool(
name="read_csv",
description="Load CSV file into DataFrame",
inputSchema={
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Path to CSV file"
},
"name": {
"type": "string",
"description": "Optional name for data_ref"
},
"chunk_size": {
"type": "integer",
"description": "Process in chunks of this size"
}
},
"required": ["file_path"]
}
),
Tool(
name="read_parquet",
description="Load Parquet file into DataFrame",
inputSchema={
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Path to Parquet file"
},
"name": {
"type": "string",
"description": "Optional name for data_ref"
},
"columns": {
"type": "array",
"items": {"type": "string"},
"description": "Optional list of columns to load"
}
},
"required": ["file_path"]
}
),
Tool(
name="read_json",
description="Load JSON/JSONL file into DataFrame",
inputSchema={
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "Path to JSON file"
},
"name": {
"type": "string",
"description": "Optional name for data_ref"
},
"lines": {
"type": "boolean",
"default": False,
"description": "Read as JSON Lines format"
}
},
"required": ["file_path"]
}
),
Tool(
name="to_csv",
description="Export DataFrame to CSV file",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"file_path": {
"type": "string",
"description": "Output file path"
},
"index": {
"type": "boolean",
"default": False,
"description": "Include index column"
}
},
"required": ["data_ref", "file_path"]
}
),
Tool(
name="to_parquet",
description="Export DataFrame to Parquet file",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"file_path": {
"type": "string",
"description": "Output file path"
},
"compression": {
"type": "string",
"default": "snappy",
"description": "Compression codec"
}
},
"required": ["data_ref", "file_path"]
}
),
Tool(
name="describe",
description="Get statistical summary of DataFrame",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
}
},
"required": ["data_ref"]
}
),
Tool(
name="head",
description="Get first N rows of DataFrame",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"n": {
"type": "integer",
"default": 10,
"description": "Number of rows"
}
},
"required": ["data_ref"]
}
),
Tool(
name="tail",
description="Get last N rows of DataFrame",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"n": {
"type": "integer",
"default": 10,
"description": "Number of rows"
}
},
"required": ["data_ref"]
}
),
Tool(
name="filter",
description="Filter DataFrame rows by condition",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"condition": {
"type": "string",
"description": "pandas query string (e.g., 'age > 30 and city == \"NYC\"')"
},
"name": {
"type": "string",
"description": "Optional name for result data_ref"
}
},
"required": ["data_ref", "condition"]
}
),
Tool(
name="select",
description="Select specific columns from DataFrame",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"columns": {
"type": "array",
"items": {"type": "string"},
"description": "List of column names to select"
},
"name": {
"type": "string",
"description": "Optional name for result data_ref"
}
},
"required": ["data_ref", "columns"]
}
),
Tool(
name="groupby",
description="Group DataFrame and aggregate",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to stored DataFrame"
},
"by": {
"oneOf": [
{"type": "string"},
{"type": "array", "items": {"type": "string"}}
],
"description": "Column(s) to group by"
},
"agg": {
"type": "object",
"description": "Aggregation dict (e.g., {\"sales\": \"sum\", \"count\": \"mean\"})"
},
"name": {
"type": "string",
"description": "Optional name for result data_ref"
}
},
"required": ["data_ref", "by", "agg"]
}
),
Tool(
name="join",
description="Join two DataFrames",
inputSchema={
"type": "object",
"properties": {
"left_ref": {
"type": "string",
"description": "Reference to left DataFrame"
},
"right_ref": {
"type": "string",
"description": "Reference to right DataFrame"
},
"on": {
"oneOf": [
{"type": "string"},
{"type": "array", "items": {"type": "string"}}
],
"description": "Column(s) to join on (if same name in both)"
},
"left_on": {
"oneOf": [
{"type": "string"},
{"type": "array", "items": {"type": "string"}}
],
"description": "Left join column(s)"
},
"right_on": {
"oneOf": [
{"type": "string"},
{"type": "array", "items": {"type": "string"}}
],
"description": "Right join column(s)"
},
"how": {
"type": "string",
"enum": ["inner", "left", "right", "outer"],
"default": "inner",
"description": "Join type"
},
"name": {
"type": "string",
"description": "Optional name for result data_ref"
}
},
"required": ["left_ref", "right_ref"]
}
),
Tool(
name="list_data",
description="List all stored DataFrames",
inputSchema={
"type": "object",
"properties": {}
}
),
Tool(
name="drop_data",
description="Remove a DataFrame from storage",
inputSchema={
"type": "object",
"properties": {
"data_ref": {
"type": "string",
"description": "Reference to drop"
}
},
"required": ["data_ref"]
}
),
# PostgreSQL tools
Tool(
name="pg_connect",
description="Test PostgreSQL connection and return status",
inputSchema={
"type": "object",
"properties": {}
}
),
Tool(
name="pg_query",
description="Execute SELECT query and return results as data_ref",
inputSchema={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "SQL SELECT query"
},
"params": {
"type": "array",
"items": {},
"description": "Query parameters (use $1, $2, etc.)"
},
"name": {
"type": "string",
"description": "Optional name for result data_ref"
}
},
"required": ["query"]
}
),
Tool(
name="pg_execute",
description="Execute INSERT/UPDATE/DELETE query",
inputSchema={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "SQL DML query"
},
"params": {
"type": "array",
"items": {},
"description": "Query parameters"
}
},
"required": ["query"]
}
),
Tool(
name="pg_tables",
description="List all tables in schema",
inputSchema={
"type": "object",
"properties": {
"schema": {
"type": "string",
"default": "public",
"description": "Schema name"
}
}
}
),
Tool(
name="pg_columns",
description="Get column information for a table",
inputSchema={
"type": "object",
"properties": {
"table": {
"type": "string",
"description": "Table name"
},
"schema": {
"type": "string",
"default": "public",
"description": "Schema name"
}
},
"required": ["table"]
}
),
Tool(
name="pg_schemas",
description="List all schemas in database",
inputSchema={
"type": "object",
"properties": {}
}
),
# PostGIS tools
Tool(
name="st_tables",
description="List PostGIS-enabled tables",
inputSchema={
"type": "object",
"properties": {
"schema": {
"type": "string",
"default": "public",
"description": "Schema name"
}
}
}
),
Tool(
name="st_geometry_type",
description="Get geometry type of a column",
inputSchema={
"type": "object",
"properties": {
"table": {
"type": "string",
"description": "Table name"
},
"column": {
"type": "string",
"description": "Geometry column name"
},
"schema": {
"type": "string",
"default": "public",
"description": "Schema name"
}
},
"required": ["table", "column"]
}
),
Tool(
name="st_srid",
description="Get SRID of geometry column",
inputSchema={
"type": "object",
"properties": {
"table": {
"type": "string",
"description": "Table name"
},
"column": {
"type": "string",
"description": "Geometry column name"
},
"schema": {
"type": "string",
"default": "public",
"description": "Schema name"
}
},
"required": ["table", "column"]
}
),
Tool(
name="st_extent",
description="Get bounding box of all geometries",
inputSchema={
"type": "object",
"properties": {
"table": {
"type": "string",
"description": "Table name"
},
"column": {
"type": "string",
"description": "Geometry column name"
},
"schema": {
"type": "string",
"default": "public",
"description": "Schema name"
}
},
"required": ["table", "column"]
}
),
# dbt tools
Tool(
name="dbt_parse",
description="Validate dbt project (pre-flight check)",
inputSchema={
"type": "object",
"properties": {}
}
),
Tool(
name="dbt_run",
description="Run dbt models with pre-validation",
inputSchema={
"type": "object",
"properties": {
"select": {
"type": "string",
"description": "Model selection (e.g., 'model_name', '+model_name', 'tag:daily')"
},
"exclude": {
"type": "string",
"description": "Models to exclude"
},
"full_refresh": {
"type": "boolean",
"default": False,
"description": "Rebuild incremental models"
}
}
}
),
Tool(
name="dbt_test",
description="Run dbt tests",
inputSchema={
"type": "object",
"properties": {
"select": {
"type": "string",
"description": "Test selection"
},
"exclude": {
"type": "string",
"description": "Tests to exclude"
}
}
}
),
Tool(
name="dbt_build",
description="Run dbt build (run + test) with pre-validation",
inputSchema={
"type": "object",
"properties": {
"select": {
"type": "string",
"description": "Model/test selection"
},
"exclude": {
"type": "string",
"description": "Resources to exclude"
},
"full_refresh": {
"type": "boolean",
"default": False,
"description": "Rebuild incremental models"
}
}
}
),
Tool(
name="dbt_compile",
description="Compile dbt models to SQL without executing",
inputSchema={
"type": "object",
"properties": {
"select": {
"type": "string",
"description": "Model selection"
}
}
}
),
Tool(
name="dbt_ls",
description="List dbt resources",
inputSchema={
"type": "object",
"properties": {
"select": {
"type": "string",
"description": "Resource selection"
},
"resource_type": {
"type": "string",
"enum": ["model", "test", "seed", "snapshot", "source"],
"description": "Filter by type"
},
"output": {
"type": "string",
"enum": ["name", "path", "json"],
"default": "name",
"description": "Output format"
}
}
}
),
Tool(
name="dbt_docs_generate",
description="Generate dbt documentation",
inputSchema={
"type": "object",
"properties": {}
}
),
Tool(
name="dbt_lineage",
description="Get model dependencies and lineage",
inputSchema={
"type": "object",
"properties": {
"model": {
"type": "string",
"description": "Model name to analyze"
}
},
"required": ["model"]
}
)
]
return tools
@self.server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
"""Handle tool invocation."""
try:
# Route to appropriate tool handler
# pandas tools
if name == "read_csv":
result = await self.pandas_tools.read_csv(**arguments)
elif name == "read_parquet":
result = await self.pandas_tools.read_parquet(**arguments)
elif name == "read_json":
result = await self.pandas_tools.read_json(**arguments)
elif name == "to_csv":
result = await self.pandas_tools.to_csv(**arguments)
elif name == "to_parquet":
result = await self.pandas_tools.to_parquet(**arguments)
elif name == "describe":
result = await self.pandas_tools.describe(**arguments)
elif name == "head":
result = await self.pandas_tools.head(**arguments)
elif name == "tail":
result = await self.pandas_tools.tail(**arguments)
elif name == "filter":
result = await self.pandas_tools.filter(**arguments)
elif name == "select":
result = await self.pandas_tools.select(**arguments)
elif name == "groupby":
result = await self.pandas_tools.groupby(**arguments)
elif name == "join":
result = await self.pandas_tools.join(**arguments)
elif name == "list_data":
result = await self.pandas_tools.list_data()
elif name == "drop_data":
result = await self.pandas_tools.drop_data(**arguments)
# PostgreSQL tools
elif name == "pg_connect":
result = await self.postgres_tools.pg_connect()
elif name == "pg_query":
result = await self.postgres_tools.pg_query(**arguments)
elif name == "pg_execute":
result = await self.postgres_tools.pg_execute(**arguments)
elif name == "pg_tables":
result = await self.postgres_tools.pg_tables(**arguments)
elif name == "pg_columns":
result = await self.postgres_tools.pg_columns(**arguments)
elif name == "pg_schemas":
result = await self.postgres_tools.pg_schemas()
# PostGIS tools
elif name == "st_tables":
result = await self.postgres_tools.st_tables(**arguments)
elif name == "st_geometry_type":
result = await self.postgres_tools.st_geometry_type(**arguments)
elif name == "st_srid":
result = await self.postgres_tools.st_srid(**arguments)
elif name == "st_extent":
result = await self.postgres_tools.st_extent(**arguments)
# dbt tools
elif name == "dbt_parse":
result = await self.dbt_tools.dbt_parse()
elif name == "dbt_run":
result = await self.dbt_tools.dbt_run(**arguments)
elif name == "dbt_test":
result = await self.dbt_tools.dbt_test(**arguments)
elif name == "dbt_build":
result = await self.dbt_tools.dbt_build(**arguments)
elif name == "dbt_compile":
result = await self.dbt_tools.dbt_compile(**arguments)
elif name == "dbt_ls":
result = await self.dbt_tools.dbt_ls(**arguments)
elif name == "dbt_docs_generate":
result = await self.dbt_tools.dbt_docs_generate()
elif name == "dbt_lineage":
result = await self.dbt_tools.dbt_lineage(**arguments)
else:
raise ValueError(f"Unknown tool: {name}")
return [TextContent(
type="text",
text=json.dumps(result, indent=2, default=str)
)]
except Exception as e:
logger.error(f"Tool {name} failed: {e}")
return [TextContent(
type="text",
text=json.dumps({"error": str(e)}, indent=2)
)]
async def run(self):
"""Run the MCP server"""
await self.initialize()
self.setup_tools()
async with stdio_server() as (read_stream, write_stream):
await self.server.run(
read_stream,
write_stream,
self.server.create_initialization_options()
)
async def main():
"""Main entry point"""
server = DataPlatformMCPServer()
await server.run()
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,49 @@
[build-system]
requires = ["setuptools>=61.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "data-platform-mcp"
version = "1.0.0"
description = "MCP Server for data engineering with pandas, PostgreSQL/PostGIS, and dbt"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10"
authors = [
{name = "Leo Miranda"}
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dependencies = [
"mcp>=0.9.0",
"pandas>=2.0.0",
"pyarrow>=14.0.0",
"asyncpg>=0.29.0",
"geoalchemy2>=0.14.0",
"shapely>=2.0.0",
"dbt-core>=1.9.0",
"dbt-postgres>=1.9.0",
"python-dotenv>=1.0.0",
"pydantic>=2.5.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.4.3",
"pytest-asyncio>=0.23.0",
]
[tool.setuptools.packages.find]
where = ["."]
include = ["mcp_server*"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]

View File

@@ -0,0 +1,23 @@
# MCP SDK
mcp>=0.9.0
# Data Processing
pandas>=2.0.0
pyarrow>=14.0.0
# PostgreSQL/PostGIS
asyncpg>=0.29.0
geoalchemy2>=0.14.0
shapely>=2.0.0
# dbt
dbt-core>=1.9.0
dbt-postgres>=1.9.0
# Utilities
python-dotenv>=1.0.0
pydantic>=2.5.0
# Testing
pytest>=7.4.3
pytest-asyncio>=0.23.0

View File

@@ -0,0 +1,3 @@
"""
Tests for Data Platform MCP Server.
"""

View File

@@ -0,0 +1,239 @@
"""
Unit tests for configuration loader.
"""
import pytest
from pathlib import Path
import os
def test_load_system_config(tmp_path, monkeypatch):
"""Test loading system-level PostgreSQL configuration"""
# Import here to avoid import errors before setup
from mcp_server.config import DataPlatformConfig
# Mock home directory
config_dir = tmp_path / '.config' / 'claude'
config_dir.mkdir(parents=True)
config_file = config_dir / 'postgres.env'
config_file.write_text(
"POSTGRES_URL=postgresql://user:pass@localhost:5432/testdb\n"
)
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(tmp_path)
config = DataPlatformConfig()
result = config.load()
assert result['postgres_url'] == 'postgresql://user:pass@localhost:5432/testdb'
assert result['postgres_available'] is True
def test_postgres_optional(tmp_path, monkeypatch):
"""Test that PostgreSQL configuration is optional"""
from mcp_server.config import DataPlatformConfig
# No postgres.env file
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(tmp_path)
# Clear any existing env vars
monkeypatch.delenv('POSTGRES_URL', raising=False)
config = DataPlatformConfig()
result = config.load()
assert result['postgres_url'] is None
assert result['postgres_available'] is False
def test_project_config_override(tmp_path, monkeypatch):
"""Test that project config overrides system config"""
from mcp_server.config import DataPlatformConfig
# Set up system config
system_config_dir = tmp_path / '.config' / 'claude'
system_config_dir.mkdir(parents=True)
system_config = system_config_dir / 'postgres.env'
system_config.write_text(
"POSTGRES_URL=postgresql://system:pass@localhost:5432/systemdb\n"
)
# Set up project config
project_dir = tmp_path / 'project'
project_dir.mkdir()
project_config = project_dir / '.env'
project_config.write_text(
"POSTGRES_URL=postgresql://project:pass@localhost:5432/projectdb\n"
"DBT_PROJECT_DIR=/path/to/dbt\n"
)
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(project_dir)
config = DataPlatformConfig()
result = config.load()
# Project config should override
assert result['postgres_url'] == 'postgresql://project:pass@localhost:5432/projectdb'
assert result['dbt_project_dir'] == '/path/to/dbt'
def test_max_rows_config(tmp_path, monkeypatch):
"""Test max rows configuration"""
from mcp_server.config import DataPlatformConfig
project_dir = tmp_path / 'project'
project_dir.mkdir()
project_config = project_dir / '.env'
project_config.write_text("DATA_PLATFORM_MAX_ROWS=50000\n")
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(project_dir)
config = DataPlatformConfig()
result = config.load()
assert result['max_rows'] == 50000
def test_default_max_rows(tmp_path, monkeypatch):
"""Test default max rows value"""
from mcp_server.config import DataPlatformConfig
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(tmp_path)
# Clear any existing env vars
monkeypatch.delenv('DATA_PLATFORM_MAX_ROWS', raising=False)
config = DataPlatformConfig()
result = config.load()
assert result['max_rows'] == 100_000 # Default value
def test_dbt_auto_detection(tmp_path, monkeypatch):
"""Test automatic dbt project detection"""
from mcp_server.config import DataPlatformConfig
# Create project with dbt_project.yml
project_dir = tmp_path / 'project'
project_dir.mkdir()
(project_dir / 'dbt_project.yml').write_text("name: test_project\n")
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(project_dir)
# Clear PWD and DBT_PROJECT_DIR to ensure auto-detection
monkeypatch.delenv('PWD', raising=False)
monkeypatch.delenv('DBT_PROJECT_DIR', raising=False)
monkeypatch.delenv('CLAUDE_PROJECT_DIR', raising=False)
config = DataPlatformConfig()
result = config.load()
assert result['dbt_project_dir'] == str(project_dir)
assert result['dbt_available'] is True
def test_dbt_subdirectory_detection(tmp_path, monkeypatch):
"""Test dbt project detection in subdirectory"""
from mcp_server.config import DataPlatformConfig
# Create project with dbt in subdirectory
project_dir = tmp_path / 'project'
project_dir.mkdir()
# Need a marker file for _find_project_directory to find the project
(project_dir / '.git').mkdir()
dbt_dir = project_dir / 'transform'
dbt_dir.mkdir()
(dbt_dir / 'dbt_project.yml').write_text("name: test_project\n")
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(project_dir)
# Clear env vars to ensure auto-detection
monkeypatch.delenv('PWD', raising=False)
monkeypatch.delenv('DBT_PROJECT_DIR', raising=False)
monkeypatch.delenv('CLAUDE_PROJECT_DIR', raising=False)
config = DataPlatformConfig()
result = config.load()
assert result['dbt_project_dir'] == str(dbt_dir)
assert result['dbt_available'] is True
def test_no_dbt_project(tmp_path, monkeypatch):
"""Test when no dbt project exists"""
from mcp_server.config import DataPlatformConfig
project_dir = tmp_path / 'project'
project_dir.mkdir()
monkeypatch.setenv('HOME', str(tmp_path))
monkeypatch.chdir(project_dir)
# Clear any existing env vars
monkeypatch.delenv('DBT_PROJECT_DIR', raising=False)
config = DataPlatformConfig()
result = config.load()
assert result['dbt_project_dir'] is None
assert result['dbt_available'] is False
def test_find_project_directory_from_env(tmp_path, monkeypatch):
"""Test finding project directory from CLAUDE_PROJECT_DIR env var"""
from mcp_server.config import DataPlatformConfig
project_dir = tmp_path / 'my-project'
project_dir.mkdir()
(project_dir / '.git').mkdir()
monkeypatch.setenv('CLAUDE_PROJECT_DIR', str(project_dir))
config = DataPlatformConfig()
result = config._find_project_directory()
assert result == project_dir
def test_find_project_directory_from_cwd(tmp_path, monkeypatch):
"""Test finding project directory from cwd with .env file"""
from mcp_server.config import DataPlatformConfig
project_dir = tmp_path / 'project'
project_dir.mkdir()
(project_dir / '.env').write_text("TEST=value")
monkeypatch.chdir(project_dir)
monkeypatch.delenv('CLAUDE_PROJECT_DIR', raising=False)
monkeypatch.delenv('PWD', raising=False)
config = DataPlatformConfig()
result = config._find_project_directory()
assert result == project_dir
def test_find_project_directory_none_when_no_markers(tmp_path, monkeypatch):
"""Test returns None when no project markers found"""
from mcp_server.config import DataPlatformConfig
empty_dir = tmp_path / 'empty'
empty_dir.mkdir()
monkeypatch.chdir(empty_dir)
monkeypatch.delenv('CLAUDE_PROJECT_DIR', raising=False)
monkeypatch.delenv('PWD', raising=False)
monkeypatch.delenv('DBT_PROJECT_DIR', raising=False)
config = DataPlatformConfig()
result = config._find_project_directory()
assert result is None

View File

@@ -0,0 +1,240 @@
"""
Unit tests for Arrow IPC DataFrame registry.
"""
import pytest
import pandas as pd
import pyarrow as pa
def test_store_pandas_dataframe():
"""Test storing pandas DataFrame"""
from mcp_server.data_store import DataStore
# Create fresh instance for test
store = DataStore()
store._dataframes = {}
store._metadata = {}
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
data_ref = store.store(df, name='test_df')
assert data_ref == 'test_df'
assert 'test_df' in store._dataframes
assert store._metadata['test_df'].rows == 3
assert store._metadata['test_df'].columns == 2
def test_store_arrow_table():
"""Test storing Arrow Table directly"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
table = pa.table({'x': [1, 2, 3], 'y': [4, 5, 6]})
data_ref = store.store(table, name='arrow_test')
assert data_ref == 'arrow_test'
assert store._dataframes['arrow_test'].num_rows == 3
def test_store_auto_name():
"""Test auto-generated data_ref names"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
df = pd.DataFrame({'a': [1, 2]})
data_ref = store.store(df)
assert data_ref.startswith('df_')
assert len(data_ref) == 11 # df_ + 8 hex chars
def test_get_dataframe():
"""Test retrieving stored DataFrame"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
df = pd.DataFrame({'a': [1, 2, 3]})
store.store(df, name='get_test')
result = store.get('get_test')
assert result is not None
assert result.num_rows == 3
def test_get_pandas():
"""Test retrieving as pandas DataFrame"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
store.store(df, name='pandas_test')
result = store.get_pandas('pandas_test')
assert isinstance(result, pd.DataFrame)
assert list(result.columns) == ['a', 'b']
assert len(result) == 3
def test_get_nonexistent():
"""Test getting nonexistent data_ref returns None"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
assert store.get('nonexistent') is None
assert store.get_pandas('nonexistent') is None
def test_list_refs():
"""Test listing all stored DataFrames"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
store.store(pd.DataFrame({'a': [1, 2]}), name='df1')
store.store(pd.DataFrame({'b': [3, 4, 5]}), name='df2')
refs = store.list_refs()
assert len(refs) == 2
ref_names = [r['ref'] for r in refs]
assert 'df1' in ref_names
assert 'df2' in ref_names
def test_drop_dataframe():
"""Test dropping a DataFrame"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
store.store(pd.DataFrame({'a': [1]}), name='drop_test')
assert store.get('drop_test') is not None
result = store.drop('drop_test')
assert result is True
assert store.get('drop_test') is None
def test_drop_nonexistent():
"""Test dropping nonexistent data_ref"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
result = store.drop('nonexistent')
assert result is False
def test_clear():
"""Test clearing all DataFrames"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
store.store(pd.DataFrame({'a': [1]}), name='df1')
store.store(pd.DataFrame({'b': [2]}), name='df2')
store.clear()
assert len(store.list_refs()) == 0
def test_get_info():
"""Test getting DataFrame metadata"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
store.store(df, name='info_test', source='test source')
info = store.get_info('info_test')
assert info.ref == 'info_test'
assert info.rows == 3
assert info.columns == 2
assert info.column_names == ['a', 'b']
assert info.source == 'test source'
assert info.memory_bytes > 0
def test_total_memory():
"""Test total memory calculation"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
store.store(pd.DataFrame({'a': range(100)}), name='df1')
store.store(pd.DataFrame({'b': range(200)}), name='df2')
total = store.total_memory_bytes()
assert total > 0
total_mb = store.total_memory_mb()
assert total_mb >= 0
def test_check_row_limit():
"""Test row limit checking"""
from mcp_server.data_store import DataStore
store = DataStore()
store._max_rows = 100
# Under limit
result = store.check_row_limit(50)
assert result['exceeded'] is False
# Over limit
result = store.check_row_limit(150)
assert result['exceeded'] is True
assert 'suggestion' in result
def test_metadata_dtypes():
"""Test that dtypes are correctly recorded"""
from mcp_server.data_store import DataStore
store = DataStore()
store._dataframes = {}
store._metadata = {}
df = pd.DataFrame({
'int_col': [1, 2, 3],
'float_col': [1.1, 2.2, 3.3],
'str_col': ['a', 'b', 'c']
})
store.store(df, name='dtype_test')
info = store.get_info('dtype_test')
assert 'int_col' in info.dtypes
assert 'float_col' in info.dtypes
assert 'str_col' in info.dtypes

View File

@@ -0,0 +1,318 @@
"""
Unit tests for dbt MCP tools.
"""
import pytest
from unittest.mock import Mock, patch, MagicMock
import subprocess
import json
import tempfile
import os
@pytest.fixture
def mock_config(tmp_path):
"""Mock configuration with dbt project"""
dbt_dir = tmp_path / 'dbt_project'
dbt_dir.mkdir()
(dbt_dir / 'dbt_project.yml').write_text('name: test_project\n')
return {
'dbt_project_dir': str(dbt_dir),
'dbt_profiles_dir': str(tmp_path / '.dbt')
}
@pytest.fixture
def dbt_tools(mock_config):
"""Create DbtTools instance with mocked config"""
with patch('mcp_server.dbt_tools.load_config', return_value=mock_config):
from mcp_server.dbt_tools import DbtTools
tools = DbtTools()
tools.project_dir = mock_config['dbt_project_dir']
tools.profiles_dir = mock_config['dbt_profiles_dir']
return tools
@pytest.mark.asyncio
async def test_dbt_parse_success(dbt_tools):
"""Test successful dbt parse"""
mock_result = MagicMock()
mock_result.returncode = 0
mock_result.stdout = 'Parsed successfully'
mock_result.stderr = ''
with patch('subprocess.run', return_value=mock_result):
result = await dbt_tools.dbt_parse()
assert result['valid'] is True
@pytest.mark.asyncio
async def test_dbt_parse_failure(dbt_tools):
"""Test dbt parse with errors"""
mock_result = MagicMock()
mock_result.returncode = 1
mock_result.stdout = ''
mock_result.stderr = 'Compilation error: deprecated syntax'
with patch('subprocess.run', return_value=mock_result):
result = await dbt_tools.dbt_parse()
assert result['valid'] is False
assert 'deprecated' in str(result.get('details', '')).lower() or len(result.get('errors', [])) > 0
@pytest.mark.asyncio
async def test_dbt_run_with_prevalidation(dbt_tools):
"""Test dbt run includes pre-validation"""
# First call is parse, second is run
mock_parse = MagicMock()
mock_parse.returncode = 0
mock_parse.stdout = 'OK'
mock_parse.stderr = ''
mock_run = MagicMock()
mock_run.returncode = 0
mock_run.stdout = 'Completed successfully'
mock_run.stderr = ''
with patch('subprocess.run', side_effect=[mock_parse, mock_run]):
result = await dbt_tools.dbt_run()
assert result['success'] is True
@pytest.mark.asyncio
async def test_dbt_run_fails_validation(dbt_tools):
"""Test dbt run fails if validation fails"""
mock_parse = MagicMock()
mock_parse.returncode = 1
mock_parse.stdout = ''
mock_parse.stderr = 'Parse error'
with patch('subprocess.run', return_value=mock_parse):
result = await dbt_tools.dbt_run()
assert 'error' in result
assert 'Pre-validation failed' in result['error']
@pytest.mark.asyncio
async def test_dbt_run_with_selection(dbt_tools):
"""Test dbt run with model selection"""
mock_parse = MagicMock()
mock_parse.returncode = 0
mock_parse.stdout = 'OK'
mock_parse.stderr = ''
mock_run = MagicMock()
mock_run.returncode = 0
mock_run.stdout = 'Completed'
mock_run.stderr = ''
calls = []
def track_calls(*args, **kwargs):
calls.append(args[0] if args else kwargs.get('args', []))
if len(calls) == 1:
return mock_parse
return mock_run
with patch('subprocess.run', side_effect=track_calls):
result = await dbt_tools.dbt_run(select='dim_customers')
# Verify --select was passed
assert any('--select' in str(call) for call in calls)
@pytest.mark.asyncio
async def test_dbt_test(dbt_tools):
"""Test dbt test"""
mock_result = MagicMock()
mock_result.returncode = 0
mock_result.stdout = 'All tests passed'
mock_result.stderr = ''
with patch('subprocess.run', return_value=mock_result):
result = await dbt_tools.dbt_test()
assert result['success'] is True
@pytest.mark.asyncio
async def test_dbt_build(dbt_tools):
"""Test dbt build with pre-validation"""
mock_parse = MagicMock()
mock_parse.returncode = 0
mock_parse.stdout = 'OK'
mock_parse.stderr = ''
mock_build = MagicMock()
mock_build.returncode = 0
mock_build.stdout = 'Build complete'
mock_build.stderr = ''
with patch('subprocess.run', side_effect=[mock_parse, mock_build]):
result = await dbt_tools.dbt_build()
assert result['success'] is True
@pytest.mark.asyncio
async def test_dbt_compile(dbt_tools):
"""Test dbt compile"""
mock_result = MagicMock()
mock_result.returncode = 0
mock_result.stdout = 'Compiled'
mock_result.stderr = ''
with patch('subprocess.run', return_value=mock_result):
result = await dbt_tools.dbt_compile()
assert result['success'] is True
@pytest.mark.asyncio
async def test_dbt_ls(dbt_tools):
"""Test dbt ls"""
mock_result = MagicMock()
mock_result.returncode = 0
mock_result.stdout = 'dim_customers\ndim_products\nfct_orders\n'
mock_result.stderr = ''
with patch('subprocess.run', return_value=mock_result):
result = await dbt_tools.dbt_ls()
assert result['success'] is True
assert result['count'] == 3
assert 'dim_customers' in result['resources']
@pytest.mark.asyncio
async def test_dbt_docs_generate(dbt_tools, tmp_path):
"""Test dbt docs generate"""
mock_result = MagicMock()
mock_result.returncode = 0
mock_result.stdout = 'Done'
mock_result.stderr = ''
# Create fake target directory
target_dir = tmp_path / 'dbt_project' / 'target'
target_dir.mkdir(parents=True)
(target_dir / 'catalog.json').write_text('{}')
(target_dir / 'manifest.json').write_text('{}')
dbt_tools.project_dir = str(tmp_path / 'dbt_project')
with patch('subprocess.run', return_value=mock_result):
result = await dbt_tools.dbt_docs_generate()
assert result['success'] is True
assert result['catalog_generated'] is True
assert result['manifest_generated'] is True
@pytest.mark.asyncio
async def test_dbt_lineage(dbt_tools, tmp_path):
"""Test dbt lineage"""
# Create manifest
target_dir = tmp_path / 'dbt_project' / 'target'
target_dir.mkdir(parents=True)
manifest = {
'nodes': {
'model.test.dim_customers': {
'name': 'dim_customers',
'resource_type': 'model',
'schema': 'public',
'database': 'testdb',
'description': 'Customer dimension',
'tags': ['daily'],
'config': {'materialized': 'table'},
'depends_on': {
'nodes': ['model.test.stg_customers']
}
},
'model.test.stg_customers': {
'name': 'stg_customers',
'resource_type': 'model',
'depends_on': {'nodes': []}
},
'model.test.fct_orders': {
'name': 'fct_orders',
'resource_type': 'model',
'depends_on': {
'nodes': ['model.test.dim_customers']
}
}
}
}
(target_dir / 'manifest.json').write_text(json.dumps(manifest))
dbt_tools.project_dir = str(tmp_path / 'dbt_project')
result = await dbt_tools.dbt_lineage('dim_customers')
assert result['model'] == 'dim_customers'
assert 'model.test.stg_customers' in result['upstream']
assert 'model.test.fct_orders' in result['downstream']
@pytest.mark.asyncio
async def test_dbt_lineage_model_not_found(dbt_tools, tmp_path):
"""Test dbt lineage with nonexistent model"""
target_dir = tmp_path / 'dbt_project' / 'target'
target_dir.mkdir(parents=True)
manifest = {
'nodes': {
'model.test.dim_customers': {
'name': 'dim_customers',
'resource_type': 'model'
}
}
}
(target_dir / 'manifest.json').write_text(json.dumps(manifest))
dbt_tools.project_dir = str(tmp_path / 'dbt_project')
result = await dbt_tools.dbt_lineage('nonexistent_model')
assert 'error' in result
assert 'not found' in result['error'].lower()
@pytest.mark.asyncio
async def test_dbt_no_project():
"""Test dbt tools when no project configured"""
with patch('mcp_server.dbt_tools.load_config', return_value={'dbt_project_dir': None}):
from mcp_server.dbt_tools import DbtTools
tools = DbtTools()
tools.project_dir = None
result = await tools.dbt_run()
assert 'error' in result
assert 'not found' in result['error'].lower()
@pytest.mark.asyncio
async def test_dbt_timeout(dbt_tools):
"""Test dbt command timeout handling"""
with patch('subprocess.run', side_effect=subprocess.TimeoutExpired('dbt', 300)):
result = await dbt_tools.dbt_parse()
assert 'error' in result
assert 'timed out' in result['error'].lower()
@pytest.mark.asyncio
async def test_dbt_not_installed(dbt_tools):
"""Test handling when dbt is not installed"""
with patch('subprocess.run', side_effect=FileNotFoundError()):
result = await dbt_tools.dbt_parse()
assert 'error' in result
assert 'not found' in result['error'].lower()

View File

@@ -0,0 +1,301 @@
"""
Unit tests for pandas MCP tools.
"""
import pytest
import pandas as pd
import tempfile
import os
from pathlib import Path
@pytest.fixture
def temp_csv(tmp_path):
"""Create a temporary CSV file for testing"""
csv_path = tmp_path / 'test.csv'
df = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'value': [10.5, 20.0, 30.5, 40.0, 50.5]
})
df.to_csv(csv_path, index=False)
return str(csv_path)
@pytest.fixture
def temp_parquet(tmp_path):
"""Create a temporary Parquet file for testing"""
parquet_path = tmp_path / 'test.parquet'
df = pd.DataFrame({
'id': [1, 2, 3],
'data': ['a', 'b', 'c']
})
df.to_parquet(parquet_path)
return str(parquet_path)
@pytest.fixture
def temp_json(tmp_path):
"""Create a temporary JSON file for testing"""
json_path = tmp_path / 'test.json'
df = pd.DataFrame({
'x': [1, 2],
'y': [3, 4]
})
df.to_json(json_path, orient='records')
return str(json_path)
@pytest.fixture
def pandas_tools():
"""Create PandasTools instance with fresh store"""
from mcp_server.pandas_tools import PandasTools
from mcp_server.data_store import DataStore
# Reset store for test isolation
store = DataStore.get_instance()
store._dataframes = {}
store._metadata = {}
return PandasTools()
@pytest.mark.asyncio
async def test_read_csv(pandas_tools, temp_csv):
"""Test reading CSV file"""
result = await pandas_tools.read_csv(temp_csv, name='csv_test')
assert 'data_ref' in result
assert result['data_ref'] == 'csv_test'
assert result['rows'] == 5
assert 'id' in result['columns']
assert 'name' in result['columns']
@pytest.mark.asyncio
async def test_read_csv_nonexistent(pandas_tools):
"""Test reading nonexistent CSV file"""
result = await pandas_tools.read_csv('/nonexistent/path.csv')
assert 'error' in result
assert 'not found' in result['error'].lower()
@pytest.mark.asyncio
async def test_read_parquet(pandas_tools, temp_parquet):
"""Test reading Parquet file"""
result = await pandas_tools.read_parquet(temp_parquet, name='parquet_test')
assert 'data_ref' in result
assert result['rows'] == 3
@pytest.mark.asyncio
async def test_read_json(pandas_tools, temp_json):
"""Test reading JSON file"""
result = await pandas_tools.read_json(temp_json, name='json_test')
assert 'data_ref' in result
assert result['rows'] == 2
@pytest.mark.asyncio
async def test_to_csv(pandas_tools, temp_csv, tmp_path):
"""Test exporting to CSV"""
# First load some data
await pandas_tools.read_csv(temp_csv, name='export_test')
# Export to new file
output_path = str(tmp_path / 'output.csv')
result = await pandas_tools.to_csv('export_test', output_path)
assert result['success'] is True
assert os.path.exists(output_path)
@pytest.mark.asyncio
async def test_to_parquet(pandas_tools, temp_csv, tmp_path):
"""Test exporting to Parquet"""
await pandas_tools.read_csv(temp_csv, name='parquet_export')
output_path = str(tmp_path / 'output.parquet')
result = await pandas_tools.to_parquet('parquet_export', output_path)
assert result['success'] is True
assert os.path.exists(output_path)
@pytest.mark.asyncio
async def test_describe(pandas_tools, temp_csv):
"""Test describe statistics"""
await pandas_tools.read_csv(temp_csv, name='describe_test')
result = await pandas_tools.describe('describe_test')
assert 'data_ref' in result
assert 'shape' in result
assert result['shape']['rows'] == 5
assert 'statistics' in result
assert 'null_counts' in result
@pytest.mark.asyncio
async def test_head(pandas_tools, temp_csv):
"""Test getting first N rows"""
await pandas_tools.read_csv(temp_csv, name='head_test')
result = await pandas_tools.head('head_test', n=3)
assert result['returned_rows'] == 3
assert len(result['data']) == 3
@pytest.mark.asyncio
async def test_tail(pandas_tools, temp_csv):
"""Test getting last N rows"""
await pandas_tools.read_csv(temp_csv, name='tail_test')
result = await pandas_tools.tail('tail_test', n=2)
assert result['returned_rows'] == 2
@pytest.mark.asyncio
async def test_filter(pandas_tools, temp_csv):
"""Test filtering rows"""
await pandas_tools.read_csv(temp_csv, name='filter_test')
result = await pandas_tools.filter('filter_test', 'value > 25')
assert 'data_ref' in result
assert result['rows'] == 3 # 30.5, 40.0, 50.5
@pytest.mark.asyncio
async def test_filter_invalid_condition(pandas_tools, temp_csv):
"""Test filter with invalid condition"""
await pandas_tools.read_csv(temp_csv, name='filter_error')
result = await pandas_tools.filter('filter_error', 'invalid_column > 0')
assert 'error' in result
@pytest.mark.asyncio
async def test_select(pandas_tools, temp_csv):
"""Test selecting columns"""
await pandas_tools.read_csv(temp_csv, name='select_test')
result = await pandas_tools.select('select_test', ['id', 'name'])
assert 'data_ref' in result
assert result['columns'] == ['id', 'name']
@pytest.mark.asyncio
async def test_select_invalid_column(pandas_tools, temp_csv):
"""Test select with invalid column"""
await pandas_tools.read_csv(temp_csv, name='select_error')
result = await pandas_tools.select('select_error', ['id', 'nonexistent'])
assert 'error' in result
assert 'available_columns' in result
@pytest.mark.asyncio
async def test_groupby(pandas_tools, tmp_path):
"""Test groupby aggregation"""
# Create test data with groups
csv_path = tmp_path / 'groupby.csv'
df = pd.DataFrame({
'category': ['A', 'A', 'B', 'B'],
'value': [10, 20, 30, 40]
})
df.to_csv(csv_path, index=False)
await pandas_tools.read_csv(str(csv_path), name='groupby_test')
result = await pandas_tools.groupby(
'groupby_test',
by='category',
agg={'value': 'sum'}
)
assert 'data_ref' in result
assert result['rows'] == 2 # Two groups: A, B
@pytest.mark.asyncio
async def test_join(pandas_tools, tmp_path):
"""Test joining DataFrames"""
# Create left table
left_path = tmp_path / 'left.csv'
pd.DataFrame({
'id': [1, 2, 3],
'name': ['A', 'B', 'C']
}).to_csv(left_path, index=False)
# Create right table
right_path = tmp_path / 'right.csv'
pd.DataFrame({
'id': [1, 2, 4],
'value': [100, 200, 400]
}).to_csv(right_path, index=False)
await pandas_tools.read_csv(str(left_path), name='left')
await pandas_tools.read_csv(str(right_path), name='right')
result = await pandas_tools.join('left', 'right', on='id', how='inner')
assert 'data_ref' in result
assert result['rows'] == 2 # Only id 1 and 2 match
@pytest.mark.asyncio
async def test_list_data(pandas_tools, temp_csv):
"""Test listing all DataFrames"""
await pandas_tools.read_csv(temp_csv, name='list_test1')
await pandas_tools.read_csv(temp_csv, name='list_test2')
result = await pandas_tools.list_data()
assert result['count'] == 2
refs = [df['ref'] for df in result['dataframes']]
assert 'list_test1' in refs
assert 'list_test2' in refs
@pytest.mark.asyncio
async def test_drop_data(pandas_tools, temp_csv):
"""Test dropping DataFrame"""
await pandas_tools.read_csv(temp_csv, name='drop_test')
result = await pandas_tools.drop_data('drop_test')
assert result['success'] is True
# Verify it's gone
list_result = await pandas_tools.list_data()
refs = [df['ref'] for df in list_result['dataframes']]
assert 'drop_test' not in refs
@pytest.mark.asyncio
async def test_drop_nonexistent(pandas_tools):
"""Test dropping nonexistent DataFrame"""
result = await pandas_tools.drop_data('nonexistent')
assert 'error' in result
@pytest.mark.asyncio
async def test_operations_on_nonexistent(pandas_tools):
"""Test operations on nonexistent data_ref"""
result = await pandas_tools.describe('nonexistent')
assert 'error' in result
result = await pandas_tools.head('nonexistent')
assert 'error' in result
result = await pandas_tools.filter('nonexistent', 'x > 0')
assert 'error' in result

View File

@@ -0,0 +1,338 @@
"""
Unit tests for PostgreSQL MCP tools.
"""
import pytest
from unittest.mock import Mock, AsyncMock, patch, MagicMock
import pandas as pd
@pytest.fixture
def mock_config():
"""Mock configuration"""
return {
'postgres_url': 'postgresql://test:test@localhost:5432/testdb',
'max_rows': 100000
}
@pytest.fixture
def postgres_tools(mock_config):
"""Create PostgresTools instance with mocked config"""
with patch('mcp_server.postgres_tools.load_config', return_value=mock_config):
from mcp_server.postgres_tools import PostgresTools
from mcp_server.data_store import DataStore
# Reset store
store = DataStore.get_instance()
store._dataframes = {}
store._metadata = {}
tools = PostgresTools()
tools.config = mock_config
return tools
@pytest.mark.asyncio
async def test_pg_connect_no_config():
"""Test pg_connect when no PostgreSQL configured"""
with patch('mcp_server.postgres_tools.load_config', return_value={'postgres_url': None}):
from mcp_server.postgres_tools import PostgresTools
tools = PostgresTools()
tools.config = {'postgres_url': None}
result = await tools.pg_connect()
assert result['connected'] is False
assert 'not configured' in result['error'].lower()
@pytest.mark.asyncio
async def test_pg_connect_success(postgres_tools):
"""Test successful pg_connect"""
mock_conn = AsyncMock()
mock_conn.fetchval = AsyncMock(side_effect=[
'PostgreSQL 15.1', # version
'testdb', # database name
'testuser', # user
None # PostGIS check fails
])
mock_conn.close = AsyncMock()
# Create proper async context manager
mock_cm = AsyncMock()
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
# Use AsyncMock for create_pool since it's awaited
with patch('asyncpg.create_pool', new=AsyncMock(return_value=mock_pool)):
postgres_tools.pool = None
result = await postgres_tools.pg_connect()
assert result['connected'] is True
assert result['database'] == 'testdb'
@pytest.mark.asyncio
async def test_pg_query_success(postgres_tools):
"""Test successful pg_query"""
mock_rows = [
{'id': 1, 'name': 'Alice'},
{'id': 2, 'name': 'Bob'}
]
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(return_value=mock_rows)
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_query('SELECT * FROM users', name='users_data')
assert 'data_ref' in result
assert result['rows'] == 2
@pytest.mark.asyncio
async def test_pg_query_empty_result(postgres_tools):
"""Test pg_query with no results"""
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(return_value=[])
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_query('SELECT * FROM empty_table')
assert result['data_ref'] is None
assert result['rows'] == 0
@pytest.mark.asyncio
async def test_pg_execute_success(postgres_tools):
"""Test successful pg_execute"""
mock_conn = AsyncMock()
mock_conn.execute = AsyncMock(return_value='INSERT 0 3')
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_execute('INSERT INTO users VALUES (1, 2, 3)')
assert result['success'] is True
assert result['affected_rows'] == 3
assert result['command'] == 'INSERT'
@pytest.mark.asyncio
async def test_pg_tables(postgres_tools):
"""Test listing tables"""
mock_rows = [
{'table_name': 'users', 'table_type': 'BASE TABLE', 'column_count': 5},
{'table_name': 'orders', 'table_type': 'BASE TABLE', 'column_count': 8}
]
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(return_value=mock_rows)
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_tables(schema='public')
assert result['schema'] == 'public'
assert result['count'] == 2
assert len(result['tables']) == 2
@pytest.mark.asyncio
async def test_pg_columns(postgres_tools):
"""Test getting column info"""
mock_rows = [
{
'column_name': 'id',
'data_type': 'integer',
'udt_name': 'int4',
'is_nullable': 'NO',
'column_default': "nextval('users_id_seq'::regclass)",
'character_maximum_length': None,
'numeric_precision': 32
},
{
'column_name': 'name',
'data_type': 'character varying',
'udt_name': 'varchar',
'is_nullable': 'YES',
'column_default': None,
'character_maximum_length': 255,
'numeric_precision': None
}
]
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(return_value=mock_rows)
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_columns(table='users')
assert result['table'] == 'public.users'
assert result['column_count'] == 2
assert result['columns'][0]['name'] == 'id'
assert result['columns'][0]['nullable'] is False
@pytest.mark.asyncio
async def test_pg_schemas(postgres_tools):
"""Test listing schemas"""
mock_rows = [
{'schema_name': 'public'},
{'schema_name': 'app'}
]
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(return_value=mock_rows)
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_schemas()
assert result['count'] == 2
assert 'public' in result['schemas']
@pytest.mark.asyncio
async def test_st_tables(postgres_tools):
"""Test listing PostGIS tables"""
mock_rows = [
{
'table_name': 'locations',
'geometry_column': 'geom',
'geometry_type': 'POINT',
'srid': 4326,
'coord_dimension': 2
}
]
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(return_value=mock_rows)
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.st_tables()
assert result['count'] == 1
assert result['postgis_tables'][0]['table'] == 'locations'
assert result['postgis_tables'][0]['srid'] == 4326
@pytest.mark.asyncio
async def test_st_tables_no_postgis(postgres_tools):
"""Test st_tables when PostGIS not installed"""
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(side_effect=Exception("relation \"geometry_columns\" does not exist"))
# Create proper async context manager
mock_cm = AsyncMock()
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
postgres_tools.pool = mock_pool
result = await postgres_tools.st_tables()
assert 'error' in result
assert 'PostGIS' in result['error']
@pytest.mark.asyncio
async def test_st_extent(postgres_tools):
"""Test getting geometry bounding box"""
mock_row = {
'xmin': -122.5,
'ymin': 37.5,
'xmax': -122.0,
'ymax': 38.0
}
mock_conn = AsyncMock()
mock_conn.fetchrow = AsyncMock(return_value=mock_row)
mock_pool = AsyncMock()
mock_pool.acquire = MagicMock(return_value=AsyncMock(
__aenter__=AsyncMock(return_value=mock_conn),
__aexit__=AsyncMock()
))
postgres_tools.pool = mock_pool
result = await postgres_tools.st_extent(table='locations', column='geom')
assert result['bbox']['xmin'] == -122.5
assert result['bbox']['ymax'] == 38.0
@pytest.mark.asyncio
async def test_error_handling(postgres_tools):
"""Test error handling for database errors"""
mock_conn = AsyncMock()
mock_conn.fetch = AsyncMock(side_effect=Exception("Connection refused"))
# Create proper async context manager
mock_cm = AsyncMock()
mock_cm.__aenter__ = AsyncMock(return_value=mock_conn)
mock_cm.__aexit__ = AsyncMock(return_value=None)
mock_pool = MagicMock()
mock_pool.acquire = MagicMock(return_value=mock_cm)
postgres_tools.pool = mock_pool
result = await postgres_tools.pg_query('SELECT 1')
assert 'error' in result
assert 'Connection refused' in result['error']

View File

@@ -0,0 +1,23 @@
{
"name": "data-platform",
"version": "1.0.0",
"description": "Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration",
"author": {
"name": "Leo Miranda",
"email": "leobmiranda@gmail.com"
},
"homepage": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace/src/branch/main/plugins/data-platform/README.md",
"repository": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace.git",
"license": "MIT",
"keywords": [
"pandas",
"postgresql",
"postgis",
"dbt",
"data-engineering",
"etl",
"dataframe"
],
"commands": ["./commands/"],
"mcpServers": ["./.mcp.json"]
}

View File

@@ -0,0 +1,10 @@
{
"mcpServers": {
"data-platform": {
"type": "stdio",
"command": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform/.venv/bin/python",
"args": ["-m", "mcp_server.server"],
"cwd": "${CLAUDE_PLUGIN_ROOT}/mcp-servers/data-platform"
}
}
}

View File

@@ -0,0 +1,119 @@
# data-platform Plugin
Data engineering tools with pandas, PostgreSQL/PostGIS, and dbt integration for Claude Code.
## Features
- **pandas Operations**: Load, transform, and export DataFrames with persistent data_ref system
- **PostgreSQL/PostGIS**: Database queries with connection pooling and spatial data support
- **dbt Integration**: Build tool wrapper with pre-execution validation
## Installation
This plugin is part of the leo-claude-mktplace. Install via:
```bash
# From marketplace
claude plugins install leo-claude-mktplace/data-platform
# Setup MCP server venv
cd ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## Configuration
### PostgreSQL (Optional)
Create `~/.config/claude/postgres.env`:
```env
POSTGRES_URL=postgresql://user:password@host:5432/database
```
### dbt (Optional)
Add to project `.env`:
```env
DBT_PROJECT_DIR=/path/to/dbt/project
DBT_PROFILES_DIR=~/.dbt
```
## Commands
| Command | Description |
|---------|-------------|
| `/initial-setup` | Interactive setup wizard for PostgreSQL and dbt configuration |
| `/ingest` | Load data from files or database |
| `/profile` | Generate data profile and statistics |
| `/schema` | Show database/DataFrame schema |
| `/explain` | Explain dbt model lineage |
| `/lineage` | Visualize data dependencies |
| `/run` | Execute dbt models |
## Agents
| Agent | Description |
|-------|-------------|
| `data-ingestion` | Data loading and transformation specialist |
| `data-analysis` | Exploration and profiling specialist |
## data_ref System
All DataFrame operations use a `data_ref` system for persistence:
```
# Load returns a reference
read_csv("data.csv") → {"data_ref": "sales_data"}
# Use reference in subsequent operations
filter("sales_data", "amount > 100") → {"data_ref": "sales_data_filtered"}
describe("sales_data_filtered") → {statistics}
```
## Example Workflow
```
/ingest data/sales.csv
# → Loaded 50,000 rows as "sales_data"
/profile sales_data
# → Statistical summary, null counts, quality assessment
/schema orders
# → Column names, types, constraints
/lineage fct_orders
# → Dependency graph showing upstream/downstream models
/run dim_customers
# → Pre-validates then executes dbt model
```
## Tools Summary
### pandas (14 tools)
`read_csv`, `read_parquet`, `read_json`, `to_csv`, `to_parquet`, `describe`, `head`, `tail`, `filter`, `select`, `groupby`, `join`, `list_data`, `drop_data`
### PostgreSQL (6 tools)
`pg_connect`, `pg_query`, `pg_execute`, `pg_tables`, `pg_columns`, `pg_schemas`
### PostGIS (4 tools)
`st_tables`, `st_geometry_type`, `st_srid`, `st_extent`
### dbt (8 tools)
`dbt_parse`, `dbt_run`, `dbt_test`, `dbt_build`, `dbt_compile`, `dbt_ls`, `dbt_docs_generate`, `dbt_lineage`
## Memory Management
- Default limit: 100,000 rows per DataFrame
- Configure via `DATA_PLATFORM_MAX_ROWS` environment variable
- Use `chunk_size` parameter for large files
- Monitor with `list_data` tool
## SessionStart Hook
On session start, the plugin checks PostgreSQL connectivity and displays a warning if unavailable. This is non-blocking - pandas and dbt tools remain available.

View File

@@ -0,0 +1,98 @@
# Data Analysis Agent
You are a data analysis specialist. Your role is to help users explore, profile, and understand their data.
## Capabilities
- Profile datasets with statistical summaries
- Explore database schemas and structures
- Analyze dbt model lineage and dependencies
- Provide data quality assessments
- Generate insights and recommendations
## Available Tools
### Data Exploration
- `describe` - Statistical summary
- `head` - Preview first rows
- `tail` - Preview last rows
- `list_data` - List available DataFrames
### Database Exploration
- `pg_connect` - Check database connection
- `pg_tables` - List all tables
- `pg_columns` - Get column details
- `pg_schemas` - List schemas
### PostGIS Exploration
- `st_tables` - List spatial tables
- `st_geometry_type` - Get geometry type
- `st_srid` - Get coordinate system
- `st_extent` - Get bounding box
### dbt Analysis
- `dbt_lineage` - Model dependencies
- `dbt_ls` - List resources
- `dbt_compile` - View compiled SQL
- `dbt_docs_generate` - Generate docs
## Workflow Guidelines
1. **Understand the question**:
- What does the user want to know?
- What data is available?
- What level of detail is needed?
2. **Explore the data**:
- Start with `list_data` or `pg_tables`
- Get schema info with `describe` or `pg_columns`
- Preview with `head` to understand content
3. **Profile thoroughly**:
- Use `describe` for statistics
- Check for nulls, outliers, patterns
- Note data quality issues
4. **Analyze dependencies** (for dbt):
- Use `dbt_lineage` to trace data flow
- Understand transformations
- Identify critical paths
5. **Provide insights**:
- Summarize findings clearly
- Highlight potential issues
- Recommend next steps
## Analysis Patterns
### Data Quality Check
1. `describe` - Get statistics
2. Check null percentages
3. Identify outliers (min/max vs mean)
4. Flag suspicious patterns
### Schema Comparison
1. `pg_columns` - Get table A schema
2. `pg_columns` - Get table B schema
3. Compare column names, types
4. Identify mismatches
### Lineage Analysis
1. `dbt_lineage` - Get model graph
2. Trace upstream sources
3. Identify downstream impact
4. Document critical path
## Example Interactions
**User**: What's in the sales_data DataFrame?
**Agent**: Uses `describe`, `head`, explains columns, statistics, patterns
**User**: What tables are in the database?
**Agent**: Uses `pg_tables`, shows list with column counts
**User**: How does the dim_customers model work?
**Agent**: Uses `dbt_lineage`, `dbt_compile`, explains dependencies and SQL
**User**: Is there any spatial data?
**Agent**: Uses `st_tables`, shows PostGIS tables with geometry types

View File

@@ -0,0 +1,81 @@
# Data Ingestion Agent
You are a data ingestion specialist. Your role is to help users load, transform, and prepare data for analysis.
## Capabilities
- Load data from CSV, Parquet, JSON files
- Query PostgreSQL databases
- Transform data using filter, select, groupby, join operations
- Export data to various formats
- Handle large datasets with chunking
## Available Tools
### File Operations
- `read_csv` - Load CSV files with optional chunking
- `read_parquet` - Load Parquet files
- `read_json` - Load JSON/JSONL files
- `to_csv` - Export to CSV
- `to_parquet` - Export to Parquet
### Data Transformation
- `filter` - Filter rows by condition
- `select` - Select specific columns
- `groupby` - Group and aggregate
- `join` - Join two DataFrames
### Database Operations
- `pg_query` - Execute SELECT queries
- `pg_execute` - Execute INSERT/UPDATE/DELETE
- `pg_tables` - List available tables
### Management
- `list_data` - List all stored DataFrames
- `drop_data` - Remove DataFrame from store
## Workflow Guidelines
1. **Understand the data source**:
- Ask about file location/format
- For database, understand table structure
- Clarify any filters or transformations needed
2. **Load data efficiently**:
- Use appropriate reader for file format
- For large files (>100k rows), use chunking
- Name DataFrames meaningfully
3. **Transform as needed**:
- Apply filters early to reduce data size
- Select only needed columns
- Join related datasets
4. **Validate results**:
- Check row counts after transformations
- Verify data types are correct
- Preview results with `head`
5. **Store with meaningful names**:
- Use descriptive data_ref names
- Document the source and transformations
## Memory Management
- Default row limit: 100,000 rows
- For larger datasets, suggest:
- Filtering before loading
- Using chunk_size parameter
- Aggregating to reduce size
- Storing to Parquet for efficient retrieval
## Example Interactions
**User**: Load the sales data from data/sales.csv
**Agent**: Uses `read_csv` to load, reports data_ref, row count, columns
**User**: Filter to only Q4 2024 sales
**Agent**: Uses `filter` with date condition, stores filtered result
**User**: Join with customer data
**Agent**: Uses `join` to combine, validates result counts

View File

@@ -0,0 +1,90 @@
# data-platform Plugin - CLAUDE.md Integration
Add this section to your project's CLAUDE.md to enable data-platform plugin features.
## Suggested CLAUDE.md Section
```markdown
## Data Platform Integration
This project uses the data-platform plugin for data engineering workflows.
### Configuration
**PostgreSQL**: Credentials in `~/.config/claude/postgres.env`
**dbt**: Project path auto-detected from `dbt_project.yml`
### Available Commands
| Command | Purpose |
|---------|---------|
| `/ingest` | Load data from files or database |
| `/profile` | Generate statistical profile |
| `/schema` | Show schema information |
| `/explain` | Explain dbt model |
| `/lineage` | Show data lineage |
| `/run` | Execute dbt models |
### data_ref Convention
DataFrames are stored with references. Use meaningful names:
- `raw_*` for source data
- `stg_*` for staged/cleaned data
- `dim_*` for dimension tables
- `fct_*` for fact tables
- `rpt_*` for reports
### dbt Workflow
1. Always validate before running: `/run` includes automatic `dbt_parse`
2. For dbt 1.9+, check for deprecated syntax before commits
3. Use `/lineage` to understand impact of changes
### Database Access
PostgreSQL tools require POSTGRES_URL configuration:
- Read-only queries: `pg_query`
- Write operations: `pg_execute`
- Schema exploration: `pg_tables`, `pg_columns`
PostGIS spatial data:
- List spatial tables: `st_tables`
- Check geometry: `st_geometry_type`, `st_srid`, `st_extent`
```
## Environment Variables
Add to project `.env` if needed:
```env
# dbt configuration
DBT_PROJECT_DIR=./transform
DBT_PROFILES_DIR=~/.dbt
# Memory limits
DATA_PLATFORM_MAX_ROWS=100000
```
## Typical Workflows
### Data Exploration
```
/ingest data/raw_customers.csv
/profile raw_customers
/schema
```
### ETL Development
```
/schema orders # Understand source
/explain stg_orders # Understand transformation
/run stg_orders # Test the model
/lineage fct_orders # Check downstream impact
```
### Database Analysis
```
/schema # List all tables
pg_columns orders # Detailed schema
st_tables # Find spatial data
```

View File

@@ -0,0 +1,44 @@
# /explain - dbt Model Explanation
Explain a dbt model's purpose, dependencies, and SQL logic.
## Usage
```
/explain <model_name>
```
## Workflow
1. **Get model info**:
- Use `dbt_lineage` to get model metadata
- Extract description, tags, materialization
2. **Analyze dependencies**:
- Show upstream models (what this depends on)
- Show downstream models (what depends on this)
- Visualize as dependency tree
3. **Compile SQL**:
- Use `dbt_compile` to get rendered SQL
- Explain key transformations
4. **Report**:
- Model purpose (from description)
- Materialization strategy
- Dependency graph
- Key SQL logic explained
## Examples
```
/explain dim_customers
/explain fct_orders
```
## Available Tools
Use these MCP tools:
- `dbt_lineage` - Get model dependencies
- `dbt_compile` - Get compiled SQL
- `dbt_ls` - List related resources

View File

@@ -0,0 +1,44 @@
# /ingest - Data Ingestion
Load data from files or database into the data platform.
## Usage
```
/ingest [source]
```
## Workflow
1. **Identify data source**:
- If source is a file path, determine format (CSV, Parquet, JSON)
- If source is "db" or a table name, query PostgreSQL
2. **Load data**:
- For files: Use `read_csv`, `read_parquet`, or `read_json`
- For database: Use `pg_query` with appropriate SELECT
3. **Validate**:
- Check row count against limits
- If exceeds 100k rows, suggest chunking or filtering
4. **Report**:
- Show data_ref, row count, columns, and memory usage
- Preview first few rows
## Examples
```
/ingest data/sales.csv
/ingest data/customers.parquet
/ingest "SELECT * FROM orders WHERE created_at > '2024-01-01'"
```
## Available Tools
Use these MCP tools:
- `read_csv` - Load CSV files
- `read_parquet` - Load Parquet files
- `read_json` - Load JSON/JSONL files
- `pg_query` - Query PostgreSQL database
- `list_data` - List loaded DataFrames

View File

@@ -0,0 +1,231 @@
---
description: Interactive setup wizard for data-platform plugin - configures MCP server and optional PostgreSQL/dbt
---
# Data Platform Setup Wizard
This command sets up the data-platform plugin with pandas, PostgreSQL, and dbt integration.
## Important Context
- **This command uses Bash, Read, Write, and AskUserQuestion tools** - NOT MCP tools
- **MCP tools won't work until after setup + session restart**
- **PostgreSQL and dbt are optional** - pandas tools work without them
---
## Phase 1: Environment Validation
### Step 1.1: Check Python Version
```bash
python3 --version
```
Requires Python 3.10+. If below, stop setup and inform user.
### Step 1.2: Check for Required Libraries
```bash
python3 -c "import sys; print(f'Python {sys.version_info.major}.{sys.version_info.minor}')"
```
---
## Phase 2: MCP Server Setup
### Step 2.1: Locate Data Platform MCP Server
The MCP server should be at the marketplace root:
```bash
# If running from installed marketplace
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_INSTALLED"
# If running from source
ls -la ~/claude-plugins-work/mcp-servers/data-platform/ 2>/dev/null || echo "NOT_FOUND_SOURCE"
```
Determine the correct path based on which exists.
### Step 2.2: Check Virtual Environment
```bash
ls -la /path/to/mcp-servers/data-platform/.venv/bin/python 2>/dev/null && echo "VENV_EXISTS" || echo "VENV_MISSING"
```
### Step 2.3: Create Virtual Environment (if missing)
```bash
cd /path/to/mcp-servers/data-platform && python3 -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install -r requirements.txt && deactivate
```
**Note:** This may take a few minutes due to pandas, pyarrow, and dbt dependencies.
---
## Phase 3: PostgreSQL Configuration (Optional)
### Step 3.1: Ask About PostgreSQL
Use AskUserQuestion:
- Question: "Do you want to configure PostgreSQL database access?"
- Header: "PostgreSQL"
- Options:
- "Yes, I have a PostgreSQL database"
- "No, I'll only use pandas/dbt tools"
**If user chooses "No":** Skip to Phase 4.
### Step 3.2: Create Config Directory
```bash
mkdir -p ~/.config/claude
```
### Step 3.3: Check PostgreSQL Configuration
```bash
cat ~/.config/claude/postgres.env 2>/dev/null || echo "FILE_NOT_FOUND"
```
**If file exists with valid URL:** Skip to Step 3.6.
**If missing or has placeholders:** Continue.
### Step 3.4: Gather PostgreSQL Information
Use AskUserQuestion:
- Question: "What is your PostgreSQL connection URL format?"
- Header: "DB Format"
- Options:
- "Standard: postgresql://user:pass@host:5432/db"
- "PostGIS: postgresql://user:pass@host:5432/db (with PostGIS extension)"
- "Other (I'll provide the full URL)"
Ask user to provide the connection URL.
### Step 3.5: Create Configuration File
```bash
cat > ~/.config/claude/postgres.env << 'EOF'
# PostgreSQL Configuration
# Generated by data-platform /initial-setup
POSTGRES_URL=<USER_PROVIDED_URL>
EOF
chmod 600 ~/.config/claude/postgres.env
```
### Step 3.6: Test PostgreSQL Connection (if configured)
```bash
source ~/.config/claude/postgres.env && python3 -c "
import asyncio
import asyncpg
async def test():
try:
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
ver = await conn.fetchval('SELECT version()')
await conn.close()
print(f'SUCCESS: {ver.split(\",\")[0]}')
except Exception as e:
print(f'FAILED: {e}')
asyncio.run(test())
"
```
Report result:
- SUCCESS: Connection works
- FAILED: Show error and suggest fixes
---
## Phase 4: dbt Configuration (Optional)
### Step 4.1: Ask About dbt
Use AskUserQuestion:
- Question: "Do you use dbt for data transformations in your projects?"
- Header: "dbt"
- Options:
- "Yes, I have dbt projects"
- "No, I don't use dbt"
**If user chooses "No":** Skip to Phase 5.
### Step 4.2: dbt Discovery
dbt configuration is **project-level** (not system-level). The plugin auto-detects dbt projects by looking for `dbt_project.yml`.
Inform user:
```
dbt projects are detected automatically when you work in a directory
containing dbt_project.yml.
If your dbt project is in a subdirectory, you can set DBT_PROJECT_DIR
in your project's .env file:
DBT_PROJECT_DIR=./transform
DBT_PROFILES_DIR=~/.dbt
```
### Step 4.3: Check dbt Installation
```bash
dbt --version 2>/dev/null || echo "DBT_NOT_FOUND"
```
**If not found:** Inform user that dbt CLI tools require dbt-core to be installed globally or in the project.
---
## Phase 5: Validation
### Step 5.1: Verify MCP Server
```bash
cd /path/to/mcp-servers/data-platform && .venv/bin/python -c "from mcp_server.server import DataPlatformMCPServer; print('MCP Server OK')"
```
### Step 5.2: Summary
```
╔════════════════════════════════════════════════════════════╗
║ DATA-PLATFORM SETUP COMPLETE ║
╠════════════════════════════════════════════════════════════╣
║ MCP Server: ✓ Ready ║
║ pandas Tools: ✓ Available (14 tools) ║
║ PostgreSQL Tools: [✓/✗] [Status based on config] ║
║ PostGIS Tools: [✓/✗] [Status based on PostGIS] ║
║ dbt Tools: [✓/✗] [Status based on discovery] ║
╚════════════════════════════════════════════════════════════╝
```
### Step 5.3: Session Restart Notice
---
**⚠️ Session Restart Required**
Restart your Claude Code session for MCP tools to become available.
**After restart, you can:**
- Run `/ingest` to load data from files or database
- Run `/profile` to analyze DataFrame statistics
- Run `/schema` to explore database/DataFrame schema
- Run `/run` to execute dbt models (if configured)
- Run `/lineage` to view dbt model dependencies
---
## Memory Limits
The data-platform plugin has a default row limit of 100,000 rows per DataFrame. For larger datasets:
- Use chunked processing (`chunk_size` parameter)
- Filter data before loading
- Store to Parquet for efficient re-loading
You can override the limit by setting in your project `.env`:
```
DATA_PLATFORM_MAX_ROWS=500000
```

View File

@@ -0,0 +1,60 @@
# /lineage - Data Lineage Visualization
Show data lineage for dbt models or database tables.
## Usage
```
/lineage <model_name> [--depth N]
```
## Workflow
1. **Get lineage data**:
- Use `dbt_lineage` for dbt models
- For database tables, trace through dbt manifest
2. **Build lineage graph**:
- Identify all upstream sources
- Identify all downstream consumers
- Note materialization at each node
3. **Visualize**:
- ASCII art dependency tree
- List format with indentation
- Show depth levels
4. **Report**:
- Full dependency chain
- Critical path identification
- Refresh implications
## Examples
```
/lineage dim_customers
/lineage fct_orders --depth 3
```
## Output Format
```
Sources:
└── raw_customers (source)
└── raw_orders (source)
dim_customers (table)
├── upstream:
│ └── stg_customers (view)
│ └── raw_customers (source)
└── downstream:
└── fct_orders (incremental)
└── rpt_customer_lifetime (table)
```
## Available Tools
Use these MCP tools:
- `dbt_lineage` - Get model dependencies
- `dbt_ls` - List dbt resources
- `dbt_docs_generate` - Generate full manifest

View File

@@ -0,0 +1,44 @@
# /profile - Data Profiling
Generate statistical profile and quality report for a DataFrame.
## Usage
```
/profile <data_ref>
```
## Workflow
1. **Get data reference**:
- If no data_ref provided, use `list_data` to show available options
- Validate the data_ref exists
2. **Generate profile**:
- Use `describe` for statistical summary
- Analyze null counts, unique values, data types
3. **Quality assessment**:
- Identify columns with high null percentage
- Flag potential data quality issues
- Suggest cleaning operations if needed
4. **Report**:
- Summary statistics per column
- Data type distribution
- Memory usage
- Quality score
## Examples
```
/profile sales_data
/profile df_a1b2c3d4
```
## Available Tools
Use these MCP tools:
- `describe` - Get statistical summary
- `head` - Preview first rows
- `list_data` - List available DataFrames

View File

@@ -0,0 +1,55 @@
# /run - Execute dbt Models
Run dbt models with automatic pre-validation.
## Usage
```
/run [model_selection] [--full-refresh]
```
## Workflow
1. **Pre-validation** (MANDATORY):
- Use `dbt_parse` to validate project
- Check for deprecated syntax (dbt 1.9+)
- If validation fails, show errors and STOP
2. **Execute models**:
- Use `dbt_run` with provided selection
- Monitor progress and capture output
3. **Report results**:
- Success/failure status per model
- Execution time
- Row counts where available
- Any warnings or errors
## Examples
```
/run # Run all models
/run dim_customers # Run specific model
/run +fct_orders # Run model and its upstream
/run tag:daily # Run models with tag
/run --full-refresh # Rebuild incremental models
```
## Selection Syntax
| Pattern | Meaning |
|---------|---------|
| `model_name` | Run single model |
| `+model_name` | Run model and upstream |
| `model_name+` | Run model and downstream |
| `+model_name+` | Run model with all deps |
| `tag:name` | Run by tag |
| `path:models/staging` | Run by path |
## Available Tools
Use these MCP tools:
- `dbt_parse` - Pre-validation (ALWAYS RUN FIRST)
- `dbt_run` - Execute models
- `dbt_build` - Run + test
- `dbt_test` - Run tests only

View File

@@ -0,0 +1,48 @@
# /schema - Schema Exploration
Display schema information for database tables or DataFrames.
## Usage
```
/schema [table_name | data_ref]
```
## Workflow
1. **Determine target**:
- If argument is a loaded data_ref, show DataFrame schema
- If argument is a table name, query database schema
- If no argument, list all available tables and DataFrames
2. **For DataFrames**:
- Use `describe` to get column info
- Show dtypes, null counts, sample values
3. **For database tables**:
- Use `pg_columns` for column details
- Use `st_tables` to check for PostGIS columns
- Show constraints and indexes if available
4. **Report**:
- Column name, type, nullable, default
- For PostGIS: geometry type, SRID
- For DataFrames: pandas dtype, null percentage
## Examples
```
/schema # List all tables and DataFrames
/schema customers # Show table schema
/schema sales_data # Show DataFrame schema
```
## Available Tools
Use these MCP tools:
- `pg_tables` - List database tables
- `pg_columns` - Get column info
- `pg_schemas` - List schemas
- `st_tables` - List PostGIS tables
- `describe` - Get DataFrame info
- `list_data` - List DataFrames

View File

@@ -0,0 +1,10 @@
{
"hooks": {
"SessionStart": [
{
"type": "command",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/startup-check.sh"
}
]
}
}

View File

@@ -0,0 +1,54 @@
#!/bin/bash
# data-platform startup check hook
# Checks for common issues at session start
# All output MUST have [data-platform] prefix
PREFIX="[data-platform]"
# Check if MCP venv exists
PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT:-$(dirname "$(dirname "$(realpath "$0")")")}"
VENV_PATH="$PLUGIN_ROOT/mcp-servers/data-platform/.venv/bin/python"
if [[ ! -f "$VENV_PATH" ]]; then
echo "$PREFIX MCP venv missing - run /initial-setup or setup.sh"
exit 0
fi
# Check PostgreSQL configuration (optional - just warn if configured but failing)
POSTGRES_CONFIG="$HOME/.config/claude/postgres.env"
if [[ -f "$POSTGRES_CONFIG" ]]; then
source "$POSTGRES_CONFIG"
if [[ -n "${POSTGRES_URL:-}" ]]; then
# Quick connection test (5 second timeout)
RESULT=$("$VENV_PATH" -c "
import asyncio
import sys
async def test():
try:
import asyncpg
conn = await asyncpg.connect('$POSTGRES_URL', timeout=5)
await conn.close()
return 'OK'
except Exception as e:
return f'FAIL: {e}'
print(asyncio.run(test()))
" 2>/dev/null || echo "FAIL: asyncpg not installed")
if [[ "$RESULT" == "OK" ]]; then
# PostgreSQL OK - say nothing
:
elif [[ "$RESULT" == *"FAIL"* ]]; then
echo "$PREFIX PostgreSQL connection failed - check POSTGRES_URL"
fi
fi
fi
# Check dbt project (if in a project with dbt_project.yml)
if [[ -f "dbt_project.yml" ]] || [[ -f "transform/dbt_project.yml" ]]; then
if ! command -v dbt &> /dev/null; then
echo "$PREFIX dbt CLI not found - dbt tools unavailable"
fi
fi
# All checks passed - say nothing
exit 0

View File

@@ -0,0 +1 @@
../../../mcp-servers/data-platform

View File

@@ -1,6 +1,6 @@
#!/bin/bash
# doc-guardian notification hook
# Outputs a single notification for config file changes, nothing otherwise
# Tracks documentation dependencies and queues updates
# This is a command hook - guaranteed not to block workflow
# Read tool input from stdin (JSON with file_path)
@@ -14,10 +14,45 @@ if [ -z "$FILE_PATH" ]; then
exit 0
fi
# Check if file is in a config directory (commands/, agents/, skills/, hooks/)
if echo "$FILE_PATH" | grep -qE '/(commands|agents|skills|hooks)/'; then
echo "[doc-guardian] Config file modified. Run /doc-sync when ready."
# Define documentation dependency mappings
# When these directories change, these docs need updating
declare -A DOC_DEPS
DOC_DEPS["commands"]="docs/COMMANDS-CHEATSHEET.md README.md"
DOC_DEPS["agents"]="README.md CLAUDE.md"
DOC_DEPS["hooks"]="docs/COMMANDS-CHEATSHEET.md README.md"
DOC_DEPS["skills"]="README.md"
DOC_DEPS[".claude-plugin"]="CLAUDE.md .claude-plugin/marketplace.json"
DOC_DEPS["mcp-servers"]="docs/COMMANDS-CHEATSHEET.md CLAUDE.md"
# Check which config directory was modified
MODIFIED_TYPE=""
for dir in commands agents hooks skills .claude-plugin mcp-servers; do
if echo "$FILE_PATH" | grep -qE "/${dir}/|^${dir}/"; then
MODIFIED_TYPE="$dir"
break
fi
done
# Exit silently if not a tracked config directory
if [ -z "$MODIFIED_TYPE" ]; then
exit 0
fi
# Exit silently for all other files (no output = no blocking)
# Get the dependent docs
DEPENDENT_DOCS="${DOC_DEPS[$MODIFIED_TYPE]}"
# Queue file for tracking pending updates
QUEUE_FILE="${CLAUDE_PROJECT_ROOT:-.}/.doc-guardian-queue"
# Add to queue (create if doesn't exist, append if does)
{
echo "$(date +%Y-%m-%dT%H:%M:%S) | $MODIFIED_TYPE | $FILE_PATH | $DEPENDENT_DOCS"
} >> "$QUEUE_FILE" 2>/dev/null || true
# Count pending updates
PENDING_COUNT=$(wc -l < "$QUEUE_FILE" 2>/dev/null | tr -d ' ' || echo "1")
# Output notification with specific docs that need updating
echo "[doc-guardian] $MODIFIED_TYPE changed → update needed: $DEPENDENT_DOCS (${PENDING_COUNT} pending)"
exit 0

View File

@@ -114,6 +114,17 @@ Check current sprint progress.
**When to use:** Daily standup, progress check, deciding what to work on next
### `/proposal-status`
View proposal and implementation hierarchy.
**What it does:**
- Lists all change proposals from Gitea Wiki
- Shows implementations under each proposal with status
- Displays linked issues and lessons learned
- Tree-style formatted output
**When to use:** Review progress on multi-sprint features, track proposal lifecycle
### `/sprint-close`
Complete sprint and capture lessons learned.
@@ -468,6 +479,7 @@ projman/
│ ├── sprint-start.md
│ ├── sprint-status.md
│ ├── sprint-close.md
│ ├── proposal-status.md
│ ├── labels-sync.md
│ ├── initial-setup.md
│ ├── project-init.md

View File

@@ -357,6 +357,11 @@ Let's capture lessons learned. I'll ask some questions:
```markdown
# Sprint {N} - {Clear Title}
## Metadata
- **Implementation:** [Change VXX.X.X (Impl N)](wiki-link)
- **Issues:** #XX, #XX
- **Sprint:** Sprint N
## Context
Brief background - what were you doing?
@@ -373,17 +378,131 @@ How can future sprints avoid this or optimize it?
technology, component, issue-type, pattern
```
**IMPORTANT:** Always include the Metadata section with implementation link for traceability.
**D. Save to Gitea Wiki**
Include the implementation reference in lesson content:
```
create_lesson(
title="Sprint 18 - Claude Code Infinite Loop on Validation Errors",
content="[Full lesson content]",
content="""
# Sprint 18 - Claude Code Infinite Loop on Validation Errors
## Metadata
- **Implementation:** [Change V1.2.0 (Impl 1)](wiki-link)
- **Issues:** #45, #46
- **Sprint:** Sprint 18
## Context
[Lesson context...]
## Problem
[What went wrong...]
## Solution
[How it was solved...]
## Prevention
[How to avoid in future...]
## Tags
testing, claude-code, validation, python
""",
tags=["testing", "claude-code", "validation", "python"],
category="sprints"
)
```
**E. Git Operations**
**E. Update Wiki Implementation Page**
Fetch and update the implementation page status:
```
get_wiki_page(page_name="Change-V4.1.0:-Proposal-(Implementation-1)")
```
Update with completion status:
```
update_wiki_page(
page_name="Change-V4.1.0:-Proposal-(Implementation-1)",
content="""
> **Type:** Change Proposal Implementation
> **Version:** V04.1.0
> **Status:** Implemented ✅
> **Date:** 2026-01-26
> **Completed:** 2026-01-28
> **Origin:** [Proposal](wiki-link)
> **Sprint:** Sprint 17
# Implementation Details
[Original content...]
## Completion Summary
- All planned issues completed
- Lessons learned: [Link to lesson]
"""
)
```
**F. Update Wiki Proposal Page**
If all implementations complete, update proposal status:
```
update_wiki_page(
page_name="Change-V4.1.0:-Proposal",
content="""
> **Type:** Change Proposal
> **Version:** V04.1.0
> **Status:** Implemented ✅
> **Date:** 2026-01-26
# Feature Title
[Original content...]
## Implementations
- [Implementation 1](link) - ✅ Completed (Sprint 17)
"""
)
```
**G. Update CHANGELOG (MANDATORY)**
Add all sprint changes to `[Unreleased]` section:
```markdown
## [Unreleased]
### Added
- **projman:** New feature description
- **plugin-name:** Another feature
### Changed
- **projman:** Modified behavior
### Fixed
- **plugin-name:** Bug fix description
```
**IMPORTANT:** Never skip this step. Every sprint must update CHANGELOG.
**H. Version Check**
Run `/suggest-version` to analyze CHANGELOG and recommend version bump:
```
/suggest-version
```
If release is warranted:
```bash
./scripts/release.sh X.Y.Z
```
This ensures version numbers stay in sync:
- README.md title
- .claude-plugin/marketplace.json
- Git tags
- CHANGELOG.md section header
**I. Git Operations**
Offer to handle git cleanup:
```
@@ -418,7 +537,9 @@ Would you like me to handle git operations?
**Lessons Learned Tools (Gitea Wiki):**
- `search_lessons(query, tags, limit)` - Find relevant past lessons
- `create_lesson(title, content, tags, category)` - Save new lesson
- `get_wiki_page(page_name)` - Fetch specific pages
- `get_wiki_page(page_name)` - Fetch implementation/proposal pages
- `update_wiki_page(page_name, content)` - Update implementation/proposal status
- `list_wiki_pages()` - List all wiki pages
**Validation Tools:**
- `get_branch_protection(branch)` - Check merge rules
@@ -455,6 +576,10 @@ Would you like me to handle git operations?
8. **Auto-check subtasks** - Mark issue subtasks complete on close
9. **Track meticulously** - Update issues immediately, document blockers
10. **Capture lessons** - At sprint close, interview thoroughly
11. **Update wiki status** - At sprint close, update implementation and proposal pages
12. **Link lessons to wiki** - Include lesson links in implementation completion summary
13. **Update CHANGELOG** - MANDATORY at sprint close, never skip
14. **Run suggest-version** - Check if release is needed after CHANGELOG update
## Your Mission

View File

@@ -122,23 +122,34 @@ get_labels(repo="owner/repo")
- Use `create_label` to create them
- Report which labels were created
### 4. docs/changes/ Folder Check
### 4. Input Source Detection
Verify the project has a `docs/changes/` folder for sprint input files.
Detect where the planning input is coming from:
**If folder exists:**
- Check for relevant change files for current sprint
- Reference these files during planning
| Source | Detection | Action |
|--------|-----------|--------|
| **Local file** | `docs/changes/*.md` exists | Parse frontmatter, migrate to wiki, delete local |
| **Existing wiki** | `Change VXX.X.X: Proposal` exists | Use as-is, create new implementation page |
| **Conversation** | Neither file nor wiki exists | Create wiki from discussion context |
**If folder does NOT exist:**
- Prompt user: "Your project doesn't have a `docs/changes/` folder. This folder stores sprint planning inputs and decisions. Would you like me to create it?"
- If user agrees, create the folder structure
**Input File Format** (if using local file):
```yaml
---
version: "4.1.0" # or "sprint-17" for internal work
title: "Feature Name"
plugin: plugin-name # optional
type: feature # feature | bugfix | refactor | infra
---
**If sprint starts with discussion but no input file:**
- Capture the discussion outputs
- Create a change file: `docs/changes/sprint-XX-description.md`
- Structure the file to meet Claude Code standards (concise, focused, actionable)
- Then proceed with sprint planning using that file
# Feature Description
[Free-form content...]
```
**Detection Steps:**
1. Check for `docs/changes/*.md` files with valid frontmatter
2. Use `list_wiki_pages()` to check for existing proposal
3. If neither found, use conversation context
4. If ambiguous (multiple sources), ask user which to use
## Your Responsibilities
@@ -161,7 +172,30 @@ Great! Let me ask a few questions to understand the scope:
5. Should this integrate with existing systems?
```
### 2. Search Relevant Lessons Learned
### 2. Detect Input Source
Before proceeding, identify where the planning input is:
```
# Check for local files
ls docs/changes/*.md
# Check for existing wiki proposal
list_wiki_pages() → filter for "Change V" prefix
```
**Report to user:**
```
Input source detected:
✓ Found: docs/changes/v4.1.0-wiki-planning.md
- Version: 4.1.0
- Title: Wiki-Based Planning Workflow
- Type: feature
I'll use this as the planning input. Proceed? (y/n)
```
### 3. Search Relevant Lessons Learned
**ALWAYS search for past lessons** before planning:
@@ -192,7 +226,59 @@ I searched previous sprint lessons and found these relevant insights:
I'll keep these in mind while planning this sprint.
```
### 3. Architecture Analysis
### 4. Create Wiki Proposal and Implementation Pages
After detecting input and searching lessons, create the wiki structure:
**Create/Update Proposal Page:**
```
# If no proposal exists for this version:
create_wiki_page(
title="Change V4.1.0: Proposal",
content="""
> **Type:** Change Proposal
> **Version:** V04.1.0
> **Plugin:** projman
> **Status:** In Progress
> **Date:** 2026-01-26
# Feature Title
[Content migrated from input source]
## Implementations
- [Implementation 1](link) - Current sprint
"""
)
```
**Create Implementation Page:**
```
create_wiki_page(
title="Change V4.1.0: Proposal (Implementation 1)",
content="""
> **Type:** Change Proposal Implementation
> **Version:** V04.1.0
> **Status:** In Progress
> **Date:** 2026-01-26
> **Origin:** [Proposal](wiki-link)
> **Sprint:** Sprint 17
# Implementation Details
[Technical details, scope, approach]
"""
)
```
**Update Proposal with Implementation Link:**
- Add link to new implementation in the Implementations section
**Cleanup Local File:**
- If input came from `docs/changes/*.md`, delete the file
- Wiki is now the single source of truth
### 5. Architecture Analysis
Think through the technical approach:
@@ -204,7 +290,7 @@ Think through the technical approach:
- What's the data flow?
- What are potential risks?
### 4. Create Gitea Issues with Proper Naming
### 6. Create Gitea Issues with Wiki Reference
**Issue Title Format (MANDATORY):**
```
@@ -233,17 +319,29 @@ Think through the technical approach:
**If a task is too large, break it down into smaller tasks.**
Use the `create_issue` and `suggest_labels` MCP tools:
**IMPORTANT: Include wiki implementation reference in issue body:**
```
create_issue(
title="[Sprint 17] feat: Implement JWT token generation",
body="## Description\n\n...\n\n## Acceptance Criteria\n\n...",
body="""## Description
[Description here]
## Implementation
**Wiki:** [Change V4.1.0 (Implementation 1)](https://gitea.example.com/org/repo/wiki/Change-V4.1.0%3A-Proposal-(Implementation-1))
## Acceptance Criteria
- [ ] Criteria 1
- [ ] Criteria 2
""",
labels=["Type/Feature", "Priority/High", "Component/Auth", "Tech/Python"]
)
```
### 5. Set Up Dependencies
### 7. Set Up Dependencies
After creating issues, establish dependencies using native Gitea dependencies:
@@ -256,7 +354,7 @@ create_issue_dependency(
This creates a relationship where issue #46 depends on #45 completing first.
### 6. Create or Select Milestone
### 8. Create or Select Milestone
Use milestones to group sprint issues:
@@ -270,7 +368,7 @@ create_milestone(
Then assign issues to the milestone when creating them.
### 7. Generate Planning Document
### 9. Generate Planning Document
Summarize the sprint plan:
@@ -338,11 +436,13 @@ Sprint 17 - User Authentication (Due: 2025-02-01)
- `create_issue_dependency(issue_number, depends_on)` - Create dependency
- `get_execution_order(issue_numbers)` - Get parallel execution order
**Lessons Learned Tools (Gitea Wiki):**
**Lessons Learned & Wiki Tools:**
- `search_lessons(query, tags, limit)` - Search lessons learned
- `create_lesson(title, content, tags, category)` - Create lesson
- `list_wiki_pages()` - List wiki pages
- `get_wiki_page(page_name)` - Get wiki page content
- `create_wiki_page(title, content)` - Create new wiki page (proposals, implementations)
- `update_wiki_page(page_name, content)` - Update wiki page content
## Communication Style
@@ -370,11 +470,14 @@ Sprint 17 - User Authentication (Due: 2025-02-01)
2. **Always check branch first** - No planning on production!
3. **Always validate repo is under organization** - Fail fast if not
4. **Always validate labels exist** - Create missing ones
5. **Always check for docs/changes/ folder** - Create if missing
6. **Always search lessons learned** - Prevent repeated mistakes
7. **Always use proper naming** - `[Sprint XX] <type>: <description>`
8. **Always set up dependencies** - Use native Gitea dependencies
9. **Always use suggest_labels** - Don't guess labels
10. **Always think through architecture** - Consider edge cases
5. **Always detect input source** - Check file, wiki, or use conversation
6. **Always create wiki proposal and implementation** - Before creating issues
7. **Always search lessons learned** - Prevent repeated mistakes
8. **Always use proper naming** - `[Sprint XX] <type>: <description>`
9. **Always include wiki reference** - Add implementation link to issues
10. **Always set up dependencies** - Use native Gitea dependencies
11. **Always use suggest_labels** - Don't guess labels
12. **Always think through architecture** - Consider edge cases
13. **Always cleanup local files** - Delete after migrating to wiki
You are the thoughtful planner who ensures sprints are well-prepared, architecturally sound, and learn from past experiences. Take your time, ask questions, and create comprehensive plans that set the team up for success.

View File

@@ -4,7 +4,7 @@ description: Run diagnostics and create structured issue in marketplace reposito
# Debug Report
Run diagnostic checks on projman MCP tools and create a structured issue in the marketplace repository for investigation.
Create structured issues in the marketplace repository - either from automated diagnostic tests OR from user-reported problems.
## Prerequisites
@@ -20,6 +20,101 @@ If not configured, ask the user for the marketplace repository path.
You MUST follow these steps in order. Do NOT skip any step.
### Step 0: Select Report Mode
Use AskUserQuestion to determine what the user wants to report:
```
What would you like to report?
[ ] Run automated diagnostics - Test MCP tools and report failures
[ ] Report an issue I experienced - Describe a problem with any plugin command
```
Store the selection as `REPORT_MODE`:
- "automated" → Continue to Step 1
- "user-reported" → Skip to Step 0.1
---
### Step 0.1: Gather User Feedback (User-Reported Mode Only)
If `REPORT_MODE` is "user-reported", gather structured feedback.
**Question 1: What were you trying to do?**
Use AskUserQuestion:
```
Which plugin/command were you using?
[ ] projman (sprint planning, issues, labels)
[ ] git-flow (commits, branches)
[ ] pr-review (pull request review)
[ ] cmdb-assistant (NetBox integration)
[ ] doc-guardian (documentation)
[ ] code-sentinel (security, refactoring)
[ ] Other - I'll describe it
```
Store as `AFFECTED_PLUGIN`.
Then ask for the specific command (free text):
```
What command or tool were you using? (e.g., /sprint-plan, virt_update_vm)
```
Store as `AFFECTED_COMMAND`.
**Question 2: What was your goal?**
```
Briefly describe what you were trying to accomplish:
```
Store as `USER_GOAL`.
**Question 3: What went wrong?**
Use AskUserQuestion:
```
What type of problem did you encounter?
[ ] Error message - Command failed with an error
[ ] Missing feature - Tool doesn't support what I need
[ ] Unexpected behavior - It worked but did the wrong thing
[ ] Documentation issue - Instructions were unclear or wrong
[ ] Other - I'll describe it
```
Store as `PROBLEM_TYPE`.
Then ask for details (free text):
```
Describe what happened. Include any error messages if applicable:
```
Store as `PROBLEM_DESCRIPTION`.
**Question 4: Expected vs Actual**
```
What did you expect to happen?
```
Store as `EXPECTED_BEHAVIOR`.
**Question 5: Workaround (optional)**
```
Did you find a workaround? If so, describe it (or skip):
```
Store as `WORKAROUND` (may be empty).
After gathering feedback, continue to Step 1 for context gathering, then skip to Step 5.1.
---
### Step 1: Gather Project Context
Run these Bash commands to capture project information:
@@ -91,7 +186,9 @@ grep PROJMAN_MARKETPLACE_REPO .env
Store as `MARKETPLACE_REPO`. If not found, ask the user.
### Step 3: Run Diagnostic Suite
### Step 3: Run Diagnostic Suite (Automated Mode Only)
**Skip this step if `REPORT_MODE` is "user-reported"** → Go to Step 5.1
Run each MCP tool with explicit `repo` parameter. Record success/failure and full response.
@@ -131,7 +228,9 @@ For each test, record:
- Status: PASS or FAIL
- Response or error message
### Step 4: Analyze Results
### Step 4: Analyze Results (Automated Mode Only)
**Skip this step if `REPORT_MODE` is "user-reported"** → Go to Step 5.1
Count failures and categorize errors:
@@ -145,7 +244,9 @@ Count failures and categorize errors:
For each failure, write a hypothesis about the likely cause.
### Step 5: Generate Smart Labels
### Step 5: Generate Smart Labels (Automated Mode Only)
**Skip this step if `REPORT_MODE` is "user-reported"** → Go to Step 5.1
Generate appropriate labels based on the diagnostic results.
@@ -180,7 +281,53 @@ The final label set should include:
- **Always**: `Type: Bug`, `Source: Diagnostic`, `Agent: Claude`
- **If detected**: `Component: *`, `Complexity: *`, `Risk: *`, `Priority: *`
### Step 6: Generate Issue Content
After generating labels, continue to Step 6.
---
### Step 5.1: Generate Labels (User-Reported Mode Only)
**Only execute this step if `REPORT_MODE` is "user-reported"**
**1. Map problem type to labels:**
| PROBLEM_TYPE | Labels |
|--------------|--------|
| Error message | `Type: Bug` |
| Missing feature | `Type: Enhancement` |
| Unexpected behavior | `Type: Bug` |
| Documentation issue | `Type: Documentation` |
| Other | `Type: Bug` (default) |
**2. Map plugin to component:**
| AFFECTED_PLUGIN | Component Label |
|-----------------|-----------------|
| projman | `Component: Commands` |
| git-flow | `Component: Commands` |
| pr-review | `Component: Commands` |
| cmdb-assistant | `Component: API` |
| doc-guardian | `Component: Commands` |
| code-sentinel | `Component: Commands` |
| Other | *(no component label)* |
**3. Build final labels:**
```
BASE_LABELS = ["Source: User-Reported", "Agent: Claude"]
TYPE_LABEL = [mapped from PROBLEM_TYPE]
COMPONENT_LABEL = [mapped from AFFECTED_PLUGIN, if any]
FINAL_LABELS = BASE_LABELS + TYPE_LABEL + COMPONENT_LABEL
```
After generating labels, continue to Step 6.1.
---
### Step 6: Generate Issue Content (Automated Mode Only)
**Skip this step if `REPORT_MODE` is "user-reported"** → Go to Step 6.1
Use this exact template:
@@ -254,9 +401,86 @@ Use this exact template:
---
*Generated by /debug-report - Labels: Type: Bug, Source: Diagnostic, Agent: Claude*
*Generated by /debug-report (automated) - Labels: Type: Bug, Source: Diagnostic, Agent: Claude*
```
After generating content, continue to Step 7.
---
### Step 6.1: Generate Issue Content (User-Reported Mode Only)
**Only execute this step if `REPORT_MODE` is "user-reported"**
Use this template for user-reported issues:
```markdown
## User-Reported Issue
**Reported**: [ISO timestamp]
**Reporter**: Claude Code via /debug-report (user feedback)
## Context
| Field | Value |
|-------|-------|
| Plugin | `[AFFECTED_PLUGIN]` |
| Command/Tool | `[AFFECTED_COMMAND]` |
| Repository | `[PROJECT_REPO]` |
| Working Directory | `[WORKING_DIR]` |
| Branch | `[CURRENT_BRANCH]` |
## Problem Description
### Goal
[USER_GOAL]
### What Happened
**Problem Type**: [PROBLEM_TYPE]
[PROBLEM_DESCRIPTION]
### Expected Behavior
[EXPECTED_BEHAVIOR]
## Workaround
[WORKAROUND if provided, otherwise "None identified"]
## Investigation Hints
Based on the affected plugin/command, relevant files to check:
[Generate based on AFFECTED_PLUGIN:]
**projman:**
- `plugins/projman/commands/[AFFECTED_COMMAND].md`
- `mcp-servers/gitea/mcp_server/tools/*.py`
**git-flow:**
- `plugins/git-flow/commands/[AFFECTED_COMMAND].md`
**pr-review:**
- `plugins/pr-review/commands/[AFFECTED_COMMAND].md`
- `mcp-servers/gitea/mcp_server/tools/pull_requests.py`
**cmdb-assistant:**
- `plugins/cmdb-assistant/commands/[AFFECTED_COMMAND].md`
- `mcp-servers/netbox/mcp_server/tools/*.py`
- `mcp-servers/netbox/mcp_server/server.py` (tool schemas)
**doc-guardian / code-sentinel:**
- `plugins/[plugin]/commands/[AFFECTED_COMMAND].md`
- `plugins/[plugin]/hooks/*.md`
---
*Generated by /debug-report (user feedback) - Labels: [FINAL_LABELS]*
```
After generating content, continue to Step 7.
---
### Step 7: Create Issue in Marketplace
**IMPORTANT:** Always use curl to create issues in the marketplace repo. This avoids branch protection restrictions and MCP context issues that can block issue creation when working on protected branches.
@@ -274,46 +498,57 @@ fi
**2. Fetch label IDs from marketplace repo:**
The diagnostic labels to apply are:
Labels depend on `REPORT_MODE`:
**Automated mode:**
- `Source/Diagnostic` (always)
- `Type/Bug` (always)
**User-reported mode:**
- `Source/User-Reported` (always)
- Type label from Step 5.1 (Bug, Enhancement, or Documentation)
- Component label from Step 5.1 (if applicable)
```bash
# Fetch all labels and extract IDs for our target labels
# Fetch all labels from marketplace repo
LABELS_JSON=$(curl -s "${GITEA_API_URL}/repos/${MARKETPLACE_REPO}/labels" \
-H "Authorization: token ${GITEA_API_TOKEN}")
# Extract label IDs (handles both org and repo labels)
SOURCE_DIAG_ID=$(echo "$LABELS_JSON" | jq -r '.[] | select(.name == "Source/Diagnostic") | .id')
TYPE_BUG_ID=$(echo "$LABELS_JSON" | jq -r '.[] | select(.name == "Type/Bug") | .id')
# Extract label IDs based on FINAL_LABELS from Step 5 or 5.1
# Build LABEL_IDS array with IDs of labels that exist in the repo
# Example for automated mode:
SOURCE_ID=$(echo "$LABELS_JSON" | jq -r '.[] | select(.name == "Source/Diagnostic") | .id')
TYPE_ID=$(echo "$LABELS_JSON" | jq -r '.[] | select(.name == "Type/Bug") | .id')
# Build label array (only include IDs that were found)
LABEL_IDS="[]"
if [[ -n "$SOURCE_DIAG_ID" && -n "$TYPE_BUG_ID" ]]; then
LABEL_IDS="[$SOURCE_DIAG_ID, $TYPE_BUG_ID]"
elif [[ -n "$SOURCE_DIAG_ID" ]]; then
LABEL_IDS="[$SOURCE_DIAG_ID]"
elif [[ -n "$TYPE_BUG_ID" ]]; then
LABEL_IDS="[$TYPE_BUG_ID]"
fi
# Example for user-reported mode (adjust based on FINAL_LABELS):
# SOURCE_ID=$(echo "$LABELS_JSON" | jq -r '.[] | select(.name == "Source/User-Reported") | .id')
# TYPE_ID=$(echo "$LABELS_JSON" | jq -r '.[] | select(.name == "[TYPE_LABEL]") | .id')
# Build label array from found IDs
LABEL_IDS="[$(echo "$SOURCE_ID,$TYPE_ID" | sed 's/,,*/,/g; s/^,//; s/,$//')]"
echo "Label IDs to apply: $LABEL_IDS"
```
**3. Create issue with labels via curl:**
**Title format depends on `REPORT_MODE`:**
- Automated: `[Diagnostic] [summary of main failure]`
- User-reported: `[AFFECTED_PLUGIN] [brief summary of PROBLEM_DESCRIPTION]`
```bash
# Create temp files with restrictive permissions
DIAG_TITLE=$(mktemp -t diag-title.XXXXXX)
DIAG_BODY=$(mktemp -t diag-body.XXXXXX)
DIAG_PAYLOAD=$(mktemp -t diag-payload.XXXXXX)
# Save title
echo "[Diagnostic] [summary of main failure]" > "$DIAG_TITLE"
# Save title (format depends on REPORT_MODE)
# Automated: "[Diagnostic] [summary of main failure]"
# User-reported: "[AFFECTED_PLUGIN] [brief summary]"
echo "[Title based on REPORT_MODE]" > "$DIAG_TITLE"
# Save body (paste Step 6 content) - heredoc delimiter prevents shell expansion
# Save body (paste Step 6 or 6.1 content) - heredoc delimiter prevents shell expansion
cat > "$DIAG_BODY" << 'DIAGNOSTIC_EOF'
[Paste the full issue content from Step 6 here]
[Paste the full issue content from Step 6 or 6.1 here]
DIAGNOSTIC_EOF
# Build JSON payload with labels using jq
@@ -360,8 +595,9 @@ To create the issue manually:
### Step 8: Report to User
Display summary:
Display summary based on `REPORT_MODE`:
**Automated Mode:**
```
Debug Report Complete
=====================
@@ -383,18 +619,38 @@ Next Steps:
3. Select issue #[N] to investigate
```
**User-Reported Mode:**
```
Issue Report Complete
=====================
Plugin: [AFFECTED_PLUGIN]
Command: [AFFECTED_COMMAND]
Problem: [PROBLEM_TYPE]
Issue Created: [issue URL]
Your feedback has been captured. The development team will
investigate and may follow up with questions.
Next Steps:
1. Switch to marketplace repo: cd [marketplace path]
2. Run: /debug-review
3. Select issue #[N] to investigate
```
## DO NOT
- **DO NOT** attempt to fix anything - only report
- **DO NOT** create issues if all tests pass (just report success)
- **DO NOT** skip any diagnostic test
- **DO NOT** create issues if all automated tests pass (unless in user-reported mode)
- **DO NOT** skip any diagnostic test in automated mode
- **DO NOT** call MCP tools without the `repo` parameter
- **DO NOT** ask user questions during execution - run autonomously
- **DO NOT** skip user questions in user-reported mode - gather complete feedback
- **DO NOT** use MCP tools to create issues in the marketplace - always use curl (avoids branch restrictions)
## If All Tests Pass
## If All Tests Pass (Automated Mode Only)
If all 5 tests pass, report success without creating an issue:
If all 5 tests pass in automated mode, report success without creating an issue:
```
Debug Report Complete
@@ -407,8 +663,8 @@ Failed: 0
All diagnostics passed. No issues to report.
If you're experiencing a specific problem, please describe it
and I can create a manual bug report.
If you're experiencing a specific problem, run /debug-report again
and select "Report an issue I experienced" to describe it.
```
## Troubleshooting

View File

@@ -0,0 +1,121 @@
---
description: View proposal and implementation hierarchy with status tracking
---
# Proposal Status
View the status of all change proposals and their implementations in the Gitea Wiki.
## Overview
This command provides a tree view of:
- All change proposals (`Change VXX.X.X: Proposal`)
- Their implementations (`Change VXX.X.X: Proposal (Implementation N)`)
- Linked issues and lessons learned
## Workflow
1. **Fetch All Wiki Pages**
- Use `list_wiki_pages()` to get all wiki pages
- Filter for pages matching `Change V*: Proposal*` pattern
2. **Parse Proposal Structure**
- Group implementations under their parent proposals
- Extract status from page metadata (In Progress, Implemented, Abandoned)
3. **Fetch Linked Artifacts**
- For each implementation, search for issues referencing it
- Search lessons learned that link to the implementation
4. **Display Tree View**
```
Change V04.1.0: Proposal [In Progress]
├── Implementation 1 [In Progress] - 2026-01-26
│ ├── Issues: #161, #162, #163, #164
│ └── Lessons: (pending)
└── Implementation 2 [Not Started]
Change V04.0.0: Proposal [Implemented]
└── Implementation 1 [Implemented] - 2026-01-20
├── Issues: #150, #151
└── Lessons: v4.0.0-impl-1-lessons
```
## MCP Tools Used
- `list_wiki_pages()` - List all wiki pages
- `get_wiki_page(page_name)` - Get page content for status extraction
- `list_issues(state, labels)` - Find linked issues
- `search_lessons(query, tags)` - Find linked lessons
## Status Definitions
| Status | Meaning |
|--------|---------|
| **Pending** | Proposal created but no implementation started |
| **In Progress** | At least one implementation is active |
| **Implemented** | All planned implementations complete |
| **Abandoned** | Proposal was cancelled or superseded |
## Filtering Options
The command accepts optional filters:
```
/proposal-status # Show all proposals
/proposal-status --version V04.1.0 # Show specific version
/proposal-status --status "In Progress" # Filter by status
```
## Example Output
```
Proposal Status Report
======================
Change V04.1.0: Wiki-Based Planning Workflow [In Progress]
├── Implementation 1 [In Progress] - Started: 2026-01-26
│ ├── Issues: #161 (closed), #162 (closed), #163 (closed), #164 (open)
│ └── Lessons: (pending - sprint not closed)
└── (No additional implementations planned)
Change V04.0.0: MCP Server Consolidation [Implemented]
├── Implementation 1 [Implemented] - 2026-01-15 to 2026-01-20
│ ├── Issues: #150 (closed), #151 (closed), #152 (closed)
│ └── Lessons:
│ • Sprint 15 - MCP Server Symlink Best Practices
│ • Sprint 15 - Venv Path Resolution in Plugins
└── (Complete)
Change V03.2.0: Label Taxonomy Sync [Implemented]
└── Implementation 1 [Implemented] - 2026-01-10 to 2026-01-12
├── Issues: #140 (closed), #141 (closed)
└── Lessons:
• Sprint 14 - Organization vs Repository Labels
Summary:
- Total Proposals: 3
- In Progress: 1
- Implemented: 2
- Pending: 0
```
## Implementation Notes
**Page Name Parsing:**
- Proposals: `Change VXX.X.X: Proposal` or `Change Sprint-NN: Proposal`
- Implementations: `Change VXX.X.X: Proposal (Implementation N)`
**Status Extraction:**
- Parse the `> **Status:**` line from page metadata
- Default to "Unknown" if not found
**Issue Linking:**
- Search for issues containing wiki link in body
- Or search for issues with `[Sprint XX]` prefix matching implementation
**Lesson Linking:**
- Search lessons with implementation link in metadata
- Or search by version/sprint tags

View File

@@ -41,13 +41,36 @@ The orchestrator agent will guide you through:
- Create lessons in project wiki under `lessons-learned/sprints/`
- Make lessons searchable for future sprints
5. **Git Operations**
- Commit any remaining work
5. **Update Wiki Implementation Page**
- Use `get_wiki_page` to fetch the current implementation page
- Update status from "In Progress" to "Implemented" (or "Partial"/"Failed")
- Add completion date
- Link to lessons learned created in step 4
- Use `update_wiki_page` to save changes
6. **Update Wiki Proposal Page**
- Check if all implementations for this proposal are complete
- If all complete: Update proposal status to "Implemented"
- If partial: Keep status as "In Progress", note completed implementations
- Add summary of what was accomplished
7. **Update CHANGELOG** (MANDATORY)
- Add all sprint changes to `[Unreleased]` section in CHANGELOG.md
- Categorize: Added, Changed, Fixed, Removed, Deprecated
- Include plugin prefix (e.g., `- **projman:** New feature`)
8. **Version Check**
- Run `/suggest-version` to analyze changes and recommend version bump
- If release warranted: run `./scripts/release.sh X.Y.Z`
- Ensures version numbers stay in sync across files
9. **Git Operations**
- Commit any remaining work (including CHANGELOG updates)
- Merge feature branches if needed
- Clean up merged branches
- Tag sprint completion
- Tag sprint completion (if release created)
6. **Close Milestone**
10. **Close Milestone**
- Use `update_milestone` to close the sprint milestone
- Document final completion status
@@ -66,7 +89,8 @@ The orchestrator agent will guide you through:
- `create_lesson` - Create lessons learned entry
- `search_lessons` - Check for similar existing lessons
- `list_wiki_pages` - Check existing lessons learned
- `get_wiki_page` - Read existing lessons
- `get_wiki_page` - Read existing lessons or implementation pages
- `update_wiki_page` - Update implementation/proposal status
## Lesson Structure
@@ -75,6 +99,11 @@ Lessons should follow this structure:
```markdown
# Sprint X - [Lesson Title]
## Metadata
- **Implementation:** [Change VXX.X.X (Impl N)](wiki-link)
- **Issues:** #45, #46, #47
- **Sprint:** Sprint X
## Context
[What were you trying to do? What was the sprint goal?]
@@ -91,12 +120,19 @@ Lessons should follow this structure:
[Comma-separated tags for search: technology, component, type]
```
**IMPORTANT:** Always include the Implementation link in the Metadata section. This enables bidirectional traceability between lessons and the work that generated them.
## Example Lessons Learned
**Example 1: Technical Gotcha**
```markdown
# Sprint 16 - Claude Code Infinite Loop on Validation Errors
## Metadata
- **Implementation:** [Change V1.2.0 (Impl 1)](https://gitea.example.com/org/repo/wiki/Change-V1.2.0%3A-Proposal-(Implementation-1))
- **Issues:** #45, #46
- **Sprint:** Sprint 16
## Context
Implementing input validation for authentication API endpoints.
@@ -123,6 +159,11 @@ testing, claude-code, validation, python, pytest, debugging
```markdown
# Sprint 14 - Extracting Services Too Early
## Metadata
- **Implementation:** [Change V2.0.0 (Impl 1)](https://gitea.example.com/org/repo/wiki/Change-V2.0.0%3A-Proposal-(Implementation-1))
- **Issues:** #32, #33, #34
- **Sprint:** Sprint 14
## Context
Planning to extract Intuit Engine service from monolith.

View File

@@ -47,15 +47,34 @@ Verify all required labels exist using `get_labels`:
**If labels are missing:** Use `create_label` to create them.
### 4. docs/changes/ Folder Check
### 4. Input Source Detection
Verify the project has a `docs/changes/` folder for sprint input files.
The planner supports flexible input sources for sprint planning:
**If folder does NOT exist:** Prompt user to create it.
| Source | Detection | Action |
|--------|-----------|--------|
| **Local file** | `docs/changes/*.md` exists | Parse frontmatter, migrate to wiki, delete local |
| **Existing wiki** | `Change VXX.X.X: Proposal` exists | Use as-is, create new implementation page |
| **Conversation** | Neither file nor wiki exists | Create wiki from discussion context |
**If sprint starts with discussion but no input file:**
- Capture the discussion outputs
- Create a change file: `docs/changes/sprint-XX-description.md`
**Input File Format** (if using local file):
```yaml
---
version: "4.1.0" # or "sprint-17" for internal work
title: "Feature Name"
plugin: plugin-name # optional
type: feature # feature | bugfix | refactor | infra
---
# Feature Description
[Free-form content...]
```
**Detection Logic:**
1. Check for `docs/changes/*.md` files
2. Check for existing wiki proposal matching version
3. If neither found, use conversation context
4. If ambiguous, ask user which input to use
## Planning Workflow
@@ -66,36 +85,56 @@ The planner agent will:
- Understand scope, priorities, and constraints
- Never rush - take time to understand requirements fully
2. **Search Relevant Lessons Learned**
2. **Detect Input Source**
- Check for `docs/changes/*.md` files
- Check for existing wiki proposal by version
- If neither: use conversation context
- Ask user if multiple sources found
3. **Search Relevant Lessons Learned**
- Use the `search_lessons` MCP tool to find past experiences
- Search by keywords and tags relevant to the sprint work
- Review patterns and preventable mistakes from previous sprints
3. **Architecture Analysis**
4. **Create/Update Wiki Proposal**
- If local file: migrate content to wiki, create proposal page
- If conversation: create proposal from discussion
- If existing wiki: skip creation, use as-is
- **Page naming:** `Change VXX.X.X: Proposal` or `Change Sprint-NN: Proposal`
5. **Create Wiki Implementation Page**
- Create `Change VXX.X.X: Proposal (Implementation N)`
- Include tags: Type, Version, Status=In Progress, Date, Origin
- Update proposal page with link to this implementation
- This page tracks THIS sprint's work on the proposal
6. **Architecture Analysis**
- Think through technical approach and edge cases
- Identify architectural decisions needed
- Consider dependencies and integration points
- Review existing codebase architecture
4. **Create Gitea Issues**
7. **Create Gitea Issues**
- Use the `create_issue` MCP tool for each planned task
- Apply appropriate labels using `suggest_labels` tool
- **Issue Title Format (MANDATORY):** `[Sprint XX] <type>: <description>`
- **Include wiki reference:** `Implementation: [Change VXX.X.X (Impl N)](wiki-link)`
- Include acceptance criteria and technical notes
5. **Set Up Dependencies**
8. **Set Up Dependencies**
- Use `create_issue_dependency` to establish task dependencies
- This enables parallel execution planning
6. **Create or Select Milestone**
9. **Create or Select Milestone**
- Use `create_milestone` to group sprint issues
- Assign issues to the milestone
7. **Generate Planning Document**
- Summarize architectural decisions
- List created issues with labels
- Document dependency graph
- Provide sprint overview
10. **Cleanup & Summary**
- Delete local input file (wiki is now source of truth)
- Summarize architectural decisions
- List created issues with labels
- Document dependency graph
- Provide sprint overview with wiki links
## Issue Title Format (MANDATORY)
@@ -152,6 +191,8 @@ The planner agent will:
- `search_lessons` - Search lessons learned from previous sprints
- `list_wiki_pages` - List wiki pages
- `get_wiki_page` - Fetch specific documentation page
- `create_wiki_page` - Create new wiki page (proposals, implementations)
- `update_wiki_page` - Update wiki page content
## Label Taxonomy
@@ -181,16 +222,17 @@ The planner agent is thoughtful and methodical:
User: I want to plan a sprint for user authentication
Planner: Great! Let me first run pre-planning validations...
[Checks branch, repo org, labels, docs/changes folder]
[Checks branch, repo org, labels]
All validations passed.
All validations passed. Now let me ask a few questions:
[Checks for input sources]
Found: docs/changes/v1.2.0-user-authentication.md
I'll use this file as the planning input. Let me ask a few questions:
1. What authentication method are you planning? (JWT, OAuth, session-based?)
2. Are there any specific security requirements or compliance needs?
3. Should this integrate with existing user management?
4. What's the priority level for this sprint?
Let me also search for relevant lessons learned about authentication...
[Uses search_lessons to find past authentication work]
@@ -198,22 +240,23 @@ Based on previous experience, I found these relevant lessons:
- Sprint 12: JWT token expiration handling edge cases
- Sprint 8: OAuth integration pitfalls with third-party providers
Now, let me analyze the architecture...
[Creates wiki proposal and implementation pages]
✓ Created: "Change V1.2.0: Proposal" (wiki)
✓ Created: "Change V1.2.0: Proposal (Implementation 1)" (wiki)
✓ Deleted: docs/changes/v1.2.0-user-authentication.md (migrated to wiki)
[Creates issues with appropriate labels and dependencies]
Now, let me analyze the architecture and create issues...
[Creates issues with wiki references]
Created 5 issues for the authentication sprint:
- Issue #45: [Sprint 17] feat: Implement JWT token generation
Labels: Type/Feature, Priority/High, Component/Auth, Tech/Python
Dependencies: None
Implementation: [Change V1.2.0 (Impl 1)](wiki-link)
- Issue #46: [Sprint 17] feat: Build user login endpoint
Labels: Type/Feature, Priority/High, Component/API, Tech/FastAPI
Dependencies: #45
- Issue #47: [Sprint 17] feat: Create user registration form
Labels: Type/Feature, Priority/Medium, Component/Frontend, Tech/Vue
Dependencies: #46
Implementation: [Change V1.2.0 (Impl 1)](wiki-link)
Dependency Graph:
#45 -> #46 -> #47
@@ -222,20 +265,28 @@ Dependency Graph:
#48
Milestone: Sprint 17 - User Authentication (Due: 2025-02-01)
Wiki: https://gitea.example.com/org/repo/wiki/Change-V1.2.0%3A-Proposal
```
## Getting Started
Invoke the planner agent by providing your sprint goals. The agent will guide you through the planning process.
**Input Options:**
1. Create `docs/changes/vX.Y.Z-feature-name.md` with frontmatter before running
2. Create wiki proposal page manually, then run `/sprint-plan`
3. Just start a conversation - the planner will capture context and create wiki pages
**Example:**
> "I want to plan a sprint for extracting the Intuit Engine service from the monolith"
The planner will then:
1. Run pre-planning validations
2. Ask clarifying questions
3. Search lessons learned
4. Create issues with proper naming and labels
5. Set up dependencies
6. Create milestone
7. Generate planning summary
2. Detect input source (file, wiki, or conversation)
3. Ask clarifying questions
4. Search lessons learned
5. Create wiki proposal and implementation pages
6. Create issues with wiki references
7. Set up dependencies
8. Create milestone
9. Cleanup and generate planning summary

View File

@@ -0,0 +1,78 @@
# /suggest-version
Analyze CHANGELOG.md and suggest appropriate semantic version bump.
## Behavior
1. **Read current state**:
- Read `CHANGELOG.md` to find current version and [Unreleased] content
- Read `.claude-plugin/marketplace.json` for current marketplace version
- Check individual plugin versions in `plugins/*/. claude-plugin/plugin.json`
2. **Analyze [Unreleased] section**:
- Extract all entries under `### Added`, `### Changed`, `### Fixed`, `### Removed`, `### Deprecated`
- Categorize changes by impact
3. **Apply SemVer rules**:
| Change Type | Version Bump | Indicators |
|-------------|--------------|------------|
| **MAJOR** (X.0.0) | Breaking changes | `### Removed`, `### Changed` with "BREAKING:", renamed/removed APIs |
| **MINOR** (x.Y.0) | New features, backwards compatible | `### Added` with new commands/plugins/features |
| **PATCH** (x.y.Z) | Bug fixes only | `### Fixed` only, `### Changed` for non-breaking tweaks |
4. **Output recommendation**:
```
## Version Analysis
**Current version:** X.Y.Z
**[Unreleased] summary:**
- Added: N entries (new features/plugins)
- Changed: N entries (M breaking)
- Fixed: N entries
- Removed: N entries
**Recommendation:** MINOR bump → X.(Y+1).0
**Reason:** New features added without breaking changes
**To release:** ./scripts/release.sh X.Y.Z
```
5. **Check version sync**:
- Compare marketplace version with individual plugin versions
- Warn if plugins are out of sync (e.g., marketplace 4.0.0 but projman 3.1.0)
## Examples
**Output when MINOR bump needed:**
```
## Version Analysis
**Current version:** 4.0.0
**[Unreleased] summary:**
- Added: 3 entries (new command, hook improvement, workflow example)
- Changed: 1 entry (0 breaking)
- Fixed: 2 entries
**Recommendation:** MINOR bump → 4.1.0
**Reason:** New features (Added section) without breaking changes
**To release:** ./scripts/release.sh 4.1.0
```
**Output when nothing to release:**
```
## Version Analysis
**Current version:** 4.0.0
**[Unreleased] summary:** Empty - no pending changes
**Recommendation:** No release needed
```
## Integration
This command helps maintain proper versioning workflow:
- Run after completing a sprint to determine version bump
- Run before `/sprint-close` to ensure version is updated
- Integrates with `./scripts/release.sh` for actual release execution

View File

@@ -1,6 +1,6 @@
#!/bin/bash
# projman startup check hook
# Checks for common issues at session start
# Checks for common issues AND suggests sprint planning proactively
# All output MUST have [projman] prefix
PREFIX="[projman]"
@@ -26,5 +26,41 @@ if [[ -f ".env" ]]; then
fi
fi
# All checks passed - say nothing
# Check for open issues (suggests sprint planning)
# Only if .env exists with valid GITEA config
if [[ -f ".env" ]]; then
GITEA_API_URL=$(grep -E "^GITEA_API_URL=" ~/.config/claude/gitea.env 2>/dev/null | cut -d'=' -f2 | tr -d '"' || true)
GITEA_API_TOKEN=$(grep -E "^GITEA_API_TOKEN=" ~/.config/claude/gitea.env 2>/dev/null | cut -d'=' -f2 | tr -d '"' || true)
GITEA_REPO=$(grep -E "^GITEA_REPO=" .env 2>/dev/null | cut -d'=' -f2 | tr -d '"' || true)
if [[ -n "$GITEA_API_URL" && -n "$GITEA_API_TOKEN" && -n "$GITEA_REPO" ]]; then
# Quick check for open issues without milestone (unplanned work)
OPEN_ISSUES=$(curl -s -m 5 \
-H "Authorization: token $GITEA_API_TOKEN" \
"${GITEA_API_URL}/repos/${GITEA_REPO}/issues?state=open&milestone=none&limit=1" 2>/dev/null | \
grep -c '"number"' || echo "0")
if [[ "$OPEN_ISSUES" -gt 0 ]]; then
# Count total unplanned issues
TOTAL_UNPLANNED=$(curl -s -m 5 \
-H "Authorization: token $GITEA_API_TOKEN" \
"${GITEA_API_URL}/repos/${GITEA_REPO}/issues?state=open&milestone=none" 2>/dev/null | \
grep -c '"number"' || echo "?")
echo "$PREFIX ${TOTAL_UNPLANNED} open issues without milestone - consider /sprint-plan"
fi
fi
fi
# Check for CHANGELOG.md [Unreleased] content (version management)
if [[ -f "CHANGELOG.md" ]]; then
# Check if there's content under [Unreleased] that hasn't been released
UNRELEASED_CONTENT=$(sed -n '/## \[Unreleased\]/,/## \[/p' CHANGELOG.md 2>/dev/null | grep -E '^### (Added|Changed|Fixed|Removed|Deprecated)' | head -1 || true)
if [[ -n "$UNRELEASED_CONTENT" ]]; then
UNRELEASED_LINES=$(sed -n '/## \[Unreleased\]/,/## \[/p' CHANGELOG.md 2>/dev/null | grep -E '^- ' | wc -l | tr -d ' ')
if [[ "$UNRELEASED_LINES" -gt 0 ]]; then
echo "$PREFIX ${UNRELEASED_LINES} unreleased changes in CHANGELOG - consider version bump"
fi
fi
fi
exit 0

View File

@@ -67,6 +67,7 @@ main() {
# Shared MCP servers at repository root (v3.0.0+)
update_mcp_server "gitea"
update_mcp_server "netbox"
update_mcp_server "data-platform"
check_changelog

172
scripts/release.sh Executable file
View File

@@ -0,0 +1,172 @@
#!/bin/bash
# release.sh - Create a new release with version consistency
#
# Usage: ./scripts/release.sh X.Y.Z
#
# This script ensures all version references are updated consistently:
# 1. CHANGELOG.md - [Unreleased] becomes [X.Y.Z] - YYYY-MM-DD
# 2. README.md - Title updated to vX.Y.Z
# 3. marketplace.json - version field updated
# 4. Git commit and tag created
#
# Prerequisites:
# - Clean working directory (no uncommitted changes)
# - [Unreleased] section in CHANGELOG.md with content
# - On development branch
set -e
VERSION=$1
DATE=$(date +%Y-%m-%d)
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
error() { echo -e "${RED}ERROR: $1${NC}" >&2; exit 1; }
warn() { echo -e "${YELLOW}WARNING: $1${NC}"; }
success() { echo -e "${GREEN}$1${NC}"; }
info() { echo -e "$1"; }
# Validate arguments
if [ -z "$VERSION" ]; then
echo "Usage: ./scripts/release.sh X.Y.Z"
echo ""
echo "Example: ./scripts/release.sh 3.2.0"
echo ""
echo "This will:"
echo " 1. Update CHANGELOG.md [Unreleased] -> [X.Y.Z] - $(date +%Y-%m-%d)"
echo " 2. Update README.md title to vX.Y.Z"
echo " 3. Update marketplace.json version to X.Y.Z"
echo " 4. Commit with message 'chore: release vX.Y.Z'"
echo " 5. Create git tag vX.Y.Z"
exit 1
fi
# Validate version format
if ! [[ "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
error "Invalid version format. Use X.Y.Z (e.g., 3.2.0)"
fi
# Check we're in the right directory
if [ ! -f "CHANGELOG.md" ] || [ ! -f "README.md" ] || [ ! -f ".claude-plugin/marketplace.json" ]; then
error "Must run from repository root (CHANGELOG.md, README.md, .claude-plugin/marketplace.json must exist)"
fi
# Check for clean working directory
if [ -n "$(git status --porcelain)" ]; then
warn "Working directory has uncommitted changes"
echo ""
git status --short
echo ""
read -p "Continue anyway? [y/N] " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
# Check current branch
BRANCH=$(git branch --show-current)
if [ "$BRANCH" != "development" ] && [ "$BRANCH" != "main" ]; then
warn "Not on development or main branch (current: $BRANCH)"
read -p "Continue anyway? [y/N] " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
fi
# Check [Unreleased] section has content
if ! grep -q "## \[Unreleased\]" CHANGELOG.md; then
error "CHANGELOG.md missing [Unreleased] section"
fi
# Check if tag already exists
if git tag -l | grep -q "^v$VERSION$"; then
error "Tag v$VERSION already exists"
fi
info ""
info "=== Release v$VERSION ==="
info ""
# Show what will change
info "Changes to be made:"
info " CHANGELOG.md: [Unreleased] -> [$VERSION] - $DATE"
info " README.md: title -> v$VERSION"
info " marketplace.json: version -> $VERSION"
info " Git: commit + tag v$VERSION"
info ""
# Preview CHANGELOG [Unreleased] content
info "Current [Unreleased] content:"
info "---"
sed -n '/^## \[Unreleased\]/,/^## \[/p' CHANGELOG.md | head -30
info "---"
info ""
read -p "Proceed with release? [y/N] " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
info "Aborted"
exit 0
fi
info ""
info "Updating files..."
# 1. Update CHANGELOG.md
# Replace [Unreleased] with [X.Y.Z] - DATE and add new [Unreleased] section
sed -i "s/^## \[Unreleased\]$/## [Unreleased]\n\n*Changes staged for the next release*\n\n---\n\n## [$VERSION] - $DATE/" CHANGELOG.md
# Remove the placeholder text if it exists after the new [Unreleased]
sed -i '/^\*Changes staged for the next release\*$/d' CHANGELOG.md
# Clean up any double blank lines
sed -i '/^$/N;/^\n$/d' CHANGELOG.md
success " CHANGELOG.md updated"
# 2. Update README.md title
sed -i "s/^# Leo Claude Marketplace - v[0-9]\+\.[0-9]\+\.[0-9]\+$/# Leo Claude Marketplace - v$VERSION/" README.md
success " README.md updated"
# 3. Update marketplace.json version
sed -i "s/\"version\": \"[0-9]\+\.[0-9]\+\.[0-9]\+\"/\"version\": \"$VERSION\"/" .claude-plugin/marketplace.json
success " marketplace.json updated"
info ""
info "Files updated. Review changes:"
info ""
git diff --stat
info ""
git diff CHANGELOG.md | head -40
info ""
read -p "Commit and tag? [y/N] " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
warn "Changes made but not committed. Run 'git checkout -- .' to revert."
exit 0
fi
# Commit
git add CHANGELOG.md README.md .claude-plugin/marketplace.json
git commit -m "chore: release v$VERSION"
success " Committed"
# Tag
git tag "v$VERSION"
success " Tagged v$VERSION"
info ""
success "=== Release v$VERSION created ==="
info ""
info "Next steps:"
info " 1. Review the commit: git show HEAD"
info " 2. Push to remote: git push && git push --tags"
info " 3. Merge to main if on development branch"
info ""

View File

@@ -116,6 +116,15 @@ verify_symlinks() {
log_error "pr-review/gitea symlink missing"
log_todo "Run: ln -s ../../../mcp-servers/gitea plugins/pr-review/mcp-servers/gitea"
fi
# Check data-platform -> data-platform symlink
local dataplatform_link="$REPO_ROOT/plugins/data-platform/mcp-servers/data-platform"
if [[ -L "$dataplatform_link" ]]; then
log_success "data-platform symlink exists"
else
log_error "data-platform symlink missing"
log_todo "Run: ln -s ../../../mcp-servers/data-platform plugins/data-platform/mcp-servers/data-platform"
fi
}
# --- Section 3: Config File Templates ---
@@ -178,6 +187,22 @@ EOF
chmod 600 "$config_dir/git-flow.env"
log_success "git-flow.env template created"
fi
# PostgreSQL config (for data-platform, optional)
if [[ -f "$config_dir/postgres.env" ]]; then
log_skip "postgres.env already exists"
else
cat > "$config_dir/postgres.env" << 'EOF'
# PostgreSQL Configuration (for data-platform plugin)
# Update with your PostgreSQL connection URL
# This is OPTIONAL - pandas tools work without it
POSTGRES_URL=postgresql://user:password@localhost:5432/database
EOF
chmod 600 "$config_dir/postgres.env"
log_success "postgres.env template created"
log_todo "Edit ~/.config/claude/postgres.env with your PostgreSQL credentials (optional)"
fi
}
# --- Section 4: Validate Configuration ---
@@ -283,6 +308,7 @@ main() {
# Shared MCP servers at repository root
setup_shared_mcp "gitea"
setup_shared_mcp "netbox"
setup_shared_mcp "data-platform"
# Verify symlinks from plugins to shared MCP servers
verify_symlinks

View File

@@ -168,6 +168,12 @@ if [[ ! -d "$ROOT_DIR/mcp-servers/netbox" ]]; then
fi
echo "✓ Shared netbox MCP server exists"
if [[ ! -d "$ROOT_DIR/mcp-servers/data-platform" ]]; then
echo "ERROR: Shared data-platform MCP server not found at mcp-servers/data-platform/"
exit 1
fi
echo "✓ Shared data-platform MCP server exists"
# Check symlinks for plugins that use MCP servers
check_mcp_symlink() {
local plugin_name="$1"
@@ -195,5 +201,8 @@ check_mcp_symlink "pr-review" "gitea"
# Plugins with netbox MCP dependency
check_mcp_symlink "cmdb-assistant" "netbox"
# Plugins with data-platform MCP dependency
check_mcp_symlink "data-platform" "data-platform"
echo ""
echo "=== All validations passed ==="

View File

@@ -15,15 +15,15 @@ for f in $(find ~/.claude -name "hooks.json" 2>/dev/null); do
fi
done
# Check cache specifically
# Note about cache (informational only - do NOT clear mid-session)
if [ -d ~/.claude/plugins/cache/leo-claude-mktplace ]; then
echo "❌ CACHE EXISTS: ~/.claude/plugins/cache/leo-claude-mktplace"
echo " Run: rm -rf ~/.claude/plugins/cache/leo-claude-mktplace/"
FAILED=1
echo " Cache exists: ~/.claude/plugins/cache/leo-claude-mktplace"
echo " (This is normal - do NOT clear mid-session or MCP tools will break)"
echo " To apply plugin changes: restart Claude Code session"
fi
# Verify installed hooks are command type
for plugin in doc-guardian code-sentinel projman pr-review project-hygiene; do
for plugin in doc-guardian code-sentinel projman pr-review project-hygiene data-platform; do
HOOK_FILE=~/.claude/plugins/marketplaces/leo-claude-mktplace/plugins/$plugin/hooks/hooks.json
if [ -f "$HOOK_FILE" ]; then
if grep -q '"type": "command"' "$HOOK_FILE" || grep -q '"type":"command"' "$HOOK_FILE"; then