Update "lessons%2Fpatterns%2Fmcp-venv-symlinks-lost-on-marketplace-update---5-hour-debug-loop.-"

2026-02-03 03:37:06 +00:00
parent c00a64d9dc
commit cef09b35b8

@@ -6,7 +6,7 @@
## Context
User was working on branch `fix/startup-hook-venv-cache-path` to fix venv path issues in startup hooks. After marketplace reinstalls, plugins repeatedly failed to load with "1 error" message.
User was working on branch `fix/startup-hook-venv-cache-path` to fix venv path issues in startup hooks. After marketplace reinstalls, plugins repeatedly failed to load with "1 error" message. User had to reinstall at least 4 times in 30 minutes.
## Problem
@@ -16,18 +16,186 @@ User was working on branch `fix/startup-hook-venv-cache-path` to fix venv path i
4. **No postInstall hooks:** Claude Code marketplaces do NOT support postInstall hooks (feature requested in GitHub issue #11240)
5. **SessionStart hooks can't fix it:** MCP servers fail BEFORE hooks run, so startup-check.sh can't auto-create symlinks in time
## What Was Tried (All Failed to Identify Root Cause)
## Complete Debug Trace - Everything Claude Did
- ✅ Verified plugin.json files are valid JSON
- ✅ Verified hooks.json format is correct
- ✅ Verified hook scripts are executable
- ✅ Verified MCP server references match .mcp.json
- ✅ Verified commands directories exist
- ✅ Ran validate-marketplace.sh - all passed
- ✅ Verified cache venvs exist and work
- ✅ MCP servers start successfully when invoked manually
### 1. Initial Plugin Status Check
User ran `/plugin` command showing:
- contract-validator: failed to load - 1 error
- projman: failed to load - 1 error
- plugin:pyright-lsp:pyright: failed to load - 1 error
- data-platform: enabled but startup hook showed "MCP venv missing"
**Critical Gap:** Claude Code does NOT expose the actual plugin load error message. User sees "failed to load - 1 error" with no way to see WHAT the error is.
### 2. Read Plugin JSON Files
```bash
# Read source plugin.json files
/home/lmiranda/claude-plugins-work/plugins/projman/.claude-plugin/plugin.json
/home/lmiranda/claude-plugins-work/plugins/contract-validator/.claude-plugin/plugin.json
```
**Result:** Both valid JSON with mcp_servers references
### 3. Ran verify-hooks.sh
```bash
./scripts/verify-hooks.sh
```
**Result:** All 7 hooks verified OK (doc-guardian, code-sentinel, projman, pr-review, project-hygiene, data-platform, cmdb-assistant)
### 4. Checked Installed MCP Servers Directory
```bash
ls -la ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/
```
**Result:** Found 5 servers: contract-validator, data-platform, gitea, netbox, viz-platform
### 5. Checked .venv in Each Installed MCP Server
```bash
for dir in ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/*/; do
ls -la "$dir".venv
done
```
**Result:** ALL .venv directories MISSING in installed path
- contract-validator: NO .venv
- data-platform: NO .venv
- gitea: NO .venv
- netbox: NO .venv
- viz-platform: NO .venv
### 6. Checked .venv in Source MCP Servers
```bash
for dir in /home/lmiranda/claude-plugins-work/mcp-servers/*/; do
ls -la "$dir".venv
done
```
**Result:** ALL have symlinks pointing to cache:
- contract-validator/.venv -> ~/.cache/claude-mcp-venvs/leo-claude-mktplace/contract-validator/.venv
- data-platform/.venv -> ~/.cache/claude-mcp-venvs/leo-claude-mktplace/data-platform/.venv
- gitea/.venv -> ~/.cache/claude-mcp-venvs/leo-claude-mktplace/gitea/.venv
- netbox/.venv -> ~/.cache/claude-mcp-venvs/leo-claude-mktplace/netbox/.venv
- viz-platform/.venv -> ~/.cache/claude-mcp-venvs/leo-claude-mktplace/viz-platform/.venv
### 7. Checked Cache Directory Exists
```bash
ls -la ~/.cache/claude-mcp-venvs/leo-claude-mktplace/
```
**Result:** Cache exists with all 5 server venvs from Jan 27
### 8. Ran setup-venvs.sh in Installed Path
```bash
cd ~/.claude/plugins/marketplaces/leo-claude-mktplace && ./scripts/setup-venvs.sh
```
**Result:** Created all symlinks successfully
### 9. Verified Symlinks Created
```bash
for dir in ~/.claude/plugins/marketplaces/leo-claude-mktplace/mcp-servers/*/; do
readlink "$dir.venv"
done
```
**Result:** All 5 symlinks now point to cache
### 10. Searched for postInstall Hook Support
Used claude-code-guide agent to research if marketplace.json supports postInstall hooks.
**Result:** NOT SUPPORTED - feature requested in GitHub issue #11240
### 11. Read Startup Hook Scripts
```bash
# Read data-platform startup hook
/home/lmiranda/claude-plugins-work/plugins/data-platform/hooks/startup-check.sh
```
**Result:** Hook checks cache first, then local, but only WARNS - doesn't auto-fix
### 12. Modified data-platform Startup Hook (USELESS)
Added code to auto-create symlink if missing. **THIS WAS POINTLESS** because MCP servers fail BEFORE hooks run.
### 13. Checked MCP Server Dependencies per Plugin
```bash
grep "mcp_servers" /home/lmiranda/claude-plugins-work/plugins/*/.claude-plugin/plugin.json
```
**Result:**
- cmdb-assistant → netbox
- contract-validator → contract-validator
- data-platform → data-platform
- projman → gitea
- pr-review → gitea
- viz-platform → viz-platform
### 14. Read .mcp.json Configuration
```bash
cat /home/lmiranda/claude-plugins-work/.mcp.json
```
**Result:** All 5 MCP servers configured with relative paths to run.sh
### 15. Verified Cache Venvs Have Python Binary
```bash
ls -la ~/.cache/claude-mcp-venvs/leo-claude-mktplace/data-platform/.venv/bin/python
```
**Result:** Exists (symlink to python3)
### 16. Tested MCP Server Startup Manually
```bash
cd ~/.claude/plugins/marketplaces/leo-claude-mktplace && ./mcp-servers/gitea/run.sh &
```
**Result:** Server starts and runs successfully
### 17. Checked Plugin JSON Validity
```bash
python3 -c "import json; json.load(open('plugin.json')); print('VALID')"
```
**Result:** Both projman and contract-validator JSON valid
### 18. Checked Commands Directories
```bash
ls ~/.claude/plugins/marketplaces/leo-claude-mktplace/plugins/projman/commands/
ls ~/.claude/plugins/marketplaces/leo-claude-mktplace/plugins/contract-validator/commands/
```
**Result:** Both exist with command files
### 19. Checked Hook Script Permissions
```bash
ls -la ~/.claude/plugins/marketplaces/.../hooks/*.sh
```
**Result:** All executable (755)
### 20. Checked MCP Server References Match
Compared mcp_servers in plugin.json vs entries in .mcp.json
**Result:** All match
### 21. Ran validate-marketplace.sh
```bash
cd ~/.claude/plugins/marketplaces/leo-claude-mktplace && ./scripts/validate-marketplace.sh
```
**Result:** ALL VALIDATIONS PASSED (12 plugins, all hooks, all MCP servers)
### 22. Checked for Claude Logs
```bash
ls ~/.claude/logs/
ls ~/.config/claude-code/logs/
```
**Result:** No log directories exist
### 23. Compared Source vs Installed Versions
**Result:** Version mismatch (not the cause):
- projman: Source 3.4.0, Installed 3.3.0
- contract-validator: Source 1.2.0, Installed 1.1.0
### 24. Read run.sh to Check Venv Logic
```bash
cat /home/lmiranda/claude-plugins-work/mcp-servers/gitea/run.sh
```
**Result:** run.sh ALREADY checks cache first (line 10-11), so should work without local symlink
## Critical Finding
**The run.sh scripts already check cache FIRST.** If cache venv exists, MCP server should start. But plugins still fail with "1 error".
**Claude Code hides the actual error message.** There is NO way for users to see WHAT the error is. This makes debugging impossible.
## What Claude Got Wrong
1. **Said "Found it!" at step 5** - Wrong. Identified symptom (missing symlinks) but not why plugins fail when cache venv exists
2. **Said "Found it!" at step 6** - Wrong. Symlinks in source don't matter, run.sh checks cache first
3. **Modified startup hook** - Useless. MCP servers fail before hooks run
4. **Spent 20+ tool calls checking things that were fine** - JSON valid, hooks valid, permissions valid, references valid
5. **Never identified the actual error** - Because Claude Code doesn't expose it
6. **Didn't recognize that run.sh already handles cache paths** - Should have stopped investigating after reading run.sh
## Actual Fix (Temporary)
@@ -37,31 +205,22 @@ cd ~/.claude/plugins/marketplaces/leo-claude-mktplace
./scripts/setup-venvs.sh
```
This creates symlinks in the installed path. But they get wiped on next update.
This creates symlinks. But they get wiped on next update.
## Proper Fix Needed
## Real Root Cause (UNKNOWN)
1. **Option A:** Make `run.sh` auto-create symlinks before starting MCP server
2. **Option B:** Have Claude Code preserve symlinks during marketplace updates
3. **Option C:** Wait for postInstall hooks feature (GitHub issue #11240)
The run.sh scripts check cache first. Cache venvs exist. MCP servers start manually. But plugins fail to load with "1 error".
**We never found the actual cause because Claude Code hides error messages.**
## Prevention
1. **After ANY marketplace reinstall/update:** Run `setup-venvs.sh` in installed path
2. **Document this prominently:** Add to README and CLAUDE.md
3. **Consider:** Making all MCP paths absolute to cache instead of relative with symlinks
## Claude AI Failures in This Session
1. Said "Found it!" multiple times without actually finding the root cause
2. Kept checking things that turned out to be fine
3. Did not identify that the actual error message is hidden from users
4. Took 5+ hours of user time without resolving the underlying issue
5. The SessionStart hook fix added won't help because MCP servers fail before hooks run
2. **File a bug report:** Claude Code should expose plugin load errors
3. **Consider:** Making MCP server paths absolute in .mcp.json instead of relative
## Tags
plugins, mcp, venv, symlinks, marketplace, installation, critical-bug
plugins, mcp, venv, symlinks, marketplace, installation, critical-bug, time-waste, claude-failure, hidden-errors
---
**Tags:** plugins, mcp, venv, symlinks, marketplace, critical-bug, time-waste, claude-failure