feat(marketplace): command consolidation + 8 new plugins (v8.1.0 → v9.0.0) [BREAKING]

Phase 1b: Rename all ~94 commands across 12 plugins to /<noun> <action>
sub-command pattern. Git-flow consolidated from 8→5 commands (commit
variants absorbed into --push/--merge/--sync flags). Dispatch files,
name: frontmatter, and cross-reference updates for all plugins.

Phase 2: Design documents for 8 new plugins in docs/designs/.

Phase 3: Scaffold 8 new plugins — saas-api-platform, saas-db-migrate,
saas-react-platform, saas-test-pilot, data-seed, ops-release-manager,
ops-deploy-pipeline, debug-mcp. Each with plugin.json, commands, agents,
skills, README, and claude-md-integration. Marketplace grows from 12→20.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-06 14:52:11 -05:00
parent 5098422858
commit 2d51df7a42
321 changed files with 13582 additions and 1019 deletions

View File

@@ -0,0 +1,25 @@
{
"name": "data-seed",
"version": "1.0.0",
"description": "Test data generation and database seeding with reproducible profiles",
"author": {
"name": "Leo Miranda",
"email": "leobmiranda@gmail.com"
},
"homepage": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace/src/branch/main/plugins/data-seed/README.md",
"repository": "https://gitea.hotserv.cloud/personal-projects/leo-claude-mktplace.git",
"license": "MIT",
"keywords": [
"test-data",
"seeding",
"faker",
"fixtures",
"schema",
"database",
"reproducible"
],
"commands": [
"./commands/"
],
"domain": "data"
}

View File

@@ -0,0 +1,74 @@
# data-seed Plugin
Test data generation and database seeding with reproducible profiles for Claude Code.
## Overview
The data-seed plugin generates realistic test data from schema definitions. It supports multiple ORM dialects (SQLAlchemy, Prisma, Django ORM, raw SQL DDL), handles foreign key dependencies automatically, and produces output in SQL, JSON, or CSV formats.
Key features:
- **Schema-first**: Parses your existing schema — no manual configuration needed
- **Realistic data**: Locale-aware faker providers for names, emails, addresses, and more
- **Reproducible**: Deterministic generation from seed profiles
- **Dependency-aware**: Resolves FK relationships and generates in correct insertion order
- **Profile-based**: Reusable profiles for small (unit tests), medium (development), and large (stress tests)
## Installation
This plugin is part of the Leo Claude Marketplace. Install via the marketplace or copy the `plugins/data-seed/` directory to your Claude Code plugins path.
## Commands
| Command | Description |
|---------|-------------|
| `/seed setup` | Setup wizard — detect schema source, configure output format |
| `/seed generate` | Generate seed data from schema or models |
| `/seed apply` | Apply seed data to database or create fixture files |
| `/seed profile` | Define and manage reusable data profiles |
| `/seed validate` | Validate seed data against schema constraints |
## Quick Start
```
/seed setup # Detect schema, configure output
/seed generate # Generate data with medium profile
/seed validate # Verify generated data integrity
/seed apply # Write fixture files
```
## Agents
| Agent | Model | Role |
|-------|-------|------|
| `seed-generator` | Sonnet | Data generation, profile management, and seed application |
| `seed-validator` | Haiku | Read-only validation of seed data integrity |
## Skills
| Skill | Purpose |
|-------|---------|
| `schema-inference` | Parse ORM models and SQL DDL into normalized schema |
| `faker-patterns` | Map columns to realistic faker providers |
| `relationship-resolution` | FK dependency ordering and circular dependency handling |
| `profile-management` | Seed profile CRUD and configuration |
| `visual-header` | Standard visual output formatting |
## Supported Schema Sources
- SQLAlchemy models (2.0+ and legacy 1.x)
- Prisma schema
- Django ORM models
- Raw SQL DDL (CREATE TABLE statements)
- JSON Schema definitions
## Output Formats
- SQL INSERT statements
- JSON fixtures (Django-compatible)
- CSV files
- Prisma seed scripts
- Python factory objects
## License
MIT License — Part of the Leo Claude Marketplace.

View File

@@ -0,0 +1,96 @@
---
name: seed-generator
description: Data generation, profile management, and seed application. Use when generating test data, managing seed profiles, or applying fixtures to databases.
model: sonnet
permissionMode: acceptEdits
---
# Seed Generator Agent
You are a test data generation specialist. Your role is to create realistic, schema-compliant seed data for databases and fixture files using faker patterns, profile-based configuration, and dependency-aware insertion ordering.
## Visual Output Requirements
**MANDATORY: Display header at start of every response.**
```
+----------------------------------------------------------------------+
| DATA-SEED - [Command Name] |
| [Context Line] |
+----------------------------------------------------------------------+
```
## Trigger Conditions
Activate this agent when:
- User runs `/seed setup`
- User runs `/seed generate [options]`
- User runs `/seed apply [options]`
- User runs `/seed profile [action]`
## Skills to Load
- skills/schema-inference.md
- skills/faker-patterns.md
- skills/relationship-resolution.md
- skills/profile-management.md
- skills/visual-header.md
## Core Principles
### Schema-First Approach
Always derive data generation rules from the schema definition, never from assumptions:
- Parse the actual schema source (SQLAlchemy, Prisma, Django, raw SQL)
- Respect every constraint: NOT NULL, UNIQUE, CHECK, foreign keys, defaults
- Map types precisely — do not generate strings for integer columns or vice versa
### Reproducibility
- Seed the random number generator from the profile name + table name for deterministic output
- Same profile + same schema = same data every time
- Document the seed value in output metadata for reproducibility
### Realistic Data
- Use locale-aware faker providers for names, addresses, phone numbers
- Generate plausible relationships (not every user has exactly one order)
- Include edge cases at configurable ratios (empty strings, boundary integers, unicode)
- Distribute enum values with realistic skew (not uniform)
### Safety
- Never modify schema or drop tables
- Database operations always wrapped in transactions
- TRUNCATE operations require explicit user confirmation
- Display execution plan before applying to database
## Operating Modes
### Setup Mode
- Detect project ORM/schema type
- Configure output format and directory
- Initialize default profiles
### Generate Mode
- Parse schema, resolve dependencies, generate data
- Output to configured format (SQL, JSON, CSV, factory objects)
### Apply Mode
- Read generated seed data
- Apply to database or write framework-specific fixture files
- Support clean (TRUNCATE) + seed workflow
### Profile Mode
- CRUD operations on data profiles
- Configure row counts, edge case ratios, custom overrides
## Error Handling
| Error | Response |
|-------|----------|
| Schema source not found | Prompt user to run `/seed setup` |
| Circular FK dependency detected | Use deferred constraint strategy, explain to user |
| UNIQUE constraint collision after 100 retries | FAIL: report column and suggest increasing uniqueness pool |
| Database connection failed (apply mode) | Report error, suggest using file target instead |
| Unsupported ORM dialect | WARN: fall back to raw SQL DDL parsing |
## Communication Style
Clear and structured. Show what will be generated before generating it. Display progress per table during generation. Summarize output with file paths and row counts. For errors, explain the constraint that was violated and suggest a fix.

View File

@@ -0,0 +1,106 @@
---
name: seed-validator
description: Read-only validation of seed data integrity and schema compliance. Use when verifying generated test data against constraints and referential integrity.
model: haiku
permissionMode: plan
disallowedTools: Write, Edit, MultiEdit
---
# Seed Validator Agent
You are a strict seed data integrity auditor. Your role is to validate generated test data against schema definitions, checking type constraints, referential integrity, uniqueness, and statistical properties. You never modify files or data — analysis and reporting only.
## Visual Output Requirements
**MANDATORY: Display header at start of every response.**
```
+----------------------------------------------------------------------+
| DATA-SEED - Validate |
| [Profile Name or Target Path] |
+----------------------------------------------------------------------+
```
## Trigger Conditions
Activate this agent when:
- User runs `/seed validate [options]`
- Generator agent requests post-generation validation
## Skills to Load
- skills/schema-inference.md
- skills/relationship-resolution.md
- skills/visual-header.md
## Validation Categories
### Type Constraints (FAIL on violation)
- Integer columns must contain valid integers within type range
- String columns must not exceed declared max length
- Date/datetime columns must contain parseable ISO 8601 values
- Boolean columns must contain only true/false/null
- Decimal columns must respect declared precision and scale
- UUID columns must match UUID v4 format
- Enum columns must contain only declared valid values
### Referential Integrity (FAIL on violation)
- Every foreign key value must reference an existing parent row
- Self-referential keys must reference rows in the same table
- Many-to-many through tables must have valid references on both sides
- Cascading dependency chains must be intact
### Uniqueness (FAIL on violation)
- Single-column UNIQUE constraints: no duplicates
- Composite unique constraints: no duplicate tuples
- Primary key uniqueness across all rows
### NOT NULL (FAIL on violation)
- Required columns must not contain null values in any row
### Statistical Properties (WARN level, --strict only)
- Null ratio within tolerance of profile target
- Edge case ratio within tolerance of profile target
- Value distribution not unrealistically uniform for enum/category columns
- Date ranges within reasonable bounds
- Numeric values within sensible ranges for domain
## Report Format
```
+----------------------------------------------------------------------+
| DATA-SEED - Validate |
| Profile: [name] |
+----------------------------------------------------------------------+
Tables Validated: N
Rows Checked: N
Constraints Verified: N
FAIL (N)
1. [table.column] Description of violation
Fix: Suggested corrective action
WARN (N)
1. [table.column] Description of concern
Suggestion: Recommended improvement
INFO (N)
1. [table] Statistical observation
Note: Context
VERDICT: PASS | FAIL (N blocking issues)
```
## Error Handling
| Error | Response |
|-------|----------|
| No seed data found | Report error, suggest running `/seed generate` |
| Schema source missing | Report error, suggest running `/seed setup` |
| Malformed seed file | FAIL: report file path and parse error |
| Profile not found | Use default profile, WARN about missing profile |
## Communication Style
Precise and concise. Report exact locations of violations with table name, column name, and row numbers where applicable. Group findings by severity. Always include a clear PASS/FAIL verdict at the end.

View File

@@ -0,0 +1,93 @@
# data-seed Plugin - CLAUDE.md Integration
Add this section to your project's CLAUDE.md to enable data-seed plugin features.
## Suggested CLAUDE.md Section
```markdown
## Test Data Generation (data-seed)
This project uses the data-seed plugin for test data generation and database seeding.
### Configuration
**Schema Source**: Auto-detected from project ORM (SQLAlchemy, Prisma, Django, raw SQL)
**Output Directory**: `seeds/` or `fixtures/` (configurable via `/seed setup`)
**Profiles**: `seed-profiles.json` in output directory
### Available Commands
| Command | Purpose |
|---------|---------|
| `/seed setup` | Configure schema source and output format |
| `/seed generate` | Generate test data from schema |
| `/seed apply` | Apply seed data to database or fixture files |
| `/seed profile` | Manage data profiles (small, medium, large) |
| `/seed validate` | Validate seed data against schema constraints |
### Data Profiles
| Profile | Rows/Table | Edge Cases | Use Case |
|---------|------------|------------|----------|
| `small` | 10 | None | Unit tests |
| `medium` | 100 | 10% | Development |
| `large` | 1000 | 5% | Performance testing |
### Typical Workflow
```
/seed setup # First-time configuration
/seed generate --profile medium # Generate development data
/seed validate # Verify integrity
/seed apply --target file # Write fixture files
```
### Custom Profiles
Create custom profiles for project-specific needs:
```
/seed profile create staging
```
Override row counts per table and set custom value pools for enum columns.
```
## Environment Variables
Add to project `.env` if needed:
```env
# Seed data configuration
SEED_OUTPUT_DIR=./seeds
SEED_DEFAULT_PROFILE=medium
SEED_DEFAULT_LOCALE=en_US
```
## Typical Workflows
### Initial Setup
```
/seed setup # Detect schema, configure output
/seed generate # Generate with default profile
/seed validate # Verify data integrity
```
### CI/CD Integration
```
/seed generate --profile small # Fast, minimal data for tests
/seed apply --target file # Write fixtures
# Run test suite with fixtures
```
### Development Environment
```
/seed generate --profile medium # Realistic development data
/seed apply --target database --clean # Clean and seed database
```
### Performance Testing
```
/seed generate --profile large # High-volume data
/seed apply --target database # Load into test database
# Run performance benchmarks
```

View File

@@ -0,0 +1,70 @@
---
name: seed apply
---
# /seed apply - Apply Seed Data
## Skills to Load
- skills/profile-management.md
- skills/visual-header.md
## Visual Output
Display header: `DATA-SEED - Apply`
## Usage
```
/seed apply [--profile <name>] [--target <database|file>] [--clean] [--dry-run]
```
## Workflow
### 1. Locate Seed Data
- Look for generated seed files in configured output directory
- If no seed data found, prompt user to run `/seed generate` first
- Display available seed datasets with timestamps and profiles
### 2. Determine Target
- `--target database`: Apply directly to connected database via SQL execution
- `--target file` (default): Write fixture files for framework consumption
- Auto-detect framework for file output:
- Django: `fixtures/` directory as JSON fixtures compatible with `loaddata`
- SQLAlchemy: Python factory files or SQL insert scripts
- Prisma: `prisma/seed.ts` compatible format
- Generic: SQL insert statements or CSV files
### 3. Pre-Apply Validation
- If targeting database: verify connection, check table existence
- If `--clean` specified: generate TRUNCATE/DELETE statements for affected tables (respecting FK order)
- Display execution plan showing table order, row counts, and clean operations
- If `--dry-run`: display plan and exit without applying
### 4. Apply Data
- Execute in dependency order (parents before children)
- If targeting database: wrap in transaction, rollback on error
- If targeting files: write all files atomically
- Track progress: display per-table status during application
### 5. Post-Apply Summary
- Report rows inserted per table
- Report any errors or skipped rows
- Display total execution time
- If database target: verify row counts match expectations
## Examples
```
/seed apply # Write fixture files (default)
/seed apply --target database # Insert directly into database
/seed apply --profile small --clean # Clean + apply small dataset
/seed apply --dry-run # Preview without applying
/seed apply --target database --clean # Truncate then seed database
```
## Safety
- Database operations always use transactions
- `--clean` requires explicit confirmation before executing TRUNCATE
- Never drops tables or modifies schema — seed data only
- `--dry-run` is always safe and produces no side effects

View File

@@ -0,0 +1,71 @@
---
name: seed generate
---
# /seed generate - Generate Seed Data
## Skills to Load
- skills/schema-inference.md
- skills/faker-patterns.md
- skills/relationship-resolution.md
- skills/visual-header.md
## Visual Output
Display header: `DATA-SEED - Generate`
## Usage
```
/seed generate [table_name] [--profile <name>] [--rows <count>] [--format <sql|json|csv>] [--locale <locale>]
```
## Workflow
### 1. Parse Schema
- Load schema from configured source (see `/seed setup`)
- Extract tables, columns, types, constraints, and relationships
- Use `skills/schema-inference.md` to normalize types across ORM dialects
### 2. Resolve Generation Order
- Build dependency graph from foreign key relationships
- Use `skills/relationship-resolution.md` to determine insertion order
- Handle circular dependencies via deferred constraint resolution
- If specific `table_name` provided, generate only that table plus its dependencies
### 3. Select Profile
- Load profile from `seed-profiles.json` (default: `medium`)
- Override row count if `--rows` specified
- Apply profile-specific edge case ratios and custom value overrides
### 4. Generate Data
- For each table in dependency order:
- Map column types to faker providers using `skills/faker-patterns.md`
- Respect NOT NULL constraints (never generate null for required fields)
- Respect UNIQUE constraints (track generated values, retry on collision)
- Generate foreign key values from previously generated parent rows
- Apply locale-specific patterns for names, addresses, phone numbers
- Handle enum/check constraints by selecting from valid values only
- Include edge cases per profile settings (empty strings, boundary values, unicode)
### 5. Output Results
- Write generated data in requested format to configured output directory
- Display summary: tables generated, row counts, file paths
- Report any constraint violations or generation warnings
## Examples
```
/seed generate # All tables, medium profile
/seed generate users # Only users table + dependencies
/seed generate --profile large # All tables, 1000 rows each
/seed generate orders --rows 50 # 50 order rows
/seed generate --format json # Output as JSON fixtures
/seed generate --locale pt_BR # Brazilian Portuguese data
```
## Edge Cases
- Self-referential foreign keys (e.g., `manager_id` on `employees`): generate root rows first, then assign managers from existing rows
- Many-to-many through tables: generate both sides first, then populate junction table
- Nullable foreign keys: generate null values at the profile's configured null ratio

View File

@@ -0,0 +1,86 @@
---
name: seed profile
---
# /seed profile - Manage Data Profiles
## Skills to Load
- skills/profile-management.md
- skills/visual-header.md
## Visual Output
Display header: `DATA-SEED - Profile Management`
## Usage
```
/seed profile list
/seed profile show <name>
/seed profile create <name>
/seed profile edit <name>
/seed profile delete <name>
```
## Workflow
### list — Show All Profiles
- Read `seed-profiles.json` from configured output directory
- Display table: name, row counts per table, edge case ratio, description
- Highlight the default profile
### show — Profile Details
- Display full profile definition including:
- Per-table row counts
- Edge case configuration (null ratio, boundary values, unicode strings)
- Custom value overrides per column
- Locale settings
- Relationship density settings
### create — New Profile
- Ask user for profile name and description
- Ask for base row count (applies to all tables unless overridden)
- Ask for per-table overrides (optional)
- Ask for edge case ratio (0.0 = no edge cases, 1.0 = all edge cases)
- Ask for custom column overrides (e.g., `users.role` always "admin")
- Save to `seed-profiles.json`
### edit — Modify Profile
- Load existing profile, display current values
- Allow user to modify any field interactively
- Save updated profile
### delete — Remove Profile
- Confirm deletion with user
- Cannot delete the last remaining profile
- Remove from `seed-profiles.json`
## Profile Schema
```json
{
"name": "medium",
"description": "Realistic dataset for development and manual testing",
"default_rows": 100,
"table_overrides": {
"users": 50,
"orders": 200,
"order_items": 500
},
"edge_case_ratio": 0.1,
"null_ratio": 0.05,
"locale": "en_US",
"custom_values": {
"users.status": ["active", "active", "active", "inactive"],
"users.role": ["user", "user", "user", "admin"]
}
}
```
## Built-in Profiles
| Profile | Rows | Edge Cases | Use Case |
|---------|------|------------|----------|
| `small` | 10 | 0% | Unit tests, quick validation |
| `medium` | 100 | 10% | Development, manual testing |
| `large` | 1000 | 5% | Performance testing, stress testing |

View File

@@ -0,0 +1,59 @@
---
name: seed setup
---
# /seed setup - Data Seed Setup Wizard
## Skills to Load
- skills/schema-inference.md
- skills/visual-header.md
## Visual Output
Display header: `DATA-SEED - Setup Wizard`
## Usage
```
/seed setup
```
## Workflow
### Phase 1: Environment Detection
- Detect project type: Python (SQLAlchemy, Django ORM), Node.js (Prisma, TypeORM), or raw SQL
- Check for existing schema files: `schema.prisma`, `models.py`, `*.sql` DDL files
- Identify package manager and installed ORM libraries
### Phase 2: Schema Source Configuration
- Ask user to confirm detected schema source or specify manually
- Supported sources:
- SQLAlchemy models (`models.py`, `models/` directory)
- Prisma schema (`prisma/schema.prisma`)
- Django models (`models.py` with Django imports)
- Raw SQL DDL files (`*.sql` with CREATE TABLE statements)
- JSON Schema definitions (`*.schema.json`)
- Store schema source path for future commands
### Phase 3: Output Configuration
- Ask preferred output format: SQL inserts, JSON fixtures, CSV files, or ORM factory objects
- Ask preferred output directory (default: `seeds/` or `fixtures/`)
- Ask default locale for faker data (default: `en_US`)
### Phase 4: Profile Initialization
- Create default profiles if none exist:
- `small` — 10 rows per table, minimal relationships
- `medium` — 100 rows per table, realistic relationships
- `large` — 1000 rows per table, stress-test volume
- Store profiles in `seed-profiles.json` in output directory
### Phase 5: Validation
- Verify schema can be parsed from detected source
- Display summary with detected tables, column counts, and relationship map
- Inform user of available commands
## Important Notes
- Uses Bash, Read, Write, AskUserQuestion tools
- Does not require database connection (schema-first approach)
- Profile definitions are portable across environments

View File

@@ -0,0 +1,98 @@
---
name: seed validate
---
# /seed validate - Validate Seed Data
## Skills to Load
- skills/schema-inference.md
- skills/relationship-resolution.md
- skills/visual-header.md
## Visual Output
Display header: `DATA-SEED - Validate`
## Usage
```
/seed validate [--profile <name>] [--strict]
```
## Workflow
### 1. Load Schema and Seed Data
- Parse schema from configured source using `skills/schema-inference.md`
- Load generated seed data from output directory
- If no seed data found, report error and suggest running `/seed generate`
### 2. Type Constraint Validation
- For each column in each table, verify generated values match declared type:
- Integer columns contain only integers within range (INT, BIGINT, SMALLINT)
- String columns respect max length constraints (VARCHAR(N))
- Date/datetime columns contain parseable date values
- Boolean columns contain only true/false/null
- Decimal columns respect precision and scale
- UUID columns contain valid UUID format
- Enum columns contain only declared valid values
### 3. Referential Integrity Validation
- Use `skills/relationship-resolution.md` to build FK dependency graph
- For every foreign key value in child tables, verify parent row exists
- For self-referential keys, verify referenced row exists in same table
- For many-to-many through tables, verify both sides exist
- Report orphaned references as FAIL
### 4. Constraint Compliance
- NOT NULL: verify no null values in required columns
- UNIQUE: verify no duplicate values in unique columns or unique-together groups
- CHECK constraints: evaluate check expressions against generated data
- Default values: verify defaults are applied where column value is omitted
### 5. Statistical Validation (--strict mode)
- Verify null ratio matches profile configuration within tolerance
- Verify edge case ratio matches profile configuration
- Verify row counts match profile specification
- Verify distribution of enum/category values is not unrealistically uniform
- Verify date ranges are within reasonable bounds (not year 9999)
### 6. Report
- Display validation results grouped by severity:
- **FAIL**: Type mismatch, FK violation, NOT NULL violation, UNIQUE violation
- **WARN**: Unrealistic distributions, unexpected null ratios, date range issues
- **INFO**: Statistics summary, coverage metrics
```
+----------------------------------------------------------------------+
| DATA-SEED - Validate |
| Profile: medium |
+----------------------------------------------------------------------+
Tables Validated: 8
Rows Checked: 1,450
Constraints Verified: 42
FAIL (0)
No blocking violations found.
WARN (2)
1. [orders.created_at] Date range spans 200 years
Suggestion: Constrain date generator to recent years
2. [users.email] 3 duplicate values detected
Suggestion: Increase faker uniqueness retry count
INFO (1)
1. [order_items] Null ratio 0.12 (profile target: 0.10)
Within acceptable tolerance.
VERDICT: PASS (0 blocking issues)
```
## Examples
```
/seed validate # Standard validation
/seed validate --profile large # Validate large profile data
/seed validate --strict # Include statistical checks
```

View File

@@ -0,0 +1,17 @@
---
description: Test data generation — create realistic fake data from schema definitions
---
# /seed
Test data generation and database seeding with reproducible profiles.
## Sub-commands
| Sub-command | Description |
|-------------|-------------|
| `/seed setup` | Setup wizard for data-seed configuration |
| `/seed generate` | Generate seed data from schema or models |
| `/seed apply` | Apply seed data to database or create fixture files |
| `/seed profile` | Define reusable data profiles (small, medium, large) |
| `/seed validate` | Validate seed data against schema constraints |

View File

@@ -0,0 +1,90 @@
---
name: faker-patterns
description: Realistic data generation patterns using faker providers with locale awareness
---
# Faker Patterns
## Purpose
Map schema column types and naming conventions to appropriate faker data generators. This skill ensures generated test data is realistic, locale-aware, and respects type constraints.
---
## Column Name to Provider Mapping
Use column name heuristics to select the most realistic faker provider:
| Column Name Pattern | Faker Provider | Example Output |
|---------------------|---------------|----------------|
| `*name`, `first_name` | `faker.name()` / `faker.first_name()` | "Alice Johnson" |
| `*last_name`, `surname` | `faker.last_name()` | "Rodriguez" |
| `*email` | `faker.email()` | "alice@example.com" |
| `*phone*`, `*tel*` | `faker.phone_number()` | "+1-555-0123" |
| `*address*`, `*street*` | `faker.street_address()` | "742 Evergreen Terrace" |
| `*city` | `faker.city()` | "Toronto" |
| `*state*`, `*province*` | `faker.state()` | "Ontario" |
| `*country*` | `faker.country()` | "Canada" |
| `*zip*`, `*postal*` | `faker.postcode()` | "M5V 2H1" |
| `*url*`, `*website*` | `faker.url()` | "https://example.com" |
| `*company*`, `*org*` | `faker.company()` | "Acme Corp" |
| `*title*`, `*subject*` | `faker.sentence(nb_words=5)` | "Updated quarterly report summary" |
| `*description*`, `*bio*`, `*body*` | `faker.paragraph()` | Multi-sentence text |
| `*created*`, `*updated*`, `*_at` | `faker.date_time_between(start_date='-2y')` | "2024-06-15T10:30:00" |
| `*date*`, `*dob*`, `*birth*` | `faker.date_of_birth(minimum_age=18)` | "1990-03-22" |
| `*price*`, `*amount*`, `*cost*` | `faker.pydecimal(min_value=0.01, max_value=9999.99)` | 49.99 |
| `*quantity*`, `*count*` | `faker.random_int(min=1, max=100)` | 7 |
| `*status*` | Random from enum or `["active", "inactive", "pending"]` | "active" |
| `*uuid*`, `*guid*` | `faker.uuid4()` | "550e8400-e29b-41d4-a716-446655440000" |
| `*ip*`, `*ip_address*` | `faker.ipv4()` | "192.168.1.42" |
| `*color*`, `*colour*` | `faker.hex_color()` | "#3498db" |
| `*password*`, `*hash*` | `faker.sha256()` | Hash string (never plaintext) |
| `*image*`, `*avatar*`, `*photo*` | `faker.image_url()` | "https://picsum.photos/200" |
| `*slug*` | `faker.slug()` | "updated-quarterly-report" |
| `*username*`, `*login*` | `faker.user_name()` | "alice_johnson42" |
## Type Fallback Mapping
When column name does not match any pattern, fall back to type-based generation:
| Canonical Type | Generator |
|----------------|-----------|
| `string` | `faker.pystr(max_chars=max_length)` |
| `integer` | `faker.random_int(min=0, max=2147483647)` |
| `float` | `faker.pyfloat(min_value=0, max_value=10000)` |
| `decimal` | `faker.pydecimal(left_digits=precision-scale, right_digits=scale)` |
| `boolean` | `faker.pybool()` |
| `datetime` | `faker.date_time_between(start_date='-2y', end_date='now')` |
| `date` | `faker.date_between(start_date='-2y', end_date='today')` |
| `uuid` | `faker.uuid4()` |
| `json` | `{"key": faker.word(), "value": faker.sentence()}` |
## Locale Support
Supported locales affect names, addresses, phone formats, and postal codes:
| Locale | Names | Addresses | Phone | Currency |
|--------|-------|-----------|-------|----------|
| `en_US` | English names | US addresses | US format | USD |
| `en_CA` | English names | Canadian addresses | CA format | CAD |
| `en_GB` | English names | UK addresses | UK format | GBP |
| `pt_BR` | Portuguese names | Brazilian addresses | BR format | BRL |
| `fr_FR` | French names | French addresses | FR format | EUR |
| `de_DE` | German names | German addresses | DE format | EUR |
| `ja_JP` | Japanese names | Japanese addresses | JP format | JPY |
| `es_ES` | Spanish names | Spanish addresses | ES format | EUR |
Default locale: `en_US`. Override per-profile or per-command with `--locale`.
## Edge Case Values
Include at configurable ratio (default 10%):
| Type | Edge Cases |
|------|------------|
| `string` | Empty string `""`, max-length string, unicode characters, emoji, SQL special chars `'; DROP TABLE --` |
| `integer` | 0, -1, MAX_INT, MIN_INT |
| `float` | 0.0, -0.0, very small (0.0001), very large (999999.99) |
| `date` | Today, yesterday, epoch (1970-01-01), leap day (2024-02-29) |
| `boolean` | null (if nullable) |
| `email` | Plus-addressed `user+tag@example.com`, long domain, subdomain email |

View File

@@ -0,0 +1,116 @@
---
name: profile-management
description: Seed profile definitions with row counts, edge case ratios, and custom value overrides
---
# Profile Management
## Purpose
Define and manage reusable seed data profiles that control how much data is generated, what edge cases are included, and what custom overrides apply. Profiles enable reproducible, consistent test data across environments.
---
## Profile Storage
Profiles are stored in `seed-profiles.json` in the configured output directory (default: `seeds/` or `fixtures/`).
## Profile Schema
```json
{
"profiles": [
{
"name": "profile-name",
"description": "Human-readable description",
"default_rows": 100,
"table_overrides": {
"table_name": 200
},
"edge_case_ratio": 0.1,
"null_ratio": 0.05,
"locale": "en_US",
"seed_value": 42,
"custom_values": {
"table.column": ["value1", "value2", "value3"]
},
"relationship_density": {
"many_to_many": 0.3,
"self_ref_max_depth": 3
}
}
],
"default_profile": "medium"
}
```
## Field Definitions
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `name` | string | Yes | Unique profile identifier (lowercase, hyphens allowed) |
| `description` | string | No | What this profile is for |
| `default_rows` | integer | Yes | Row count for tables without explicit override |
| `table_overrides` | object | No | Per-table row count overrides |
| `edge_case_ratio` | float | No | Fraction of rows with edge case values (0.0 to 1.0, default 0.1) |
| `null_ratio` | float | No | Fraction of nullable columns set to null (0.0 to 1.0, default 0.05) |
| `locale` | string | No | Faker locale for name/address generation (default "en_US") |
| `seed_value` | integer | No | Random seed for reproducibility (default: hash of profile name) |
| `custom_values` | object | No | Column-specific value pools (table.column -> array of values) |
| `relationship_density` | object | No | Controls many-to-many fill ratio and self-referential depth |
## Built-in Profiles
### small
- `default_rows`: 10
- `edge_case_ratio`: 0.0
- `null_ratio`: 0.0
- Use case: unit tests, schema validation, quick smoke tests
- Characteristics: minimal data, no edge cases, all required fields populated
### medium
- `default_rows`: 100
- `edge_case_ratio`: 0.1
- `null_ratio`: 0.05
- Use case: development, manual testing, demo environments
- Characteristics: realistic volume, occasional edge cases, some nulls
### large
- `default_rows`: 1000
- `edge_case_ratio`: 0.05
- `null_ratio`: 0.03
- Use case: performance testing, pagination testing, stress testing
- Characteristics: high volume, lower edge case ratio to avoid noise
## Custom Value Overrides
Override the faker generator for specific columns with a weighted value pool:
```json
{
"custom_values": {
"users.role": ["user", "user", "user", "admin"],
"orders.status": ["completed", "completed", "pending", "cancelled", "refunded"],
"products.currency": ["USD"]
}
}
```
Values are selected randomly with replacement. Duplicate entries in the array increase that value's probability (e.g., "user" appears 3x = 75% probability).
## Profile Operations
### Resolution Order
When determining row count for a table:
1. Command-line `--rows` flag (highest priority)
2. Profile `table_overrides` for that specific table
3. Profile `default_rows`
4. Built-in default: 100
### Validation Rules
- Profile name must be unique within `seed-profiles.json`
- `default_rows` must be >= 1
- `edge_case_ratio` must be between 0.0 and 1.0
- `null_ratio` must be between 0.0 and 1.0
- Custom value arrays must not be empty
- Cannot delete the last remaining profile

View File

@@ -0,0 +1,118 @@
---
name: relationship-resolution
description: Foreign key resolution, dependency ordering, and circular dependency handling for seed data
---
# Relationship Resolution
## Purpose
Determine the correct order for generating and inserting seed data across tables with foreign key dependencies. Handle edge cases including circular dependencies, self-referential relationships, and many-to-many through tables.
---
## Dependency Graph Construction
### Step 1: Extract Foreign Keys
For each table, identify all columns with foreign key constraints:
- Direct FK references to other tables
- Self-referential FKs (same table)
- Composite FKs spanning multiple columns
### Step 2: Build Directed Graph
- Each table is a node
- Each FK creates a directed edge: child -> parent (child depends on parent)
- Self-referential edges are noted but excluded from ordering (handled separately)
### Step 3: Topological Sort
- Apply topological sort to determine insertion order
- Tables with no dependencies come first
- Tables depending on others come after their dependencies
- Result: ordered list where every table's dependencies appear before it
## Insertion Order Example
Given schema:
```
users (no FK)
categories (no FK)
products (FK -> categories)
orders (FK -> users)
order_items (FK -> orders, FK -> products)
reviews (FK -> users, FK -> products)
```
Insertion order: `users, categories, products, orders, order_items, reviews`
Deletion order (reverse): `reviews, order_items, orders, products, categories, users`
## Circular Dependency Handling
When topological sort detects a cycle:
### Strategy 1: Nullable FK Deferral
If one FK in the cycle is nullable:
1. Insert rows with nullable FK set to NULL
2. Complete the cycle for the other table
3. UPDATE the nullable FK to point to the now-existing rows
Example: `departments.manager_id -> employees`, `employees.department_id -> departments`
1. Insert departments with `manager_id = NULL`
2. Insert employees referencing departments
3. UPDATE departments to set `manager_id` to an employee
### Strategy 2: Deferred Constraints
If database supports deferred constraints (PostgreSQL):
1. Set FK constraints to DEFERRED within transaction
2. Insert all rows in any order
3. Constraints checked at COMMIT time
### Strategy 3: Two-Pass Generation
If neither strategy works:
1. First pass: generate all rows without cross-cycle FK values
2. Second pass: update FK values to reference generated rows from the other table
## Self-Referential Relationships
Common pattern: `employees.manager_id -> employees.id`
### Generation Strategy
1. Generate root rows first (manager_id = NULL) — these are top-level managers
2. Generate second tier referencing root rows
3. Generate remaining rows referencing any previously generated row
4. Depth distribution controlled by profile (default: max depth 3, pyramid shape)
### Configuration
```json
{
"self_ref_null_ratio": 0.1,
"self_ref_max_depth": 3,
"self_ref_distribution": "pyramid"
}
```
## Many-to-Many Through Tables
Detection: a table with exactly two FK columns and no non-FK data columns (excluding PK and timestamps).
### Generation Strategy
1. Generate both parent tables first
2. Generate through table rows pairing random parents
3. Respect uniqueness on the (FK1, FK2) composite — no duplicate pairings
4. Density controlled by profile: sparse (10% of possible pairs), medium (30%), dense (60%)
## Deletion Order
When `--clean` is specified for `/seed apply`:
1. Reverse the insertion order
2. TRUNCATE or DELETE in this order to avoid FK violations
3. For circular dependencies: disable FK checks, truncate, re-enable (with user confirmation)
## Error Handling
| Scenario | Response |
|----------|----------|
| Unresolvable cycle (no nullable FKs, no deferred constraints) | FAIL: report cycle, suggest schema modification |
| Missing parent table in schema | FAIL: report orphaned FK reference |
| FK references non-existent column | FAIL: report schema inconsistency |
| Through table detection false positive | WARN: ask user to confirm junction table identification |

View File

@@ -0,0 +1,81 @@
---
name: schema-inference
description: Infer data types, constraints, and relationships from ORM models or raw SQL DDL
---
# Schema Inference
## Purpose
Parse and normalize schema definitions from multiple ORM dialects into a unified internal representation. This skill enables data generation and validation commands to work across SQLAlchemy, Prisma, Django ORM, and raw SQL DDL without dialect-specific logic in every command.
---
## Supported Schema Sources
| Source | Detection | File Patterns |
|--------|-----------|---------------|
| SQLAlchemy | `from sqlalchemy import`, `Column(`, `mapped_column(` | `models.py`, `models/*.py` |
| Prisma | `model` blocks with `@id`, `@relation` | `prisma/schema.prisma` |
| Django ORM | `from django.db import models`, `models.CharField` | `models.py` with Django imports |
| Raw SQL DDL | `CREATE TABLE` statements | `*.sql`, `schema.sql`, `migrations/*.sql` |
| JSON Schema | `"type": "object"`, `"properties":` | `*.schema.json` |
## Type Normalization
Map dialect-specific types to a canonical set:
| Canonical Type | SQLAlchemy | Prisma | Django | SQL |
|----------------|------------|--------|--------|-----|
| `string` | `String(N)`, `Text` | `String` | `CharField`, `TextField` | `VARCHAR(N)`, `TEXT` |
| `integer` | `Integer`, `BigInteger`, `SmallInteger` | `Int`, `BigInt` | `IntegerField`, `BigIntegerField` | `INT`, `BIGINT`, `SMALLINT` |
| `float` | `Float`, `Numeric` | `Float` | `FloatField` | `FLOAT`, `REAL`, `DOUBLE` |
| `decimal` | `Numeric(P,S)` | `Decimal` | `DecimalField` | `DECIMAL(P,S)`, `NUMERIC(P,S)` |
| `boolean` | `Boolean` | `Boolean` | `BooleanField` | `BOOLEAN`, `BIT` |
| `datetime` | `DateTime` | `DateTime` | `DateTimeField` | `TIMESTAMP`, `DATETIME` |
| `date` | `Date` | `DateTime` | `DateField` | `DATE` |
| `uuid` | `UUID` | `String @default(uuid())` | `UUIDField` | `UUID` |
| `json` | `JSON` | `Json` | `JSONField` | `JSON`, `JSONB` |
| `enum` | `Enum(...)` | `enum` block | `choices=` | `ENUM(...)`, `CHECK IN (...)` |
## Constraint Extraction
For each column, extract:
- **nullable**: Whether NULL values are allowed (default: true unless PK or explicit NOT NULL)
- **unique**: Whether values must be unique
- **max_length**: For string types, the maximum character length
- **precision/scale**: For decimal types
- **default**: Default value expression
- **check**: CHECK constraint expressions (e.g., `age >= 0`)
- **primary_key**: Whether this column is part of the primary key
## Relationship Extraction
Identify foreign key relationships:
- **parent_table**: The referenced table
- **parent_column**: The referenced column (usually PK)
- **on_delete**: CASCADE, SET NULL, RESTRICT, NO ACTION
- **self_referential**: True if FK references same table
- **many_to_many**: Detected from junction/through tables with two FKs and no additional non-FK columns
## Output Format
Internal representation used by other skills:
```json
{
"tables": {
"users": {
"columns": {
"id": {"type": "integer", "primary_key": true, "nullable": false},
"email": {"type": "string", "max_length": 255, "unique": true, "nullable": false},
"name": {"type": "string", "max_length": 100, "nullable": false},
"manager_id": {"type": "integer", "nullable": true, "foreign_key": {"table": "users", "column": "id"}}
},
"relationships": [
{"type": "self_referential", "column": "manager_id", "references": "users.id"}
]
}
}
}
```

View File

@@ -0,0 +1,27 @@
# Visual Header Skill
Standard visual header for data-seed commands.
## Header Template
```
+----------------------------------------------------------------------+
| DATA-SEED - [Context] |
+----------------------------------------------------------------------+
```
## Context Values by Command
| Command | Context |
|---------|---------|
| `/seed setup` | Setup Wizard |
| `/seed generate` | Generate |
| `/seed apply` | Apply |
| `/seed profile` | Profile Management |
| `/seed validate` | Validate |
| Agent mode (seed-generator) | Data Generation |
| Agent mode (seed-validator) | Validation |
## Usage
Display header at the start of every command response before proceeding with the operation.