Files
personal-portfolio/CLAUDE.md
lmiranda 263b52d5e4
Some checks failed
CI / lint-and-test (push) Has been cancelled
docs: sync documentation with codebase
Fixes identified by doc-guardian audit:

Critical fixes:
- DATABASE_SCHEMA.md: Fix staging model name stg_police__crimes → stg_toronto__crime
- DATABASE_SCHEMA.md: Update mart model names to match actual dbt models
- CLAUDE.md: Fix errors/ description (no handlers module exists)
- scripts/etl/toronto.sh: Fix parser module references to actual modules

Stale fixes:
- CONTRIBUTING.md: Add make typecheck, test-cov; fix make ci description
- PROJECT_REFERENCE.md: Document services/, callback modules, all Makefile targets
- CLAUDE.md: Expand Makefile commands, add plugin documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:25:29 -05:00

11 KiB

CLAUDE.md

Working context for Claude Code on the Analytics Portfolio project.


Project Status

Last Completed Sprint: 9 (Neighbourhood Dashboard Transition) Current State: Ready for deployment sprint or new features Branch: development (feature branches merge here)


Quick Reference

Run Commands

# Setup & Database
make setup          # Install deps, create .env, init pre-commit
make docker-up      # Start PostgreSQL + PostGIS (auto-detects x86/ARM)
make docker-down    # Stop containers
make docker-logs    # View container logs
make db-init        # Initialize database schema
make db-reset       # Drop and recreate database (DESTRUCTIVE)

# Data Loading
make load-data      # Load Toronto data from APIs, seed dev data
make load-data-only # Load Toronto data without dbt or seeding
make seed-data      # Seed sample development data

# Application
make run            # Start Dash dev server

# Testing & Quality
make test           # Run pytest
make test-cov       # Run pytest with coverage
make lint           # Run ruff linter
make format         # Run ruff formatter
make typecheck      # Run mypy type checker
make ci             # Run all checks (lint, typecheck, test)

# dbt
make dbt-run        # Run dbt models
make dbt-test       # Run dbt tests
make dbt-docs       # Generate and serve dbt documentation

# Maintenance
make clean          # Remove build artifacts and caches

Branch Workflow

  1. Create feature branch FROM development: git checkout -b feature/{sprint}-{description}
  2. Work and commit on feature branch
  3. Merge INTO development when complete
  4. development -> staging -> main for releases

Code Conventions

Import Style

Context Style Example
Same directory Single dot from .neighbourhood import NeighbourhoodRecord
Sibling directory Double dot from ..schemas.neighbourhood import CensusRecord
External packages Absolute import pandas as pd

Module Responsibilities

Directory Contains Purpose
schemas/ Pydantic models Data validation
models/ SQLAlchemy ORM Database persistence
parsers/ API/CSV extraction Raw data ingestion
loaders/ Database operations Data loading
services/ Query functions dbt mart queries, business logic
figures/ Chart factories Plotly figure generation
callbacks/ Dash callbacks In pages/{dashboard}/callbacks/
errors/ Exception classes Custom exceptions
utils/ Helper modules Markdown loading, shared utilities

Type Hints

Use Python 3.10+ style:

def process(items: list[str], config: dict[str, int] | None = None) -> bool:
    ...

Error Handling

# errors/exceptions.py
class PortfolioError(Exception):
    """Base exception."""

class ParseError(PortfolioError):
    """PDF/CSV parsing failed."""

class ValidationError(PortfolioError):
    """Pydantic or business rule validation failed."""

class LoadError(PortfolioError):
    """Database load operation failed."""

Code Standards

  • Single responsibility functions with verb naming
  • Early returns over deep nesting
  • Google-style docstrings only for non-obvious behavior
  • Module-level constants for magic values
  • Pydantic BaseSettings for runtime config

Application Structure

Entry Point: portfolio_app/app.py (Dash app factory with Pages routing)

Directory Purpose Notes
pages/ Dash Pages (file-based routing) URLs match file paths
pages/toronto/ Toronto Dashboard tabs/ for layouts, callbacks/ for interactions
components/ Shared UI components metric_card, sidebar, map_controls, time_slider
figures/ Plotly chart factories choropleth, bar_charts, scatter, radar, time_series
toronto/ Toronto data logic parsers/, loaders/, schemas/, models/
content/blog/ Markdown blog articles Processed by utils/markdown_loader.py
notebooks/ Data documentation 5 domains: overview, housing, safety, demographics, amenities

Key URLs: / (home), /toronto (dashboard), /blog (listing), /blog/{slug} (articles)


Tech Stack (Locked)

Layer Technology Version
Database PostgreSQL + PostGIS 16.x
Validation Pydantic >=2.0
ORM SQLAlchemy >=2.0 (2.0-style API only)
Transformation dbt-postgres >=1.7
Data Processing Pandas >=2.1
Geospatial GeoPandas + Shapely >=0.14
Visualization Dash + Plotly >=2.14
UI Components dash-mantine-components Latest stable
Testing pytest >=7.0
Python 3.11+ Via pyenv

Notes:

  • SQLAlchemy 2.0 + Pydantic 2.0 only (never mix 1.x APIs)
  • PostGIS extension required in database
  • Docker Compose V2 format (no version field)
  • Multi-architecture support: make docker-up auto-detects CPU architecture and uses the appropriate PostGIS image (x86_64: postgis/postgis, ARM64: imresamu/postgis)

Data Model Overview

Geographic Reality (Toronto Housing)

City Neighbourhoods (158) - Primary geographic unit for analysis
CMHC Zones (~20)          - Rental data (Census Tract aligned)

Star Schema

Table Type Keys
fact_rentals Fact -> dim_time, dim_cmhc_zone
dim_time Dimension date_key (PK)
dim_cmhc_zone Dimension zone_key (PK), geometry
dim_neighbourhood Dimension neighbourhood_id (PK), geometry
dim_policy_event Dimension event_id (PK)

dbt Layers

Layer Naming Purpose
Staging stg_{source}__{entity} 1:1 source, cleaned, typed
Intermediate int_{domain}__{transform} Business logic
Marts mart_{domain} Final analytical tables

Deferred Features

Stop and flag if a task seems to require these:

Feature Reason
Historical boundary reconciliation (140->158) 2021+ data only for V1
ML prediction models Energy project scope (future phase)
Multi-project shared infrastructure Build first, abstract second

Environment Variables

Required in .env:

DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
LOG_LEVEL=INFO

Script Standards

All scripts in scripts/:

  • Include usage comments at top
  • Idempotent where possible
  • Exit codes: 0 = success, 1 = error
  • Use set -euo pipefail for bash
  • Log to stdout, errors to stderr

Reference Documents

Document Location Use When
Project reference docs/PROJECT_REFERENCE.md Architecture decisions, completed work
Developer guide docs/CONTRIBUTING.md How to add pages, blog posts, tabs
Lessons learned docs/project-lessons-learned/INDEX.md Past issues and solutions
Deployment runbook docs/runbooks/deployment.md Deploying to staging/production
Dashboard runbook docs/runbooks/adding-dashboard.md Adding new data dashboards

Projman Plugin Workflow

CRITICAL: Always use the projman plugin for sprint and task management.

When to Use Projman Skills

Skill Trigger Purpose
/projman:sprint-plan New sprint or phase implementation Architecture analysis + Gitea issue creation
/projman:sprint-start Beginning implementation work Load lessons learned (Wiki.js or local), start execution
/projman:sprint-status Check progress Review blockers and completion status
/projman:sprint-close Sprint completion Capture lessons learned (Wiki.js or local backup)

Default Behavior

When user requests implementation work:

  1. ALWAYS start with /projman:sprint-plan before writing code
  2. Create Gitea issues with proper labels and acceptance criteria
  3. Use /projman:sprint-start to begin execution with lessons learned
  4. Track progress via Gitea issue comments
  5. Close sprint with /projman:sprint-close to document lessons

Gitea Repository

  • Repo: personal-projects/personal-portfolio
  • Host: gitea.hotserv.cloud
  • SSH: ssh://git@hotserv.tailc9b278.ts.net:2222/personal-projects/personal-portfolio.git
  • Labels: 18 repository-level labels configured (Type, Priority, Complexity, Effort)

MCP Tools Available

Gitea:

  • list_issues, get_issue, create_issue, update_issue, add_comment
  • get_labels, suggest_labels

Wiki.js:

  • search_lessons, create_lesson, search_pages, get_page

Lessons Learned (Backup Method)

When Wiki.js is unavailable, use the local backup in docs/project-lessons-learned/:

At Sprint Start:

  1. Review docs/project-lessons-learned/INDEX.md for relevant past lessons
  2. Search lesson files by tags/keywords before implementation
  3. Apply prevention strategies from applicable lessons

At Sprint Close:

  1. Try Wiki.js create_lesson first
  2. If Wiki.js fails, create lesson in docs/project-lessons-learned/
  3. Use naming convention: {phase-or-sprint}-{short-description}.md
  4. Update INDEX.md with new entry
  5. Follow the lesson template in INDEX.md

Migration: Once Wiki.js is configured, lessons will be migrated there for better searchability.

Issue Structure

Every Gitea issue should include:

  • Overview: Brief description
  • Files to Create/Modify: Explicit paths
  • Acceptance Criteria: Checkboxes
  • Technical Notes: Implementation hints
  • Labels: Listed in body (workaround for label API issues)

Other Available Plugins

Code Quality: code-sentinel

Use for security scanning and refactoring analysis.

Command Purpose
/code-sentinel:security-scan Full security audit of codebase
/code-sentinel:refactor Apply refactoring patterns
/code-sentinel:refactor-dry Preview refactoring without applying

When to use: Before major releases, after adding authentication/data handling code, periodic audits.

Documentation: doc-guardian

Use for documentation drift detection and synchronization.

Command Purpose
/doc-guardian:doc-audit Scan project for documentation drift
/doc-guardian:doc-sync Synchronize pending documentation updates

When to use: After significant code changes, before releases, when docs feel stale.

Pull Requests: pr-review

Use for comprehensive PR review with multiple analysis perspectives.

Command Purpose
/pr-review:initial-setup Configure PR review for this project
/pr-review:project-init Quick project-level setup

When to use: Before merging significant PRs to development or main.

Git Workflow: git-flow

Use for git operations assistance.

When to use: Complex merge scenarios, branch management questions.


Last Updated: January 2026 (Post-Sprint 9)