Files
personal-portfolio/CLAUDE.md
lmiranda c7e9b88adb feat: project bootstrap and structure
Sprint 1 initialization:
- Project directory structure (portfolio_app/, tests/, dbt/, data/, scripts/)
- CLAUDE.md with AI assistant context
- pyproject.toml with all dependencies
- docker-compose.yml for PostgreSQL 16 + PostGIS
- Makefile with standard targets
- Pre-commit configuration (ruff, mypy)
- Environment template (.env.example)
- Error handling foundation (PortfolioError hierarchy)
- Test configuration (conftest.py, pytest config)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 13:49:28 -05:00

7.2 KiB

CLAUDE.md

Working context for Claude Code on the Analytics Portfolio project.


Project Status

Current Sprint: 1 (Project Bootstrap) Phase: 1 - Toronto Housing Dashboard Branch: development (feature branches merge here)


Quick Reference

Run Commands

make setup          # Install deps, create .env, init pre-commit
make docker-up      # Start PostgreSQL + PostGIS
make docker-down    # Stop containers
make db-init        # Initialize database schema
make run            # Start Dash dev server
make test           # Run pytest
make lint           # Run ruff linter
make format         # Run ruff formatter
make ci             # Run all checks

Branch Workflow

  1. Create feature branch FROM development: git checkout -b feature/{sprint}-{description}
  2. Work and commit on feature branch
  3. Merge INTO development when complete
  4. development -> staging -> main for releases

Code Conventions

Import Style

Context Style Example
Same directory Single dot from .trreb import TRREBParser
Sibling directory Double dot from ..schemas.trreb import TRREBRecord
External packages Absolute import pandas as pd

Module Responsibilities

Directory Contains Purpose
schemas/ Pydantic models Data validation
models/ SQLAlchemy ORM Database persistence
parsers/ PDF/CSV extraction Raw data ingestion
loaders/ Database operations Data loading
figures/ Chart factories Plotly figure generation
callbacks/ Dash callbacks In pages/{dashboard}/callbacks/
errors/ Exceptions + handlers Error handling

Type Hints

Use Python 3.10+ style:

def process(items: list[str], config: dict[str, int] | None = None) -> bool:
    ...

Error Handling

# errors/exceptions.py
class PortfolioError(Exception):
    """Base exception."""

class ParseError(PortfolioError):
    """PDF/CSV parsing failed."""

class ValidationError(PortfolioError):
    """Pydantic or business rule validation failed."""

class LoadError(PortfolioError):
    """Database load operation failed."""

Code Standards

  • Single responsibility functions with verb naming
  • Early returns over deep nesting
  • Google-style docstrings only for non-obvious behavior
  • Module-level constants for magic values
  • Pydantic BaseSettings for runtime config

Application Structure

portfolio_app/
├── app.py                    # Dash app factory with Pages routing
├── config.py                 # Pydantic BaseSettings
├── assets/                   # CSS, images (auto-served)
├── pages/
│   ├── home.py              # Bio landing page -> /
│   └── toronto/
│       ├── dashboard.py     # Layout only -> /toronto
│       └── callbacks/       # Interaction logic
├── components/              # Shared UI (navbar, footer, cards)
├── figures/                 # Shared chart factories
├── toronto/                 # Toronto data logic
│   ├── parsers/
│   ├── loaders/
│   ├── schemas/             # Pydantic
│   └── models/              # SQLAlchemy
└── errors/

URL Routing

URL Page Sprint
/ Bio landing page 2
/toronto Toronto Housing Dashboard 6

Tech Stack (Locked)

Layer Technology Version
Database PostgreSQL + PostGIS 16.x
Validation Pydantic >=2.0
ORM SQLAlchemy >=2.0 (2.0-style API only)
Transformation dbt-postgres >=1.7
Data Processing Pandas >=2.1
Geospatial GeoPandas + Shapely >=0.14
Visualization Dash + Plotly >=2.14
UI Components dash-mantine-components Latest stable
Testing pytest >=7.0
Python 3.11+ Via pyenv

Notes:

  • SQLAlchemy 2.0 + Pydantic 2.0 only (never mix 1.x APIs)
  • PostGIS extension required in database
  • Docker Compose V2 format (no version field)

Data Model Overview

Geographic Reality (Toronto Housing)

TRREB Districts (~35)     - Purchase data (W01, C01, E01...)
CMHC Zones (~20)          - Rental data (Census Tract aligned)
City Neighbourhoods (158) - Enrichment/overlay only

Critical: These geographies do NOT align. Display as separate layers—do not force crosswalks.

Star Schema

Table Type Keys
fact_purchases Fact -> dim_time, dim_trreb_district
fact_rentals Fact -> dim_time, dim_cmhc_zone
dim_time Dimension date_key (PK)
dim_trreb_district Dimension district_key (PK), geometry
dim_cmhc_zone Dimension zone_key (PK), geometry
dim_neighbourhood Dimension neighbourhood_id (PK), geometry
dim_policy_event Dimension event_id (PK)

V1 Rule: dim_neighbourhood has NO FK to fact tables—reference overlay only.

dbt Layers

Layer Naming Purpose
Staging stg_{source}__{entity} 1:1 source, cleaned, typed
Intermediate int_{domain}__{transform} Business logic
Marts mart_{domain} Final analytical tables

DO NOT BUILD (Phase 1)

Stop and flag if a task seems to require these:

Feature Reason
bridge_district_neighbourhood table Area-weighted aggregation is Phase 4
Crime data integration Deferred to Phase 4
Historical boundary reconciliation (140->158) 2021+ data only for V1
ML prediction models Energy project scope (Phase 3)
Multi-project shared infrastructure Build first, abstract second (Phase 2)

Sprint 1 Deliverables

Category Tasks
Bootstrap Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md
Infrastructure Docker Compose (PostgreSQL + PostGIS), scripts/ directory
App Foundation portfolio_app/ structure, config.py, error handling
Tests tests/ directory, conftest.py, pytest config
Data Acquisition Download TRREB PDFs, START boundary digitization (HUMAN task)

Human Tasks (Cannot Automate)

Task Tool Effort
Digitize TRREB district boundaries QGIS 3-4 hours
Research policy events (10-20) Manual 2-3 hours
Replace social link placeholders Manual 5 minutes

Environment Variables

Required in .env:

DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
LOG_LEVEL=INFO

Script Standards

All scripts in scripts/:

  • Include usage comments at top
  • Idempotent where possible
  • Exit codes: 0 = success, 1 = error
  • Use set -euo pipefail for bash
  • Log to stdout, errors to stderr

Reference Documents

Document Location Use When
Full specification docs/PROJECT_REFERENCE.md Architecture decisions
Data schemas docs/toronto_housing_spec.md Parser/model tasks
WBS details docs/wbs.md Sprint planning
Bio content docs/bio_content.md Building home.py

Last Updated: Sprint 1