# Portfolio Project Reference **Project**: Analytics Portfolio **Owner**: Leo **Status**: Ready for Sprint 1 --- ## Project Overview Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities. | Project | Domain | Key Skills | Phase | |---------|--------|------------|-------| | **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) | | **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) | **Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards). --- ## Branching Strategy | Branch | Purpose | Deploys To | |--------|---------|------------| | `main` | Production releases only | VPS (production) | | `staging` | Pre-production testing | VPS (staging) | | `development` | Active development | Local only | **Rules**: - All feature branches created FROM `development` - All feature branches merge INTO `development` - `development` → `staging` for testing - `staging` → `main` for release - Direct commits to `main` or `staging` are forbidden - Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}` --- ## Tech Stack (Locked) | Layer | Technology | Version | |-------|------------|---------| | Database | PostgreSQL + PostGIS | 16.x | | Validation | Pydantic | ≥2.0 | | ORM | SQLAlchemy | ≥2.0 (2.0-style API only) | | Transformation | dbt-postgres | ≥1.7 | | Data Processing | Pandas | ≥2.1 | | Geospatial | GeoPandas + Shapely | ≥0.14 | | Visualization | Dash + Plotly | ≥2.14 | | UI Components | dash-mantine-components | Latest stable | | Testing | pytest | ≥7.0 | | Python | 3.11+ | Via pyenv | **Compatibility Notes**: - SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs - PostGIS extension required—enable during db init - Docker Compose V2 (no `version` field in compose files) --- ## Code Conventions ### Import Style | Context | Style | Example | |---------|-------|---------| | Same directory | Single dot | `from .trreb import TRREBParser` | | Sibling directory | Double dot | `from ..schemas.trreb import TRREBRecord` | | External packages | Absolute | `import pandas as pd` | ### Module Separation | Directory | Contains | Purpose | |-----------|----------|---------| | `schemas/` | Pydantic models | Data validation | | `models/` | SQLAlchemy ORM | Database persistence | | `parsers/` | PDF/CSV extraction | Raw data ingestion | | `loaders/` | Database operations | Data loading | | `figures/` | Chart factories | Plotly figure generation | | `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` | | `errors/` | Exceptions + handlers | Error handling | ### Code Standards - **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`) - **Functions**: Single responsibility, verb naming, early returns over nesting - **Docstrings**: Google style, minimal—only for non-obvious behavior - **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config ### Error Handling ```python # errors/exceptions.py class PortfolioError(Exception): """Base exception.""" class ParseError(PortfolioError): """PDF/CSV parsing failed.""" class ValidationError(PortfolioError): """Pydantic or business rule validation failed.""" class LoadError(PortfolioError): """Database load operation failed.""" ``` - Decorators for infrastructure concerns (logging, retry, transactions) - Explicit handling for domain logic (business rules, recovery strategies) --- ## Application Architecture ### Dash Pages Structure ``` portfolio_app/ ├── app.py # Dash app factory with Pages routing ├── config.py # Pydantic BaseSettings ├── assets/ # CSS, images (auto-served by Dash) ├── pages/ │ ├── home.py # Bio landing page → / │ ├── toronto/ │ │ ├── dashboard.py # Layout only → /toronto │ │ └── callbacks/ # Interaction logic │ └── energy/ # Phase 3 ├── components/ # Shared UI (navbar, footer, cards) ├── figures/ # Shared chart factories ├── toronto/ # Toronto data logic │ ├── parsers/ │ ├── loaders/ │ ├── schemas/ # Pydantic │ └── models/ # SQLAlchemy └── errors/ ``` ### URL Routing (Automatic) | URL | Page | Status | |-----|------|--------| | `/` | Bio landing page | Sprint 2 | | `/toronto` | Toronto Housing Dashboard | Sprint 6 | | `/energy` | Energy Pricing Dashboard | Phase 3 | --- ## Phase 1: Toronto Housing Dashboard ### Data Sources | Track | Source | Format | Geography | Frequency | |-------|--------|--------|-----------|-----------| | Purchases | TRREB Monthly Reports | PDF | ~35 Districts | Monthly | | Rentals | CMHC Rental Market Survey | CSV | ~20 Zones | Annual | | Enrichment | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census | | Policy Events | Curated list | CSV | N/A | Event-based | ### Geographic Reality ``` ┌─────────────────────────────────────────────────────────────────┐ │ City of Toronto Neighbourhoods (158) │ ← Enrichment only ├─────────────────────────────────────────────────────────────────┤ │ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data ├─────────────────────────────────────────────────────────────────┤ │ CMHC Zones (~20) — Census Tract aligned │ ← Rental data └─────────────────────────────────────────────────────────────────┘ ``` **Critical**: These geographies do NOT align. Display as separate layers with toggle—do not force crosswalks. ### Data Model (Star Schema) | Table | Type | Keys | |-------|------|------| | `fact_purchases` | Fact | → dim_time, dim_trreb_district | | `fact_rentals` | Fact | → dim_time, dim_cmhc_zone | | `dim_time` | Dimension | date_key (PK) | | `dim_trreb_district` | Dimension | district_key (PK), geometry | | `dim_cmhc_zone` | Dimension | zone_key (PK), geometry | | `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry | | `dim_policy_event` | Dimension | event_id (PK) | **V1 Rule**: `dim_neighbourhood` has NO FK to fact tables—reference overlay only. ### dbt Layer Structure | Layer | Naming | Purpose | |-------|--------|---------| | Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed | | Intermediate | `int_{domain}__{transform}` | Business logic, filtering | | Marts | `mart_{domain}` | Final analytical tables | --- ## Sprint Overview | Sprint | Focus | Milestone | |--------|-------|-----------| | 1 | Project bootstrap, start TRREB digitization | — | | 2 | Bio page, data acquisition | **Launch 1: Bio Live** | | 3 | Parsers, schemas, models | — | | 4 | Loaders, dbt | — | | 5 | Visualization | — | | 6 | Polish, deploy dashboard | **Launch 2: Dashboard Live** | | 7 | Buffer | — | ### Sprint 1 Deliverables | Category | Tasks | |----------|-------| | **Bootstrap** | Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md | | **Infrastructure** | Docker Compose (PostgreSQL + PostGIS), scripts/ directory | | **App Foundation** | portfolio_app/ structure, config.py, error handling | | **Tests** | tests/ directory, conftest.py, pytest config | | **Data Acquisition** | Download TRREB PDFs, START boundary digitization (HUMAN task) | ### Human Tasks (Cannot Automate) | Task | Tool | Effort | |------|------|--------| | Digitize TRREB district boundaries | QGIS | 3-4 hours | | Research policy events (10-20) | Manual research | 2-3 hours | | Replace social link placeholders | Manual | 5 minutes | --- ## Scope Boundaries ### Phase 1 — Build These - Bio landing page with content from bio_content_v2.md - TRREB PDF parser - CMHC CSV processor - PostgreSQL + PostGIS database layer - Star schema (facts + dimensions) - dbt models with tests - Choropleth visualization (Dash) - Policy event annotation layer - Neighbourhood overlay (toggle-able) ### Phase 1 — Do NOT Build | Feature | Reason | When | |---------|--------|------| | `bridge_district_neighbourhood` table | Area-weighted aggregation is Phase 4 | After Energy project | | Crime data integration | Deferred scope | Phase 4 | | Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Phase 4 | | ML prediction models | Energy project scope | Phase 3 | | Multi-project shared infrastructure | Build first, abstract second | Phase 2 | If a task seems to require Phase 3/4 features, **stop and flag it**. --- ## File Structure ### Root-Level Files (Allowed) | File | Purpose | |------|---------| | `README.md` | Project overview | | `CLAUDE.md` | AI assistant context | | `pyproject.toml` | Python packaging | | `.gitignore` | Git ignore rules | | `.env.example` | Environment template | | `.python-version` | pyenv version | | `.pre-commit-config.yaml` | Pre-commit hooks | | `docker-compose.yml` | Container orchestration | | `Makefile` | Task automation | ### Directory Structure ``` portfolio/ ├── portfolio_app/ # Monolithic Dash application │ ├── app.py │ ├── config.py │ ├── assets/ │ ├── pages/ │ ├── components/ │ ├── figures/ │ ├── toronto/ │ └── errors/ ├── tests/ ├── dbt/ ├── data/ │ └── toronto/ │ ├── raw/ │ ├── processed/ # gitignored │ └── reference/ ├── scripts/ │ ├── db/ │ ├── docker/ │ ├── deploy/ │ ├── dbt/ │ └── dev/ ├── docs/ ├── notebooks/ ├── backups/ # gitignored └── reports/ # gitignored ``` ### Gitignored Directories - `data/*/processed/` - `reports/` - `backups/` - `notebooks/*.html` - `.env` - `__pycache__/` - `.venv/` --- ## Makefile Targets | Target | Purpose | |--------|---------| | `setup` | Install deps, create .env, init pre-commit | | `docker-up` | Start PostgreSQL + PostGIS | | `docker-down` | Stop containers | | `db-init` | Initialize database schema | | `run` | Start Dash dev server | | `test` | Run pytest | | `dbt-run` | Run dbt models | | `dbt-test` | Run dbt tests | | `lint` | Run ruff linter | | `format` | Run ruff formatter | | `ci` | Run all checks | | `deploy` | Deploy to production | --- ## Script Standards All scripts in `scripts/`: - Include usage comments at top - Idempotent where possible - Exit codes: 0 = success, 1 = error - Use `set -euo pipefail` for bash - Log to stdout, errors to stderr --- ## Environment Variables Required in `.env`: ```bash DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio POSTGRES_USER=portfolio POSTGRES_PASSWORD= POSTGRES_DB=portfolio DASH_DEBUG=true SECRET_KEY= LOG_LEVEL=INFO ``` --- ## Success Criteria ### Launch 1 (Sprint 2) - [ ] Bio page accessible via HTTPS - [ ] All bio content rendered (from bio_content_v2.md) - [ ] No placeholder text visible - [ ] Mobile responsive - [ ] Social links functional ### Launch 2 (Sprint 6) - [ ] Choropleth renders TRREB districts and CMHC zones - [ ] Purchase/rental mode toggle works - [ ] Time navigation works - [ ] Policy event markers visible - [ ] Neighbourhood overlay toggleable - [ ] Methodology documentation published - [ ] Data sources cited --- ## Reference Documents For detailed specifications, see: | Document | Location | Use When | |----------|----------|----------| | Data schemas | `docs/toronto_housing_spec.md` | Parser/model tasks | | WBS details | `docs/wbs.md` | Sprint planning | | Bio content | `docs/bio_content.md` | Building home.py | --- *Reference Version: 1.0* *Created: January 2026*