# Portfolio Project Reference **Project**: Analytics Portfolio **Owner**: Leo **Status**: Ready for Sprint 1 --- ## Project Overview Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities. | Project | Domain | Key Skills | Phase | |---------|--------|------------|-------| | **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) | | **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) | **Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards). --- ## Branching Strategy | Branch | Purpose | Deploys To | |--------|---------|------------| | `main` | Production releases only | VPS (production) | | `staging` | Pre-production testing | VPS (staging) | | `development` | Active development | Local only | **Rules**: - All feature branches created FROM `development` - All feature branches merge INTO `development` - `development` → `staging` for testing - `staging` → `main` for release - Direct commits to `main` or `staging` are forbidden - Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}` --- ## Tech Stack (Locked) | Layer | Technology | Version | |-------|------------|---------| | Database | PostgreSQL + PostGIS | 16.x | | Validation | Pydantic | ≥2.0 | | ORM | SQLAlchemy | ≥2.0 (2.0-style API only) | | Transformation | dbt-postgres | ≥1.7 | | Data Processing | Pandas | ≥2.1 | | Geospatial | GeoPandas + Shapely | ≥0.14 | | Visualization | Dash + Plotly | ≥2.14 | | UI Components | dash-mantine-components | Latest stable | | Testing | pytest | ≥7.0 | | Python | 3.11+ | Via pyenv | **Compatibility Notes**: - SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs - PostGIS extension required—enable during db init - Docker Compose V2 (no `version` field in compose files) --- ## Code Conventions ### Import Style | Context | Style | Example | |---------|-------|---------| | Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` | | Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` | | External packages | Absolute | `import pandas as pd` | ### Module Separation | Directory | Contains | Purpose | |-----------|----------|---------| | `schemas/` | Pydantic models | Data validation | | `models/` | SQLAlchemy ORM | Database persistence | | `parsers/` | API/CSV extraction | Raw data ingestion | | `loaders/` | Database operations | Data loading | | `figures/` | Chart factories | Plotly figure generation | | `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` | | `errors/` | Exceptions + handlers | Error handling | ### Code Standards - **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`) - **Functions**: Single responsibility, verb naming, early returns over nesting - **Docstrings**: Google style, minimal—only for non-obvious behavior - **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config ### Error Handling ```python # errors/exceptions.py class PortfolioError(Exception): """Base exception.""" class ParseError(PortfolioError): """PDF/CSV parsing failed.""" class ValidationError(PortfolioError): """Pydantic or business rule validation failed.""" class LoadError(PortfolioError): """Database load operation failed.""" ``` - Decorators for infrastructure concerns (logging, retry, transactions) - Explicit handling for domain logic (business rules, recovery strategies) --- ## Application Architecture ### Dash Pages Structure ``` portfolio_app/ ├── app.py # Dash app factory with Pages routing ├── config.py # Pydantic BaseSettings ├── assets/ # CSS, images (auto-served by Dash) ├── pages/ │ ├── home.py # Bio landing page → / │ ├── toronto/ │ │ ├── dashboard.py # Layout only → /toronto │ │ └── callbacks/ # Interaction logic │ └── energy/ # Phase 3 ├── components/ # Shared UI (navbar, footer, cards) ├── figures/ # Shared chart factories ├── toronto/ # Toronto data logic │ ├── parsers/ │ ├── loaders/ │ ├── schemas/ # Pydantic │ └── models/ # SQLAlchemy └── errors/ ``` ### URL Routing (Automatic) | URL | Page | Status | |-----|------|--------| | `/` | Bio landing page | Sprint 2 | | `/toronto` | Toronto Housing Dashboard | Sprint 6 | | `/energy` | Energy Pricing Dashboard | Phase 3 | --- ## Phase 1: Toronto Neighbourhood Dashboard ### Data Sources | Track | Source | Format | Geography | Frequency | |-------|--------|--------|-----------|-----------| | Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual | | Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census | | Policy Events | Curated list | CSV | N/A | Event-based | ### Geographic Reality ``` ┌─────────────────────────────────────────────────────────────────┐ │ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit ├─────────────────────────────────────────────────────────────────┤ │ CMHC Zones (~20) — Census Tract aligned │ ← Rental data └─────────────────────────────────────────────────────────────────┘ ``` ### Data Model (Star Schema) | Table | Type | Keys | |-------|------|------| | `fact_rentals` | Fact | → dim_time, dim_cmhc_zone | | `dim_time` | Dimension | date_key (PK) | | `dim_cmhc_zone` | Dimension | zone_key (PK), geometry | | `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry | | `dim_policy_event` | Dimension | event_id (PK) | ### dbt Layer Structure | Layer | Naming | Purpose | |-------|--------|---------| | Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed | | Intermediate | `int_{domain}__{transform}` | Business logic, filtering | | Marts | `mart_{domain}` | Final analytical tables | --- ## Sprint Overview | Sprint | Focus | Milestone | |--------|-------|-----------| | 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** | | 7 | Navigation & theme modernization | — | | 8 | Portfolio website expansion | **Launch 2: Website Live** | | 9 | Neighbourhood dashboard transition | Cleanup complete | | 10+ | Dashboard implementation | **Launch 3: Dashboard Live** | --- ## Scope Boundaries ### Phase 1 — Build These - Bio landing page and portfolio website - CMHC rental data processor - Toronto neighbourhood data integration - PostgreSQL + PostGIS database layer - Star schema (facts + dimensions) - dbt models with tests - Choropleth visualization (Dash) - Policy event annotation layer ### Deferred Features | Feature | Reason | When | |---------|--------|------| | Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase | | ML prediction models | Energy project scope | Phase 3 | | Multi-project shared infrastructure | Build first, abstract second | Future | If a task seems to require deferred features, **stop and flag it**. --- ## File Structure ### Root-Level Files (Allowed) | File | Purpose | |------|---------| | `README.md` | Project overview | | `CLAUDE.md` | AI assistant context | | `pyproject.toml` | Python packaging | | `.gitignore` | Git ignore rules | | `.env.example` | Environment template | | `.python-version` | pyenv version | | `.pre-commit-config.yaml` | Pre-commit hooks | | `docker-compose.yml` | Container orchestration | | `Makefile` | Task automation | ### Directory Structure ``` portfolio/ ├── portfolio_app/ # Monolithic Dash application │ ├── app.py │ ├── config.py │ ├── assets/ │ ├── pages/ │ ├── components/ │ ├── figures/ │ ├── toronto/ │ └── errors/ ├── tests/ ├── dbt/ ├── data/ │ └── toronto/ │ ├── raw/ │ ├── processed/ # gitignored │ └── reference/ ├── scripts/ │ ├── db/ │ ├── docker/ │ ├── deploy/ │ ├── dbt/ │ └── dev/ ├── docs/ ├── notebooks/ ├── backups/ # gitignored └── reports/ # gitignored ``` ### Gitignored Directories - `data/*/processed/` - `reports/` - `backups/` - `notebooks/*.html` - `.env` - `__pycache__/` - `.venv/` --- ## Makefile Targets | Target | Purpose | |--------|---------| | `setup` | Install deps, create .env, init pre-commit | | `docker-up` | Start PostgreSQL + PostGIS | | `docker-down` | Stop containers | | `db-init` | Initialize database schema | | `run` | Start Dash dev server | | `test` | Run pytest | | `dbt-run` | Run dbt models | | `dbt-test` | Run dbt tests | | `lint` | Run ruff linter | | `format` | Run ruff formatter | | `ci` | Run all checks | | `deploy` | Deploy to production | --- ## Script Standards All scripts in `scripts/`: - Include usage comments at top - Idempotent where possible - Exit codes: 0 = success, 1 = error - Use `set -euo pipefail` for bash - Log to stdout, errors to stderr --- ## Environment Variables Required in `.env`: ```bash DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio POSTGRES_USER=portfolio POSTGRES_PASSWORD= POSTGRES_DB=portfolio DASH_DEBUG=true SECRET_KEY= LOG_LEVEL=INFO ``` --- ## Success Criteria ### Launch 1 (Bio Live) - [x] Bio page accessible via HTTPS - [x] All bio content rendered - [x] No placeholder text visible - [x] Mobile responsive - [x] Social links functional ### Launch 2 (Website Live) - [x] Full portfolio website with navigation - [x] About, Contact, Projects, Resume, Blog pages - [x] Dark mode theme support - [x] Sidebar navigation ### Launch 3 (Dashboard Live) - [ ] Choropleth renders neighbourhoods and CMHC zones - [ ] Rental data visualization works - [ ] Time navigation works - [ ] Policy event markers visible - [ ] Methodology documentation published - [ ] Data sources cited --- ## Reference Documents For detailed specifications, see: | Document | Location | Use When | |----------|----------|----------| | Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification | | Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning | --- *Reference Version: 2.0* *Updated: Sprint 9*