Files
personal-portfolio/docs/PROJECT_REFERENCE.md
lmiranda 81993b23a7 docs: Update CLAUDE.md and PROJECT_REFERENCE.md for neighbourhood transition
CLAUDE.md:
- Update project status to Sprint 9
- Remove TRREB references from data model section
- Update star schema to reflect current tables
- Simplify deferred features section
- Update reference documents

PROJECT_REFERENCE.md:
- Update import examples to neighbourhood-based
- Update data sources for neighbourhood dashboard
- Update geographic reality diagram
- Update star schema
- Modernize sprint overview
- Update scope boundaries
- Update success criteria with completed milestones
- Update reference documents

Closes #52

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 10:17:22 -05:00

11 KiB

Portfolio Project Reference

Project: Analytics Portfolio Owner: Leo Status: Ready for Sprint 1


Project Overview

Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.

Project Domain Key Skills Phase
Toronto Housing Dashboard Real estate ETL, dimensional modeling, geospatial, choropleth Phase 1 (Active)
Energy Pricing Analysis Utility markets Time series, ML prediction, API integration Phase 3 (Future)

Platform: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).


Branching Strategy

Branch Purpose Deploys To
main Production releases only VPS (production)
staging Pre-production testing VPS (staging)
development Active development Local only

Rules:

  • All feature branches created FROM development
  • All feature branches merge INTO development
  • developmentstaging for testing
  • stagingmain for release
  • Direct commits to main or staging are forbidden
  • Branch naming: feature/{sprint}-{description} or fix/{issue-id}

Tech Stack (Locked)

Layer Technology Version
Database PostgreSQL + PostGIS 16.x
Validation Pydantic ≥2.0
ORM SQLAlchemy ≥2.0 (2.0-style API only)
Transformation dbt-postgres ≥1.7
Data Processing Pandas ≥2.1
Geospatial GeoPandas + Shapely ≥0.14
Visualization Dash + Plotly ≥2.14
UI Components dash-mantine-components Latest stable
Testing pytest ≥7.0
Python 3.11+ Via pyenv

Compatibility Notes:

  • SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
  • PostGIS extension required—enable during db init
  • Docker Compose V2 (no version field in compose files)

Code Conventions

Import Style

Context Style Example
Same directory Single dot from .neighbourhood import NeighbourhoodParser
Sibling directory Double dot from ..schemas.neighbourhood import CensusRecord
External packages Absolute import pandas as pd

Module Separation

Directory Contains Purpose
schemas/ Pydantic models Data validation
models/ SQLAlchemy ORM Database persistence
parsers/ API/CSV extraction Raw data ingestion
loaders/ Database operations Data loading
figures/ Chart factories Plotly figure generation
callbacks/ Dash callbacks Per-dashboard, in pages/{dashboard}/callbacks/
errors/ Exceptions + handlers Error handling

Code Standards

  • Type hints: Mandatory, Python 3.10+ style (list[str], dict[str, int], X | None)
  • Functions: Single responsibility, verb naming, early returns over nesting
  • Docstrings: Google style, minimal—only for non-obvious behavior
  • Constants: Module-level for magic values, Pydantic BaseSettings for runtime config

Error Handling

# errors/exceptions.py
class PortfolioError(Exception):
    """Base exception."""

class ParseError(PortfolioError):
    """PDF/CSV parsing failed."""

class ValidationError(PortfolioError):
    """Pydantic or business rule validation failed."""

class LoadError(PortfolioError):
    """Database load operation failed."""
  • Decorators for infrastructure concerns (logging, retry, transactions)
  • Explicit handling for domain logic (business rules, recovery strategies)

Application Architecture

Dash Pages Structure

portfolio_app/
├── app.py                    # Dash app factory with Pages routing
├── config.py                 # Pydantic BaseSettings
├── assets/                   # CSS, images (auto-served by Dash)
├── pages/
│   ├── home.py              # Bio landing page → /
│   ├── toronto/
│   │   ├── dashboard.py     # Layout only → /toronto
│   │   └── callbacks/       # Interaction logic
│   └── energy/              # Phase 3
├── components/              # Shared UI (navbar, footer, cards)
├── figures/                 # Shared chart factories
├── toronto/                 # Toronto data logic
│   ├── parsers/
│   ├── loaders/
│   ├── schemas/             # Pydantic
│   └── models/              # SQLAlchemy
└── errors/

URL Routing (Automatic)

URL Page Status
/ Bio landing page Sprint 2
/toronto Toronto Housing Dashboard Sprint 6
/energy Energy Pricing Dashboard Phase 3

Phase 1: Toronto Neighbourhood Dashboard

Data Sources

Track Source Format Geography Frequency
Rentals CMHC Rental Market Survey API/CSV ~20 Zones Annual
Neighbourhoods City of Toronto Open Data GeoJSON/CSV 158 Neighbourhoods Census
Policy Events Curated list CSV N/A Event-based

Geographic Reality

┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Neighbourhoods (158)                            │ ← Primary analysis unit
├─────────────────────────────────────────────────────────────────┤
│ CMHC Zones (~20) — Census Tract aligned                         │ ← Rental data
└─────────────────────────────────────────────────────────────────┘

Data Model (Star Schema)

Table Type Keys
fact_rentals Fact → dim_time, dim_cmhc_zone
dim_time Dimension date_key (PK)
dim_cmhc_zone Dimension zone_key (PK), geometry
dim_neighbourhood Dimension neighbourhood_id (PK), geometry
dim_policy_event Dimension event_id (PK)

dbt Layer Structure

Layer Naming Purpose
Staging stg_{source}__{entity} 1:1 source, cleaned, typed
Intermediate int_{domain}__{transform} Business logic, filtering
Marts mart_{domain} Final analytical tables

Sprint Overview

Sprint Focus Milestone
1-6 Foundation and initial dashboard Launch 1: Bio Live
7 Navigation & theme modernization
8 Portfolio website expansion Launch 2: Website Live
9 Neighbourhood dashboard transition Cleanup complete
10+ Dashboard implementation Launch 3: Dashboard Live

Scope Boundaries

Phase 1 — Build These

  • Bio landing page and portfolio website
  • CMHC rental data processor
  • Toronto neighbourhood data integration
  • PostgreSQL + PostGIS database layer
  • Star schema (facts + dimensions)
  • dbt models with tests
  • Choropleth visualization (Dash)
  • Policy event annotation layer

Deferred Features

Feature Reason When
Historical boundary reconciliation (140→158) 2021+ data only for V1 Future phase
ML prediction models Energy project scope Phase 3
Multi-project shared infrastructure Build first, abstract second Future

If a task seems to require deferred features, stop and flag it.


File Structure

Root-Level Files (Allowed)

File Purpose
README.md Project overview
CLAUDE.md AI assistant context
pyproject.toml Python packaging
.gitignore Git ignore rules
.env.example Environment template
.python-version pyenv version
.pre-commit-config.yaml Pre-commit hooks
docker-compose.yml Container orchestration
Makefile Task automation

Directory Structure

portfolio/
├── portfolio_app/           # Monolithic Dash application
│   ├── app.py
│   ├── config.py
│   ├── assets/
│   ├── pages/
│   ├── components/
│   ├── figures/
│   ├── toronto/
│   └── errors/
├── tests/
├── dbt/
├── data/
│   └── toronto/
│       ├── raw/
│       ├── processed/       # gitignored
│       └── reference/
├── scripts/
│   ├── db/
│   ├── docker/
│   ├── deploy/
│   ├── dbt/
│   └── dev/
├── docs/
├── notebooks/
├── backups/                 # gitignored
└── reports/                 # gitignored

Gitignored Directories

  • data/*/processed/
  • reports/
  • backups/
  • notebooks/*.html
  • .env
  • __pycache__/
  • .venv/

Makefile Targets

Target Purpose
setup Install deps, create .env, init pre-commit
docker-up Start PostgreSQL + PostGIS
docker-down Stop containers
db-init Initialize database schema
run Start Dash dev server
test Run pytest
dbt-run Run dbt models
dbt-test Run dbt tests
lint Run ruff linter
format Run ruff formatter
ci Run all checks
deploy Deploy to production

Script Standards

All scripts in scripts/:

  • Include usage comments at top
  • Idempotent where possible
  • Exit codes: 0 = success, 1 = error
  • Use set -euo pipefail for bash
  • Log to stdout, errors to stderr

Environment Variables

Required in .env:

DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
LOG_LEVEL=INFO

Success Criteria

Launch 1 (Bio Live)

  • Bio page accessible via HTTPS
  • All bio content rendered
  • No placeholder text visible
  • Mobile responsive
  • Social links functional

Launch 2 (Website Live)

  • Full portfolio website with navigation
  • About, Contact, Projects, Resume, Blog pages
  • Dark mode theme support
  • Sidebar navigation

Launch 3 (Dashboard Live)

  • Choropleth renders neighbourhoods and CMHC zones
  • Rental data visualization works
  • Time navigation works
  • Policy event markers visible
  • Methodology documentation published
  • Data sources cited

Reference Documents

For detailed specifications, see:

Document Location Use When
Dashboard vision docs/changes/Change-Toronto-Analysis.md Dashboard specification
Implementation plan docs/changes/Change-Toronto-Analysis-Reviewed.md Sprint planning

Reference Version: 2.0 Updated: Sprint 9