Portfolio Project Reference
Project: Analytics Portfolio
Owner: Leo
Status: Ready for Sprint 1
Project Overview
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.
| Project |
Domain |
Key Skills |
Phase |
| Toronto Housing Dashboard |
Real estate |
ETL, dimensional modeling, geospatial, choropleth |
Phase 1 (Active) |
| Energy Pricing Analysis |
Utility markets |
Time series, ML prediction, API integration |
Phase 3 (Future) |
Platform: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).
Branching Strategy
| Branch |
Purpose |
Deploys To |
main |
Production releases only |
VPS (production) |
staging |
Pre-production testing |
VPS (staging) |
development |
Active development |
Local only |
Rules:
- All feature branches created FROM
development
- All feature branches merge INTO
development
development → staging for testing
staging → main for release
- Direct commits to
main or staging are forbidden
- Branch naming:
feature/{sprint}-{description} or fix/{issue-id}
Tech Stack (Locked)
| Layer |
Technology |
Version |
| Database |
PostgreSQL + PostGIS |
16.x |
| Validation |
Pydantic |
≥2.0 |
| ORM |
SQLAlchemy |
≥2.0 (2.0-style API only) |
| Transformation |
dbt-postgres |
≥1.7 |
| Data Processing |
Pandas |
≥2.1 |
| Geospatial |
GeoPandas + Shapely |
≥0.14 |
| Visualization |
Dash + Plotly |
≥2.14 |
| UI Components |
dash-mantine-components |
Latest stable |
| Testing |
pytest |
≥7.0 |
| Python |
3.11+ |
Via pyenv |
Compatibility Notes:
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
- PostGIS extension required—enable during db init
- Docker Compose V2 (no
version field in compose files)
Code Conventions
Import Style
| Context |
Style |
Example |
| Same directory |
Single dot |
from .trreb import TRREBParser |
| Sibling directory |
Double dot |
from ..schemas.trreb import TRREBRecord |
| External packages |
Absolute |
import pandas as pd |
Module Separation
| Directory |
Contains |
Purpose |
schemas/ |
Pydantic models |
Data validation |
models/ |
SQLAlchemy ORM |
Database persistence |
parsers/ |
PDF/CSV extraction |
Raw data ingestion |
loaders/ |
Database operations |
Data loading |
figures/ |
Chart factories |
Plotly figure generation |
callbacks/ |
Dash callbacks |
Per-dashboard, in pages/{dashboard}/callbacks/ |
errors/ |
Exceptions + handlers |
Error handling |
Code Standards
- Type hints: Mandatory, Python 3.10+ style (
list[str], dict[str, int], X | None)
- Functions: Single responsibility, verb naming, early returns over nesting
- Docstrings: Google style, minimal—only for non-obvious behavior
- Constants: Module-level for magic values, Pydantic BaseSettings for runtime config
Error Handling
- Decorators for infrastructure concerns (logging, retry, transactions)
- Explicit handling for domain logic (business rules, recovery strategies)
Application Architecture
Dash Pages Structure
URL Routing (Automatic)
| URL |
Page |
Status |
/ |
Bio landing page |
Sprint 2 |
/toronto |
Toronto Housing Dashboard |
Sprint 6 |
/energy |
Energy Pricing Dashboard |
Phase 3 |
Phase 1: Toronto Housing Dashboard
Data Sources
| Track |
Source |
Format |
Geography |
Frequency |
| Purchases |
TRREB Monthly Reports |
PDF |
~35 Districts |
Monthly |
| Rentals |
CMHC Rental Market Survey |
CSV |
~20 Zones |
Annual |
| Enrichment |
City of Toronto Open Data |
GeoJSON/CSV |
158 Neighbourhoods |
Census |
| Policy Events |
Curated list |
CSV |
N/A |
Event-based |
Geographic Reality
Critical: These geographies do NOT align. Display as separate layers with toggle—do not force crosswalks.
Data Model (Star Schema)
| Table |
Type |
Keys |
fact_purchases |
Fact |
→ dim_time, dim_trreb_district |
fact_rentals |
Fact |
→ dim_time, dim_cmhc_zone |
dim_time |
Dimension |
date_key (PK) |
dim_trreb_district |
Dimension |
district_key (PK), geometry |
dim_cmhc_zone |
Dimension |
zone_key (PK), geometry |
dim_neighbourhood |
Dimension |
neighbourhood_id (PK), geometry |
dim_policy_event |
Dimension |
event_id (PK) |
V1 Rule: dim_neighbourhood has NO FK to fact tables—reference overlay only.
dbt Layer Structure
| Layer |
Naming |
Purpose |
| Staging |
stg_{source}__{entity} |
1:1 source, cleaned, typed |
| Intermediate |
int_{domain}__{transform} |
Business logic, filtering |
| Marts |
mart_{domain} |
Final analytical tables |
Sprint Overview
| Sprint |
Focus |
Milestone |
| 1 |
Project bootstrap, start TRREB digitization |
— |
| 2 |
Bio page, data acquisition |
Launch 1: Bio Live |
| 3 |
Parsers, schemas, models |
— |
| 4 |
Loaders, dbt |
— |
| 5 |
Visualization |
— |
| 6 |
Polish, deploy dashboard |
Launch 2: Dashboard Live |
| 7 |
Buffer |
— |
Sprint 1 Deliverables
| Category |
Tasks |
| Bootstrap |
Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md |
| Infrastructure |
Docker Compose (PostgreSQL + PostGIS), scripts/ directory |
| App Foundation |
portfolio_app/ structure, config.py, error handling |
| Tests |
tests/ directory, conftest.py, pytest config |
| Data Acquisition |
Download TRREB PDFs, START boundary digitization (HUMAN task) |
Human Tasks (Cannot Automate)
| Task |
Tool |
Effort |
| Digitize TRREB district boundaries |
QGIS |
3-4 hours |
| Research policy events (10-20) |
Manual research |
2-3 hours |
| Replace social link placeholders |
Manual |
5 minutes |
Scope Boundaries
Phase 1 — Build These
- Bio landing page with content from bio_content_v2.md
- TRREB PDF parser
- CMHC CSV processor
- PostgreSQL + PostGIS database layer
- Star schema (facts + dimensions)
- dbt models with tests
- Choropleth visualization (Dash)
- Policy event annotation layer
- Neighbourhood overlay (toggle-able)
Phase 1 — Do NOT Build
| Feature |
Reason |
When |
bridge_district_neighbourhood table |
Area-weighted aggregation is Phase 4 |
After Energy project |
| Crime data integration |
Deferred scope |
Phase 4 |
| Historical boundary reconciliation (140→158) |
2021+ data only for V1 |
Phase 4 |
| ML prediction models |
Energy project scope |
Phase 3 |
| Multi-project shared infrastructure |
Build first, abstract second |
Phase 2 |
If a task seems to require Phase 3/4 features, stop and flag it.
File Structure
Root-Level Files (Allowed)
| File |
Purpose |
README.md |
Project overview |
CLAUDE.md |
AI assistant context |
pyproject.toml |
Python packaging |
.gitignore |
Git ignore rules |
.env.example |
Environment template |
.python-version |
pyenv version |
.pre-commit-config.yaml |
Pre-commit hooks |
docker-compose.yml |
Container orchestration |
Makefile |
Task automation |
Directory Structure
Gitignored Directories
data/*/processed/
reports/
backups/
notebooks/*.html
.env
__pycache__/
.venv/
Makefile Targets
| Target |
Purpose |
setup |
Install deps, create .env, init pre-commit |
docker-up |
Start PostgreSQL + PostGIS |
docker-down |
Stop containers |
db-init |
Initialize database schema |
run |
Start Dash dev server |
test |
Run pytest |
dbt-run |
Run dbt models |
dbt-test |
Run dbt tests |
lint |
Run ruff linter |
format |
Run ruff formatter |
ci |
Run all checks |
deploy |
Deploy to production |
Script Standards
All scripts in scripts/:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use
set -euo pipefail for bash
- Log to stdout, errors to stderr
Environment Variables
Required in .env:
Success Criteria
Launch 1 (Sprint 2)
Launch 2 (Sprint 6)
Reference Documents
For detailed specifications, see:
| Document |
Location |
Use When |
| Data schemas |
docs/toronto_housing_spec.md |
Parser/model tasks |
| WBS details |
docs/wbs.md |
Sprint planning |
| Bio content |
docs/bio_content.md |
Building home.py |
Reference Version: 1.0
Created: January 2026