Portfolio Project Reference
Project: Analytics Portfolio
Owner: Leo
Status: Ready for Sprint 1
Project Overview
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.
| Project |
Domain |
Key Skills |
Phase |
| Toronto Housing Dashboard |
Real estate |
ETL, dimensional modeling, geospatial, choropleth |
Phase 1 (Active) |
| Energy Pricing Analysis |
Utility markets |
Time series, ML prediction, API integration |
Phase 3 (Future) |
Platform: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).
Branching Strategy
| Branch |
Purpose |
Deploys To |
main |
Production releases only |
VPS (production) |
staging |
Pre-production testing |
VPS (staging) |
development |
Active development |
Local only |
Rules:
- All feature branches created FROM
development
- All feature branches merge INTO
development
development → staging for testing
staging → main for release
- Direct commits to
main or staging are forbidden
- Branch naming:
feature/{sprint}-{description} or fix/{issue-id}
Tech Stack (Locked)
| Layer |
Technology |
Version |
| Database |
PostgreSQL + PostGIS |
16.x |
| Validation |
Pydantic |
≥2.0 |
| ORM |
SQLAlchemy |
≥2.0 (2.0-style API only) |
| Transformation |
dbt-postgres |
≥1.7 |
| Data Processing |
Pandas |
≥2.1 |
| Geospatial |
GeoPandas + Shapely |
≥0.14 |
| Visualization |
Dash + Plotly |
≥2.14 |
| UI Components |
dash-mantine-components |
Latest stable |
| Testing |
pytest |
≥7.0 |
| Python |
3.11+ |
Via pyenv |
Compatibility Notes:
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
- PostGIS extension required—enable during db init
- Docker Compose V2 (no
version field in compose files)
Code Conventions
Import Style
| Context |
Style |
Example |
| Same directory |
Single dot |
from .neighbourhood import NeighbourhoodParser |
| Sibling directory |
Double dot |
from ..schemas.neighbourhood import CensusRecord |
| External packages |
Absolute |
import pandas as pd |
Module Separation
| Directory |
Contains |
Purpose |
schemas/ |
Pydantic models |
Data validation |
models/ |
SQLAlchemy ORM |
Database persistence |
parsers/ |
API/CSV extraction |
Raw data ingestion |
loaders/ |
Database operations |
Data loading |
figures/ |
Chart factories |
Plotly figure generation |
callbacks/ |
Dash callbacks |
Per-dashboard, in pages/{dashboard}/callbacks/ |
errors/ |
Exceptions + handlers |
Error handling |
Code Standards
- Type hints: Mandatory, Python 3.10+ style (
list[str], dict[str, int], X | None)
- Functions: Single responsibility, verb naming, early returns over nesting
- Docstrings: Google style, minimal—only for non-obvious behavior
- Constants: Module-level for magic values, Pydantic BaseSettings for runtime config
Error Handling
- Decorators for infrastructure concerns (logging, retry, transactions)
- Explicit handling for domain logic (business rules, recovery strategies)
Application Architecture
Dash Pages Structure
URL Routing (Automatic)
| URL |
Page |
Status |
/ |
Bio landing page |
Sprint 2 |
/toronto |
Toronto Housing Dashboard |
Sprint 6 |
/energy |
Energy Pricing Dashboard |
Phase 3 |
Phase 1: Toronto Neighbourhood Dashboard
Data Sources
| Track |
Source |
Format |
Geography |
Frequency |
| Rentals |
CMHC Rental Market Survey |
API/CSV |
~20 Zones |
Annual |
| Neighbourhoods |
City of Toronto Open Data |
GeoJSON/CSV |
158 Neighbourhoods |
Census |
| Policy Events |
Curated list |
CSV |
N/A |
Event-based |
Geographic Reality
Data Model (Star Schema)
| Table |
Type |
Keys |
fact_rentals |
Fact |
→ dim_time, dim_cmhc_zone |
dim_time |
Dimension |
date_key (PK) |
dim_cmhc_zone |
Dimension |
zone_key (PK), geometry |
dim_neighbourhood |
Dimension |
neighbourhood_id (PK), geometry |
dim_policy_event |
Dimension |
event_id (PK) |
dbt Layer Structure
| Layer |
Naming |
Purpose |
| Staging |
stg_{source}__{entity} |
1:1 source, cleaned, typed |
| Intermediate |
int_{domain}__{transform} |
Business logic, filtering |
| Marts |
mart_{domain} |
Final analytical tables |
Sprint Overview
| Sprint |
Focus |
Milestone |
| 1-6 |
Foundation and initial dashboard |
Launch 1: Bio Live |
| 7 |
Navigation & theme modernization |
— |
| 8 |
Portfolio website expansion |
Launch 2: Website Live |
| 9 |
Neighbourhood dashboard transition |
Cleanup complete |
| 10+ |
Dashboard implementation |
Launch 3: Dashboard Live |
Scope Boundaries
Phase 1 — Build These
- Bio landing page and portfolio website
- CMHC rental data processor
- Toronto neighbourhood data integration
- PostgreSQL + PostGIS database layer
- Star schema (facts + dimensions)
- dbt models with tests
- Choropleth visualization (Dash)
- Policy event annotation layer
Deferred Features
| Feature |
Reason |
When |
| Historical boundary reconciliation (140→158) |
2021+ data only for V1 |
Future phase |
| ML prediction models |
Energy project scope |
Phase 3 |
| Multi-project shared infrastructure |
Build first, abstract second |
Future |
If a task seems to require deferred features, stop and flag it.
File Structure
Root-Level Files (Allowed)
| File |
Purpose |
README.md |
Project overview |
CLAUDE.md |
AI assistant context |
pyproject.toml |
Python packaging |
.gitignore |
Git ignore rules |
.env.example |
Environment template |
.python-version |
pyenv version |
.pre-commit-config.yaml |
Pre-commit hooks |
docker-compose.yml |
Container orchestration |
Makefile |
Task automation |
Directory Structure
Gitignored Directories
data/*/processed/
reports/
backups/
notebooks/*.html
.env
__pycache__/
.venv/
Makefile Targets
| Target |
Purpose |
setup |
Install deps, create .env, init pre-commit |
docker-up |
Start PostgreSQL + PostGIS |
docker-down |
Stop containers |
db-init |
Initialize database schema |
run |
Start Dash dev server |
test |
Run pytest |
dbt-run |
Run dbt models |
dbt-test |
Run dbt tests |
lint |
Run ruff linter |
format |
Run ruff formatter |
ci |
Run all checks |
deploy |
Deploy to production |
Script Standards
All scripts in scripts/:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use
set -euo pipefail for bash
- Log to stdout, errors to stderr
Environment Variables
Required in .env:
Success Criteria
Launch 1 (Bio Live)
Launch 2 (Website Live)
Launch 3 (Dashboard Live)
Reference Documents
For detailed specifications, see:
| Document |
Location |
Use When |
| Dashboard vision |
docs/changes/Change-Toronto-Analysis.md |
Dashboard specification |
| Implementation plan |
docs/changes/Change-Toronto-Analysis-Reviewed.md |
Sprint planning |
Reference Version: 2.0
Updated: Sprint 9