2 Commits

2 changed files with 1603 additions and 0 deletions

View File

@@ -0,0 +1,809 @@
# Toronto Housing Price Dashboard
## Portfolio Project — Data Specification & Architecture
**Version**: 5.1
**Last Updated**: January 2026
**Status**: Specification Complete
---
## Document Context
| Attribute | Value |
|-----------|-------|
| **Parent Document** | `portfolio_project_plan_v5.md` |
| **Role** | Detailed specification for Toronto Housing Dashboard |
| **Scope** | Data schemas, source URLs, geographic boundaries, V1/V2 decisions |
**Rule**: For overall project scope, phasing, tech stack, and deployment architecture, see `portfolio_project_plan_v5.md`. This document provides implementation-level detail for the Toronto Housing project specifically.
**Terminology Note**: This document uses **Stages 14** to describe Toronto Housing implementation steps. These are distinct from the **Phases 15** in `portfolio_project_plan_v5.md`, which describe the overall portfolio project lifecycle.
---
## Project Overview
A dashboard analyzing housing price variations across Toronto neighbourhoods over time, with dual analysis tracks:
| Track | Data Domain | Primary Source | Geographic Unit |
|-------|-------------|----------------|-----------------|
| **Purchases** | Sales transactions | TRREB Monthly Reports | ~35 Districts |
| **Rentals** | Rental market stats | CMHC Rental Market Survey | ~20 Zones |
**Core Visualization**: Interactive choropleth map of Toronto with toggle between rental/purchase analysis, time-series exploration by month/year.
**Enrichment Layer** (V1: overlay only): Neighbourhood-level demographic and socioeconomic context including population density, education attainment, and income. Crime data deferred to Phase 4 of the portfolio project (post-Energy project).
**Tech Stack & Deployment**: See `portfolio_project_plan_v5.md` → Tech Stack, Deployment Architecture
---
## Geographic Layers
### Layer Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Official Neighbourhoods (158) │ ← Reference overlay + Enrichment data
├─────────────────────────────────────────────────────────────────┤
│ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data
├─────────────────────────────────────────────────────────────────┤
│ CMHC Survey Zones (~20) — Census Tract aligned │ ← Rental data
└─────────────────────────────────────────────────────────────────┘
```
### Boundary Files
| Layer | Zones | Format | Source | Status |
|-------|-------|--------|--------|--------|
| **City Neighbourhoods** | 158 | GeoJSON, Shapefile | [GitHub - jasonicarter/toronto-geojson](https://github.com/jasonicarter/toronto-geojson) | ✅ Ready to use |
| **TRREB Districts** | ~35 | PDF only | [TRREB Toronto Map PDF](https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf) | ⚠ Requires manual digitization |
| **CMHC Zones** | ~20 | R package | R `cmhc` package via `get_cmhc_geography()` | ✅ Available (see note) |
### Digitization Task: TRREB Districts
**Input**: TRREB Toronto PDF map
**Output**: GeoJSON with district codes (W01-W10, C01-C15, E01-E11)
**Tool**: QGIS
**Process**:
1. Import PDF as raster layer in QGIS
2. Create vector layer with polygon features
3. Trace district boundaries
4. Add attributes: `district_code`, `district_name`, `area_type` (West/Central/East)
5. Export as GeoJSON (WGS84 / EPSG:4326)
### CMHC Zone Boundaries
**Source**: The R `cmhc` package provides CMHC survey geography via the `get_cmhc_geography()` function.
**Extraction Process**:
```r
# In R
library(cmhc)
library(sf)
# Get Toronto CMA zones
toronto_zones <- get_cmhc_geography(
geography_type = "ZONE",
cma = "Toronto"
)
# Export to GeoJSON for Python/PostGIS
st_write(toronto_zones, "cmhc_zones.geojson", driver = "GeoJSON")
```
**Output**: `data/toronto/raw/geo/cmhc_zones.geojson`
**Why R?**: CMHC zone boundaries are not published as standalone files. The `cmhc` R package is the only reliable programmatic source. One-time extraction, then use GeoJSON in Python stack.
### ⚠ Neighbourhood Boundary Change (140 → 158)
The City of Toronto updated from 140 to 158 social planning neighbourhoods in **April 2021**. This affects data alignment:
| Data Source | Pre-2021 | Post-2021 | Handling |
|-------------|----------|-----------|----------|
| Census (2016 and earlier) | 140 neighbourhoods | N/A | Use 140-model files |
| Census (2021+) | N/A | 158 neighbourhoods | Use 158-model files |
**V1 Strategy**: Use 2021 Census on 158 boundaries only. Defer historical trend analysis to portfolio Phase 4.
---
## Data Source #1: TRREB Monthly Market Reports
### Source Details
| Attribute | Value |
|-----------|-------|
| **Provider** | Toronto Regional Real Estate Board |
| **URL** | [TRREB Market Watch](https://trreb.ca/index.php/market-news/market-watch) |
| **Format** | PDF (monthly reports) |
| **Update Frequency** | Monthly |
| **Historical Availability** | 2007Present |
| **Access** | Public (aggregate data in PDFs) |
| **Extraction Method** | PDF parsing (`pdfplumber` or `camelot-py`) |
### Available Tables
#### Table: `trreb_monthly_summary`
**Location in PDF**: Pages 3-4 (Summary by Area)
| Column | Data Type | Description |
|--------|-----------|-------------|
| `report_date` | DATE | First of month (YYYY-MM-01) |
| `area_code` | VARCHAR(3) | District code (W01, C01, E01, etc.) |
| `area_name` | VARCHAR(100) | District name |
| `area_type` | VARCHAR(10) | West / Central / East / North |
| `sales` | INTEGER | Number of transactions |
| `dollar_volume` | DECIMAL | Total sales volume ($) |
| `avg_price` | DECIMAL | Average sale price ($) |
| `median_price` | DECIMAL | Median sale price ($) |
| `new_listings` | INTEGER | New listings count |
| `active_listings` | INTEGER | Active listings at month end |
| `avg_sp_lp` | DECIMAL | Avg sale price / list price ratio (%) |
| `avg_dom` | INTEGER | Average days on market |
### Dimensions
| Dimension | Granularity | Values |
|-----------|-------------|--------|
| **Time** | Monthly | 2007-01 to present |
| **Geography** | District | ~35 TRREB districts |
| **Property Type** | Aggregate | All residential (no breakdown in summary) |
### Metrics Available
| Metric | Aggregation | Use Case |
|--------|-------------|----------|
| `avg_price` | Pre-calculated monthly avg | Primary price indicator |
| `median_price` | Pre-calculated monthly median | Robust price indicator |
| `sales` | Count | Market activity volume |
| `avg_dom` | Average | Market velocity |
| `avg_sp_lp` | Ratio | Buyer/seller market indicator |
| `new_listings` | Count | Supply indicator |
| `active_listings` | Snapshot | Inventory level |
### ⚠ Limitations
- No transaction-level data (aggregates only)
- Property type breakdown requires parsing additional tables
- PDF structure may vary slightly across years
- District boundaries haven't changed since 2011
---
## Data Source #2: CMHC Rental Market Survey
### Source Details
| Attribute | Value |
|-----------|-------|
| **Provider** | Canada Mortgage and Housing Corporation |
| **URL** | [CMHC Housing Market Information Portal](https://www03.cmhc-schl.gc.ca/hmip-pimh/) |
| **Format** | CSV export, API |
| **Update Frequency** | Annual (October survey) |
| **Historical Availability** | 1990Present |
| **Access** | Public, free registration for bulk downloads |
| **Geographic Levels** | CMA → Zone → Neighbourhood → Census Tract |
### Available Tables
#### Table: `cmhc_rental_summary`
**Portal Path**: Toronto → Primary Rental Market → Summary Statistics
| Column | Data Type | Description |
|--------|-----------|-------------|
| `survey_year` | INTEGER | Survey year (October) |
| `zone_code` | VARCHAR(10) | CMHC zone identifier |
| `zone_name` | VARCHAR(100) | Zone name |
| `bedroom_type` | VARCHAR(20) | Bachelor / 1-Bed / 2-Bed / 3-Bed+ / Total |
| `universe` | INTEGER | Total rental units in zone |
| `vacancy_rate` | DECIMAL | Vacancy rate (%) |
| `vacancy_rate_reliability` | VARCHAR(1) | Reliability code (a/b/c/d) |
| `availability_rate` | DECIMAL | Availability rate (%) |
| `average_rent` | DECIMAL | Average monthly rent ($) |
| `average_rent_reliability` | VARCHAR(1) | Reliability code |
| `median_rent` | DECIMAL | Median monthly rent ($) |
| `rent_change_pct` | DECIMAL | YoY rent change (%) |
| `turnover_rate` | DECIMAL | Unit turnover rate (%) |
### Dimensions
| Dimension | Granularity | Values |
|-----------|-------------|--------|
| **Time** | Annual | 1990 to present (October snapshot) |
| **Geography** | Zone | ~20 CMHC zones in Toronto CMA |
| **Bedroom Type** | Category | Bachelor, 1-Bed, 2-Bed, 3-Bed+, Total |
| **Structure Type** | Category | Row, Apartment (available in detailed tables) |
### Metrics Available
| Metric | Aggregation | Use Case |
|--------|-------------|----------|
| `average_rent` | Pre-calculated avg | Primary rent indicator |
| `median_rent` | Pre-calculated median | Robust rent indicator |
| `vacancy_rate` | Percentage | Market tightness |
| `availability_rate` | Percentage | Supply accessibility |
| `turnover_rate` | Percentage | Tenant mobility |
| `rent_change_pct` | YoY % | Rent growth tracking |
| `universe` | Count | Market size |
### Reliability Codes
| Code | Meaning | Coefficient of Variation |
|------|---------|-------------------------|
| `a` | Excellent | CV ≤ 2.5% |
| `b` | Good | 2.5% < CV ≤ 5% |
| `c` | Fair | 5% < CV ≤ 10% |
| `d` | Poor (use with caution) | CV > 10% |
| `**` | Data suppressed | Sample too small |
### ⚠ Limitations
- Annual only (no monthly granularity)
- October snapshot (point-in-time)
- Zones are larger than TRREB districts
- Purpose-built rental only (excludes condo rentals in base survey)
---
## Data Source #3: City of Toronto Open Data
### Source Details
| Attribute | Value |
|-----------|-------|
| **Provider** | City of Toronto |
| **URL** | [Toronto Open Data Portal](https://open.toronto.ca/) |
| **Format** | GeoJSON, Shapefile, CSV |
| **Use Case** | Reference layer, demographic enrichment |
### Relevant Datasets
#### Dataset: `neighbourhoods`
| Column | Data Type | Description |
|--------|-----------|-------------|
| `area_id` | INTEGER | Neighbourhood ID (1-158) |
| `area_name` | VARCHAR(100) | Official neighbourhood name |
| `geometry` | POLYGON | Boundary geometry |
#### Dataset: `neighbourhood_profiles` (Census-linked)
| Column | Data Type | Description |
|--------|-----------|-------------|
| `neighbourhood_id` | INTEGER | Links to neighbourhoods |
| `population` | INTEGER | Total population |
| `avg_household_income` | DECIMAL | Average household income |
| `dwelling_count` | INTEGER | Total dwellings |
| `owner_pct` | DECIMAL | % owner-occupied |
| `renter_pct` | DECIMAL | % renter-occupied |
### Enrichment Potential
Can overlay demographic context on housing data:
- Income brackets by neighbourhood
- Ownership vs rental ratios
- Population density
- Dwelling type distribution
---
## Data Source #4: Enrichment Data (Density, Education)
### Purpose
Provide socioeconomic context to housing price analysis. Enables questions like:
- Do neighbourhoods with higher education attainment have higher prices?
- How does population density correlate with price per square foot?
### Geographic Alignment Reality
**Critical constraint**: Enrichment data is available at the **158-neighbourhood** level, while core housing data sits at **TRREB districts (~35)** and **CMHC zones (~20)**. These do not align cleanly.
```
158 Neighbourhoods (fine) → Enrichment data lives here
(no clean crosswalk)
~35 TRREB Districts (coarse) → Purchase data lives here
~20 CMHC Zones (coarse) → Rental data lives here
```
### Available Enrichment Datasets
#### Dataset: Neighbourhood Profiles (Census)
| Attribute | Value |
|-----------|-------|
| **Provider** | City of Toronto (via Statistics Canada Census) |
| **URL** | [Toronto Open Data - Neighbourhood Profiles](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
| **Format** | CSV, JSON, XML, XLSX |
| **Update Frequency** | Every 5 years (Census cycle) |
| **Available Years** | 2001, 2006, 2011, 2016, 2021 |
| **Geographic Unit** | 158 neighbourhoods (140 pre-2021) |
**Key Variables**:
| Variable | Description | Use Case |
|----------|-------------|----------|
| `population` | Total population | Density calculation |
| `land_area_sqkm` | Area in square kilometers | Density calculation |
| `pop_density_per_sqkm` | Population per km | Density metric |
| `pct_bachelors_or_higher` | % age 25-64 with bachelor's+ | Education proxy |
| `median_household_income` | Median total household income | Income metric |
| `avg_household_income` | Average total household income | Income metric |
| `pct_owner_occupied` | % owner-occupied dwellings | Tenure split |
| `pct_renter_occupied` | % renter-occupied dwellings | Tenure split |
**Download URL (2021, 158 neighbourhoods)**:
```
https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx
```
### Crime Data — Deferred to Portfolio Phase 4
Crime data (TPS Neighbourhood Crime Rates) is **not included in V1 scope**. It will be added in portfolio Phase 4 after the Energy Pricing project is complete.
**Rationale**:
- Crime data is socially/politically sensitive and requires careful methodology documentation
- V1 focuses on core housing metrics and policy events
- Deferral reduces scope creep risk
**Future Reference** (Portfolio Phase 4):
- Source: [TPS Public Safety Data Portal](https://data.torontopolice.on.ca/)
- Dataset: Neighbourhood Crime Rates (Major Crime Indicators)
- Geographic Unit: 158 neighbourhoods
### V1 Enrichment Data Summary
| Measure | Source | Geography | Frequency | Format | Status |
|---------|--------|-----------|-----------|--------|--------|
| **Population Density** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
| **Education Attainment** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
| **Median Income** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
| **Crime Rates (MCI)** | TPS Data Portal | 158 neighbourhoods | Annual | GeoJSON/CSV | Deferred to Portfolio Phase 4 |
---
## Data Source #5: Policy Events
### Purpose
Provide temporal context for housing price movements. Display as annotation markers on time series charts. **No causation claims** — correlation/context only.
### Event Schema
#### Table: `dim_policy_event`
| Column | Data Type | Description |
|--------|-----------|-------------|
| `event_id` | INTEGER (PK) | Auto-increment primary key |
| `event_date` | DATE | Date event was announced/occurred |
| `effective_date` | DATE | Date policy took effect (if different) |
| `level` | VARCHAR(20) | `federal` / `provincial` / `municipal` |
| `category` | VARCHAR(20) | `monetary` / `tax` / `regulatory` / `supply` / `economic` |
| `title` | VARCHAR(200) | Short event title for display |
| `description` | TEXT | Longer description for tooltip |
| `expected_direction` | VARCHAR(10) | `bearish` / `bullish` / `neutral` |
| `source_url` | VARCHAR(500) | Link to official announcement/documentation |
| `confidence` | VARCHAR(10) | `high` / `medium` / `low` |
| `created_at` | TIMESTAMP | Record creation timestamp |
### Event Tiers
| Tier | Level | Category Examples | Inclusion Criteria |
|------|-------|-------------------|-------------------|
| **1** | Federal | BoC rate decisions, OSFI stress tests | Always include; objective, documented |
| **1** | Provincial | Fair Housing Plan, foreign buyer tax, rent control | Always include; legislative record |
| **2** | Municipal | Zoning reforms, development charges | Include if material impact expected |
| **2** | Economic | COVID measures, major employer closures | Include if Toronto-specific impact |
| **3** | Market | Major project announcements | Strict criteria; must be verifiable |
### Expected Direction Values
| Value | Meaning | Example |
|-------|---------|---------|
| `bullish` | Expected to increase prices | Rate cut, supply restriction |
| `bearish` | Expected to decrease prices | Rate hike, foreign buyer tax |
| `neutral` | Uncertain or mixed impact | Regulatory clarification |
### ⚠ Caveats
- **No causation claims**: Events are context, not explanation
- **Lag effects**: Policy impact may not be immediate
- **Confounding factors**: Multiple simultaneous influences
- **Display only**: No statistical analysis in V1
### Sample Events (Tier 1)
| Date | Level | Category | Title | Direction |
|------|-------|----------|-------|-----------|
| 2017-04-20 | provincial | tax | Ontario Fair Housing Plan | bearish |
| 2018-01-01 | federal | regulatory | OSFI B-20 Stress Test | bearish |
| 2020-03-27 | federal | monetary | BoC Emergency Rate Cut (0.25%) | bullish |
| 2022-03-02 | federal | monetary | BoC Rate Hike Cycle Begins | bearish |
| 2023-06-01 | federal | tax | Federal 2-Year Foreign Buyer Ban | bearish |
---
## Data Integration Strategy
### Temporal Alignment
| Source | Native Frequency | Alignment Strategy |
|--------|------------------|---------------------|
| TRREB | Monthly | Use as-is |
| CMHC | Annual (October) | Spread to monthly OR display annual overlay |
| Census/Enrichment | 5-year | Static snapshot; display as reference |
| Policy Events | Event-based | Display as vertical markers on time axis |
**Recommendation**: Keep separate time axes. TRREB monthly for purchases, CMHC annual for rentals. Don't force artificial monthly rental data.
### Geographic Alignment
```
┌─────────────────────────────────────────────────────────────────┐
│ VISUALIZATION APPROACH │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Purchase Mode Rental Mode │
│ ───────────────── ────────────── │
│ Map: TRREB Districts Map: CMHC Zones │
│ Time: Monthly slider Time: Annual selector │
│ Metrics: Price, Sales Metrics: Rent, Vacancy │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ City Neighbourhoods Overlay │ │
│ │ (158 boundaries as reference layer) │ │
│ │ + Enrichment data (density, education, income) │ │
│ ──────────────────────────────────────────────────────────┘ │
│ │
────────────────────────────────────────────────────────────────────┘
```
### Enrichment Integration Strategy (Phased)
#### V1: Reference Overlay (Current Scope)
**Approach**: Display neighbourhood enrichment as a separate toggle-able layer. No joins to housing data.
**UX**:
- User hovers over TRREB district → tooltip shows "This district contains neighbourhoods: Annex, Casa Loma, Yorkville..."
- User toggles "Show Enrichment" → choropleth switches to neighbourhood-level density/education/income
- Enrichment and housing metrics displayed side-by-side, not merged
**Pros**:
- No imputation or dodgy aggregations
- Honest about geographic mismatch
- Ships faster
**Cons**:
- Can't do correlation analysis (price vs. enrichment) directly in dashboard
**Implementation**:
- `dim_neighbourhood` as standalone dimension (no FK to fact tables)
- Spatial lookup on hover (point-in-polygon)
#### V2/Portfolio Phase 4: Area-Weighted Aggregation (Future Scope)
**Approach**: Pre-compute area-weighted averages of neighbourhood metrics for each TRREB district and CMHC zone.
**Process**:
1. Spatial join: intersect neighbourhood polygons with TRREB/CMHC polygons
2. Compute overlap area for each neighbourhood-district pair
3. Weight neighbourhood metrics by overlap area proportion
4. User selects aggregation method in UI
**Aggregation Methods to Expose**:
| Method | Description | Best For |
|--------|-------------|----------|
| **Area-weighted mean** | Weight by % overlap area | Continuous metrics (density) |
| **Population-weighted mean** | Weight by population in overlap | Per-capita metrics (education) |
| **Majority assignment** | Assign neighbourhood to district with >50% overlap | Categorical data |
| **Max overlap** | Assign to single district with largest overlap | 1:1 mapping needs |
**Default**: Population-weighted (more defensible for per-capita metrics). Hide selector behind "Advanced" toggle.
### V1 Future-Proofing (Do Now)
| Action | Why |
|--------|-----|
| Store neighbourhood boundaries in same CRS as TRREB/CMHC (WGS84) | Avoids reprojection headaches |
| Keep `dim_neighbourhood` normalized (not denormalized into district tables) | Clean separation for V2 join |
| Document Census year for each metric | Ready for 2026 Census |
| Include `census_year` column in dim_neighbourhood | Enables SCD tracking |
### V1 Defer (Don't Do Yet)
| Action | Why Not |
|--------|---------|
| Pre-compute area-weighted crosswalk | Don't need for V1 |
| Build aggregation method selector UI | No backend to support it |
| Crime data integration | Deferred to Portfolio Phase 4 |
| Historical neighbourhood boundary reconciliation (140→158) | Use 2021+ data only for V1 |
---
## Proposed Data Model
### Star Schema
```
┌──────────────────┐
│ dim_time │
├──────────────────┤
│ date_key (PK) │
│ year │
│ month │
│ quarter │
│ month_name │
───────────────────────┘
┌─────────────────────────────────────────────┐
│ │ │
┌──────────────────┐ │ ┌──────────────────┐
│ dim_trreb_district│ │ │ dim_cmhc_zone │
├──────────────────┤ │ ├──────────────────┤
│ district_key (PK)│ │ │ zone_key (PK) │
│ district_code │ │ │ zone_code │
│ district_name │ │ │ zone_name │
│ area_type │ │ │ geometry │
│ geometry │
───────────────────────┘ │ │
│ │ │
┌──────────────────┐ │ ┌──────────────────┐
│ fact_purchases │ │ │ fact_rentals │
├──────────────────┤ │ ├──────────────────┤
│ date_key (FK) │ │ │ date_key (FK) │
│ district_key (FK)│ │ │ zone_key (FK) │
│ sales_count │ │ │ bedroom_type │
│ avg_price │ │ │ avg_rent │
│ median_price │ │ │ median_rent │
│ new_listings │ │ │ vacancy_rate │
│ active_listings │ │ │ universe │
│ avg_dom │ │ │ turnover_rate │
│ avg_sp_lp │ │ │ reliability_code │
─────────────────────┘ │ ─────────────────────┘
┌───────────────────────────┐
│ dim_neighbourhood │
├───────────────────────────┤
│ neighbourhood_id (PK) │
│ name │
│ geometry │
│ population │
│ land_area_sqkm │
│ pop_density_per_sqkm │
│ pct_bachelors_or_higher │
│ median_household_income │
│ pct_owner_occupied │
│ pct_renter_occupied │
│ census_year │ ← For SCD tracking
──────────────────────────────┘
┌───────────────────────────┐
│ dim_policy_event │
├───────────────────────────┤
│ event_id (PK) │
│ event_date │
│ effective_date │
│ level │ ← federal/provincial/municipal
│ category │ ← monetary/tax/regulatory/supply/economic
│ title │
│ description │
│ expected_direction │ ← bearish/bullish/neutral
│ source_url │
│ confidence │ ← high/medium/low
│ created_at │
──────────────────────────────┘
┌───────────────────────────┐
│ bridge_district_neighbourhood │ ← Portfolio Phase 4 ONLY
├───────────────────────────┤
│ district_key (FK) │
│ neighbourhood_id (FK) │
│ area_overlap_pct │
│ population_overlap │ ← For pop-weighted agg
──────────────────────────────┘
```
**Notes**:
- `dim_neighbourhood` has no FK relationship to fact tables in V1
- `dim_policy_event` is standalone (no FK to facts); used for time-series annotation
- `bridge_district_neighbourhood` is Portfolio Phase 4 scope only
- Similar bridge table needed for CMHC zones in Portfolio Phase 4
---
## File Structure
> **Note**: Toronto Housing data logic lives in `portfolio_app/toronto/`. See `portfolio_project_plan_v5.md` for full project structure.
### Data Directory Structure
```
data/
└── toronto/
├── raw/
│ ├── trreb/
│ │ └── market_watch_YYYY_MM.pdf
│ ├── cmhc/
│ │ └── rental_survey_YYYY.csv
│ ├── enrichment/
│ │ └── neighbourhood_profiles_2021.xlsx
│ └── geo/
│ ├── toronto_neighbourhoods.geojson
│ ├── trreb_districts.geojson ← (to be created via QGIS)
│ └── cmhc_zones.geojson ← (from R cmhc package)
├── processed/ ← gitignored
│ ├── fact_purchases.parquet
│ ├── fact_rentals.parquet
│ ├── dim_time.parquet
│ ├── dim_trreb_district.parquet
│ ├── dim_cmhc_zone.parquet
│ ├── dim_neighbourhood.parquet
│ └── dim_policy_event.parquet
└── reference/
├── policy_events.csv ← Curated event list
└── neighbourhood_boundary_changelog.md ← 140→158 notes
```
### Code Module Structure
```
portfolio_app/toronto/
├── __init__.py
├── parsers/
│ ├── __init__.py
│ ├── trreb.py # PDF extraction
│ └── cmhc.py # CSV processing
├── loaders/
│ ├── __init__.py
│ └── database.py # DB operations
├── schemas/ # Pydantic models
│ ├── __init__.py
│ ├── trreb.py
│ ├── cmhc.py
│ ├── enrichment.py
│ └── policy_event.py
├── models/ # SQLAlchemy ORM
│ ├── __init__.py
│ ├── base.py # DeclarativeBase, engine
│ ├── dimensions.py # dim_time, dim_trreb_district, dim_policy_event, etc.
│ └── facts.py # fact_purchases, fact_rentals
└── transforms/
└── __init__.py
```
### Notebooks
```
notebooks/
├── 01_trreb_pdf_extraction.ipynb
├── 02_cmhc_data_prep.ipynb
├── 03_geo_layer_prep.ipynb
├── 04_enrichment_data_prep.ipynb
├── 05_policy_events_curation.ipynb
└── 06_spatial_crosswalk.ipynb ← Portfolio Phase 4 only
```
---
## ✅ Implementation Checklist
> **Note**: These are **Stages** within the Toronto Housing project (Portfolio Phase 1). They are distinct from the overall portfolio **Phases** defined in `portfolio_project_plan_v5.md`.
### Stage 1: Data Acquisition
- [ ] Download TRREB monthly PDFs (2020-present as MVP)
- [ ] Register for CMHC portal and export Toronto rental data
- [ ] Extract CMHC zone boundaries via R `cmhc` package
- [ ] Download City of Toronto neighbourhood GeoJSON (158 boundaries)
- [ ] Digitize TRREB district boundaries in QGIS
- [ ] Download Neighbourhood Profiles (2021 Census, 158-model)
### Stage 2: Data Processing
- [ ] Build TRREB PDF parser (`portfolio_app/toronto/parsers/trreb.py`)
- [ ] Build Pydantic schemas (`portfolio_app/toronto/schemas/`)
- [ ] Build SQLAlchemy models (`portfolio_app/toronto/models/`)
- [ ] Extract and validate TRREB monthly summaries
- [ ] Clean and structure CMHC rental data
- [ ] Process Neighbourhood Profiles into `dim_neighbourhood`
- [ ] Curate and load policy events into `dim_policy_event`
- [ ] Create dimension tables
- [ ] Build fact tables
- [ ] Validate all geospatial layers use same CRS (WGS84/EPSG:4326)
### Stage 3: Visualization (V1)
- [ ] Create dashboard page (`portfolio_app/pages/toronto/dashboard.py`)
- [ ] Build choropleth figures (`portfolio_app/figures/choropleth.py`)
- [ ] Build time series figures (`portfolio_app/figures/time_series.py`)
- [ ] Design dashboard layout (purchase/rental toggle)
- [ ] Implement choropleth map with layer switching
- [ ] Add time slider/selector
- [ ] Build neighbourhood overlay (toggle-able)
- [ ] Add enrichment layer toggle (density/education/income choropleth)
- [ ] Add policy event markers on time series
- [ ] Add tooltips with cross-reference info ("This district contains...")
- [ ] Add tooltips showing enrichment metrics on hover
### Stage 4: Polish (V1)
- [ ] Add data source citations
- [ ] Document methodology (especially geographic limitations)
- [ ] Write docs (`docs/methodology.md`, `docs/data_sources.md`)
- [ ] Deploy to portfolio
### Future Enhancements (Portfolio Phase 4 — Post-Energy Project)
- [ ] Add crime data to dim_neighbourhood
- [ ] Build spatial crosswalk (neighbourhood ↔ district/zone intersections)
- [ ] Compute area-weighted and population-weighted aggregations
- [ ] Add aggregation method selector to UI
- [ ] Enable correlation analysis (price vs. enrichment metrics)
- [ ] Add historical neighbourhood boundary support (140→158)
**Deployment & dbt Architecture**: See `portfolio_project_plan_v5.md` for:
- dbt layer structure and testing strategy
- Deployment architecture
- Data quality framework
---
## References & Links
### Core Housing Data
| Resource | URL |
|----------|-----|
| TRREB Market Watch | https://trreb.ca/index.php/market-news/market-watch |
| CMHC Housing Portal | https://www03.cmhc-schl.gc.ca/hmip-pimh/ |
### Geographic Boundaries
| Resource | URL |
|----------|-----|
| Toronto Neighbourhoods GeoJSON | https://github.com/jasonicarter/toronto-geojson |
| TRREB District Map (PDF) | https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf |
| Statistics Canada Census Tracts | https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index-eng.cfm |
| R `cmhc` package (CRAN) | https://cran.r-project.org/package=cmhc |
### Enrichment Data
| Resource | URL |
|----------|-----|
| Toronto Open Data Portal | https://open.toronto.ca/ |
| Neighbourhood Profiles (CKAN) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/neighbourhood-profiles |
| Neighbourhood Profiles 2021 (Direct Download) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx |
### Policy Events Research
| Resource | URL |
|----------|-----|
| Bank of Canada Interest Rates | https://www.bankofcanada.ca/rates/interest-rates/ |
| OSFI (Stress Test Rules) | https://www.osfi-bsif.gc.ca/ |
| Ontario Legislature (Bills) | https://www.ola.org/ |
### Reference Documentation
| Resource | URL |
|----------|-----|
| Statistics Canada 2021 Census Reference | https://www12.statcan.gc.ca/census-recensement/2021/ref/index-eng.cfm |
| City of Toronto Neighbourhood Profiles Overview | https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/neighbourhood-profiles/ |
---
## Related Documents
| Document | Relationship | Use For |
|----------|--------------|---------|
| `portfolio_project_plan_v5.md` | Parent document | Overall scope, phasing, tech stack, deployment, dbt architecture, data quality framework |
---
*Document Version: 5.1*
*Updated: January 2026*
*Project: Toronto Housing Price Dashboard — Portfolio Piece*

794
docs/wbs_sprint_plan_v4.md Normal file
View File

@@ -0,0 +1,794 @@
# Work Breakdown Structure & Sprint Plan
**Project**: Toronto Housing Dashboard (Portfolio Phase 1)
**Version**: 4.1
**Date**: January 2026
---
## Document Context
| Attribute | Value |
|-----------|-------|
| **Parent Documents** | `portfolio_project_plan_v5.md`, `toronto_housing_dashboard_spec_v5.md` |
| **Content Source** | `bio_content_v2.md` |
| **Role** | Executable sprint plan for Phase 1 delivery |
---
## Milestones
| Milestone | Deliverable | Target Sprint |
|-----------|-------------|---------------|
| **Launch 1** | Bio Landing Page | Sprint 2 |
| **Launch 2** | Toronto Housing Dashboard | Sprint 6 |
---
## WBS Structure
```
1.0 Launch 1: Bio Landing Page
├── 1.1 Project Bootstrap
├── 1.2 Infrastructure
├── 1.3 Application Foundation
├── 1.4 Bio Page
└── 1.5 Deployment
2.0 Launch 2: Toronto Housing Dashboard
├── 2.1 Data Acquisition
├── 2.2 Data Processing
├── 2.3 Database Layer
├── 2.4 dbt Transformation
├── 2.5 Visualization
├── 2.6 Documentation
└── 2.7 Operations
```
---
## Launch 1: Bio Landing Page
### 1.1 Project Bootstrap
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 1.1.1 | Git repository initialization | — | Low | Low |
| 1.1.2 | Create `.gitignore` | 1.1.1 | Low | Low |
| 1.1.3 | Create `pyproject.toml` | 1.1.1 | Low | Low |
| 1.1.4 | Create `.python-version` (3.11+) | 1.1.1 | Low | Low |
| 1.1.5 | Create `.env.example` | 1.1.1 | Low | Low |
| 1.1.6 | Create `README.md` (initial) | 1.1.1 | Low | Low |
| 1.1.7 | Create `CLAUDE.md` | 1.1.1 | Low | Low |
| 1.1.8 | Create `Makefile` with all targets | 1.1.3 | Low | Medium |
### 1.2 Infrastructure
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 1.2.1 | Python env setup (pyenv, venv, deps) | 1.1.3, 1.1.4 | Low | Low |
| 1.2.2 | Create `.pre-commit-config.yaml` | 1.2.1 | Low | Low |
| 1.2.3 | Install pre-commit hooks | 1.2.2 | Low | Low |
| 1.2.4 | Create `docker-compose.yml` (PostgreSQL + PostGIS) | 1.1.5 | Low | Low |
| 1.2.5 | Create `scripts/` directory structure | 1.1.1 | Low | Low |
| 1.2.6 | Create `scripts/docker/up.sh` | 1.2.5 | Low | Low |
| 1.2.7 | Create `scripts/docker/down.sh` | 1.2.5 | Low | Low |
| 1.2.8 | Create `scripts/docker/logs.sh` | 1.2.5 | Low | Low |
| 1.2.9 | Create `scripts/docker/rebuild.sh` | 1.2.5 | Low | Low |
| 1.2.10 | Create `scripts/db/init.sh` (PostGIS extension) | 1.2.5 | Low | Low |
| 1.2.11 | Create `scripts/dev/setup.sh` | 1.2.5 | Low | Low |
| 1.2.12 | Verify Docker + PostGIS working | 1.2.4, 1.2.10 | Low | Low |
### 1.3 Application Foundation
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 1.3.1 | Create `portfolio_app/` directory structure (full tree) | 1.2.1 | Low | Low |
| 1.3.2 | Create `portfolio_app/__init__.py` | 1.3.1 | Low | Low |
| 1.3.3 | Create `portfolio_app/config.py` (Pydantic BaseSettings) | 1.3.1 | Low | Medium |
| 1.3.4 | Create `portfolio_app/errors/__init__.py` | 1.3.1 | Low | Low |
| 1.3.5 | Create `portfolio_app/errors/exceptions.py` | 1.3.4 | Low | Low |
| 1.3.6 | Create `portfolio_app/errors/handlers.py` | 1.3.5 | Low | Medium |
| 1.3.7 | Create `portfolio_app/app.py` (Dash + Pages routing) | 1.3.3 | Low | Medium |
| 1.3.8 | Configure dash-mantine-components theme | 1.3.7 | Low | Low |
| 1.3.9 | Create `portfolio_app/assets/` directory | 1.3.1 | Low | Low |
| 1.3.10 | Create `portfolio_app/assets/styles.css` | 1.3.9 | Low | Medium |
| 1.3.11 | Create `portfolio_app/assets/variables.css` | 1.3.9 | Low | Low |
| 1.3.12 | Add `portfolio_app/assets/favicon.ico` | 1.3.9 | Low | Low |
| 1.3.13 | Create `portfolio_app/assets/images/` directory | 1.3.9 | Low | Low |
| 1.3.14 | Create `tests/` directory structure | 1.2.1 | Low | Low |
| 1.3.15 | Create `tests/__init__.py` | 1.3.14 | Low | Low |
| 1.3.16 | Create `tests/conftest.py` | 1.3.14 | Low | Medium |
| 1.3.17 | Configure pytest in `pyproject.toml` | 1.1.3, 1.3.14 | Low | Low |
### 1.4 Bio Page
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 1.4.1 | Create `portfolio_app/components/__init__.py` | 1.3.1 | Low | Low |
| 1.4.2 | Create `portfolio_app/components/navbar.py` | 1.4.1, 1.3.8 | Low | Low |
| 1.4.3 | Create `portfolio_app/components/footer.py` | 1.4.1, 1.3.8 | Low | Low |
| 1.4.4 | Create `portfolio_app/components/cards.py` | 1.4.1, 1.3.8 | Low | Low |
| 1.4.5 | Create `portfolio_app/pages/__init__.py` | 1.3.1 | Low | Low |
| 1.4.6 | Create `portfolio_app/pages/home.py` (layout) | 1.4.5, 1.4.2, 1.4.3 | Low | Low |
| 1.4.7 | Integrate bio content from `bio_content_v2.md` | 1.4.6 | Low | Low |
| 1.4.8 | Replace social link placeholders with real URLs | 1.4.7 | Low | Low |
| 1.4.9 | Implement project cards (deployed/in-dev logic) | 1.4.4, 1.4.6 | Low | Low |
| 1.4.10 | Test bio page renders locally | 1.4.9 | Low | Low |
### 1.5 Deployment
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 1.5.1 | Install PostgreSQL + PostGIS on VPS | — | Low | Low |
| 1.5.2 | Configure firewall (ufw: SSH, HTTP, HTTPS) | 1.5.1 | Low | Low |
| 1.5.3 | Create application database user | 1.5.1 | Low | Low |
| 1.5.4 | Create Gunicorn systemd service file | 1.4.10 | Low | Low |
| 1.5.5 | Configure Nginx reverse proxy | 1.5.4 | Low | Low |
| 1.5.6 | Configure SSL (certbot) | 1.5.5 | Low | Low |
| 1.5.7 | Create `scripts/deploy/deploy.sh` | 1.2.5 | Low | Low |
| 1.5.8 | Create `scripts/deploy/health-check.sh` | 1.2.5 | Low | Low |
| 1.5.9 | Deploy bio page | 1.5.6, 1.5.7 | Low | Low |
| 1.5.10 | Verify HTTPS access | 1.5.9 | Low | Low |
---
## Launch 2: Toronto Housing Dashboard
### 2.1 Data Acquisition
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.1.1 | Define TRREB year scope + download PDFs | — | Low | Low |
| 2.1.2 | **HUMAN**: Digitize TRREB district boundaries (QGIS) | 2.1.1 | High | High |
| 2.1.3 | Register for CMHC portal | — | Low | Low |
| 2.1.4 | Export CMHC Toronto rental CSVs | 2.1.3 | Low | Low |
| 2.1.5 | Extract CMHC zone boundaries (R cmhc package) | 2.1.3 | Low | Medium |
| 2.1.6 | Download neighbourhoods GeoJSON (158 boundaries) | — | Low | Low |
| 2.1.7 | Download Neighbourhood Profiles 2021 (xlsx) | — | Low | Low |
| 2.1.8 | Validate CRS alignment (all geo files WGS84) | 2.1.2, 2.1.5, 2.1.6 | Low | Medium |
| 2.1.9 | Research Tier 1 policy events (10—20 events) | — | Mid | Medium |
| 2.1.10 | Create `data/toronto/reference/policy_events.csv` | 2.1.9 | Low | Low |
| 2.1.11 | Create `data/` directory structure | 1.3.1 | Low | Low |
| 2.1.12 | Organize raw files into `data/toronto/raw/` | 2.1.11 | Low | Low |
| 2.1.13 | Test TRREB parser across year boundaries | 2.2.3 | Low | Medium |
### 2.2 Data Processing
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.2.1 | Create `portfolio_app/toronto/__init__.py` | 1.3.1 | Low | Low |
| 2.2.2 | Create `portfolio_app/toronto/parsers/__init__.py` | 2.2.1 | Low | Low |
| 2.2.3 | Build TRREB PDF parser (`parsers/trreb.py`) | 2.2.2, 2.1.1 | Mid | High |
| 2.2.4 | TRREB data cleaning/normalization | 2.2.3 | Low | Medium |
| 2.2.5 | TRREB parser unit tests | 2.2.4 | Low | Low |
| 2.2.6 | Build CMHC CSV processor (`parsers/cmhc.py`) | 2.2.2, 2.1.4 | Low | Low |
| 2.2.7 | CMHC reliability code handling | 2.2.6 | Low | Low |
| 2.2.8 | CMHC processor unit tests | 2.2.7 | Low | Low |
| 2.2.9 | Build Neighbourhood Profiles parser | 2.2.1, 2.1.7 | Low | Low |
| 2.2.10 | Policy events CSV loader | 2.2.1, 2.1.10 | Low | Low |
### 2.3 Database Layer
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.3.1 | Create `portfolio_app/toronto/schemas/__init__.py` | 2.2.1 | Low | Low |
| 2.3.2 | Create TRREB Pydantic schemas (`schemas/trreb.py`) | 2.3.1 | Low | Medium |
| 2.3.3 | Create CMHC Pydantic schemas (`schemas/cmhc.py`) | 2.3.1 | Low | Medium |
| 2.3.4 | Create enrichment Pydantic schemas (`schemas/enrichment.py`) | 2.3.1 | Low | Low |
| 2.3.5 | Create policy event Pydantic schema (`schemas/policy_event.py`) | 2.3.1 | Low | Low |
| 2.3.6 | Create `portfolio_app/toronto/models/__init__.py` | 2.2.1 | Low | Low |
| 2.3.7 | Create SQLAlchemy base (`models/base.py`) | 2.3.6, 1.3.3 | Low | Medium |
| 2.3.8 | Create dimension models (`models/dimensions.py`) | 2.3.7 | Low | Medium |
| 2.3.9 | Create fact models (`models/facts.py`) | 2.3.8 | Low | Medium |
| 2.3.10 | Create `portfolio_app/toronto/loaders/__init__.py` | 2.2.1 | Low | Low |
| 2.3.11 | Create dimension loaders (`loaders/database.py`) | 2.3.10, 2.3.8 | Low | Medium |
| 2.3.12 | Create fact loaders | 2.3.11, 2.3.9, 2.2.4, 2.2.7 | Mid | Medium |
| 2.3.13 | Loader integration tests | 2.3.12 | Low | Medium |
| 2.3.14 | Create SQL views for dashboard queries | 2.3.12 | Low | Medium |
### 2.4 dbt Transformation
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.4.1 | Create `dbt/` directory structure | 1.3.1 | Low | Low |
| 2.4.2 | Create `dbt/dbt_project.yml` | 2.4.1 | Low | Low |
| 2.4.3 | Create `dbt/profiles.yml` | 2.4.1, 1.3.3 | Low | Low |
| 2.4.4 | Create `scripts/dbt/run.sh` | 1.2.5 | Low | Low |
| 2.4.5 | Create `scripts/dbt/test.sh` | 1.2.5 | Low | Low |
| 2.4.6 | Create `scripts/dbt/docs.sh` | 1.2.5 | Low | Low |
| 2.4.7 | Create `scripts/dbt/fresh.sh` | 1.2.5 | Low | Low |
| 2.4.8 | Create staging models (`stg_trreb__monthly`, `stg_cmhc__rental`) | 2.4.3, 2.3.12 | Low | Medium |
| 2.4.9 | Create intermediate models | 2.4.8 | Low | Medium |
| 2.4.10 | Create mart models | 2.4.9 | Low | Medium |
| 2.4.11 | Create dbt schema tests (unique, not_null, relationships) | 2.4.10 | Low | Medium |
| 2.4.12 | Create custom dbt tests (anomaly detection) | 2.4.11 | Low | Medium |
| 2.4.13 | Create dbt documentation (schema.yml) | 2.4.10 | Low | Low |
### 2.5 Visualization
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.5.1 | Create `portfolio_app/figures/__init__.py` | 1.3.1 | Low | Low |
| 2.5.2 | Build choropleth factory (`figures/choropleth.py`) | 2.5.1, 2.1.8 | Mid | Medium |
| 2.5.3 | Build time series factory (`figures/time_series.py`) | 2.5.1 | Low | Medium |
| 2.5.4 | Build YoY change chart factory (`figures/statistical.py`) | 2.5.1 | Low | Medium |
| 2.5.5 | Build seasonality decomposition chart | 2.5.4 | Low | Medium |
| 2.5.6 | Build district correlation matrix chart | 2.5.4 | Low | Medium |
| 2.5.7 | Create `portfolio_app/pages/toronto/__init__.py` | 1.4.5 | Low | Low |
| 2.5.8 | Create `portfolio_app/pages/toronto/dashboard.py` (layout only) | 2.5.7, 1.4.2, 1.4.3 | Mid | High |
| 2.5.9 | Implement purchase/rental mode toggle | 2.5.8 | Low | Low |
| 2.5.10 | Implement monthly time slider | 2.5.8 | Low | Medium |
| 2.5.11 | Implement annual time selector (CMHC) | 2.5.8 | Low | Low |
| 2.5.12 | Implement layer toggles (districts/zones/neighbourhoods) | 2.5.8 | Low | Medium |
| 2.5.13 | Create `portfolio_app/pages/toronto/callbacks/__init__.py` | 2.5.7 | Low | Low |
| 2.5.14 | Create `callbacks/map_callbacks.py` | 2.5.13, 2.5.2 | Mid | Medium |
| 2.5.15 | Create `callbacks/filter_callbacks.py` | 2.5.13 | Low | Medium |
| 2.5.16 | Create `callbacks/timeseries_callbacks.py` | 2.5.13, 2.5.3 | Low | Medium |
| 2.5.17 | Implement district/zone tooltips | 2.5.14 | Low | Low |
| 2.5.18 | Implement neighbourhood overlay | 2.5.14, 2.1.6 | Low | Medium |
| 2.5.19 | Implement enrichment layer toggle | 2.5.18 | Low | Medium |
| 2.5.20 | Implement policy event markers on time series | 2.5.16, 2.2.10 | Low | Medium |
| 2.5.21 | Implement "district contains neighbourhoods" tooltip | 2.5.17 | Low | Low |
| 2.5.22 | Test dashboard renders with sample data | 2.5.20 | Low | Medium |
### 2.6 Documentation
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.6.1 | Create `docs/` directory | 1.3.1 | Low | Low |
| 2.6.2 | Write `docs/methodology.md` (geographic limitations) | 2.5.22 | Low | Medium |
| 2.6.3 | Write `docs/data_sources.md` (citations) | 2.5.22 | Low | Low |
| 2.6.4 | Write `docs/user_guide.md` | 2.5.22 | Low | Low |
| 2.6.5 | Update `README.md` (final) | 2.6.2, 2.6.3 | Low | Low |
| 2.6.6 | Update `CLAUDE.md` (final) | 2.6.5 | Low | Low |
### 2.7 Operations
| ID | Task | Depends On | Effort | Complexity |
|----|------|------------|--------|------------|
| 2.7.1 | Create `scripts/db/backup.sh` | 1.2.5 | Low | Low |
| 2.7.2 | Create `scripts/db/restore.sh` | 1.2.5 | Low | Low |
| 2.7.3 | Create `scripts/db/reset.sh` (dev only) | 1.2.5 | Low | Low |
| 2.7.4 | Create `scripts/deploy/rollback.sh` | 1.2.5 | Low | Medium |
| 2.7.5 | Implement backup retention policy | 2.7.1 | Low | Low |
| 2.7.6 | Add `/health` endpoint | 2.5.8 | Low | Low |
| 2.7.7 | Configure uptime monitoring (external) | 2.7.6 | Low | Low |
| 2.7.8 | Deploy Toronto dashboard | 1.5.9, 2.5.22 | Low | Low |
| 2.7.9 | Verify production deployment | 2.7.8 | Low | Low |
---
## L3 Task Details
### 1.1 Project Bootstrap
#### 1.1.1 Git repository initialization
| Attribute | Value |
|-----------|-------|
| **What** | Initialize git repo with main branch |
| **How** | `git init`, initial commit |
| **Inputs** | — |
| **Outputs** | `.git/` directory |
| **Why** | Version control foundation |
#### 1.1.2 Create `.gitignore`
| Attribute | Value |
|-----------|-------|
| **What** | Git ignore rules per project plan |
| **How** | Create file with patterns for: `.env`, `data/*/processed/`, `reports/`, `backups/`, `notebooks/*.html`, `__pycache__/`, `.venv/` |
| **Inputs** | Project plan → Directory Rules |
| **Outputs** | `.gitignore` |
#### 1.1.3 Create `pyproject.toml`
| Attribute | Value |
|-----------|-------|
| **What** | Python packaging config |
| **How** | Define project metadata, dependencies, tool configs (ruff, mypy, pytest) |
| **Inputs** | Tech stack versions from project plan |
| **Outputs** | `pyproject.toml` |
| **Dependencies** | PostgreSQL 16.x, Pydantic ≥2.0, SQLAlchemy ≥2.0, dbt-postgres ≥1.7, Pandas ≥2.1, GeoPandas ≥0.14, Dash ≥2.14, dash-mantine-components (latest), pytest ≥7.0 |
#### 1.1.4 Create `.python-version`
| Attribute | Value |
|-----------|-------|
| **What** | pyenv version file |
| **How** | Single line: `3.11` or specific patch version |
| **Outputs** | `.python-version` |
#### 1.1.5 Create `.env.example`
| Attribute | Value |
|-----------|-------|
| **What** | Environment variable template |
| **How** | Template with: DATABASE_URL, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB, DASH_DEBUG, SECRET_KEY, LOG_LEVEL |
| **Inputs** | Project plan → Environment Setup |
| **Outputs** | `.env.example` |
#### 1.1.6 Create `README.md` (initial)
| Attribute | Value |
|-----------|-------|
| **What** | Project overview stub |
| **How** | Title, brief description, "Setup coming soon" |
| **Outputs** | `README.md` |
#### 1.1.7 Create `CLAUDE.md`
| Attribute | Value |
|-----------|-------|
| **What** | AI assistant context file |
| **How** | Project context, architecture decisions, patterns, conventions |
| **Inputs** | Project plan → Code Architecture |
| **Outputs** | `CLAUDE.md` |
| **Why** | Claude Code effectiveness from day 1 |
#### 1.1.8 Create `Makefile`
| Attribute | Value |
|-----------|-------|
| **What** | All make targets from project plan |
| **How** | Implement targets: setup, venv, clean, docker-up/down/logs/rebuild, db-init/backup/restore/reset, run, run-prod, dbt-run/test/docs/fresh, test, test-cov, lint, format, typecheck, ci, deploy, rollback |
| **Inputs** | Project plan → Makefile Targets |
| **Outputs** | `Makefile` |
### 1.2 Infrastructure
#### 1.2.4 Create `docker-compose.yml`
| Attribute | Value |
|-----------|-------|
| **What** | Docker Compose V2 for PostgreSQL 16 + PostGIS |
| **How** | Service definition, volume mounts, port 5432, env vars from `.env` |
| **Inputs** | `.env.example` |
| **Outputs** | `docker-compose.yml` |
| **Note** | No `version` field (Docker Compose V2) |
#### 1.2.5 Create `scripts/` directory structure
| Attribute | Value |
|-----------|-------|
| **What** | Full scripts tree per project plan |
| **How** | `mkdir -p scripts/{db,docker,deploy,dbt,dev}` |
| **Outputs** | `scripts/db/`, `scripts/docker/`, `scripts/deploy/`, `scripts/dbt/`, `scripts/dev/` |
#### 1.2.10 Create `scripts/db/init.sh`
| Attribute | Value |
|-----------|-------|
| **What** | Database initialization with PostGIS |
| **How** | `CREATE DATABASE`, `CREATE EXTENSION postgis`, schema creation |
| **Standard** | `set -euo pipefail`, usage comment, idempotent |
| **Outputs** | `scripts/db/init.sh` |
### 1.3 Application Foundation
#### 1.3.1 Create `portfolio_app/` directory structure
| Attribute | Value |
|-----------|-------|
| **What** | Full application tree per project plan |
| **Directories** | `portfolio_app/`, `portfolio_app/assets/`, `portfolio_app/assets/images/`, `portfolio_app/pages/`, `portfolio_app/pages/toronto/`, `portfolio_app/pages/toronto/callbacks/`, `portfolio_app/components/`, `portfolio_app/figures/`, `portfolio_app/toronto/`, `portfolio_app/toronto/parsers/`, `portfolio_app/toronto/loaders/`, `portfolio_app/toronto/schemas/`, `portfolio_app/toronto/models/`, `portfolio_app/toronto/transforms/`, `portfolio_app/errors/` |
| **Pattern** | Callbacks in `pages/{dashboard}/callbacks/` per project plan |
#### 1.3.3 Create `config.py`
| Attribute | Value |
|-----------|-------|
| **What** | Pydantic BaseSettings for config |
| **How** | Settings class loading from `.env` |
| **Fields** | DATABASE_URL, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB, DASH_DEBUG, SECRET_KEY, LOG_LEVEL |
#### 1.3.5 Create `exceptions.py`
| Attribute | Value |
|-----------|-------|
| **What** | Exception hierarchy per project plan |
| **Classes** | `PortfolioError` (base), `ParseError`, `ValidationError`, `LoadError` |
#### 1.3.6 Create `handlers.py`
| Attribute | Value |
|-----------|-------|
| **What** | Error handling decorators |
| **How** | Decorators for: logging/re-raise, retry logic, transaction boundaries, timing |
| **Pattern** | Infrastructure concerns only; domain logic uses explicit handling |
#### 1.3.7 Create `app.py`
| Attribute | Value |
|-----------|-------|
| **What** | Dash app factory with Pages routing |
| **How** | `Dash(__name__, use_pages=True)`, MantineProvider wrapper |
| **Imports** | External: absolute; Internal: relative (dot notation) |
#### 1.3.16 Create `conftest.py`
| Attribute | Value |
|-----------|-------|
| **What** | pytest fixtures |
| **How** | Test database fixture, sample data fixtures, app client fixture |
### 1.4 Bio Page
#### 1.4.7 Integrate bio content
| Attribute | Value |
|-----------|-------|
| **What** | Content from `bio_content_v2.md` |
| **Sections** | Headline, Professional Summary, Tech Stack, Side Project, Availability |
| **Layout** | Hero → Summary → Tech Stack → Project Cards → Social Links → Availability |
#### 1.4.8 Replace social link placeholders
| Attribute | Value |
|-----------|-------|
| **What** | Replace `[USERNAME]` in LinkedIn/GitHub URLs |
| **Source** | `bio_content_v2.md` → Social Links |
| **Acceptance** | No placeholder text in production |
#### 1.4.9 Implement project cards
| Attribute | Value |
|-----------|-------|
| **What** | Dynamic project card display |
| **Logic** | Show deployed projects with links; show "In Development" for in-progress; hide or grey out planned |
| **Source** | `bio_content_v2.md` → Portfolio Projects Section |
### 2.1 Data Acquisition
#### 2.1.1 Define TRREB year scope + download PDFs
| Attribute | Value |
|-----------|-------|
| **What** | Decide which years to parse for V1, download PDFs |
| **Decision** | 2020—present for V1 (manageable scope, consistent PDF format). Expand to 2007+ in future if needed. |
| **Output** | `data/toronto/raw/trreb/market_watch_YYYY_MM.pdf` |
| **Note** | PDF format may vary pre-2018; test before committing to older years |
#### 2.1.2 Digitize TRREB district boundaries
| Attribute | Value |
|-----------|-------|
| **What** | GeoJSON with ~35 district polygons |
| **Tool** | QGIS |
| **Process** | Import PDF as raster → create vector layer → trace polygons → add attributes (district_code, district_name, area_type) → export GeoJSON (WGS84/EPSG:4326) |
| **Input** | TRREB Toronto.pdf map |
| **Output** | `data/toronto/raw/geo/trreb_districts.geojson` |
| **Effort** | High |
| **Complexity** | High |
| **Note** | HUMAN TASK — not automatable |
#### 2.1.5 Extract CMHC zone boundaries
| Attribute | Value |
|-----------|-------|
| **What** | GeoJSON with ~20 zone polygons |
| **Tool** | R with cmhc and sf packages |
| **Process** | `get_cmhc_geography(geography_type="ZONE", cma="Toronto")``st_write()` to GeoJSON |
| **Output** | `data/toronto/raw/geo/cmhc_zones.geojson` |
#### 2.1.9 Research Tier 1 policy events
| Attribute | Value |
|-----------|-------|
| **What** | Federal/provincial policy events with dates, descriptions, expected direction |
| **Sources** | Bank of Canada, OSFI, Ontario Legislature |
| **Schema** | event_date, effective_date, level, category, title, description, expected_direction, source_url, confidence |
| **Acceptance** | Minimum 10 events, maximum 20 |
| **Examples** | BoC rate decisions, OSFI B-20, Ontario Fair Housing Plan, foreign buyer tax |
#### 2.1.13 Test TRREB parser across year boundaries
| Attribute | Value |
|-----------|-------|
| **What** | Verify parser handles PDFs from different years |
| **Test Cases** | 2020 Q1, 2022 Q1, 2024 Q1 (minimum) |
| **Check For** | Table structure changes, column naming variations, page number shifts |
| **Output** | Documented format variations, parser fallbacks if needed |
### 2.2 Data Processing
#### 2.2.3 Build TRREB PDF parser
| Attribute | Value |
|-----------|-------|
| **What** | Extract summary tables from TRREB PDFs |
| **Tool** | pdfplumber or camelot-py |
| **Location** | Pages 3-4 (Summary by Area) |
| **Fields** | report_date, area_code, area_name, area_type, sales, dollar_volume, avg_price, median_price, new_listings, active_listings, avg_sp_lp, avg_dom |
| **Output** | `portfolio_app/toronto/parsers/trreb.py` |
#### 2.2.7 CMHC reliability code handling
| Attribute | Value |
|-----------|-------|
| **What** | Parse reliability codes, handle suppression |
| **Codes** | a (excellent), b (good), c (fair), d (poor/caution), ** (suppressed → NULL) |
| **Implementation** | Pydantic validators, enum type |
### 2.3 Database Layer
#### 2.3.8 Create dimension models
| Attribute | Value |
|-----------|-------|
| **What** | SQLAlchemy 2.0 models for dimensions |
| **Tables** | `dim_time`, `dim_trreb_district`, `dim_cmhc_zone`, `dim_neighbourhood`, `dim_policy_event` |
| **Geometry** | PostGIS geometry columns for districts, zones, neighbourhoods |
| **Note** | `dim_neighbourhood` has no FK to facts in V1 |
#### 2.3.9 Create fact models
| Attribute | Value |
|-----------|-------|
| **What** | SQLAlchemy 2.0 models for facts |
| **Tables** | `fact_purchases`, `fact_rentals` |
| **FKs** | fact_purchases → dim_time, dim_trreb_district; fact_rentals → dim_time, dim_cmhc_zone |
### 2.4 dbt Transformation
#### 2.4.8 Create staging models
| Attribute | Value |
|-----------|-------|
| **What** | 1:1 source mapping, cleaned and typed |
| **Models** | `stg_trreb__monthly`, `stg_cmhc__rental` |
| **Naming** | `stg_{source}__{entity}` |
#### 2.4.11 Create dbt schema tests
| Attribute | Value |
|-----------|-------|
| **What** | Data quality tests |
| **Tests** | `unique` (PKs), `not_null` (required), `accepted_values` (reliability codes, area_type), `relationships` (FK integrity) |
#### 2.4.12 Create custom dbt tests
| Attribute | Value |
|-----------|-------|
| **What** | Anomaly detection rules |
| **Rules** | Price MoM change >30% → flag; missing districts → fail; duplicate records → fail |
### 2.5 Visualization
#### 2.5.2 Build choropleth factory
| Attribute | Value |
|-----------|-------|
| **What** | Reusable choropleth_mapbox figure generator |
| **Inputs** | GeoDataFrame, metric column, color config |
| **Output** | Plotly figure |
| **Location** | `portfolio_app/figures/choropleth.py` |
#### 2.5.4—2.5.6 Statistical chart factories
| Attribute | Value |
|-----------|-------|
| **What** | Statistical analysis visualizations |
| **Charts** | YoY change with variance bands, seasonality decomposition, district correlation matrix |
| **Location** | `portfolio_app/figures/statistical.py` |
| **Why** | Required skill demonstration per project plan |
#### 2.5.8 Create dashboard layout
| Attribute | Value |
|-----------|-------|
| **What** | Toronto dashboard page structure |
| **File** | `portfolio_app/pages/toronto/dashboard.py` |
| **Pattern** | Layout only — no callbacks in this file |
| **Components** | Navbar, choropleth map, time controls, layer toggles, time series panel, statistics panel, footer |
#### 2.5.13—2.5.16 Create callbacks
| Attribute | Value |
|-----------|-------|
| **What** | Dashboard interaction logic |
| **Location** | `portfolio_app/pages/toronto/callbacks/` |
| **Files** | `__init__.py`, `map_callbacks.py`, `filter_callbacks.py`, `timeseries_callbacks.py` |
| **Pattern** | Separate from layout per project plan callback separation pattern |
| **Registration** | Import callback modules in `callbacks/__init__.py`; import that package in `dashboard.py`. Dash Pages auto-discovers callbacks when module is imported. |
#### 2.5.22 Test dashboard renders with sample data
| Attribute | Value |
|-----------|-------|
| **What** | Verify dashboard works end-to-end |
| **Sample Data** | Use output from task 2.3.12 (fact loaders). Run loaders with subset of parsed data before this task. |
| **Verify** | Choropleth renders, time controls work, tooltips display, no console errors |
---
## Sprint Plan
### Sprint 1: Project Bootstrap + Start TRREB Digitization
**Goal**: Dev environment working, repo initialized, TRREB digitization started
| Task ID | Task | Effort |
|---------|------|--------|
| 1.1.1 | Git repo init | Low |
| 1.1.2 | .gitignore | Low |
| 1.1.3 | pyproject.toml | Low |
| 1.1.4 | .python-version | Low |
| 1.1.5 | .env.example | Low |
| 1.1.6 | README.md (initial) | Low |
| 1.1.7 | CLAUDE.md | Low |
| 1.1.8 | Makefile | Low |
| 1.2.1 | Python env setup | Low |
| 1.2.2 | .pre-commit-config.yaml | Low |
| 1.2.3 | Install pre-commit | Low |
| 1.2.4 | docker-compose.yml | Low |
| 1.2.5 | scripts/ directory structure | Low |
| 1.2.6—1.2.9 | Docker scripts | Low |
| 1.2.10 | scripts/db/init.sh | Low |
| 1.2.11 | scripts/dev/setup.sh | Low |
| 1.2.12 | Verify Docker + PostGIS | Low |
| 1.3.1 | portfolio_app/ directory structure | Low |
| 1.3.2—1.3.6 | App foundation files | Low |
| 1.3.14—1.3.17 | Test infrastructure | Low |
| 2.1.1 | Download TRREB PDFs | Low |
| 2.1.2 | **START** TRREB boundaries (HUMAN) | High |
| 2.1.9 | **START** Policy events research | Mid |
---
### Sprint 2: Bio Page + Data Acquisition
**Goal**: Bio live, all raw data downloaded
| Task ID | Task | Effort |
|---------|------|--------|
| 1.3.7 | app.py with Pages | Low |
| 1.3.8 | Theme config | Low |
| 1.3.9—1.3.13 | Assets directory + files | Low |
| 1.4.1—1.4.4 | Components | Low |
| 1.4.5—1.4.10 | Bio page | Low |
| 1.5.1—1.5.3 | VPS setup | Low |
| 1.5.4—1.5.6 | Gunicorn/Nginx/SSL | Low |
| 1.5.7—1.5.8 | Deploy scripts | Low |
| 1.5.9—1.5.10 | Deploy + verify | Low |
| 2.1.2 | **CONTINUE** TRREB boundaries | High |
| 2.1.3—2.1.4 | CMHC registration + export | Low |
| 2.1.5 | CMHC zone boundaries (R) | Low |
| 2.1.6 | Neighbourhoods GeoJSON | Low |
| 2.1.7 | Neighbourhood Profiles download | Low |
| 2.1.9 | **CONTINUE** Policy events research | Mid |
| 2.1.10 | policy_events.csv | Low |
| 2.1.11—2.1.12 | data/ directory + organize | Low |
**Milestone**: **Launch 1 — Bio Live**
---
### Sprint 3: Parsers + Schemas + Models
**Goal**: ETL pipeline working, database layer complete
| Task ID | Task | Effort |
|---------|------|--------|
| 2.1.2 | **COMPLETE** TRREB boundaries | High |
| 2.1.8 | CRS validation | Low |
| 2.2.1—2.2.2 | Toronto module init | Low |
| 2.2.3—2.2.5 | TRREB parser + tests | Mid |
| 2.2.6—2.2.8 | CMHC processor + tests | Low |
| 2.2.9 | Neighbourhood Profiles parser | Low |
| 2.2.10 | Policy events loader | Low |
| 2.3.1—2.3.5 | Pydantic schemas | Low |
| 2.3.6—2.3.9 | SQLAlchemy models | Low |
---
### Sprint 4: Loaders + dbt
**Goal**: Data loaded, transformation layer ready
| Task ID | Task | Effort |
|---------|------|--------|
| 2.3.10—2.3.13 | Loaders + tests | Mid |
| 2.3.14 | SQL views | Low |
| 2.4.1—2.4.7 | dbt setup + scripts | Low |
| 2.4.8—2.4.10 | dbt models | Low |
| 2.4.11—2.4.12 | dbt tests | Low |
| 2.4.13 | dbt documentation | Low |
| 2.7.1—2.7.3 | DB backup/restore scripts | Low |
---
### Sprint 5: Visualization
**Goal**: Dashboard functional
| Task ID | Task | Effort |
|---------|------|--------|
| 2.5.1—2.5.6 | Figure factories | Mid |
| 2.5.7—2.5.12 | Dashboard layout + controls | Mid |
| 2.5.13—2.5.16 | Callbacks | Mid |
| 2.5.17—2.5.21 | Tooltips + overlays + markers | Low |
| 2.5.22 | Test dashboard | Low |
---
### Sprint 6: Polish + Launch 2
**Goal**: Dashboard deployed
| Task ID | Task | Effort |
|---------|------|--------|
| 2.6.1—2.6.6 | Documentation | Low |
| 2.7.4—2.7.5 | Rollback script + retention | Low |
| 2.7.6—2.7.7 | Health endpoint + monitoring | Low |
| 2.7.8—2.7.9 | Deploy + verify | Low |
**Milestone**: **Launch 2 — Toronto Dashboard Live**
---
### Sprint 7: Buffer
**Goal**: Contingency for slippage, bug fixes
| Task ID | Task | Effort |
|---------|------|--------|
| — | Overflow from previous sprints | Varies |
| — | Bug fixes | Varies |
| — | UX polish | Low |
---
## Sprint Summary
| Sprint | Focus | Key Risk | Milestone |
|--------|-------|----------|-----------|
| 1 | Bootstrap + start boundaries | — | — |
| 2 | Bio + data acquisition | TRREB digitization | Launch 1 |
| 3 | Parsers + DB layer | PDF parser, boundaries | — |
| 4 | Loaders + dbt | — | — |
| 5 | Visualization | Choropleth complexity | — |
| 6 | Polish + deploy | — | Launch 2 |
| 7 | Buffer | — | — |
---
## Dependency Graph
### Launch 1 Critical Path
```
1.1.1 → 1.1.3 → 1.2.1 → 1.3.1 → 1.3.7 → 1.4.6 → 1.4.10 → 1.5.9 → 1.5.10
```
### Launch 2 Critical Path
```
2.1.2 (TRREB boundaries) ─┬→ 2.1.8 (CRS) → 2.5.2 (choropleth) → 2.5.8 (layout) → 2.5.22 (test) → 2.7.8 (deploy)
2.1.1 → 2.2.3 (parser) → 2.2.4 → 2.3.12 (loaders) → 2.4.8 (dbt) ─┘
```
### Parallel Tracks (can run simultaneously)
| Track | Tasks | Can Start |
|-------|-------|-----------|
| **A: TRREB Boundaries** | 2.1.1 → 2.1.2 | Sprint 1 |
| **B: TRREB Parser** | 2.2.3—2.2.5 | Sprint 2 (after PDFs) |
| **C: CMHC** | 2.1.3—2.1.5 → 2.2.6—2.2.8 | Sprint 2 |
| **D: Enrichment** | 2.1.6—2.1.7 → 2.2.9 | Sprint 2 |
| **E: Policy Events** | 2.1.9—2.1.10 → 2.2.10 | Sprint 1—2 |
| **F: Schemas/Models** | 2.3.1—2.3.9 | Sprint 3 (after parsers) |
| **G: dbt** | 2.4.* | Sprint 4 (after loaders) |
| **H: Ops Scripts** | 2.7.1—2.7.5 | Sprint 4 |
---
## Risk Register
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| TRREB digitization slips | Medium | High | Start Sprint 1; timebox; accept lower precision initially |
| PDF parser breaks on older years | Medium | Medium | Test multiple years early; build fallbacks |
| PostGIS geometry issues | Low | Medium | Validate CRS before load (2.1.8) |
| Choropleth performance | Low | Medium | Pre-aggregate; simplify geometries |
| Policy events research takes too long | Medium | Low | Cap at 10 events minimum; expand post-launch |
---
## Acceptance Criteria
### Launch 1
- [ ] Bio page accessible via HTTPS
- [ ] All content from `bio_content_v2.md` rendered
- [ ] No placeholder text ([USERNAME]) visible
- [ ] Mobile responsive
- [ ] Social links functional
### Launch 2
- [ ] Choropleth renders TRREB districts
- [ ] Choropleth renders CMHC zones
- [ ] Purchase/rental mode toggle works
- [ ] Time navigation works (monthly for TRREB, annual for CMHC)
- [ ] Policy event markers visible on time series
- [ ] Neighbourhood overlay toggleable
- [ ] Methodology documentation published
- [ ] Data sources cited
- [ ] Health endpoint responds
---
## Effort Legend
| Level | Meaning |
|-------|---------|
| **Low** | Straightforward; minimal iteration expected |
| **Mid** | Requires debugging or multi-step coordination |
| **High** | Complex logic, external tools, or human intervention required |
---
*Document Version: 4.1*
*Created: January 2026*