Upload files to "docs"

This commit is contained in:
2026-01-11 19:01:43 +00:00
parent 38e4a0354b
commit ff58e0a3ea
2 changed files with 1603 additions and 0 deletions

View File

@@ -0,0 +1,809 @@
# Toronto Housing Price Dashboard
## Portfolio Project — Data Specification & Architecture
**Version**: 5.1
**Last Updated**: January 2026
**Status**: Specification Complete
---
## Document Context
| Attribute | Value |
|-----------|-------|
| **Parent Document** | `portfolio_project_plan_v5.md` |
| **Role** | Detailed specification for Toronto Housing Dashboard |
| **Scope** | Data schemas, source URLs, geographic boundaries, V1/V2 decisions |
**Rule**: For overall project scope, phasing, tech stack, and deployment architecture, see `portfolio_project_plan_v5.md`. This document provides implementation-level detail for the Toronto Housing project specifically.
**Terminology Note**: This document uses **Stages 14** to describe Toronto Housing implementation steps. These are distinct from the **Phases 15** in `portfolio_project_plan_v5.md`, which describe the overall portfolio project lifecycle.
---
## Project Overview
A dashboard analyzing housing price variations across Toronto neighbourhoods over time, with dual analysis tracks:
| Track | Data Domain | Primary Source | Geographic Unit |
|-------|-------------|----------------|-----------------|
| **Purchases** | Sales transactions | TRREB Monthly Reports | ~35 Districts |
| **Rentals** | Rental market stats | CMHC Rental Market Survey | ~20 Zones |
**Core Visualization**: Interactive choropleth map of Toronto with toggle between rental/purchase analysis, time-series exploration by month/year.
**Enrichment Layer** (V1: overlay only): Neighbourhood-level demographic and socioeconomic context including population density, education attainment, and income. Crime data deferred to Phase 4 of the portfolio project (post-Energy project).
**Tech Stack & Deployment**: See `portfolio_project_plan_v5.md` → Tech Stack, Deployment Architecture
---
## Geographic Layers
### Layer Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Official Neighbourhoods (158) │ ← Reference overlay + Enrichment data
├─────────────────────────────────────────────────────────────────┤
│ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data
├─────────────────────────────────────────────────────────────────┤
│ CMHC Survey Zones (~20) — Census Tract aligned │ ← Rental data
└─────────────────────────────────────────────────────────────────┘
```
### Boundary Files
| Layer | Zones | Format | Source | Status |
|-------|-------|--------|--------|--------|
| **City Neighbourhoods** | 158 | GeoJSON, Shapefile | [GitHub - jasonicarter/toronto-geojson](https://github.com/jasonicarter/toronto-geojson) | ✅ Ready to use |
| **TRREB Districts** | ~35 | PDF only | [TRREB Toronto Map PDF](https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf) | ⚠ Requires manual digitization |
| **CMHC Zones** | ~20 | R package | R `cmhc` package via `get_cmhc_geography()` | ✅ Available (see note) |
### Digitization Task: TRREB Districts
**Input**: TRREB Toronto PDF map
**Output**: GeoJSON with district codes (W01-W10, C01-C15, E01-E11)
**Tool**: QGIS
**Process**:
1. Import PDF as raster layer in QGIS
2. Create vector layer with polygon features
3. Trace district boundaries
4. Add attributes: `district_code`, `district_name`, `area_type` (West/Central/East)
5. Export as GeoJSON (WGS84 / EPSG:4326)
### CMHC Zone Boundaries
**Source**: The R `cmhc` package provides CMHC survey geography via the `get_cmhc_geography()` function.
**Extraction Process**:
```r
# In R
library(cmhc)
library(sf)
# Get Toronto CMA zones
toronto_zones <- get_cmhc_geography(
geography_type = "ZONE",
cma = "Toronto"
)
# Export to GeoJSON for Python/PostGIS
st_write(toronto_zones, "cmhc_zones.geojson", driver = "GeoJSON")
```
**Output**: `data/toronto/raw/geo/cmhc_zones.geojson`
**Why R?**: CMHC zone boundaries are not published as standalone files. The `cmhc` R package is the only reliable programmatic source. One-time extraction, then use GeoJSON in Python stack.
### ⚠ Neighbourhood Boundary Change (140 → 158)
The City of Toronto updated from 140 to 158 social planning neighbourhoods in **April 2021**. This affects data alignment:
| Data Source | Pre-2021 | Post-2021 | Handling |
|-------------|----------|-----------|----------|
| Census (2016 and earlier) | 140 neighbourhoods | N/A | Use 140-model files |
| Census (2021+) | N/A | 158 neighbourhoods | Use 158-model files |
**V1 Strategy**: Use 2021 Census on 158 boundaries only. Defer historical trend analysis to portfolio Phase 4.
---
## Data Source #1: TRREB Monthly Market Reports
### Source Details
| Attribute | Value |
|-----------|-------|
| **Provider** | Toronto Regional Real Estate Board |
| **URL** | [TRREB Market Watch](https://trreb.ca/index.php/market-news/market-watch) |
| **Format** | PDF (monthly reports) |
| **Update Frequency** | Monthly |
| **Historical Availability** | 2007Present |
| **Access** | Public (aggregate data in PDFs) |
| **Extraction Method** | PDF parsing (`pdfplumber` or `camelot-py`) |
### Available Tables
#### Table: `trreb_monthly_summary`
**Location in PDF**: Pages 3-4 (Summary by Area)
| Column | Data Type | Description |
|--------|-----------|-------------|
| `report_date` | DATE | First of month (YYYY-MM-01) |
| `area_code` | VARCHAR(3) | District code (W01, C01, E01, etc.) |
| `area_name` | VARCHAR(100) | District name |
| `area_type` | VARCHAR(10) | West / Central / East / North |
| `sales` | INTEGER | Number of transactions |
| `dollar_volume` | DECIMAL | Total sales volume ($) |
| `avg_price` | DECIMAL | Average sale price ($) |
| `median_price` | DECIMAL | Median sale price ($) |
| `new_listings` | INTEGER | New listings count |
| `active_listings` | INTEGER | Active listings at month end |
| `avg_sp_lp` | DECIMAL | Avg sale price / list price ratio (%) |
| `avg_dom` | INTEGER | Average days on market |
### Dimensions
| Dimension | Granularity | Values |
|-----------|-------------|--------|
| **Time** | Monthly | 2007-01 to present |
| **Geography** | District | ~35 TRREB districts |
| **Property Type** | Aggregate | All residential (no breakdown in summary) |
### Metrics Available
| Metric | Aggregation | Use Case |
|--------|-------------|----------|
| `avg_price` | Pre-calculated monthly avg | Primary price indicator |
| `median_price` | Pre-calculated monthly median | Robust price indicator |
| `sales` | Count | Market activity volume |
| `avg_dom` | Average | Market velocity |
| `avg_sp_lp` | Ratio | Buyer/seller market indicator |
| `new_listings` | Count | Supply indicator |
| `active_listings` | Snapshot | Inventory level |
### ⚠ Limitations
- No transaction-level data (aggregates only)
- Property type breakdown requires parsing additional tables
- PDF structure may vary slightly across years
- District boundaries haven't changed since 2011
---
## Data Source #2: CMHC Rental Market Survey
### Source Details
| Attribute | Value |
|-----------|-------|
| **Provider** | Canada Mortgage and Housing Corporation |
| **URL** | [CMHC Housing Market Information Portal](https://www03.cmhc-schl.gc.ca/hmip-pimh/) |
| **Format** | CSV export, API |
| **Update Frequency** | Annual (October survey) |
| **Historical Availability** | 1990Present |
| **Access** | Public, free registration for bulk downloads |
| **Geographic Levels** | CMA → Zone → Neighbourhood → Census Tract |
### Available Tables
#### Table: `cmhc_rental_summary`
**Portal Path**: Toronto → Primary Rental Market → Summary Statistics
| Column | Data Type | Description |
|--------|-----------|-------------|
| `survey_year` | INTEGER | Survey year (October) |
| `zone_code` | VARCHAR(10) | CMHC zone identifier |
| `zone_name` | VARCHAR(100) | Zone name |
| `bedroom_type` | VARCHAR(20) | Bachelor / 1-Bed / 2-Bed / 3-Bed+ / Total |
| `universe` | INTEGER | Total rental units in zone |
| `vacancy_rate` | DECIMAL | Vacancy rate (%) |
| `vacancy_rate_reliability` | VARCHAR(1) | Reliability code (a/b/c/d) |
| `availability_rate` | DECIMAL | Availability rate (%) |
| `average_rent` | DECIMAL | Average monthly rent ($) |
| `average_rent_reliability` | VARCHAR(1) | Reliability code |
| `median_rent` | DECIMAL | Median monthly rent ($) |
| `rent_change_pct` | DECIMAL | YoY rent change (%) |
| `turnover_rate` | DECIMAL | Unit turnover rate (%) |
### Dimensions
| Dimension | Granularity | Values |
|-----------|-------------|--------|
| **Time** | Annual | 1990 to present (October snapshot) |
| **Geography** | Zone | ~20 CMHC zones in Toronto CMA |
| **Bedroom Type** | Category | Bachelor, 1-Bed, 2-Bed, 3-Bed+, Total |
| **Structure Type** | Category | Row, Apartment (available in detailed tables) |
### Metrics Available
| Metric | Aggregation | Use Case |
|--------|-------------|----------|
| `average_rent` | Pre-calculated avg | Primary rent indicator |
| `median_rent` | Pre-calculated median | Robust rent indicator |
| `vacancy_rate` | Percentage | Market tightness |
| `availability_rate` | Percentage | Supply accessibility |
| `turnover_rate` | Percentage | Tenant mobility |
| `rent_change_pct` | YoY % | Rent growth tracking |
| `universe` | Count | Market size |
### Reliability Codes
| Code | Meaning | Coefficient of Variation |
|------|---------|-------------------------|
| `a` | Excellent | CV ≤ 2.5% |
| `b` | Good | 2.5% < CV ≤ 5% |
| `c` | Fair | 5% < CV ≤ 10% |
| `d` | Poor (use with caution) | CV > 10% |
| `**` | Data suppressed | Sample too small |
### ⚠ Limitations
- Annual only (no monthly granularity)
- October snapshot (point-in-time)
- Zones are larger than TRREB districts
- Purpose-built rental only (excludes condo rentals in base survey)
---
## Data Source #3: City of Toronto Open Data
### Source Details
| Attribute | Value |
|-----------|-------|
| **Provider** | City of Toronto |
| **URL** | [Toronto Open Data Portal](https://open.toronto.ca/) |
| **Format** | GeoJSON, Shapefile, CSV |
| **Use Case** | Reference layer, demographic enrichment |
### Relevant Datasets
#### Dataset: `neighbourhoods`
| Column | Data Type | Description |
|--------|-----------|-------------|
| `area_id` | INTEGER | Neighbourhood ID (1-158) |
| `area_name` | VARCHAR(100) | Official neighbourhood name |
| `geometry` | POLYGON | Boundary geometry |
#### Dataset: `neighbourhood_profiles` (Census-linked)
| Column | Data Type | Description |
|--------|-----------|-------------|
| `neighbourhood_id` | INTEGER | Links to neighbourhoods |
| `population` | INTEGER | Total population |
| `avg_household_income` | DECIMAL | Average household income |
| `dwelling_count` | INTEGER | Total dwellings |
| `owner_pct` | DECIMAL | % owner-occupied |
| `renter_pct` | DECIMAL | % renter-occupied |
### Enrichment Potential
Can overlay demographic context on housing data:
- Income brackets by neighbourhood
- Ownership vs rental ratios
- Population density
- Dwelling type distribution
---
## Data Source #4: Enrichment Data (Density, Education)
### Purpose
Provide socioeconomic context to housing price analysis. Enables questions like:
- Do neighbourhoods with higher education attainment have higher prices?
- How does population density correlate with price per square foot?
### Geographic Alignment Reality
**Critical constraint**: Enrichment data is available at the **158-neighbourhood** level, while core housing data sits at **TRREB districts (~35)** and **CMHC zones (~20)**. These do not align cleanly.
```
158 Neighbourhoods (fine) → Enrichment data lives here
(no clean crosswalk)
~35 TRREB Districts (coarse) → Purchase data lives here
~20 CMHC Zones (coarse) → Rental data lives here
```
### Available Enrichment Datasets
#### Dataset: Neighbourhood Profiles (Census)
| Attribute | Value |
|-----------|-------|
| **Provider** | City of Toronto (via Statistics Canada Census) |
| **URL** | [Toronto Open Data - Neighbourhood Profiles](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
| **Format** | CSV, JSON, XML, XLSX |
| **Update Frequency** | Every 5 years (Census cycle) |
| **Available Years** | 2001, 2006, 2011, 2016, 2021 |
| **Geographic Unit** | 158 neighbourhoods (140 pre-2021) |
**Key Variables**:
| Variable | Description | Use Case |
|----------|-------------|----------|
| `population` | Total population | Density calculation |
| `land_area_sqkm` | Area in square kilometers | Density calculation |
| `pop_density_per_sqkm` | Population per km | Density metric |
| `pct_bachelors_or_higher` | % age 25-64 with bachelor's+ | Education proxy |
| `median_household_income` | Median total household income | Income metric |
| `avg_household_income` | Average total household income | Income metric |
| `pct_owner_occupied` | % owner-occupied dwellings | Tenure split |
| `pct_renter_occupied` | % renter-occupied dwellings | Tenure split |
**Download URL (2021, 158 neighbourhoods)**:
```
https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx
```
### Crime Data — Deferred to Portfolio Phase 4
Crime data (TPS Neighbourhood Crime Rates) is **not included in V1 scope**. It will be added in portfolio Phase 4 after the Energy Pricing project is complete.
**Rationale**:
- Crime data is socially/politically sensitive and requires careful methodology documentation
- V1 focuses on core housing metrics and policy events
- Deferral reduces scope creep risk
**Future Reference** (Portfolio Phase 4):
- Source: [TPS Public Safety Data Portal](https://data.torontopolice.on.ca/)
- Dataset: Neighbourhood Crime Rates (Major Crime Indicators)
- Geographic Unit: 158 neighbourhoods
### V1 Enrichment Data Summary
| Measure | Source | Geography | Frequency | Format | Status |
|---------|--------|-----------|-----------|--------|--------|
| **Population Density** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
| **Education Attainment** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
| **Median Income** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
| **Crime Rates (MCI)** | TPS Data Portal | 158 neighbourhoods | Annual | GeoJSON/CSV | Deferred to Portfolio Phase 4 |
---
## Data Source #5: Policy Events
### Purpose
Provide temporal context for housing price movements. Display as annotation markers on time series charts. **No causation claims** — correlation/context only.
### Event Schema
#### Table: `dim_policy_event`
| Column | Data Type | Description |
|--------|-----------|-------------|
| `event_id` | INTEGER (PK) | Auto-increment primary key |
| `event_date` | DATE | Date event was announced/occurred |
| `effective_date` | DATE | Date policy took effect (if different) |
| `level` | VARCHAR(20) | `federal` / `provincial` / `municipal` |
| `category` | VARCHAR(20) | `monetary` / `tax` / `regulatory` / `supply` / `economic` |
| `title` | VARCHAR(200) | Short event title for display |
| `description` | TEXT | Longer description for tooltip |
| `expected_direction` | VARCHAR(10) | `bearish` / `bullish` / `neutral` |
| `source_url` | VARCHAR(500) | Link to official announcement/documentation |
| `confidence` | VARCHAR(10) | `high` / `medium` / `low` |
| `created_at` | TIMESTAMP | Record creation timestamp |
### Event Tiers
| Tier | Level | Category Examples | Inclusion Criteria |
|------|-------|-------------------|-------------------|
| **1** | Federal | BoC rate decisions, OSFI stress tests | Always include; objective, documented |
| **1** | Provincial | Fair Housing Plan, foreign buyer tax, rent control | Always include; legislative record |
| **2** | Municipal | Zoning reforms, development charges | Include if material impact expected |
| **2** | Economic | COVID measures, major employer closures | Include if Toronto-specific impact |
| **3** | Market | Major project announcements | Strict criteria; must be verifiable |
### Expected Direction Values
| Value | Meaning | Example |
|-------|---------|---------|
| `bullish` | Expected to increase prices | Rate cut, supply restriction |
| `bearish` | Expected to decrease prices | Rate hike, foreign buyer tax |
| `neutral` | Uncertain or mixed impact | Regulatory clarification |
### ⚠ Caveats
- **No causation claims**: Events are context, not explanation
- **Lag effects**: Policy impact may not be immediate
- **Confounding factors**: Multiple simultaneous influences
- **Display only**: No statistical analysis in V1
### Sample Events (Tier 1)
| Date | Level | Category | Title | Direction |
|------|-------|----------|-------|-----------|
| 2017-04-20 | provincial | tax | Ontario Fair Housing Plan | bearish |
| 2018-01-01 | federal | regulatory | OSFI B-20 Stress Test | bearish |
| 2020-03-27 | federal | monetary | BoC Emergency Rate Cut (0.25%) | bullish |
| 2022-03-02 | federal | monetary | BoC Rate Hike Cycle Begins | bearish |
| 2023-06-01 | federal | tax | Federal 2-Year Foreign Buyer Ban | bearish |
---
## Data Integration Strategy
### Temporal Alignment
| Source | Native Frequency | Alignment Strategy |
|--------|------------------|---------------------|
| TRREB | Monthly | Use as-is |
| CMHC | Annual (October) | Spread to monthly OR display annual overlay |
| Census/Enrichment | 5-year | Static snapshot; display as reference |
| Policy Events | Event-based | Display as vertical markers on time axis |
**Recommendation**: Keep separate time axes. TRREB monthly for purchases, CMHC annual for rentals. Don't force artificial monthly rental data.
### Geographic Alignment
```
┌─────────────────────────────────────────────────────────────────┐
│ VISUALIZATION APPROACH │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Purchase Mode Rental Mode │
│ ───────────────── ────────────── │
│ Map: TRREB Districts Map: CMHC Zones │
│ Time: Monthly slider Time: Annual selector │
│ Metrics: Price, Sales Metrics: Rent, Vacancy │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ City Neighbourhoods Overlay │ │
│ │ (158 boundaries as reference layer) │ │
│ │ + Enrichment data (density, education, income) │ │
│ ──────────────────────────────────────────────────────────┘ │
│ │
────────────────────────────────────────────────────────────────────┘
```
### Enrichment Integration Strategy (Phased)
#### V1: Reference Overlay (Current Scope)
**Approach**: Display neighbourhood enrichment as a separate toggle-able layer. No joins to housing data.
**UX**:
- User hovers over TRREB district → tooltip shows "This district contains neighbourhoods: Annex, Casa Loma, Yorkville..."
- User toggles "Show Enrichment" → choropleth switches to neighbourhood-level density/education/income
- Enrichment and housing metrics displayed side-by-side, not merged
**Pros**:
- No imputation or dodgy aggregations
- Honest about geographic mismatch
- Ships faster
**Cons**:
- Can't do correlation analysis (price vs. enrichment) directly in dashboard
**Implementation**:
- `dim_neighbourhood` as standalone dimension (no FK to fact tables)
- Spatial lookup on hover (point-in-polygon)
#### V2/Portfolio Phase 4: Area-Weighted Aggregation (Future Scope)
**Approach**: Pre-compute area-weighted averages of neighbourhood metrics for each TRREB district and CMHC zone.
**Process**:
1. Spatial join: intersect neighbourhood polygons with TRREB/CMHC polygons
2. Compute overlap area for each neighbourhood-district pair
3. Weight neighbourhood metrics by overlap area proportion
4. User selects aggregation method in UI
**Aggregation Methods to Expose**:
| Method | Description | Best For |
|--------|-------------|----------|
| **Area-weighted mean** | Weight by % overlap area | Continuous metrics (density) |
| **Population-weighted mean** | Weight by population in overlap | Per-capita metrics (education) |
| **Majority assignment** | Assign neighbourhood to district with >50% overlap | Categorical data |
| **Max overlap** | Assign to single district with largest overlap | 1:1 mapping needs |
**Default**: Population-weighted (more defensible for per-capita metrics). Hide selector behind "Advanced" toggle.
### V1 Future-Proofing (Do Now)
| Action | Why |
|--------|-----|
| Store neighbourhood boundaries in same CRS as TRREB/CMHC (WGS84) | Avoids reprojection headaches |
| Keep `dim_neighbourhood` normalized (not denormalized into district tables) | Clean separation for V2 join |
| Document Census year for each metric | Ready for 2026 Census |
| Include `census_year` column in dim_neighbourhood | Enables SCD tracking |
### V1 Defer (Don't Do Yet)
| Action | Why Not |
|--------|---------|
| Pre-compute area-weighted crosswalk | Don't need for V1 |
| Build aggregation method selector UI | No backend to support it |
| Crime data integration | Deferred to Portfolio Phase 4 |
| Historical neighbourhood boundary reconciliation (140→158) | Use 2021+ data only for V1 |
---
## Proposed Data Model
### Star Schema
```
┌──────────────────┐
│ dim_time │
├──────────────────┤
│ date_key (PK) │
│ year │
│ month │
│ quarter │
│ month_name │
───────────────────────┘
┌─────────────────────────────────────────────┐
│ │ │
┌──────────────────┐ │ ┌──────────────────┐
│ dim_trreb_district│ │ │ dim_cmhc_zone │
├──────────────────┤ │ ├──────────────────┤
│ district_key (PK)│ │ │ zone_key (PK) │
│ district_code │ │ │ zone_code │
│ district_name │ │ │ zone_name │
│ area_type │ │ │ geometry │
│ geometry │
───────────────────────┘ │ │
│ │ │
┌──────────────────┐ │ ┌──────────────────┐
│ fact_purchases │ │ │ fact_rentals │
├──────────────────┤ │ ├──────────────────┤
│ date_key (FK) │ │ │ date_key (FK) │
│ district_key (FK)│ │ │ zone_key (FK) │
│ sales_count │ │ │ bedroom_type │
│ avg_price │ │ │ avg_rent │
│ median_price │ │ │ median_rent │
│ new_listings │ │ │ vacancy_rate │
│ active_listings │ │ │ universe │
│ avg_dom │ │ │ turnover_rate │
│ avg_sp_lp │ │ │ reliability_code │
─────────────────────┘ │ ─────────────────────┘
┌───────────────────────────┐
│ dim_neighbourhood │
├───────────────────────────┤
│ neighbourhood_id (PK) │
│ name │
│ geometry │
│ population │
│ land_area_sqkm │
│ pop_density_per_sqkm │
│ pct_bachelors_or_higher │
│ median_household_income │
│ pct_owner_occupied │
│ pct_renter_occupied │
│ census_year │ ← For SCD tracking
──────────────────────────────┘
┌───────────────────────────┐
│ dim_policy_event │
├───────────────────────────┤
│ event_id (PK) │
│ event_date │
│ effective_date │
│ level │ ← federal/provincial/municipal
│ category │ ← monetary/tax/regulatory/supply/economic
│ title │
│ description │
│ expected_direction │ ← bearish/bullish/neutral
│ source_url │
│ confidence │ ← high/medium/low
│ created_at │
──────────────────────────────┘
┌───────────────────────────┐
│ bridge_district_neighbourhood │ ← Portfolio Phase 4 ONLY
├───────────────────────────┤
│ district_key (FK) │
│ neighbourhood_id (FK) │
│ area_overlap_pct │
│ population_overlap │ ← For pop-weighted agg
──────────────────────────────┘
```
**Notes**:
- `dim_neighbourhood` has no FK relationship to fact tables in V1
- `dim_policy_event` is standalone (no FK to facts); used for time-series annotation
- `bridge_district_neighbourhood` is Portfolio Phase 4 scope only
- Similar bridge table needed for CMHC zones in Portfolio Phase 4
---
## File Structure
> **Note**: Toronto Housing data logic lives in `portfolio_app/toronto/`. See `portfolio_project_plan_v5.md` for full project structure.
### Data Directory Structure
```
data/
└── toronto/
├── raw/
│ ├── trreb/
│ │ └── market_watch_YYYY_MM.pdf
│ ├── cmhc/
│ │ └── rental_survey_YYYY.csv
│ ├── enrichment/
│ │ └── neighbourhood_profiles_2021.xlsx
│ └── geo/
│ ├── toronto_neighbourhoods.geojson
│ ├── trreb_districts.geojson ← (to be created via QGIS)
│ └── cmhc_zones.geojson ← (from R cmhc package)
├── processed/ ← gitignored
│ ├── fact_purchases.parquet
│ ├── fact_rentals.parquet
│ ├── dim_time.parquet
│ ├── dim_trreb_district.parquet
│ ├── dim_cmhc_zone.parquet
│ ├── dim_neighbourhood.parquet
│ └── dim_policy_event.parquet
└── reference/
├── policy_events.csv ← Curated event list
└── neighbourhood_boundary_changelog.md ← 140→158 notes
```
### Code Module Structure
```
portfolio_app/toronto/
├── __init__.py
├── parsers/
│ ├── __init__.py
│ ├── trreb.py # PDF extraction
│ └── cmhc.py # CSV processing
├── loaders/
│ ├── __init__.py
│ └── database.py # DB operations
├── schemas/ # Pydantic models
│ ├── __init__.py
│ ├── trreb.py
│ ├── cmhc.py
│ ├── enrichment.py
│ └── policy_event.py
├── models/ # SQLAlchemy ORM
│ ├── __init__.py
│ ├── base.py # DeclarativeBase, engine
│ ├── dimensions.py # dim_time, dim_trreb_district, dim_policy_event, etc.
│ └── facts.py # fact_purchases, fact_rentals
└── transforms/
└── __init__.py
```
### Notebooks
```
notebooks/
├── 01_trreb_pdf_extraction.ipynb
├── 02_cmhc_data_prep.ipynb
├── 03_geo_layer_prep.ipynb
├── 04_enrichment_data_prep.ipynb
├── 05_policy_events_curation.ipynb
└── 06_spatial_crosswalk.ipynb ← Portfolio Phase 4 only
```
---
## ✅ Implementation Checklist
> **Note**: These are **Stages** within the Toronto Housing project (Portfolio Phase 1). They are distinct from the overall portfolio **Phases** defined in `portfolio_project_plan_v5.md`.
### Stage 1: Data Acquisition
- [ ] Download TRREB monthly PDFs (2020-present as MVP)
- [ ] Register for CMHC portal and export Toronto rental data
- [ ] Extract CMHC zone boundaries via R `cmhc` package
- [ ] Download City of Toronto neighbourhood GeoJSON (158 boundaries)
- [ ] Digitize TRREB district boundaries in QGIS
- [ ] Download Neighbourhood Profiles (2021 Census, 158-model)
### Stage 2: Data Processing
- [ ] Build TRREB PDF parser (`portfolio_app/toronto/parsers/trreb.py`)
- [ ] Build Pydantic schemas (`portfolio_app/toronto/schemas/`)
- [ ] Build SQLAlchemy models (`portfolio_app/toronto/models/`)
- [ ] Extract and validate TRREB monthly summaries
- [ ] Clean and structure CMHC rental data
- [ ] Process Neighbourhood Profiles into `dim_neighbourhood`
- [ ] Curate and load policy events into `dim_policy_event`
- [ ] Create dimension tables
- [ ] Build fact tables
- [ ] Validate all geospatial layers use same CRS (WGS84/EPSG:4326)
### Stage 3: Visualization (V1)
- [ ] Create dashboard page (`portfolio_app/pages/toronto/dashboard.py`)
- [ ] Build choropleth figures (`portfolio_app/figures/choropleth.py`)
- [ ] Build time series figures (`portfolio_app/figures/time_series.py`)
- [ ] Design dashboard layout (purchase/rental toggle)
- [ ] Implement choropleth map with layer switching
- [ ] Add time slider/selector
- [ ] Build neighbourhood overlay (toggle-able)
- [ ] Add enrichment layer toggle (density/education/income choropleth)
- [ ] Add policy event markers on time series
- [ ] Add tooltips with cross-reference info ("This district contains...")
- [ ] Add tooltips showing enrichment metrics on hover
### Stage 4: Polish (V1)
- [ ] Add data source citations
- [ ] Document methodology (especially geographic limitations)
- [ ] Write docs (`docs/methodology.md`, `docs/data_sources.md`)
- [ ] Deploy to portfolio
### Future Enhancements (Portfolio Phase 4 — Post-Energy Project)
- [ ] Add crime data to dim_neighbourhood
- [ ] Build spatial crosswalk (neighbourhood ↔ district/zone intersections)
- [ ] Compute area-weighted and population-weighted aggregations
- [ ] Add aggregation method selector to UI
- [ ] Enable correlation analysis (price vs. enrichment metrics)
- [ ] Add historical neighbourhood boundary support (140→158)
**Deployment & dbt Architecture**: See `portfolio_project_plan_v5.md` for:
- dbt layer structure and testing strategy
- Deployment architecture
- Data quality framework
---
## References & Links
### Core Housing Data
| Resource | URL |
|----------|-----|
| TRREB Market Watch | https://trreb.ca/index.php/market-news/market-watch |
| CMHC Housing Portal | https://www03.cmhc-schl.gc.ca/hmip-pimh/ |
### Geographic Boundaries
| Resource | URL |
|----------|-----|
| Toronto Neighbourhoods GeoJSON | https://github.com/jasonicarter/toronto-geojson |
| TRREB District Map (PDF) | https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf |
| Statistics Canada Census Tracts | https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index-eng.cfm |
| R `cmhc` package (CRAN) | https://cran.r-project.org/package=cmhc |
### Enrichment Data
| Resource | URL |
|----------|-----|
| Toronto Open Data Portal | https://open.toronto.ca/ |
| Neighbourhood Profiles (CKAN) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/neighbourhood-profiles |
| Neighbourhood Profiles 2021 (Direct Download) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx |
### Policy Events Research
| Resource | URL |
|----------|-----|
| Bank of Canada Interest Rates | https://www.bankofcanada.ca/rates/interest-rates/ |
| OSFI (Stress Test Rules) | https://www.osfi-bsif.gc.ca/ |
| Ontario Legislature (Bills) | https://www.ola.org/ |
### Reference Documentation
| Resource | URL |
|----------|-----|
| Statistics Canada 2021 Census Reference | https://www12.statcan.gc.ca/census-recensement/2021/ref/index-eng.cfm |
| City of Toronto Neighbourhood Profiles Overview | https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/neighbourhood-profiles/ |
---
## Related Documents
| Document | Relationship | Use For |
|----------|--------------|---------|
| `portfolio_project_plan_v5.md` | Parent document | Overall scope, phasing, tech stack, deployment, dbt architecture, data quality framework |
---
*Document Version: 5.1*
*Updated: January 2026*
*Project: Toronto Housing Price Dashboard — Portfolio Piece*