Upload files to "docs"
This commit is contained in:
809
docs/toronto_housing_dashboard_spec_v5.md
Normal file
809
docs/toronto_housing_dashboard_spec_v5.md
Normal file
@@ -0,0 +1,809 @@
|
||||
# Toronto Housing Price Dashboard
|
||||
## Portfolio Project — Data Specification & Architecture
|
||||
|
||||
**Version**: 5.1
|
||||
**Last Updated**: January 2026
|
||||
**Status**: Specification Complete
|
||||
|
||||
---
|
||||
|
||||
## Document Context
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Parent Document** | `portfolio_project_plan_v5.md` |
|
||||
| **Role** | Detailed specification for Toronto Housing Dashboard |
|
||||
| **Scope** | Data schemas, source URLs, geographic boundaries, V1/V2 decisions |
|
||||
|
||||
**Rule**: For overall project scope, phasing, tech stack, and deployment architecture, see `portfolio_project_plan_v5.md`. This document provides implementation-level detail for the Toronto Housing project specifically.
|
||||
|
||||
**Terminology Note**: This document uses **Stages 1–4** to describe Toronto Housing implementation steps. These are distinct from the **Phases 1–5** in `portfolio_project_plan_v5.md`, which describe the overall portfolio project lifecycle.
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
A dashboard analyzing housing price variations across Toronto neighbourhoods over time, with dual analysis tracks:
|
||||
|
||||
| Track | Data Domain | Primary Source | Geographic Unit |
|
||||
|-------|-------------|----------------|-----------------|
|
||||
| **Purchases** | Sales transactions | TRREB Monthly Reports | ~35 Districts |
|
||||
| **Rentals** | Rental market stats | CMHC Rental Market Survey | ~20 Zones |
|
||||
|
||||
**Core Visualization**: Interactive choropleth map of Toronto with toggle between rental/purchase analysis, time-series exploration by month/year.
|
||||
|
||||
**Enrichment Layer** (V1: overlay only): Neighbourhood-level demographic and socioeconomic context including population density, education attainment, and income. Crime data deferred to Phase 4 of the portfolio project (post-Energy project).
|
||||
|
||||
**Tech Stack & Deployment**: See `portfolio_project_plan_v5.md` → Tech Stack, Deployment Architecture
|
||||
|
||||
---
|
||||
|
||||
## Geographic Layers
|
||||
|
||||
### Layer Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ City of Toronto Official Neighbourhoods (158) │ ← Reference overlay + Enrichment data
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ CMHC Survey Zones (~20) — Census Tract aligned │ ← Rental data
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Boundary Files
|
||||
|
||||
| Layer | Zones | Format | Source | Status |
|
||||
|-------|-------|--------|--------|--------|
|
||||
| **City Neighbourhoods** | 158 | GeoJSON, Shapefile | [GitHub - jasonicarter/toronto-geojson](https://github.com/jasonicarter/toronto-geojson) | ✅ Ready to use |
|
||||
| **TRREB Districts** | ~35 | PDF only | [TRREB Toronto Map PDF](https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf) | ⚠ Requires manual digitization |
|
||||
| **CMHC Zones** | ~20 | R package | R `cmhc` package via `get_cmhc_geography()` | ✅ Available (see note) |
|
||||
|
||||
### Digitization Task: TRREB Districts
|
||||
|
||||
**Input**: TRREB Toronto PDF map
|
||||
**Output**: GeoJSON with district codes (W01-W10, C01-C15, E01-E11)
|
||||
**Tool**: QGIS
|
||||
|
||||
**Process**:
|
||||
1. Import PDF as raster layer in QGIS
|
||||
2. Create vector layer with polygon features
|
||||
3. Trace district boundaries
|
||||
4. Add attributes: `district_code`, `district_name`, `area_type` (West/Central/East)
|
||||
5. Export as GeoJSON (WGS84 / EPSG:4326)
|
||||
|
||||
### CMHC Zone Boundaries
|
||||
|
||||
**Source**: The R `cmhc` package provides CMHC survey geography via the `get_cmhc_geography()` function.
|
||||
|
||||
**Extraction Process**:
|
||||
```r
|
||||
# In R
|
||||
library(cmhc)
|
||||
library(sf)
|
||||
|
||||
# Get Toronto CMA zones
|
||||
toronto_zones <- get_cmhc_geography(
|
||||
geography_type = "ZONE",
|
||||
cma = "Toronto"
|
||||
)
|
||||
|
||||
# Export to GeoJSON for Python/PostGIS
|
||||
st_write(toronto_zones, "cmhc_zones.geojson", driver = "GeoJSON")
|
||||
```
|
||||
|
||||
**Output**: `data/toronto/raw/geo/cmhc_zones.geojson`
|
||||
|
||||
**Why R?**: CMHC zone boundaries are not published as standalone files. The `cmhc` R package is the only reliable programmatic source. One-time extraction, then use GeoJSON in Python stack.
|
||||
|
||||
### ⚠ Neighbourhood Boundary Change (140 → 158)
|
||||
|
||||
The City of Toronto updated from 140 to 158 social planning neighbourhoods in **April 2021**. This affects data alignment:
|
||||
|
||||
| Data Source | Pre-2021 | Post-2021 | Handling |
|
||||
|-------------|----------|-----------|----------|
|
||||
| Census (2016 and earlier) | 140 neighbourhoods | N/A | Use 140-model files |
|
||||
| Census (2021+) | N/A | 158 neighbourhoods | Use 158-model files |
|
||||
|
||||
**V1 Strategy**: Use 2021 Census on 158 boundaries only. Defer historical trend analysis to portfolio Phase 4.
|
||||
|
||||
---
|
||||
|
||||
## Data Source #1: TRREB Monthly Market Reports
|
||||
|
||||
### Source Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | Toronto Regional Real Estate Board |
|
||||
| **URL** | [TRREB Market Watch](https://trreb.ca/index.php/market-news/market-watch) |
|
||||
| **Format** | PDF (monthly reports) |
|
||||
| **Update Frequency** | Monthly |
|
||||
| **Historical Availability** | 2007–Present |
|
||||
| **Access** | Public (aggregate data in PDFs) |
|
||||
| **Extraction Method** | PDF parsing (`pdfplumber` or `camelot-py`) |
|
||||
|
||||
### Available Tables
|
||||
|
||||
#### Table: `trreb_monthly_summary`
|
||||
**Location in PDF**: Pages 3-4 (Summary by Area)
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `report_date` | DATE | First of month (YYYY-MM-01) |
|
||||
| `area_code` | VARCHAR(3) | District code (W01, C01, E01, etc.) |
|
||||
| `area_name` | VARCHAR(100) | District name |
|
||||
| `area_type` | VARCHAR(10) | West / Central / East / North |
|
||||
| `sales` | INTEGER | Number of transactions |
|
||||
| `dollar_volume` | DECIMAL | Total sales volume ($) |
|
||||
| `avg_price` | DECIMAL | Average sale price ($) |
|
||||
| `median_price` | DECIMAL | Median sale price ($) |
|
||||
| `new_listings` | INTEGER | New listings count |
|
||||
| `active_listings` | INTEGER | Active listings at month end |
|
||||
| `avg_sp_lp` | DECIMAL | Avg sale price / list price ratio (%) |
|
||||
| `avg_dom` | INTEGER | Average days on market |
|
||||
|
||||
### Dimensions
|
||||
|
||||
| Dimension | Granularity | Values |
|
||||
|-----------|-------------|--------|
|
||||
| **Time** | Monthly | 2007-01 to present |
|
||||
| **Geography** | District | ~35 TRREB districts |
|
||||
| **Property Type** | Aggregate | All residential (no breakdown in summary) |
|
||||
|
||||
### Metrics Available
|
||||
|
||||
| Metric | Aggregation | Use Case |
|
||||
|--------|-------------|----------|
|
||||
| `avg_price` | Pre-calculated monthly avg | Primary price indicator |
|
||||
| `median_price` | Pre-calculated monthly median | Robust price indicator |
|
||||
| `sales` | Count | Market activity volume |
|
||||
| `avg_dom` | Average | Market velocity |
|
||||
| `avg_sp_lp` | Ratio | Buyer/seller market indicator |
|
||||
| `new_listings` | Count | Supply indicator |
|
||||
| `active_listings` | Snapshot | Inventory level |
|
||||
|
||||
### ⚠ Limitations
|
||||
|
||||
- No transaction-level data (aggregates only)
|
||||
- Property type breakdown requires parsing additional tables
|
||||
- PDF structure may vary slightly across years
|
||||
- District boundaries haven't changed since 2011
|
||||
|
||||
---
|
||||
|
||||
## Data Source #2: CMHC Rental Market Survey
|
||||
|
||||
### Source Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | Canada Mortgage and Housing Corporation |
|
||||
| **URL** | [CMHC Housing Market Information Portal](https://www03.cmhc-schl.gc.ca/hmip-pimh/) |
|
||||
| **Format** | CSV export, API |
|
||||
| **Update Frequency** | Annual (October survey) |
|
||||
| **Historical Availability** | 1990–Present |
|
||||
| **Access** | Public, free registration for bulk downloads |
|
||||
| **Geographic Levels** | CMA → Zone → Neighbourhood → Census Tract |
|
||||
|
||||
### Available Tables
|
||||
|
||||
#### Table: `cmhc_rental_summary`
|
||||
**Portal Path**: Toronto → Primary Rental Market → Summary Statistics
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `survey_year` | INTEGER | Survey year (October) |
|
||||
| `zone_code` | VARCHAR(10) | CMHC zone identifier |
|
||||
| `zone_name` | VARCHAR(100) | Zone name |
|
||||
| `bedroom_type` | VARCHAR(20) | Bachelor / 1-Bed / 2-Bed / 3-Bed+ / Total |
|
||||
| `universe` | INTEGER | Total rental units in zone |
|
||||
| `vacancy_rate` | DECIMAL | Vacancy rate (%) |
|
||||
| `vacancy_rate_reliability` | VARCHAR(1) | Reliability code (a/b/c/d) |
|
||||
| `availability_rate` | DECIMAL | Availability rate (%) |
|
||||
| `average_rent` | DECIMAL | Average monthly rent ($) |
|
||||
| `average_rent_reliability` | VARCHAR(1) | Reliability code |
|
||||
| `median_rent` | DECIMAL | Median monthly rent ($) |
|
||||
| `rent_change_pct` | DECIMAL | YoY rent change (%) |
|
||||
| `turnover_rate` | DECIMAL | Unit turnover rate (%) |
|
||||
|
||||
### Dimensions
|
||||
|
||||
| Dimension | Granularity | Values |
|
||||
|-----------|-------------|--------|
|
||||
| **Time** | Annual | 1990 to present (October snapshot) |
|
||||
| **Geography** | Zone | ~20 CMHC zones in Toronto CMA |
|
||||
| **Bedroom Type** | Category | Bachelor, 1-Bed, 2-Bed, 3-Bed+, Total |
|
||||
| **Structure Type** | Category | Row, Apartment (available in detailed tables) |
|
||||
|
||||
### Metrics Available
|
||||
|
||||
| Metric | Aggregation | Use Case |
|
||||
|--------|-------------|----------|
|
||||
| `average_rent` | Pre-calculated avg | Primary rent indicator |
|
||||
| `median_rent` | Pre-calculated median | Robust rent indicator |
|
||||
| `vacancy_rate` | Percentage | Market tightness |
|
||||
| `availability_rate` | Percentage | Supply accessibility |
|
||||
| `turnover_rate` | Percentage | Tenant mobility |
|
||||
| `rent_change_pct` | YoY % | Rent growth tracking |
|
||||
| `universe` | Count | Market size |
|
||||
|
||||
### Reliability Codes
|
||||
|
||||
| Code | Meaning | Coefficient of Variation |
|
||||
|------|---------|-------------------------|
|
||||
| `a` | Excellent | CV ≤ 2.5% |
|
||||
| `b` | Good | 2.5% < CV ≤ 5% |
|
||||
| `c` | Fair | 5% < CV ≤ 10% |
|
||||
| `d` | Poor (use with caution) | CV > 10% |
|
||||
| `**` | Data suppressed | Sample too small |
|
||||
|
||||
### ⚠ Limitations
|
||||
|
||||
- Annual only (no monthly granularity)
|
||||
- October snapshot (point-in-time)
|
||||
- Zones are larger than TRREB districts
|
||||
- Purpose-built rental only (excludes condo rentals in base survey)
|
||||
|
||||
---
|
||||
|
||||
## Data Source #3: City of Toronto Open Data
|
||||
|
||||
### Source Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | City of Toronto |
|
||||
| **URL** | [Toronto Open Data Portal](https://open.toronto.ca/) |
|
||||
| **Format** | GeoJSON, Shapefile, CSV |
|
||||
| **Use Case** | Reference layer, demographic enrichment |
|
||||
|
||||
### Relevant Datasets
|
||||
|
||||
#### Dataset: `neighbourhoods`
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `area_id` | INTEGER | Neighbourhood ID (1-158) |
|
||||
| `area_name` | VARCHAR(100) | Official neighbourhood name |
|
||||
| `geometry` | POLYGON | Boundary geometry |
|
||||
|
||||
#### Dataset: `neighbourhood_profiles` (Census-linked)
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `neighbourhood_id` | INTEGER | Links to neighbourhoods |
|
||||
| `population` | INTEGER | Total population |
|
||||
| `avg_household_income` | DECIMAL | Average household income |
|
||||
| `dwelling_count` | INTEGER | Total dwellings |
|
||||
| `owner_pct` | DECIMAL | % owner-occupied |
|
||||
| `renter_pct` | DECIMAL | % renter-occupied |
|
||||
|
||||
### Enrichment Potential
|
||||
|
||||
Can overlay demographic context on housing data:
|
||||
- Income brackets by neighbourhood
|
||||
- Ownership vs rental ratios
|
||||
- Population density
|
||||
- Dwelling type distribution
|
||||
|
||||
---
|
||||
|
||||
## Data Source #4: Enrichment Data (Density, Education)
|
||||
|
||||
### Purpose
|
||||
|
||||
Provide socioeconomic context to housing price analysis. Enables questions like:
|
||||
- Do neighbourhoods with higher education attainment have higher prices?
|
||||
- How does population density correlate with price per square foot?
|
||||
|
||||
### Geographic Alignment Reality
|
||||
|
||||
**Critical constraint**: Enrichment data is available at the **158-neighbourhood** level, while core housing data sits at **TRREB districts (~35)** and **CMHC zones (~20)**. These do not align cleanly.
|
||||
|
||||
```
|
||||
158 Neighbourhoods (fine) → Enrichment data lives here
|
||||
(no clean crosswalk)
|
||||
~35 TRREB Districts (coarse) → Purchase data lives here
|
||||
~20 CMHC Zones (coarse) → Rental data lives here
|
||||
```
|
||||
|
||||
### Available Enrichment Datasets
|
||||
|
||||
#### Dataset: Neighbourhood Profiles (Census)
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | City of Toronto (via Statistics Canada Census) |
|
||||
| **URL** | [Toronto Open Data - Neighbourhood Profiles](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
|
||||
| **Format** | CSV, JSON, XML, XLSX |
|
||||
| **Update Frequency** | Every 5 years (Census cycle) |
|
||||
| **Available Years** | 2001, 2006, 2011, 2016, 2021 |
|
||||
| **Geographic Unit** | 158 neighbourhoods (140 pre-2021) |
|
||||
|
||||
**Key Variables**:
|
||||
|
||||
| Variable | Description | Use Case |
|
||||
|----------|-------------|----------|
|
||||
| `population` | Total population | Density calculation |
|
||||
| `land_area_sqkm` | Area in square kilometers | Density calculation |
|
||||
| `pop_density_per_sqkm` | Population per km | Density metric |
|
||||
| `pct_bachelors_or_higher` | % age 25-64 with bachelor's+ | Education proxy |
|
||||
| `median_household_income` | Median total household income | Income metric |
|
||||
| `avg_household_income` | Average total household income | Income metric |
|
||||
| `pct_owner_occupied` | % owner-occupied dwellings | Tenure split |
|
||||
| `pct_renter_occupied` | % renter-occupied dwellings | Tenure split |
|
||||
|
||||
**Download URL (2021, 158 neighbourhoods)**:
|
||||
```
|
||||
https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx
|
||||
```
|
||||
|
||||
### Crime Data — Deferred to Portfolio Phase 4
|
||||
|
||||
Crime data (TPS Neighbourhood Crime Rates) is **not included in V1 scope**. It will be added in portfolio Phase 4 after the Energy Pricing project is complete.
|
||||
|
||||
**Rationale**:
|
||||
- Crime data is socially/politically sensitive and requires careful methodology documentation
|
||||
- V1 focuses on core housing metrics and policy events
|
||||
- Deferral reduces scope creep risk
|
||||
|
||||
**Future Reference** (Portfolio Phase 4):
|
||||
- Source: [TPS Public Safety Data Portal](https://data.torontopolice.on.ca/)
|
||||
- Dataset: Neighbourhood Crime Rates (Major Crime Indicators)
|
||||
- Geographic Unit: 158 neighbourhoods
|
||||
|
||||
### V1 Enrichment Data Summary
|
||||
|
||||
| Measure | Source | Geography | Frequency | Format | Status |
|
||||
|---------|--------|-----------|-----------|--------|--------|
|
||||
| **Population Density** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
|
||||
| **Education Attainment** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
|
||||
| **Median Income** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
|
||||
| **Crime Rates (MCI)** | TPS Data Portal | 158 neighbourhoods | Annual | GeoJSON/CSV | Deferred to Portfolio Phase 4 |
|
||||
|
||||
---
|
||||
|
||||
## Data Source #5: Policy Events
|
||||
|
||||
### Purpose
|
||||
|
||||
Provide temporal context for housing price movements. Display as annotation markers on time series charts. **No causation claims** — correlation/context only.
|
||||
|
||||
### Event Schema
|
||||
|
||||
#### Table: `dim_policy_event`
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `event_id` | INTEGER (PK) | Auto-increment primary key |
|
||||
| `event_date` | DATE | Date event was announced/occurred |
|
||||
| `effective_date` | DATE | Date policy took effect (if different) |
|
||||
| `level` | VARCHAR(20) | `federal` / `provincial` / `municipal` |
|
||||
| `category` | VARCHAR(20) | `monetary` / `tax` / `regulatory` / `supply` / `economic` |
|
||||
| `title` | VARCHAR(200) | Short event title for display |
|
||||
| `description` | TEXT | Longer description for tooltip |
|
||||
| `expected_direction` | VARCHAR(10) | `bearish` / `bullish` / `neutral` |
|
||||
| `source_url` | VARCHAR(500) | Link to official announcement/documentation |
|
||||
| `confidence` | VARCHAR(10) | `high` / `medium` / `low` |
|
||||
| `created_at` | TIMESTAMP | Record creation timestamp |
|
||||
|
||||
### Event Tiers
|
||||
|
||||
| Tier | Level | Category Examples | Inclusion Criteria |
|
||||
|------|-------|-------------------|-------------------|
|
||||
| **1** | Federal | BoC rate decisions, OSFI stress tests | Always include; objective, documented |
|
||||
| **1** | Provincial | Fair Housing Plan, foreign buyer tax, rent control | Always include; legislative record |
|
||||
| **2** | Municipal | Zoning reforms, development charges | Include if material impact expected |
|
||||
| **2** | Economic | COVID measures, major employer closures | Include if Toronto-specific impact |
|
||||
| **3** | Market | Major project announcements | Strict criteria; must be verifiable |
|
||||
|
||||
### Expected Direction Values
|
||||
|
||||
| Value | Meaning | Example |
|
||||
|-------|---------|---------|
|
||||
| `bullish` | Expected to increase prices | Rate cut, supply restriction |
|
||||
| `bearish` | Expected to decrease prices | Rate hike, foreign buyer tax |
|
||||
| `neutral` | Uncertain or mixed impact | Regulatory clarification |
|
||||
|
||||
### ⚠ Caveats
|
||||
|
||||
- **No causation claims**: Events are context, not explanation
|
||||
- **Lag effects**: Policy impact may not be immediate
|
||||
- **Confounding factors**: Multiple simultaneous influences
|
||||
- **Display only**: No statistical analysis in V1
|
||||
|
||||
### Sample Events (Tier 1)
|
||||
|
||||
| Date | Level | Category | Title | Direction |
|
||||
|------|-------|----------|-------|-----------|
|
||||
| 2017-04-20 | provincial | tax | Ontario Fair Housing Plan | bearish |
|
||||
| 2018-01-01 | federal | regulatory | OSFI B-20 Stress Test | bearish |
|
||||
| 2020-03-27 | federal | monetary | BoC Emergency Rate Cut (0.25%) | bullish |
|
||||
| 2022-03-02 | federal | monetary | BoC Rate Hike Cycle Begins | bearish |
|
||||
| 2023-06-01 | federal | tax | Federal 2-Year Foreign Buyer Ban | bearish |
|
||||
|
||||
---
|
||||
|
||||
## Data Integration Strategy
|
||||
|
||||
### Temporal Alignment
|
||||
|
||||
| Source | Native Frequency | Alignment Strategy |
|
||||
|--------|------------------|---------------------|
|
||||
| TRREB | Monthly | Use as-is |
|
||||
| CMHC | Annual (October) | Spread to monthly OR display annual overlay |
|
||||
| Census/Enrichment | 5-year | Static snapshot; display as reference |
|
||||
| Policy Events | Event-based | Display as vertical markers on time axis |
|
||||
|
||||
**Recommendation**: Keep separate time axes. TRREB monthly for purchases, CMHC annual for rentals. Don't force artificial monthly rental data.
|
||||
|
||||
### Geographic Alignment
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ VISUALIZATION APPROACH │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Purchase Mode Rental Mode │
|
||||
│ ───────────────── ────────────── │
|
||||
│ Map: TRREB Districts Map: CMHC Zones │
|
||||
│ Time: Monthly slider Time: Annual selector │
|
||||
│ Metrics: Price, Sales Metrics: Rent, Vacancy │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────┐ │
|
||||
│ │ City Neighbourhoods Overlay │ │
|
||||
│ │ (158 boundaries as reference layer) │ │
|
||||
│ │ + Enrichment data (density, education, income) │ │
|
||||
│ ──────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Enrichment Integration Strategy (Phased)
|
||||
|
||||
#### V1: Reference Overlay (Current Scope)
|
||||
|
||||
**Approach**: Display neighbourhood enrichment as a separate toggle-able layer. No joins to housing data.
|
||||
|
||||
**UX**:
|
||||
- User hovers over TRREB district → tooltip shows "This district contains neighbourhoods: Annex, Casa Loma, Yorkville..."
|
||||
- User toggles "Show Enrichment" → choropleth switches to neighbourhood-level density/education/income
|
||||
- Enrichment and housing metrics displayed side-by-side, not merged
|
||||
|
||||
**Pros**:
|
||||
- No imputation or dodgy aggregations
|
||||
- Honest about geographic mismatch
|
||||
- Ships faster
|
||||
|
||||
**Cons**:
|
||||
- Can't do correlation analysis (price vs. enrichment) directly in dashboard
|
||||
|
||||
**Implementation**:
|
||||
- `dim_neighbourhood` as standalone dimension (no FK to fact tables)
|
||||
- Spatial lookup on hover (point-in-polygon)
|
||||
|
||||
#### V2/Portfolio Phase 4: Area-Weighted Aggregation (Future Scope)
|
||||
|
||||
**Approach**: Pre-compute area-weighted averages of neighbourhood metrics for each TRREB district and CMHC zone.
|
||||
|
||||
**Process**:
|
||||
1. Spatial join: intersect neighbourhood polygons with TRREB/CMHC polygons
|
||||
2. Compute overlap area for each neighbourhood-district pair
|
||||
3. Weight neighbourhood metrics by overlap area proportion
|
||||
4. User selects aggregation method in UI
|
||||
|
||||
**Aggregation Methods to Expose**:
|
||||
|
||||
| Method | Description | Best For |
|
||||
|--------|-------------|----------|
|
||||
| **Area-weighted mean** | Weight by % overlap area | Continuous metrics (density) |
|
||||
| **Population-weighted mean** | Weight by population in overlap | Per-capita metrics (education) |
|
||||
| **Majority assignment** | Assign neighbourhood to district with >50% overlap | Categorical data |
|
||||
| **Max overlap** | Assign to single district with largest overlap | 1:1 mapping needs |
|
||||
|
||||
**Default**: Population-weighted (more defensible for per-capita metrics). Hide selector behind "Advanced" toggle.
|
||||
|
||||
### V1 Future-Proofing (Do Now)
|
||||
|
||||
| Action | Why |
|
||||
|--------|-----|
|
||||
| Store neighbourhood boundaries in same CRS as TRREB/CMHC (WGS84) | Avoids reprojection headaches |
|
||||
| Keep `dim_neighbourhood` normalized (not denormalized into district tables) | Clean separation for V2 join |
|
||||
| Document Census year for each metric | Ready for 2026 Census |
|
||||
| Include `census_year` column in dim_neighbourhood | Enables SCD tracking |
|
||||
|
||||
### V1 Defer (Don't Do Yet)
|
||||
|
||||
| Action | Why Not |
|
||||
|--------|---------|
|
||||
| Pre-compute area-weighted crosswalk | Don't need for V1 |
|
||||
| Build aggregation method selector UI | No backend to support it |
|
||||
| Crime data integration | Deferred to Portfolio Phase 4 |
|
||||
| Historical neighbourhood boundary reconciliation (140→158) | Use 2021+ data only for V1 |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Data Model
|
||||
|
||||
### Star Schema
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ dim_time │
|
||||
├──────────────────┤
|
||||
│ date_key (PK) │
|
||||
│ year │
|
||||
│ month │
|
||||
│ quarter │
|
||||
│ month_name │
|
||||
───────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ │ │
|
||||
│
|
||||
┌──────────────────┐ │ ┌──────────────────┐
|
||||
│ dim_trreb_district│ │ │ dim_cmhc_zone │
|
||||
├──────────────────┤ │ ├──────────────────┤
|
||||
│ district_key (PK)│ │ │ zone_key (PK) │
|
||||
│ district_code │ │ │ zone_code │
|
||||
│ district_name │ │ │ zone_name │
|
||||
│ area_type │ │ │ geometry │
|
||||
│ geometry │
|
||||
───────────────────────┘ │ │
|
||||
│ │ │
|
||||
│
|
||||
┌──────────────────┐ │ ┌──────────────────┐
|
||||
│ fact_purchases │ │ │ fact_rentals │
|
||||
├──────────────────┤ │ ├──────────────────┤
|
||||
│ date_key (FK) │ │ │ date_key (FK) │
|
||||
│ district_key (FK)│ │ │ zone_key (FK) │
|
||||
│ sales_count │ │ │ bedroom_type │
|
||||
│ avg_price │ │ │ avg_rent │
|
||||
│ median_price │ │ │ median_rent │
|
||||
│ new_listings │ │ │ vacancy_rate │
|
||||
│ active_listings │ │ │ universe │
|
||||
│ avg_dom │ │ │ turnover_rate │
|
||||
│ avg_sp_lp │ │ │ reliability_code │
|
||||
─────────────────────┘ │ ─────────────────────┘
|
||||
│
|
||||
|
||||
┌───────────────────────────┐
|
||||
│ dim_neighbourhood │
|
||||
├───────────────────────────┤
|
||||
│ neighbourhood_id (PK) │
|
||||
│ name │
|
||||
│ geometry │
|
||||
│ population │
|
||||
│ land_area_sqkm │
|
||||
│ pop_density_per_sqkm │
|
||||
│ pct_bachelors_or_higher │
|
||||
│ median_household_income │
|
||||
│ pct_owner_occupied │
|
||||
│ pct_renter_occupied │
|
||||
│ census_year │ ← For SCD tracking
|
||||
──────────────────────────────┘
|
||||
|
||||
┌───────────────────────────┐
|
||||
│ dim_policy_event │
|
||||
├───────────────────────────┤
|
||||
│ event_id (PK) │
|
||||
│ event_date │
|
||||
│ effective_date │
|
||||
│ level │ ← federal/provincial/municipal
|
||||
│ category │ ← monetary/tax/regulatory/supply/economic
|
||||
│ title │
|
||||
│ description │
|
||||
│ expected_direction │ ← bearish/bullish/neutral
|
||||
│ source_url │
|
||||
│ confidence │ ← high/medium/low
|
||||
│ created_at │
|
||||
──────────────────────────────┘
|
||||
|
||||
┌───────────────────────────┐
|
||||
│ bridge_district_neighbourhood │ ← Portfolio Phase 4 ONLY
|
||||
├───────────────────────────┤
|
||||
│ district_key (FK) │
|
||||
│ neighbourhood_id (FK) │
|
||||
│ area_overlap_pct │
|
||||
│ population_overlap │ ← For pop-weighted agg
|
||||
──────────────────────────────┘
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- `dim_neighbourhood` has no FK relationship to fact tables in V1
|
||||
- `dim_policy_event` is standalone (no FK to facts); used for time-series annotation
|
||||
- `bridge_district_neighbourhood` is Portfolio Phase 4 scope only
|
||||
- Similar bridge table needed for CMHC zones in Portfolio Phase 4
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
> **Note**: Toronto Housing data logic lives in `portfolio_app/toronto/`. See `portfolio_project_plan_v5.md` for full project structure.
|
||||
|
||||
### Data Directory Structure
|
||||
|
||||
```
|
||||
data/
|
||||
└── toronto/
|
||||
├── raw/
|
||||
│ ├── trreb/
|
||||
│ │ └── market_watch_YYYY_MM.pdf
|
||||
│ ├── cmhc/
|
||||
│ │ └── rental_survey_YYYY.csv
|
||||
│ ├── enrichment/
|
||||
│ │ └── neighbourhood_profiles_2021.xlsx
|
||||
│ └── geo/
|
||||
│ ├── toronto_neighbourhoods.geojson
|
||||
│ ├── trreb_districts.geojson ← (to be created via QGIS)
|
||||
│ └── cmhc_zones.geojson ← (from R cmhc package)
|
||||
│
|
||||
├── processed/ ← gitignored
|
||||
│ ├── fact_purchases.parquet
|
||||
│ ├── fact_rentals.parquet
|
||||
│ ├── dim_time.parquet
|
||||
│ ├── dim_trreb_district.parquet
|
||||
│ ├── dim_cmhc_zone.parquet
|
||||
│ ├── dim_neighbourhood.parquet
|
||||
│ └── dim_policy_event.parquet
|
||||
│
|
||||
└── reference/
|
||||
├── policy_events.csv ← Curated event list
|
||||
└── neighbourhood_boundary_changelog.md ← 140→158 notes
|
||||
```
|
||||
|
||||
### Code Module Structure
|
||||
|
||||
```
|
||||
portfolio_app/toronto/
|
||||
├── __init__.py
|
||||
├── parsers/
|
||||
│ ├── __init__.py
|
||||
│ ├── trreb.py # PDF extraction
|
||||
│ └── cmhc.py # CSV processing
|
||||
├── loaders/
|
||||
│ ├── __init__.py
|
||||
│ └── database.py # DB operations
|
||||
├── schemas/ # Pydantic models
|
||||
│ ├── __init__.py
|
||||
│ ├── trreb.py
|
||||
│ ├── cmhc.py
|
||||
│ ├── enrichment.py
|
||||
│ └── policy_event.py
|
||||
├── models/ # SQLAlchemy ORM
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py # DeclarativeBase, engine
|
||||
│ ├── dimensions.py # dim_time, dim_trreb_district, dim_policy_event, etc.
|
||||
│ └── facts.py # fact_purchases, fact_rentals
|
||||
└── transforms/
|
||||
└── __init__.py
|
||||
```
|
||||
|
||||
### Notebooks
|
||||
|
||||
```
|
||||
notebooks/
|
||||
├── 01_trreb_pdf_extraction.ipynb
|
||||
├── 02_cmhc_data_prep.ipynb
|
||||
├── 03_geo_layer_prep.ipynb
|
||||
├── 04_enrichment_data_prep.ipynb
|
||||
├── 05_policy_events_curation.ipynb
|
||||
└── 06_spatial_crosswalk.ipynb ← Portfolio Phase 4 only
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Implementation Checklist
|
||||
|
||||
> **Note**: These are **Stages** within the Toronto Housing project (Portfolio Phase 1). They are distinct from the overall portfolio **Phases** defined in `portfolio_project_plan_v5.md`.
|
||||
|
||||
### Stage 1: Data Acquisition
|
||||
- [ ] Download TRREB monthly PDFs (2020-present as MVP)
|
||||
- [ ] Register for CMHC portal and export Toronto rental data
|
||||
- [ ] Extract CMHC zone boundaries via R `cmhc` package
|
||||
- [ ] Download City of Toronto neighbourhood GeoJSON (158 boundaries)
|
||||
- [ ] Digitize TRREB district boundaries in QGIS
|
||||
- [ ] Download Neighbourhood Profiles (2021 Census, 158-model)
|
||||
|
||||
### Stage 2: Data Processing
|
||||
- [ ] Build TRREB PDF parser (`portfolio_app/toronto/parsers/trreb.py`)
|
||||
- [ ] Build Pydantic schemas (`portfolio_app/toronto/schemas/`)
|
||||
- [ ] Build SQLAlchemy models (`portfolio_app/toronto/models/`)
|
||||
- [ ] Extract and validate TRREB monthly summaries
|
||||
- [ ] Clean and structure CMHC rental data
|
||||
- [ ] Process Neighbourhood Profiles into `dim_neighbourhood`
|
||||
- [ ] Curate and load policy events into `dim_policy_event`
|
||||
- [ ] Create dimension tables
|
||||
- [ ] Build fact tables
|
||||
- [ ] Validate all geospatial layers use same CRS (WGS84/EPSG:4326)
|
||||
|
||||
### Stage 3: Visualization (V1)
|
||||
- [ ] Create dashboard page (`portfolio_app/pages/toronto/dashboard.py`)
|
||||
- [ ] Build choropleth figures (`portfolio_app/figures/choropleth.py`)
|
||||
- [ ] Build time series figures (`portfolio_app/figures/time_series.py`)
|
||||
- [ ] Design dashboard layout (purchase/rental toggle)
|
||||
- [ ] Implement choropleth map with layer switching
|
||||
- [ ] Add time slider/selector
|
||||
- [ ] Build neighbourhood overlay (toggle-able)
|
||||
- [ ] Add enrichment layer toggle (density/education/income choropleth)
|
||||
- [ ] Add policy event markers on time series
|
||||
- [ ] Add tooltips with cross-reference info ("This district contains...")
|
||||
- [ ] Add tooltips showing enrichment metrics on hover
|
||||
|
||||
### Stage 4: Polish (V1)
|
||||
- [ ] Add data source citations
|
||||
- [ ] Document methodology (especially geographic limitations)
|
||||
- [ ] Write docs (`docs/methodology.md`, `docs/data_sources.md`)
|
||||
- [ ] Deploy to portfolio
|
||||
|
||||
### Future Enhancements (Portfolio Phase 4 — Post-Energy Project)
|
||||
- [ ] Add crime data to dim_neighbourhood
|
||||
- [ ] Build spatial crosswalk (neighbourhood ↔ district/zone intersections)
|
||||
- [ ] Compute area-weighted and population-weighted aggregations
|
||||
- [ ] Add aggregation method selector to UI
|
||||
- [ ] Enable correlation analysis (price vs. enrichment metrics)
|
||||
- [ ] Add historical neighbourhood boundary support (140→158)
|
||||
|
||||
**Deployment & dbt Architecture**: See `portfolio_project_plan_v5.md` for:
|
||||
- dbt layer structure and testing strategy
|
||||
- Deployment architecture
|
||||
- Data quality framework
|
||||
|
||||
---
|
||||
|
||||
## References & Links
|
||||
|
||||
### Core Housing Data
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| TRREB Market Watch | https://trreb.ca/index.php/market-news/market-watch |
|
||||
| CMHC Housing Portal | https://www03.cmhc-schl.gc.ca/hmip-pimh/ |
|
||||
|
||||
### Geographic Boundaries
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Toronto Neighbourhoods GeoJSON | https://github.com/jasonicarter/toronto-geojson |
|
||||
| TRREB District Map (PDF) | https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf |
|
||||
| Statistics Canada Census Tracts | https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index-eng.cfm |
|
||||
| R `cmhc` package (CRAN) | https://cran.r-project.org/package=cmhc |
|
||||
|
||||
### Enrichment Data
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Toronto Open Data Portal | https://open.toronto.ca/ |
|
||||
| Neighbourhood Profiles (CKAN) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/neighbourhood-profiles |
|
||||
| Neighbourhood Profiles 2021 (Direct Download) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx |
|
||||
|
||||
### Policy Events Research
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Bank of Canada Interest Rates | https://www.bankofcanada.ca/rates/interest-rates/ |
|
||||
| OSFI (Stress Test Rules) | https://www.osfi-bsif.gc.ca/ |
|
||||
| Ontario Legislature (Bills) | https://www.ola.org/ |
|
||||
|
||||
### Reference Documentation
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Statistics Canada 2021 Census Reference | https://www12.statcan.gc.ca/census-recensement/2021/ref/index-eng.cfm |
|
||||
| City of Toronto Neighbourhood Profiles Overview | https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/neighbourhood-profiles/ |
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
| Document | Relationship | Use For |
|
||||
|----------|--------------|---------|
|
||||
| `portfolio_project_plan_v5.md` | Parent document | Overall scope, phasing, tech stack, deployment, dbt architecture, data quality framework |
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 5.1*
|
||||
*Updated: January 2026*
|
||||
*Project: Toronto Housing Price Dashboard — Portfolio Piece*
|
||||
Reference in New Issue
Block a user