# Toronto Housing Price Dashboard ## Portfolio Project — Data Specification & Architecture **Version**: 5.1 **Last Updated**: January 2026 **Status**: Specification Complete --- ## Document Context | Attribute | Value | |-----------|-------| | **Parent Document** | `portfolio_project_plan_v5.md` | | **Role** | Detailed specification for Toronto Housing Dashboard | | **Scope** | Data schemas, source URLs, geographic boundaries, V1/V2 decisions | **Rule**: For overall project scope, phasing, tech stack, and deployment architecture, see `portfolio_project_plan_v5.md`. This document provides implementation-level detail for the Toronto Housing project specifically. **Terminology Note**: This document uses **Stages 1–4** to describe Toronto Housing implementation steps. These are distinct from the **Phases 1–5** in `portfolio_project_plan_v5.md`, which describe the overall portfolio project lifecycle. --- ## Project Overview A dashboard analyzing housing price variations across Toronto neighbourhoods over time, with dual analysis tracks: | Track | Data Domain | Primary Source | Geographic Unit | |-------|-------------|----------------|-----------------| | **Purchases** | Sales transactions | TRREB Monthly Reports | ~35 Districts | | **Rentals** | Rental market stats | CMHC Rental Market Survey | ~20 Zones | **Core Visualization**: Interactive choropleth map of Toronto with toggle between rental/purchase analysis, time-series exploration by month/year. **Enrichment Layer** (V1: overlay only): Neighbourhood-level demographic and socioeconomic context including population density, education attainment, and income. Crime data deferred to Phase 4 of the portfolio project (post-Energy project). **Tech Stack & Deployment**: See `portfolio_project_plan_v5.md` → Tech Stack, Deployment Architecture --- ## Geographic Layers ### Layer Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ City of Toronto Official Neighbourhoods (158) │ ← Reference overlay + Enrichment data ├─────────────────────────────────────────────────────────────────┤ │ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data ├─────────────────────────────────────────────────────────────────┤ │ CMHC Survey Zones (~20) — Census Tract aligned │ ← Rental data └─────────────────────────────────────────────────────────────────┘ ``` ### Boundary Files | Layer | Zones | Format | Source | Status | |-------|-------|--------|--------|--------| | **City Neighbourhoods** | 158 | GeoJSON, Shapefile | [GitHub - jasonicarter/toronto-geojson](https://github.com/jasonicarter/toronto-geojson) | ✅ Ready to use | | **TRREB Districts** | ~35 | PDF only | [TRREB Toronto Map PDF](https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf) | ⚠ Requires manual digitization | | **CMHC Zones** | ~20 | R package | R `cmhc` package via `get_cmhc_geography()` | ✅ Available (see note) | ### Digitization Task: TRREB Districts **Input**: TRREB Toronto PDF map **Output**: GeoJSON with district codes (W01-W10, C01-C15, E01-E11) **Tool**: QGIS **Process**: 1. Import PDF as raster layer in QGIS 2. Create vector layer with polygon features 3. Trace district boundaries 4. Add attributes: `district_code`, `district_name`, `area_type` (West/Central/East) 5. Export as GeoJSON (WGS84 / EPSG:4326) ### CMHC Zone Boundaries **Source**: The R `cmhc` package provides CMHC survey geography via the `get_cmhc_geography()` function. **Extraction Process**: ```r # In R library(cmhc) library(sf) # Get Toronto CMA zones toronto_zones <- get_cmhc_geography( geography_type = "ZONE", cma = "Toronto" ) # Export to GeoJSON for Python/PostGIS st_write(toronto_zones, "cmhc_zones.geojson", driver = "GeoJSON") ``` **Output**: `data/toronto/raw/geo/cmhc_zones.geojson` **Why R?**: CMHC zone boundaries are not published as standalone files. The `cmhc` R package is the only reliable programmatic source. One-time extraction, then use GeoJSON in Python stack. ### ⚠ Neighbourhood Boundary Change (140 → 158) The City of Toronto updated from 140 to 158 social planning neighbourhoods in **April 2021**. This affects data alignment: | Data Source | Pre-2021 | Post-2021 | Handling | |-------------|----------|-----------|----------| | Census (2016 and earlier) | 140 neighbourhoods | N/A | Use 140-model files | | Census (2021+) | N/A | 158 neighbourhoods | Use 158-model files | **V1 Strategy**: Use 2021 Census on 158 boundaries only. Defer historical trend analysis to portfolio Phase 4. --- ## Data Source #1: TRREB Monthly Market Reports ### Source Details | Attribute | Value | |-----------|-------| | **Provider** | Toronto Regional Real Estate Board | | **URL** | [TRREB Market Watch](https://trreb.ca/index.php/market-news/market-watch) | | **Format** | PDF (monthly reports) | | **Update Frequency** | Monthly | | **Historical Availability** | 2007–Present | | **Access** | Public (aggregate data in PDFs) | | **Extraction Method** | PDF parsing (`pdfplumber` or `camelot-py`) | ### Available Tables #### Table: `trreb_monthly_summary` **Location in PDF**: Pages 3-4 (Summary by Area) | Column | Data Type | Description | |--------|-----------|-------------| | `report_date` | DATE | First of month (YYYY-MM-01) | | `area_code` | VARCHAR(3) | District code (W01, C01, E01, etc.) | | `area_name` | VARCHAR(100) | District name | | `area_type` | VARCHAR(10) | West / Central / East / North | | `sales` | INTEGER | Number of transactions | | `dollar_volume` | DECIMAL | Total sales volume ($) | | `avg_price` | DECIMAL | Average sale price ($) | | `median_price` | DECIMAL | Median sale price ($) | | `new_listings` | INTEGER | New listings count | | `active_listings` | INTEGER | Active listings at month end | | `avg_sp_lp` | DECIMAL | Avg sale price / list price ratio (%) | | `avg_dom` | INTEGER | Average days on market | ### Dimensions | Dimension | Granularity | Values | |-----------|-------------|--------| | **Time** | Monthly | 2007-01 to present | | **Geography** | District | ~35 TRREB districts | | **Property Type** | Aggregate | All residential (no breakdown in summary) | ### Metrics Available | Metric | Aggregation | Use Case | |--------|-------------|----------| | `avg_price` | Pre-calculated monthly avg | Primary price indicator | | `median_price` | Pre-calculated monthly median | Robust price indicator | | `sales` | Count | Market activity volume | | `avg_dom` | Average | Market velocity | | `avg_sp_lp` | Ratio | Buyer/seller market indicator | | `new_listings` | Count | Supply indicator | | `active_listings` | Snapshot | Inventory level | ### ⚠ Limitations - No transaction-level data (aggregates only) - Property type breakdown requires parsing additional tables - PDF structure may vary slightly across years - District boundaries haven't changed since 2011 --- ## Data Source #2: CMHC Rental Market Survey ### Source Details | Attribute | Value | |-----------|-------| | **Provider** | Canada Mortgage and Housing Corporation | | **URL** | [CMHC Housing Market Information Portal](https://www03.cmhc-schl.gc.ca/hmip-pimh/) | | **Format** | CSV export, API | | **Update Frequency** | Annual (October survey) | | **Historical Availability** | 1990–Present | | **Access** | Public, free registration for bulk downloads | | **Geographic Levels** | CMA → Zone → Neighbourhood → Census Tract | ### Available Tables #### Table: `cmhc_rental_summary` **Portal Path**: Toronto → Primary Rental Market → Summary Statistics | Column | Data Type | Description | |--------|-----------|-------------| | `survey_year` | INTEGER | Survey year (October) | | `zone_code` | VARCHAR(10) | CMHC zone identifier | | `zone_name` | VARCHAR(100) | Zone name | | `bedroom_type` | VARCHAR(20) | Bachelor / 1-Bed / 2-Bed / 3-Bed+ / Total | | `universe` | INTEGER | Total rental units in zone | | `vacancy_rate` | DECIMAL | Vacancy rate (%) | | `vacancy_rate_reliability` | VARCHAR(1) | Reliability code (a/b/c/d) | | `availability_rate` | DECIMAL | Availability rate (%) | | `average_rent` | DECIMAL | Average monthly rent ($) | | `average_rent_reliability` | VARCHAR(1) | Reliability code | | `median_rent` | DECIMAL | Median monthly rent ($) | | `rent_change_pct` | DECIMAL | YoY rent change (%) | | `turnover_rate` | DECIMAL | Unit turnover rate (%) | ### Dimensions | Dimension | Granularity | Values | |-----------|-------------|--------| | **Time** | Annual | 1990 to present (October snapshot) | | **Geography** | Zone | ~20 CMHC zones in Toronto CMA | | **Bedroom Type** | Category | Bachelor, 1-Bed, 2-Bed, 3-Bed+, Total | | **Structure Type** | Category | Row, Apartment (available in detailed tables) | ### Metrics Available | Metric | Aggregation | Use Case | |--------|-------------|----------| | `average_rent` | Pre-calculated avg | Primary rent indicator | | `median_rent` | Pre-calculated median | Robust rent indicator | | `vacancy_rate` | Percentage | Market tightness | | `availability_rate` | Percentage | Supply accessibility | | `turnover_rate` | Percentage | Tenant mobility | | `rent_change_pct` | YoY % | Rent growth tracking | | `universe` | Count | Market size | ### Reliability Codes | Code | Meaning | Coefficient of Variation | |------|---------|-------------------------| | `a` | Excellent | CV ≤ 2.5% | | `b` | Good | 2.5% < CV ≤ 5% | | `c` | Fair | 5% < CV ≤ 10% | | `d` | Poor (use with caution) | CV > 10% | | `**` | Data suppressed | Sample too small | ### ⚠ Limitations - Annual only (no monthly granularity) - October snapshot (point-in-time) - Zones are larger than TRREB districts - Purpose-built rental only (excludes condo rentals in base survey) --- ## Data Source #3: City of Toronto Open Data ### Source Details | Attribute | Value | |-----------|-------| | **Provider** | City of Toronto | | **URL** | [Toronto Open Data Portal](https://open.toronto.ca/) | | **Format** | GeoJSON, Shapefile, CSV | | **Use Case** | Reference layer, demographic enrichment | ### Relevant Datasets #### Dataset: `neighbourhoods` | Column | Data Type | Description | |--------|-----------|-------------| | `area_id` | INTEGER | Neighbourhood ID (1-158) | | `area_name` | VARCHAR(100) | Official neighbourhood name | | `geometry` | POLYGON | Boundary geometry | #### Dataset: `neighbourhood_profiles` (Census-linked) | Column | Data Type | Description | |--------|-----------|-------------| | `neighbourhood_id` | INTEGER | Links to neighbourhoods | | `population` | INTEGER | Total population | | `avg_household_income` | DECIMAL | Average household income | | `dwelling_count` | INTEGER | Total dwellings | | `owner_pct` | DECIMAL | % owner-occupied | | `renter_pct` | DECIMAL | % renter-occupied | ### Enrichment Potential Can overlay demographic context on housing data: - Income brackets by neighbourhood - Ownership vs rental ratios - Population density - Dwelling type distribution --- ## Data Source #4: Enrichment Data (Density, Education) ### Purpose Provide socioeconomic context to housing price analysis. Enables questions like: - Do neighbourhoods with higher education attainment have higher prices? - How does population density correlate with price per square foot? ### Geographic Alignment Reality **Critical constraint**: Enrichment data is available at the **158-neighbourhood** level, while core housing data sits at **TRREB districts (~35)** and **CMHC zones (~20)**. These do not align cleanly. ``` 158 Neighbourhoods (fine) → Enrichment data lives here (no clean crosswalk) ~35 TRREB Districts (coarse) → Purchase data lives here ~20 CMHC Zones (coarse) → Rental data lives here ``` ### Available Enrichment Datasets #### Dataset: Neighbourhood Profiles (Census) | Attribute | Value | |-----------|-------| | **Provider** | City of Toronto (via Statistics Canada Census) | | **URL** | [Toronto Open Data - Neighbourhood Profiles](https://open.toronto.ca/dataset/neighbourhood-profiles/) | | **Format** | CSV, JSON, XML, XLSX | | **Update Frequency** | Every 5 years (Census cycle) | | **Available Years** | 2001, 2006, 2011, 2016, 2021 | | **Geographic Unit** | 158 neighbourhoods (140 pre-2021) | **Key Variables**: | Variable | Description | Use Case | |----------|-------------|----------| | `population` | Total population | Density calculation | | `land_area_sqkm` | Area in square kilometers | Density calculation | | `pop_density_per_sqkm` | Population per km | Density metric | | `pct_bachelors_or_higher` | % age 25-64 with bachelor's+ | Education proxy | | `median_household_income` | Median total household income | Income metric | | `avg_household_income` | Average total household income | Income metric | | `pct_owner_occupied` | % owner-occupied dwellings | Tenure split | | `pct_renter_occupied` | % renter-occupied dwellings | Tenure split | **Download URL (2021, 158 neighbourhoods)**: ``` https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx ``` ### Crime Data — Deferred to Portfolio Phase 4 Crime data (TPS Neighbourhood Crime Rates) is **not included in V1 scope**. It will be added in portfolio Phase 4 after the Energy Pricing project is complete. **Rationale**: - Crime data is socially/politically sensitive and requires careful methodology documentation - V1 focuses on core housing metrics and policy events - Deferral reduces scope creep risk **Future Reference** (Portfolio Phase 4): - Source: [TPS Public Safety Data Portal](https://data.torontopolice.on.ca/) - Dataset: Neighbourhood Crime Rates (Major Crime Indicators) - Geographic Unit: 158 neighbourhoods ### V1 Enrichment Data Summary | Measure | Source | Geography | Frequency | Format | Status | |---------|--------|-----------|-----------|--------|--------| | **Population Density** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready | | **Education Attainment** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready | | **Median Income** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready | | **Crime Rates (MCI)** | TPS Data Portal | 158 neighbourhoods | Annual | GeoJSON/CSV | Deferred to Portfolio Phase 4 | --- ## Data Source #5: Policy Events ### Purpose Provide temporal context for housing price movements. Display as annotation markers on time series charts. **No causation claims** — correlation/context only. ### Event Schema #### Table: `dim_policy_event` | Column | Data Type | Description | |--------|-----------|-------------| | `event_id` | INTEGER (PK) | Auto-increment primary key | | `event_date` | DATE | Date event was announced/occurred | | `effective_date` | DATE | Date policy took effect (if different) | | `level` | VARCHAR(20) | `federal` / `provincial` / `municipal` | | `category` | VARCHAR(20) | `monetary` / `tax` / `regulatory` / `supply` / `economic` | | `title` | VARCHAR(200) | Short event title for display | | `description` | TEXT | Longer description for tooltip | | `expected_direction` | VARCHAR(10) | `bearish` / `bullish` / `neutral` | | `source_url` | VARCHAR(500) | Link to official announcement/documentation | | `confidence` | VARCHAR(10) | `high` / `medium` / `low` | | `created_at` | TIMESTAMP | Record creation timestamp | ### Event Tiers | Tier | Level | Category Examples | Inclusion Criteria | |------|-------|-------------------|-------------------| | **1** | Federal | BoC rate decisions, OSFI stress tests | Always include; objective, documented | | **1** | Provincial | Fair Housing Plan, foreign buyer tax, rent control | Always include; legislative record | | **2** | Municipal | Zoning reforms, development charges | Include if material impact expected | | **2** | Economic | COVID measures, major employer closures | Include if Toronto-specific impact | | **3** | Market | Major project announcements | Strict criteria; must be verifiable | ### Expected Direction Values | Value | Meaning | Example | |-------|---------|---------| | `bullish` | Expected to increase prices | Rate cut, supply restriction | | `bearish` | Expected to decrease prices | Rate hike, foreign buyer tax | | `neutral` | Uncertain or mixed impact | Regulatory clarification | ### ⚠ Caveats - **No causation claims**: Events are context, not explanation - **Lag effects**: Policy impact may not be immediate - **Confounding factors**: Multiple simultaneous influences - **Display only**: No statistical analysis in V1 ### Sample Events (Tier 1) | Date | Level | Category | Title | Direction | |------|-------|----------|-------|-----------| | 2017-04-20 | provincial | tax | Ontario Fair Housing Plan | bearish | | 2018-01-01 | federal | regulatory | OSFI B-20 Stress Test | bearish | | 2020-03-27 | federal | monetary | BoC Emergency Rate Cut (0.25%) | bullish | | 2022-03-02 | federal | monetary | BoC Rate Hike Cycle Begins | bearish | | 2023-06-01 | federal | tax | Federal 2-Year Foreign Buyer Ban | bearish | --- ## Data Integration Strategy ### Temporal Alignment | Source | Native Frequency | Alignment Strategy | |--------|------------------|---------------------| | TRREB | Monthly | Use as-is | | CMHC | Annual (October) | Spread to monthly OR display annual overlay | | Census/Enrichment | 5-year | Static snapshot; display as reference | | Policy Events | Event-based | Display as vertical markers on time axis | **Recommendation**: Keep separate time axes. TRREB monthly for purchases, CMHC annual for rentals. Don't force artificial monthly rental data. ### Geographic Alignment ``` ┌─────────────────────────────────────────────────────────────────┐ │ VISUALIZATION APPROACH │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Purchase Mode Rental Mode │ │ ───────────────── ────────────── │ │ Map: TRREB Districts Map: CMHC Zones │ │ Time: Monthly slider Time: Annual selector │ │ Metrics: Price, Sales Metrics: Rent, Vacancy │ │ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ City Neighbourhoods Overlay │ │ │ │ (158 boundaries as reference layer) │ │ │ │ + Enrichment data (density, education, income) │ │ │ ──────────────────────────────────────────────────────────┘ │ │ │ ────────────────────────────────────────────────────────────────────┘ ``` ### Enrichment Integration Strategy (Phased) #### V1: Reference Overlay (Current Scope) **Approach**: Display neighbourhood enrichment as a separate toggle-able layer. No joins to housing data. **UX**: - User hovers over TRREB district → tooltip shows "This district contains neighbourhoods: Annex, Casa Loma, Yorkville..." - User toggles "Show Enrichment" → choropleth switches to neighbourhood-level density/education/income - Enrichment and housing metrics displayed side-by-side, not merged **Pros**: - No imputation or dodgy aggregations - Honest about geographic mismatch - Ships faster **Cons**: - Can't do correlation analysis (price vs. enrichment) directly in dashboard **Implementation**: - `dim_neighbourhood` as standalone dimension (no FK to fact tables) - Spatial lookup on hover (point-in-polygon) #### V2/Portfolio Phase 4: Area-Weighted Aggregation (Future Scope) **Approach**: Pre-compute area-weighted averages of neighbourhood metrics for each TRREB district and CMHC zone. **Process**: 1. Spatial join: intersect neighbourhood polygons with TRREB/CMHC polygons 2. Compute overlap area for each neighbourhood-district pair 3. Weight neighbourhood metrics by overlap area proportion 4. User selects aggregation method in UI **Aggregation Methods to Expose**: | Method | Description | Best For | |--------|-------------|----------| | **Area-weighted mean** | Weight by % overlap area | Continuous metrics (density) | | **Population-weighted mean** | Weight by population in overlap | Per-capita metrics (education) | | **Majority assignment** | Assign neighbourhood to district with >50% overlap | Categorical data | | **Max overlap** | Assign to single district with largest overlap | 1:1 mapping needs | **Default**: Population-weighted (more defensible for per-capita metrics). Hide selector behind "Advanced" toggle. ### V1 Future-Proofing (Do Now) | Action | Why | |--------|-----| | Store neighbourhood boundaries in same CRS as TRREB/CMHC (WGS84) | Avoids reprojection headaches | | Keep `dim_neighbourhood` normalized (not denormalized into district tables) | Clean separation for V2 join | | Document Census year for each metric | Ready for 2026 Census | | Include `census_year` column in dim_neighbourhood | Enables SCD tracking | ### V1 Defer (Don't Do Yet) | Action | Why Not | |--------|---------| | Pre-compute area-weighted crosswalk | Don't need for V1 | | Build aggregation method selector UI | No backend to support it | | Crime data integration | Deferred to Portfolio Phase 4 | | Historical neighbourhood boundary reconciliation (140→158) | Use 2021+ data only for V1 | --- ## Proposed Data Model ### Star Schema ``` ┌──────────────────┐ │ dim_time │ ├──────────────────┤ │ date_key (PK) │ │ year │ │ month │ │ quarter │ │ month_name │ ───────────────────────┘ │ ┌─────────────────────────────────────────────┐ │ │ │ │ ┌──────────────────┐ │ ┌──────────────────┐ │ dim_trreb_district│ │ │ dim_cmhc_zone │ ├──────────────────┤ │ ├──────────────────┤ │ district_key (PK)│ │ │ zone_key (PK) │ │ district_code │ │ │ zone_code │ │ district_name │ │ │ zone_name │ │ area_type │ │ │ geometry │ │ geometry │ ───────────────────────┘ │ │ │ │ │ │ ┌──────────────────┐ │ ┌──────────────────┐ │ fact_purchases │ │ │ fact_rentals │ ├──────────────────┤ │ ├──────────────────┤ │ date_key (FK) │ │ │ date_key (FK) │ │ district_key (FK)│ │ │ zone_key (FK) │ │ sales_count │ │ │ bedroom_type │ │ avg_price │ │ │ avg_rent │ │ median_price │ │ │ median_rent │ │ new_listings │ │ │ vacancy_rate │ │ active_listings │ │ │ universe │ │ avg_dom │ │ │ turnover_rate │ │ avg_sp_lp │ │ │ reliability_code │ ─────────────────────┘ │ ─────────────────────┘ │ ┌───────────────────────────┐ │ dim_neighbourhood │ ├───────────────────────────┤ │ neighbourhood_id (PK) │ │ name │ │ geometry │ │ population │ │ land_area_sqkm │ │ pop_density_per_sqkm │ │ pct_bachelors_or_higher │ │ median_household_income │ │ pct_owner_occupied │ │ pct_renter_occupied │ │ census_year │ ← For SCD tracking ──────────────────────────────┘ ┌───────────────────────────┐ │ dim_policy_event │ ├───────────────────────────┤ │ event_id (PK) │ │ event_date │ │ effective_date │ │ level │ ← federal/provincial/municipal │ category │ ← monetary/tax/regulatory/supply/economic │ title │ │ description │ │ expected_direction │ ← bearish/bullish/neutral │ source_url │ │ confidence │ ← high/medium/low │ created_at │ ──────────────────────────────┘ ┌───────────────────────────┐ │ bridge_district_neighbourhood │ ← Portfolio Phase 4 ONLY ├───────────────────────────┤ │ district_key (FK) │ │ neighbourhood_id (FK) │ │ area_overlap_pct │ │ population_overlap │ ← For pop-weighted agg ──────────────────────────────┘ ``` **Notes**: - `dim_neighbourhood` has no FK relationship to fact tables in V1 - `dim_policy_event` is standalone (no FK to facts); used for time-series annotation - `bridge_district_neighbourhood` is Portfolio Phase 4 scope only - Similar bridge table needed for CMHC zones in Portfolio Phase 4 --- ## File Structure > **Note**: Toronto Housing data logic lives in `portfolio_app/toronto/`. See `portfolio_project_plan_v5.md` for full project structure. ### Data Directory Structure ``` data/ └── toronto/ ├── raw/ │ ├── trreb/ │ │ └── market_watch_YYYY_MM.pdf │ ├── cmhc/ │ │ └── rental_survey_YYYY.csv │ ├── enrichment/ │ │ └── neighbourhood_profiles_2021.xlsx │ └── geo/ │ ├── toronto_neighbourhoods.geojson │ ├── trreb_districts.geojson ← (to be created via QGIS) │ └── cmhc_zones.geojson ← (from R cmhc package) │ ├── processed/ ← gitignored │ ├── fact_purchases.parquet │ ├── fact_rentals.parquet │ ├── dim_time.parquet │ ├── dim_trreb_district.parquet │ ├── dim_cmhc_zone.parquet │ ├── dim_neighbourhood.parquet │ └── dim_policy_event.parquet │ └── reference/ ├── policy_events.csv ← Curated event list └── neighbourhood_boundary_changelog.md ← 140→158 notes ``` ### Code Module Structure ``` portfolio_app/toronto/ ├── __init__.py ├── parsers/ │ ├── __init__.py │ ├── trreb.py # PDF extraction │ └── cmhc.py # CSV processing ├── loaders/ │ ├── __init__.py │ └── database.py # DB operations ├── schemas/ # Pydantic models │ ├── __init__.py │ ├── trreb.py │ ├── cmhc.py │ ├── enrichment.py │ └── policy_event.py ├── models/ # SQLAlchemy ORM │ ├── __init__.py │ ├── base.py # DeclarativeBase, engine │ ├── dimensions.py # dim_time, dim_trreb_district, dim_policy_event, etc. │ └── facts.py # fact_purchases, fact_rentals └── transforms/ └── __init__.py ``` ### Notebooks ``` notebooks/ ├── 01_trreb_pdf_extraction.ipynb ├── 02_cmhc_data_prep.ipynb ├── 03_geo_layer_prep.ipynb ├── 04_enrichment_data_prep.ipynb ├── 05_policy_events_curation.ipynb └── 06_spatial_crosswalk.ipynb ← Portfolio Phase 4 only ``` --- ## ✅ Implementation Checklist > **Note**: These are **Stages** within the Toronto Housing project (Portfolio Phase 1). They are distinct from the overall portfolio **Phases** defined in `portfolio_project_plan_v5.md`. ### Stage 1: Data Acquisition - [ ] Download TRREB monthly PDFs (2020-present as MVP) - [ ] Register for CMHC portal and export Toronto rental data - [ ] Extract CMHC zone boundaries via R `cmhc` package - [ ] Download City of Toronto neighbourhood GeoJSON (158 boundaries) - [ ] Digitize TRREB district boundaries in QGIS - [ ] Download Neighbourhood Profiles (2021 Census, 158-model) ### Stage 2: Data Processing - [ ] Build TRREB PDF parser (`portfolio_app/toronto/parsers/trreb.py`) - [ ] Build Pydantic schemas (`portfolio_app/toronto/schemas/`) - [ ] Build SQLAlchemy models (`portfolio_app/toronto/models/`) - [ ] Extract and validate TRREB monthly summaries - [ ] Clean and structure CMHC rental data - [ ] Process Neighbourhood Profiles into `dim_neighbourhood` - [ ] Curate and load policy events into `dim_policy_event` - [ ] Create dimension tables - [ ] Build fact tables - [ ] Validate all geospatial layers use same CRS (WGS84/EPSG:4326) ### Stage 3: Visualization (V1) - [ ] Create dashboard page (`portfolio_app/pages/toronto/dashboard.py`) - [ ] Build choropleth figures (`portfolio_app/figures/choropleth.py`) - [ ] Build time series figures (`portfolio_app/figures/time_series.py`) - [ ] Design dashboard layout (purchase/rental toggle) - [ ] Implement choropleth map with layer switching - [ ] Add time slider/selector - [ ] Build neighbourhood overlay (toggle-able) - [ ] Add enrichment layer toggle (density/education/income choropleth) - [ ] Add policy event markers on time series - [ ] Add tooltips with cross-reference info ("This district contains...") - [ ] Add tooltips showing enrichment metrics on hover ### Stage 4: Polish (V1) - [ ] Add data source citations - [ ] Document methodology (especially geographic limitations) - [ ] Write docs (`docs/methodology.md`, `docs/data_sources.md`) - [ ] Deploy to portfolio ### Future Enhancements (Portfolio Phase 4 — Post-Energy Project) - [ ] Add crime data to dim_neighbourhood - [ ] Build spatial crosswalk (neighbourhood ↔ district/zone intersections) - [ ] Compute area-weighted and population-weighted aggregations - [ ] Add aggregation method selector to UI - [ ] Enable correlation analysis (price vs. enrichment metrics) - [ ] Add historical neighbourhood boundary support (140→158) **Deployment & dbt Architecture**: See `portfolio_project_plan_v5.md` for: - dbt layer structure and testing strategy - Deployment architecture - Data quality framework --- ## References & Links ### Core Housing Data | Resource | URL | |----------|-----| | TRREB Market Watch | https://trreb.ca/index.php/market-news/market-watch | | CMHC Housing Portal | https://www03.cmhc-schl.gc.ca/hmip-pimh/ | ### Geographic Boundaries | Resource | URL | |----------|-----| | Toronto Neighbourhoods GeoJSON | https://github.com/jasonicarter/toronto-geojson | | TRREB District Map (PDF) | https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf | | Statistics Canada Census Tracts | https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index-eng.cfm | | R `cmhc` package (CRAN) | https://cran.r-project.org/package=cmhc | ### Enrichment Data | Resource | URL | |----------|-----| | Toronto Open Data Portal | https://open.toronto.ca/ | | Neighbourhood Profiles (CKAN) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/neighbourhood-profiles | | Neighbourhood Profiles 2021 (Direct Download) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx | ### Policy Events Research | Resource | URL | |----------|-----| | Bank of Canada Interest Rates | https://www.bankofcanada.ca/rates/interest-rates/ | | OSFI (Stress Test Rules) | https://www.osfi-bsif.gc.ca/ | | Ontario Legislature (Bills) | https://www.ola.org/ | ### Reference Documentation | Resource | URL | |----------|-----| | Statistics Canada 2021 Census Reference | https://www12.statcan.gc.ca/census-recensement/2021/ref/index-eng.cfm | | City of Toronto Neighbourhood Profiles Overview | https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/neighbourhood-profiles/ | --- ## Related Documents | Document | Relationship | Use For | |----------|--------------|---------| | `portfolio_project_plan_v5.md` | Parent document | Overall scope, phasing, tech stack, deployment, dbt architecture, data quality framework | --- *Document Version: 5.1* *Updated: January 2026* *Project: Toronto Housing Price Dashboard — Portfolio Piece*