staging #96

Merged
lmiranda merged 90 commits from staging into main 2026-02-01 21:33:13 +00:00
Showing only changes of commit 48b4eeeb62 - Show all commits

View File

@@ -1,759 +1,276 @@
# Toronto Neighbourhood Dashboard — Implementation Plan
**Document Type:** Change Implementation Plan
**Document Type:** Execution Guide
**Target:** Transition from TRREB-based to Neighbourhood-based Dashboard
**Author:** Claude Code
**Version:** 1.0 | January 2026
**Status:** Awaiting Approval
**Version:** 2.0 | January 2026
---
## Executive Summary
## Overview
This plan details the transition from the current TRREB district-based housing dashboard to a comprehensive Toronto Neighbourhood Dashboard built around the city's 158 official neighbourhoods. The change simplifies geographic alignment, improves data availability through open APIs, and expands analytical scope to include housing, safety, demographics, and amenities.
Transition from TRREB district-based housing dashboard to a comprehensive Toronto Neighbourhood Dashboard built around the city's 158 official neighbourhoods.
**Key Changes:**
- Geographic foundation shifts from TRREB districts (~35) to City Neighbourhoods (158)
- Data sources transition from PDF parsing to open APIs (Toronto Open Data, CMHC, Toronto Police)
- Dashboard expands from housing-only to 5 thematic tabs
- Star schema redesigned around neighbourhood as the central dimension
---
## Table of Contents
1. [Phase 1: Repository Cleanup](#phase-1-repository-cleanup)
2. [Phase 2: Documentation Updates](#phase-2-documentation-updates)
3. [Phase 3: Data Pipeline Implementation](#phase-3-data-pipeline-implementation)
4. [Phase 4: dbt Model Restructuring](#phase-4-dbt-model-restructuring)
5. [Phase 5: Dashboard Implementation](#phase-5-dashboard-implementation)
6. [Phase 6: Jupyter Notebooks](#phase-6-jupyter-notebooks)
7. [Phase 7: Final Documentation Review](#phase-7-final-documentation-review)
8. [Phase 8: Commit and Merge Strategy](#phase-8-commit-and-merge-strategy)
- Geographic foundation: TRREB districts (~35) City Neighbourhoods (158)
- Data sources: PDF parsing → Open APIs (Toronto Open Data, Toronto Police, CMHC)
- Scope: Housing-only 5 thematic tabs (Overview, Housing, Safety, Demographics, Amenities)
---
## Phase 1: Repository Cleanup
### 1.1 Files to DELETE (TRREB-Specific)
### Files to DELETE
These files are specific to the old TRREB district-based approach and will be completely removed:
| File | Reason for Deletion |
|------|---------------------|
| `portfolio_app/toronto/schemas/trreb.py` | TRREB schema obsolete - replacing with neighbourhood-based |
| `portfolio_app/toronto/parsers/trreb.py` | PDF parsing no longer needed - using APIs |
| File | Reason |
|------|--------|
| `portfolio_app/toronto/schemas/trreb.py` | TRREB schema obsolete |
| `portfolio_app/toronto/parsers/trreb.py` | PDF parsing no longer needed |
| `portfolio_app/toronto/loaders/trreb.py` | TRREB loading logic obsolete |
| `dbt/models/staging/stg_trreb__purchases.sql` | TRREB staging model obsolete |
| `dbt/models/intermediate/int_purchases__monthly.sql` | TRREB-based intermediate obsolete |
| `dbt/models/marts/mart_toronto_purchases.sql` | Will be rebuilt for neighbourhood grain |
| `dbt/models/staging/stg_trreb__purchases.sql` | TRREB staging obsolete |
| `dbt/models/intermediate/int_purchases__monthly.sql` | TRREB intermediate obsolete |
| `dbt/models/marts/mart_toronto_purchases.sql` | Will rebuild for neighbourhood grain |
### 1.2 Files to MODIFY (Remove TRREB References)
### Files to MODIFY (Remove TRREB References)
| File | Changes Required |
|------|------------------|
| File | Action |
|------|--------|
| `portfolio_app/toronto/schemas/__init__.py` | Remove TRREB imports |
| `portfolio_app/toronto/parsers/__init__.py` | Remove TRREB parser imports |
| `portfolio_app/toronto/loaders/__init__.py` | Remove TRREB loader imports |
| `portfolio_app/toronto/models/facts.py` | Remove `FactPurchases` model (rebuild later) |
| `portfolio_app/toronto/models/facts.py` | Remove `FactPurchases` model |
| `portfolio_app/toronto/models/dimensions.py` | Remove `DimTRREBDistrict` model |
| `portfolio_app/toronto/demo_data.py` | Remove TRREB demo districts, rebuild for neighbourhoods |
| `portfolio_app/toronto/demo_data.py` | Remove TRREB demo data |
| `dbt/models/sources.yml` | Remove TRREB source definitions |
| `dbt/models/schema.yml` | Remove TRREB model documentation |
### 1.3 Files to KEEP (Reusable Infrastructure)
### Files to KEEP (Reusable)
| File | Why Keep |
|------|----------|
| `portfolio_app/toronto/schemas/cmhc.py` | CMHC data still used (requires zone-to-neighbourhood mapping) |
| `portfolio_app/toronto/parsers/cmhc.py` | CMHC parser reusable with modifications |
| `portfolio_app/toronto/loaders/cmhc.py` | Loader patterns reusable |
| File | Why |
|------|-----|
| `portfolio_app/toronto/schemas/cmhc.py` | CMHC data still used |
| `portfolio_app/toronto/parsers/cmhc.py` | Reusable with modifications |
| `portfolio_app/toronto/loaders/base.py` | Generic database utilities |
| `portfolio_app/toronto/loaders/dimensions.py` | Dimension loading patterns reusable |
| `portfolio_app/toronto/loaders/dimensions.py` | Dimension loading patterns |
| `portfolio_app/toronto/models/base.py` | SQLAlchemy base class |
| `portfolio_app/toronto/models/facts.py` | Keep `FactRentals`, refactor |
| `portfolio_app/toronto/models/dimensions.py` | Keep `DimTime`, `DimNeighbourhood`, refactor others |
| `portfolio_app/figures/*.py` | All chart factories reusable |
| `portfolio_app/components/*.py` | All UI components reusable |
### 1.4 Cleanup Commands
```bash
# Files to delete
rm portfolio_app/toronto/schemas/trreb.py
rm portfolio_app/toronto/parsers/trreb.py
rm portfolio_app/toronto/loaders/trreb.py
rm dbt/models/staging/stg_trreb__purchases.sql
rm dbt/models/intermediate/int_purchases__monthly.sql
rm dbt/models/marts/mart_toronto_purchases.sql
```
---
## Phase 2: Documentation Updates
### 2.1 Primary Documentation Files
| Document | Current State | Required Updates |
|----------|---------------|------------------|
| `CLAUDE.md` (project) | References TRREB, old sprint structure | Complete rewrite of Data Model section |
| `docs/PROJECT_REFERENCE.md` | Full spec references TRREB | Update architecture, data sources |
| `docs/toronto_housing_dashboard_spec_v5.md` | TRREB/CMHC spec | Replace with neighbourhood spec |
| `docs/wbs_sprint_plan_v4.md` | Old sprint plan | New sprint plan for neighbourhood implementation |
### 2.2 CLAUDE.md Updates Required
**Section: Data Model Overview**
- Remove: TRREB Districts (~35) reference
- Remove: "These geographies do NOT align" note (now unified)
- Update: Star schema to neighbourhood-centric model
- Update: dbt layers description
**Section: Star Schema**
Replace with:
| Table | Type | Keys |
|-------|------|------|
| `dim_neighbourhood` | Central Dimension | neighbourhood_id (PK), geometry |
| `dim_time` | Dimension | date_key (PK) |
| `dim_cmhc_zone` | Bridge | zone_key (PK), neighbourhood mapping |
| `fact_census` | Fact | -> dim_neighbourhood, dim_time |
| `fact_crime` | Fact | -> dim_neighbourhood, dim_time |
| `fact_rentals` | Fact | -> dim_cmhc_zone, dim_time |
| `fact_amenities` | Fact | -> dim_neighbourhood |
**Section: DO NOT BUILD**
Update to reflect new scope constraints.
**Section: Module Responsibilities**
Update parsers description: "API extraction" instead of "PDF/CSV extraction"
### 2.3 New Reference Documents to Create
| Document | Purpose |
|----------|---------|
| `docs/neighbourhood_dashboard_spec_v1.md` | New dashboard specification (from Change-Toronto-Analysis.md) |
| `docs/data_source_inventory.md` | API endpoints, data dictionaries, refresh schedules |
| `docs/cmhc_neighbourhood_crosswalk.md` | CMHC zone to neighbourhood mapping methodology |
| Document | Action |
|----------|--------|
| `CLAUDE.md` | Update data model section, mark transition complete |
| `docs/PROJECT_REFERENCE.md` | Update architecture, data sources |
| `docs/toronto_housing_dashboard_spec_v5.md` | Archive or delete |
| `docs/wbs_sprint_plan_v4.md` | Archive or delete |
---
## Phase 3: Data Pipeline Implementation
## Phase 3: New Data Model
### 3.1 New Schema Files
### Star Schema (Neighbourhood-Centric)
#### `portfolio_app/toronto/schemas/neighbourhood.py`
```python
"""Pydantic schemas for neighbourhood-level data."""
from pydantic import BaseModel, Field
from datetime import date
from typing import Optional
| Table | Type | Description |
|-------|------|-------------|
| `dim_neighbourhood` | Central Dimension | 158 neighbourhoods with geometry |
| `dim_time` | Dimension | Date dimension (keep existing) |
| `dim_cmhc_zone` | Bridge Dimension | 15 CMHC zones with neighbourhood mapping |
| `bridge_cmhc_neighbourhood` | Bridge | Zone-to-neighbourhood area weights |
| `fact_census` | Fact | Census indicators by neighbourhood |
| `fact_crime` | Fact | Crime stats by neighbourhood |
| `fact_rentals` | Fact | Rental data by CMHC zone (keep existing) |
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
class NeighbourhoodRecord(BaseModel):
"""Core neighbourhood dimension record."""
area_id: int = Field(..., description="Toronto Open Data AREA_ID")
name: str
population: Optional[int] = None
land_area_sqkm: Optional[float] = None
### New Schema Files
class CensusRecord(BaseModel):
"""Census indicator for a neighbourhood."""
neighbourhood_id: int
census_year: int
indicator_name: str
indicator_value: float
| File | Contains |
|------|----------|
| `toronto/schemas/neighbourhood.py` | NeighbourhoodRecord, CensusRecord, CrimeRecord |
| `toronto/schemas/amenities.py` | AmenityType enum, AmenityRecord |
class CrimeRecord(BaseModel):
"""Crime statistics for a neighbourhood."""
neighbourhood_id: int
year: int
mci_category: str # Assault, B&E, Auto Theft, Robbery, Theft Over
count: int
rate_per_100k: float
```
### New Parser Files
#### `portfolio_app/toronto/schemas/amenities.py`
```python
"""Pydantic schemas for amenity data."""
from pydantic import BaseModel
from typing import Optional
from enum import Enum
| File | Data Source | API |
|------|-------------|-----|
| `toronto/parsers/toronto_open_data.py` | Neighbourhoods, Census, Parks, Schools, Childcare | Toronto Open Data Portal |
| `toronto/parsers/toronto_police.py` | Crime Rates, MCI, Shootings | Toronto Police Portal |
class AmenityType(str, Enum):
PARK = "park"
SCHOOL = "school"
CHILDCARE = "childcare"
TTC_STOP = "ttc_stop"
class AmenityRecord(BaseModel):
"""Point amenity within a neighbourhood."""
neighbourhood_id: int
amenity_type: AmenityType
name: str
latitude: float
longitude: float
attributes: Optional[dict] = None # Type-specific attributes
```
### 3.2 New Parser Files
#### `portfolio_app/toronto/parsers/toronto_open_data.py`
```python
"""Parser for Toronto Open Data Portal APIs."""
# Endpoints:
# - Neighbourhoods GeoJSON
# - Neighbourhood Profiles CSV
# - Parks
# - Schools
# - Child Care Centres
class TorontoOpenDataParser:
BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
def fetch_neighbourhoods_geojson(self) -> dict: ...
def fetch_neighbourhood_profiles(self) -> list[CensusRecord]: ...
def fetch_parks(self) -> list[AmenityRecord]: ...
def fetch_schools(self) -> list[AmenityRecord]: ...
def fetch_childcare(self) -> list[AmenityRecord]: ...
```
#### `portfolio_app/toronto/parsers/toronto_police.py`
```python
"""Parser for Toronto Police Service Open Data Portal."""
# Endpoints:
# - Neighbourhood Crime Rates
# - Major Crime Indicators
# - Shootings & Firearm Discharges
class TorontoPoliceParser:
BASE_URL = "https://data.torontopolice.on.ca"
def fetch_neighbourhood_crime_rates(self, year: int) -> list[CrimeRecord]: ...
def fetch_mci_details(self, year: int) -> list[CrimeRecord]: ...
def fetch_shootings(self, year: int) -> list[dict]: ...
```
### 3.3 New Model Files
#### `portfolio_app/toronto/models/dimensions.py` (Updated)
```python
"""Dimension models - neighbourhood as central dimension."""
class DimNeighbourhood(Base):
"""158 Toronto neighbourhoods - CENTRAL DIMENSION."""
__tablename__ = "dim_neighbourhood"
neighbourhood_id = Column(Integer, primary_key=True) # AREA_ID from GeoJSON
name = Column(String(100), nullable=False)
geometry = Column(Geometry("POLYGON", srid=4326))
population = Column(Integer)
land_area_sqkm = Column(Float)
pop_density_per_sqkm = Column(Float)
class DimCMHCZone(Base):
"""15 CMHC zones with neighbourhood mapping."""
__tablename__ = "dim_cmhc_zone"
zone_key = Column(Integer, primary_key=True, autoincrement=True)
zone_code = Column(String(10), unique=True, nullable=False)
zone_name = Column(String(100), nullable=False)
geometry = Column(Geometry("POLYGON", srid=4326))
class BridgeCMHCNeighbourhood(Base):
"""Many-to-many: CMHC zones to neighbourhoods with area weights."""
__tablename__ = "bridge_cmhc_neighbourhood"
zone_key = Column(Integer, ForeignKey("dim_cmhc_zone.zone_key"), primary_key=True)
neighbourhood_id = Column(Integer, ForeignKey("dim_neighbourhood.neighbourhood_id"), primary_key=True)
area_weight = Column(Float) # Proportion of neighbourhood in zone
```
#### `portfolio_app/toronto/models/facts.py` (Updated)
```python
"""Fact tables - all keyed to neighbourhood or CMHC zone."""
class FactCensus(Base):
"""Census indicators by neighbourhood."""
__tablename__ = "fact_census"
id = Column(Integer, primary_key=True, autoincrement=True)
neighbourhood_id = Column(Integer, ForeignKey("dim_neighbourhood.neighbourhood_id"))
date_key = Column(Integer, ForeignKey("dim_time.date_key"))
indicator_name = Column(String(100))
indicator_value = Column(Float)
class FactCrime(Base):
"""Crime statistics by neighbourhood."""
__tablename__ = "fact_crime"
id = Column(Integer, primary_key=True, autoincrement=True)
neighbourhood_id = Column(Integer, ForeignKey("dim_neighbourhood.neighbourhood_id"))
date_key = Column(Integer, ForeignKey("dim_time.date_key"))
mci_category = Column(String(50))
incident_count = Column(Integer)
rate_per_100k = Column(Float)
class FactAmenities(Base):
"""Amenity counts by neighbourhood (snapshot)."""
__tablename__ = "fact_amenities"
neighbourhood_id = Column(Integer, ForeignKey("dim_neighbourhood.neighbourhood_id"), primary_key=True)
parks_count = Column(Integer)
parks_area_sqm = Column(Float)
schools_count = Column(Integer)
childcare_spaces = Column(Integer)
ttc_stops_count = Column(Integer)
snapshot_date = Column(Date)
```
### 3.4 New Loader Files
### New Loader Files
| File | Purpose |
|------|---------|
| `portfolio_app/toronto/loaders/neighbourhoods.py` | Load GeoJSON boundaries |
| `portfolio_app/toronto/loaders/census.py` | Load neighbourhood profiles |
| `portfolio_app/toronto/loaders/crime.py` | Load crime statistics |
| `portfolio_app/toronto/loaders/amenities.py` | Load parks, schools, childcare |
| `portfolio_app/toronto/loaders/cmhc_crosswalk.py` | Build CMHC-neighbourhood bridge |
| `toronto/loaders/neighbourhoods.py` | Load GeoJSON boundaries |
| `toronto/loaders/census.py` | Load neighbourhood profiles |
| `toronto/loaders/crime.py` | Load crime statistics |
| `toronto/loaders/amenities.py` | Load parks, schools, childcare |
| `toronto/loaders/cmhc_crosswalk.py` | Build CMHC-neighbourhood bridge |
---
## Phase 4: dbt Model Restructuring
## Phase 4: dbt Restructuring
### 4.1 New Staging Models
### Staging Layer
| Model | Source | Purpose |
|-------|--------|---------|
| `stg_toronto__neighbourhoods` | dim_neighbourhood | Clean neighbourhood dimension |
| `stg_toronto__census` | fact_census | Pivoted census indicators |
| `stg_toronto__crime` | fact_crime | Cleaned crime data |
| `stg_toronto__amenities` | fact_amenities | Amenity counts |
| `stg_cmhc__rentals` | fact_rentals | (Keep existing, modify) |
| `stg_cmhc__zone_crosswalk` | bridge_cmhc_neighbourhood | Zone-neighbourhood mapping |
| Model | Source |
|-------|--------|
| `stg_toronto__neighbourhoods` | dim_neighbourhood |
| `stg_toronto__census` | fact_census |
| `stg_toronto__crime` | fact_crime |
| `stg_toronto__amenities` | fact_amenities |
| `stg_cmhc__rentals` | fact_rentals (modify existing) |
| `stg_cmhc__zone_crosswalk` | bridge_cmhc_neighbourhood |
### 4.2 New Intermediate Models
### Intermediate Layer
| Model | Purpose |
|-------|---------|
| `int_neighbourhood__demographics` | Combined census demographics |
| `int_neighbourhood__housing` | Housing indicators from census |
| `int_neighbourhood__housing` | Housing indicators |
| `int_neighbourhood__crime_summary` | Aggregated crime by type |
| `int_neighbourhood__amenity_scores` | Normalized amenity metrics |
| `int_rentals__neighbourhood_allocated` | CMHC rentals allocated to neighbourhoods |
### 4.3 New Mart Models
### Mart Layer (One per Tab)
| Model | Purpose | Tab |
|-------|---------|-----|
| `mart_neighbourhood_overview` | Composite livability scores | Overview |
| `mart_neighbourhood_housing` | Affordability metrics | Housing |
| `mart_neighbourhood_safety` | Crime rates and trends | Safety |
| `mart_neighbourhood_demographics` | Population, income, diversity | Demographics |
| `mart_neighbourhood_amenities` | Parks, schools, transit access | Amenities |
| `mart_dashboard_kpis` | Pre-computed KPI values | All tabs |
### 4.4 dbt Sources Configuration
```yaml
# dbt/models/sources.yml
version: 2
sources:
- name: toronto
schema: public
tables:
- name: dim_neighbourhood
identifier: dim_neighbourhood
- name: dim_time
identifier: dim_time
- name: dim_cmhc_zone
identifier: dim_cmhc_zone
- name: bridge_cmhc_neighbourhood
identifier: bridge_cmhc_neighbourhood
- name: fact_census
identifier: fact_census
- name: fact_crime
identifier: fact_crime
- name: fact_rentals
identifier: fact_rentals
- name: fact_amenities
identifier: fact_amenities
```
| Model | Tab | Key Metrics |
|-------|-----|-------------|
| `mart_neighbourhood_overview` | Overview | Composite livability score |
| `mart_neighbourhood_housing` | Housing | Affordability index, rent-to-income |
| `mart_neighbourhood_safety` | Safety | Crime rates, YoY change |
| `mart_neighbourhood_demographics` | Demographics | Income, age, diversity |
| `mart_neighbourhood_amenities` | Amenities | Parks, schools, transit per capita |
---
## Phase 5: Dashboard Implementation
### 5.1 Tab Structure
### Tab Structure
```
pages/toronto/
├── dashboard.py # Main layout with tab navigation
├── methodology.py # Keep existing
├── tabs/
│ ├── __init__.py
│ ├── overview.py # Tab 1: Composite livability
│ ├── housing.py # Tab 2: Affordability
│ ├── safety.py # Tab 3: Crime
── demographics.py # Tab 4: Population
│ └── amenities.py # Tab 5: Services
│ ├── overview.py # Composite livability
│ ├── housing.py # Affordability
│ ├── safety.py # Crime
│ ├── demographics.py # Population
── amenities.py # Services
└── callbacks/
├── __init__.py
├── map_callbacks.py # Choropleth interactions
── chart_callbacks.py # Supporting charts
└── selection_callbacks.py # Neighbourhood selection
├── map_callbacks.py
├── chart_callbacks.py
── selection_callbacks.py
```
### 5.2 Shared Components per Tab
### Layout Pattern (All Tabs)
Each tab follows the same layout pattern:
Each tab follows the same structure:
1. **Choropleth Map** (left) — 158 neighbourhoods, click to select
2. **KPI Cards** (right) — 3-4 contextual metrics
3. **Supporting Charts** (bottom) — Trend + comparison visualizations
4. **Details Panel** (collapsible) — All metrics for selected neighbourhood
1. **Choropleth Map** (left, 60% width)
- 158 neighbourhoods
- Color by selected metric
- Click to select neighbourhood
### Graphs by Tab
2. **KPI Cards** (right, 40% width)
- 3-4 contextual KPIs
- Update on neighbourhood selection
3. **Supporting Charts** (bottom row)
- Chart 1: Context/trend visualization
- Chart 2: Comparison/ranking visualization
4. **Details Panel** (collapsible)
- All metrics for selected neighbourhood
### 5.3 Graphs by Tab
#### Tab 1: Overview
| Graph ID | Type | Data Source |
|----------|------|-------------|
| `overview-choropleth` | Choropleth | mart_neighbourhood_overview |
| `overview-top-bottom` | Horizontal Bar | mart_neighbourhood_overview |
| `overview-income-crime-scatter` | Scatter | mart_neighbourhood_overview |
#### Tab 2: Housing & Affordability
| Graph ID | Type | Data Source |
|----------|------|-------------|
| `housing-choropleth` | Choropleth | mart_neighbourhood_housing |
| `housing-rent-trend` | Line | mart_neighbourhood_housing (historical) |
| `housing-dwelling-types` | Pie/Bar | mart_neighbourhood_housing |
#### Tab 3: Safety
| Graph ID | Type | Data Source |
|----------|------|-------------|
| `safety-choropleth` | Choropleth | mart_neighbourhood_safety |
| `safety-crime-breakdown` | Stacked Bar | mart_neighbourhood_safety |
| `safety-trend` | Line | mart_neighbourhood_safety (5-year) |
#### Tab 4: Demographics
| Graph ID | Type | Data Source |
|----------|------|-------------|
| `demographics-choropleth` | Choropleth | mart_neighbourhood_demographics |
| `demographics-age-pyramid` | Population Pyramid | mart_neighbourhood_demographics |
| `demographics-languages` | Horizontal Bar | mart_neighbourhood_demographics |
#### Tab 5: Amenities & Services
| Graph ID | Type | Data Source |
|----------|------|-------------|
| `amenities-choropleth` | Choropleth | mart_neighbourhood_amenities |
| `amenities-radar` | Radar | mart_neighbourhood_amenities |
| `amenities-transit` | Bar | mart_neighbourhood_amenities |
| Tab | Choropleth Metric | Chart 1 | Chart 2 |
|-----|-------------------|---------|---------|
| Overview | Livability score | Top/Bottom 10 bar | Income vs Crime scatter |
| Housing | Affordability index | Rent trend (5yr line) | Dwelling types (pie/bar) |
| Safety | Crime rate per 100K | Crime breakdown (stacked bar) | Crime trend (5yr line) |
| Demographics | Median income | Age pyramid | Top languages (bar) |
| Amenities | Park area per capita | Amenity radar | Transit accessibility (bar) |
---
## Phase 6: Jupyter Notebooks
### 6.1 Notebook Structure
### Purpose
Create one notebook per graph following this template:
One notebook per graph to document:
1. **Data Reference** — How the data was built (query, transformation steps, sample output)
2. **Data Visualization** — Import figure factory, render the graph
### Directory Structure
```
notebooks/
├── README.md # Notebook index and conventions
├── README.md
├── overview/
│ ├── 01_overview_choropleth.ipynb
│ ├── 02_overview_top_bottom.ipynb
│ └── 03_overview_income_crime_scatter.ipynb
├── housing/
│ ├── 01_housing_choropleth.ipynb
│ ├── 02_housing_rent_trend.ipynb
│ └── 03_housing_dwelling_types.ipynb
├── safety/
│ ├── 01_safety_choropleth.ipynb
│ ├── 02_safety_crime_breakdown.ipynb
│ └── 03_safety_trend.ipynb
├── demographics/
│ ├── 01_demographics_choropleth.ipynb
│ ├── 02_demographics_age_pyramid.ipynb
│ └── 03_demographics_languages.ipynb
└── amenities/
├── 01_amenities_choropleth.ipynb
├── 02_amenities_radar.ipynb
└── 03_amenities_transit.ipynb
```
### 6.2 Notebook Template
Each notebook follows this structure:
### Notebook Template
```markdown
# [Graph Name] — Data Reference & Visualization
# [Graph Name]
## 1. Data Reference
### 1.1 Source Tables
- List all source tables/marts used
- Explain the grain of each table
### Source Tables
- List tables/marts used
- Grain of each table
### 1.2 Query
### Query
```sql
-- The exact query that feeds this visualization
SELECT ...
FROM ...
SELECT ... FROM ...
```
### 1.3 Data Pipeline Steps
1. Step 1: Description of transformation
2. Step 2: Description of aggregation
3. Step 3: Description of final shaping
### Transformation Steps
1. Step description
2. Step description
### 1.4 Sample Data
### Sample Data
```python
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(DATABASE_URL)
df = pd.read_sql(query, engine)
df.head(10)
```
## 2. Data Visualization
### 2.1 Import Graph Factory
```python
from portfolio_app.figures.choropleth import create_choropleth_figure
# or appropriate figure factory
```
### 2.2 Create Visualization
```python
fig = create_choropleth_figure(
geojson=geojson_data,
data=df.to_dict('records'),
...
)
fig = create_choropleth_figure(...)
fig.show()
```
### 2.3 Interpretation Notes
- What this visualization shows
- Key insights from the data
- Caveats or limitations
```
### 6.3 Notebooks to Create (15 Total)
| # | Notebook | Graph | Tab |
|---|----------|-------|-----|
| 1 | `overview/01_overview_choropleth.ipynb` | Livability score map | Overview |
| 2 | `overview/02_overview_top_bottom.ipynb` | Top/Bottom 10 bar chart | Overview |
| 3 | `overview/03_overview_income_crime_scatter.ipynb` | Income vs Crime scatter | Overview |
| 4 | `housing/01_housing_choropleth.ipynb` | Affordability index map | Housing |
| 5 | `housing/02_housing_rent_trend.ipynb` | 5-year rent trend line | Housing |
| 6 | `housing/03_housing_dwelling_types.ipynb` | Dwelling type breakdown | Housing |
| 7 | `safety/01_safety_choropleth.ipynb` | Crime rate map | Safety |
| 8 | `safety/02_safety_crime_breakdown.ipynb` | Crime type stacked bar | Safety |
| 9 | `safety/03_safety_trend.ipynb` | 5-year crime trend line | Safety |
| 10 | `demographics/01_demographics_choropleth.ipynb` | Income distribution map | Demographics |
| 11 | `demographics/02_demographics_age_pyramid.ipynb` | Age distribution pyramid | Demographics |
| 12 | `demographics/03_demographics_languages.ipynb` | Top languages bar chart | Demographics |
| 13 | `amenities/01_amenities_choropleth.ipynb` | Park area per capita map | Amenities |
| 14 | `amenities/02_amenities_radar.ipynb` | Amenity density radar | Amenities |
| 15 | `amenities/03_amenities_transit.ipynb` | Transit accessibility bar | Amenities |
Create one notebook per graph as each is implemented (15 total across 5 tabs).
---
## Phase 7: Final Documentation Review
### 7.1 Documentation Audit Checklist
After all implementation, audit and update:
After all implementation is complete, perform a comprehensive audit:
#### Project CLAUDE.md
- [ ] Project Status reflects current sprint
- [ ] Run Commands are accurate
- [ ] Code Conventions match actual code
- [ ] Application Structure matches filesystem
- [ ] URL Routing matches registered pages
- [ ] Tech Stack versions are accurate
- [ ] Data Model matches SQLAlchemy models
- [ ] dbt Layers match actual models
- [ ] Reference Documents section is current
#### docs/PROJECT_REFERENCE.md
- [ ] Architecture diagrams are current
- [ ] Data source inventory is complete
- [ ] API endpoints documented
- [ ] Refresh schedules documented
#### docs/neighbourhood_dashboard_spec_v1.md
- [ ] All tabs documented
- [ ] All graphs documented
- [ ] Data sources for each graph documented
- [ ] Colour palettes specified
#### README.md
- [ ] Project description accurate
- [ ] Installation instructions work
- [ ] Quick start guide is functional
### 7.2 App Structure Verification
Verify documentation matches actual app by running:
```bash
# Generate actual page routes
grep -r "dash.register_page" portfolio_app/pages/ --include="*.py"
# Generate actual model classes
grep -r "class.*Base" portfolio_app/toronto/models/ --include="*.py"
# Generate actual dbt models
ls -la dbt/models/**/*.sql
```
### 7.3 Final Documentation Updates
Based on audit findings, update:
1. All file path references
2. All URL route tables
3. All model/schema references
4. All dbt model references
5. Sprint number and status
- [ ] `CLAUDE.md` — Project status, app structure, data model, URL routes
- [ ] `README.md` — Project description, installation, quick start
- [ ] `docs/PROJECT_REFERENCE.md` — Architecture matches implementation
- [ ] Remove or archive legacy spec documents
---
## Phase 8: Commit and Merge Strategy
## Data Source Reference
### 8.1 Branch Strategy
```
development (base)
└── feature/neighbourhood-dashboard-transition
├── Commit 1: Cleanup - Remove TRREB files
├── Commit 2: Schemas - New neighbourhood schemas
├── Commit 3: Models - Updated SQLAlchemy models
├── Commit 4: Parsers - API parsers implementation
├── Commit 5: Loaders - Data loading functions
├── Commit 6: dbt - New staging/intermediate/mart models
├── Commit 7: Dashboard - Tab implementations
├── Commit 8: Callbacks - Dashboard interactivity
├── Commit 9: Notebooks - All 15 Jupyter notebooks
├── Commit 10: Documentation - Updated docs
└── Commit 11: Final review - Documentation audit fixes
```
### 8.2 Commit Messages
Follow conventional commits format:
```
feat: Add neighbourhood-based schemas and models
fix: Remove obsolete TRREB pipeline
docs: Update CLAUDE.md for neighbourhood dashboard
refactor: Restructure dbt models for neighbourhood grain
test: Add tests for new parsers and loaders
```
### 8.3 Merge Process
1. Create feature branch from development
2. Implement all phases with atomic commits
3. Run full CI checks: `make ci`
4. Create PR to development
5. Squash merge with comprehensive message
6. Delete feature branch
7. Tag release: `v2.0.0-neighbourhood-dashboard`
| Source | Datasets | URL |
|--------|----------|-----|
| Toronto Open Data | Neighbourhoods, Census Profiles, Parks, Schools, Childcare, TTC | open.toronto.ca |
| Toronto Police | Crime Rates, MCI, Shootings | data.torontopolice.on.ca |
| CMHC | Rental Market Survey | cmhc-schl.gc.ca |
---
## Implementation Timeline
## CMHC Zone Mapping Note
### Sprint 9: Cleanup & Foundation
- [ ] Phase 1: Repository cleanup (delete TRREB files)
- [ ] Phase 2: Documentation updates (CLAUDE.md, specs)
- [ ] Phase 3.1: New schemas created
### Sprint 10: Data Pipeline
- [ ] Phase 3.2: Parsers implementation (API integrations)
- [ ] Phase 3.3: Models implementation (SQLAlchemy)
- [ ] Phase 3.4: Loaders implementation
### Sprint 11: dbt & Dashboard
- [ ] Phase 4: dbt model restructuring
- [ ] Phase 5.1: Dashboard layout with tabs
- [ ] Phase 5.2: Choropleth maps per tab
### Sprint 12: Interactivity & Charts
- [ ] Phase 5.3: Supporting charts implementation
- [ ] Phase 5.4: Callbacks and interactivity
- [ ] Phase 6.1-6.5: Jupyter notebooks (3 per sprint day)
### Sprint 13: Documentation & Release
- [ ] Phase 6.6-6.15: Remaining notebooks
- [ ] Phase 7: Final documentation review
- [ ] Phase 8: Commit, merge, and tag release
CMHC uses 15 zones that don't align with 158 neighbourhoods. Strategy:
- Create `bridge_cmhc_neighbourhood` with area weights
- Allocate rental metrics proportionally to overlapping neighbourhoods
- Document methodology in `/toronto/methodology` page
---
## Appendix A: Files Inventory Summary
### Files to DELETE: 6
### Files to MODIFY: 8
### Files to CREATE: ~45
- Schemas: 2
- Parsers: 2
- Models: 2 (modifications)
- Loaders: 5
- dbt models: 15
- Dashboard tabs: 5
- Callbacks: 3
- Notebooks: 15
- Documentation: 3
---
## Appendix B: Risk Mitigation
| Risk | Mitigation |
|------|------------|
| CMHC zone-neighbourhood mapping inaccuracy | Document methodology, use area-weighted allocation |
| API rate limits | Implement caching, respect rate limits, store locally |
| Census data staleness (5-year cycle) | Document data vintage, display last update prominently |
| Geographic boundary changes | Lock to 2024 158-neighbourhood boundaries |
---
## Appendix C: Testing Strategy
### Unit Tests
- Schema validation tests
- Parser output format tests
- Loader idempotency tests
### Integration Tests
- End-to-end: API -> Parse -> Load -> dbt -> Dashboard
- Database constraint tests
- dbt model tests (unique, not_null, relationships)
### Visual Regression Tests
- Screenshot comparison for each tab
- Choropleth rendering tests
---
*Document Version: 1.0*
*Created: January 2026*
*Author: Claude Code*
*Document Version: 2.0*
*Trimmed from v1.0 for execution clarity*