diff --git a/docs/changes/Change-Toronto-Analysis.md b/docs/changes/Change-Toronto-Analysis.md new file mode 100644 index 0000000..3b25890 --- /dev/null +++ b/docs/changes/Change-Toronto-Analysis.md @@ -0,0 +1,423 @@ +# Toronto Neighbourhood Dashboard — Deliverables + +**Project Type:** Interactive Data Visualization Dashboard +**Geographic Scope:** City of Toronto, 158 Official Neighbourhoods +**Author:** Leo Miranda +**Version:** 1.0 | January 2026 + +--- + +## Executive Summary + +Multi-tab analytics dashboard built around Toronto's official neighbourhood boundaries. The core interaction is a choropleth map where users explore the city through different thematic lenses—housing affordability, safety, demographics, amenities—with supporting visualizations that tell a cohesive story per theme. + +**Primary Goals:** +1. Demonstrate interactive data visualization skills (Plotly/Dash) +2. Showcase data engineering capabilities (multi-source ETL, dimensional modeling) +3. Create a portfolio piece with genuine analytical value + +--- + +## Part 1: Geographic Foundation (Required First) + +| Dataset | Source | Format | Last Updated | Download | +|---------|--------|--------|--------------|----------| +| **Neighbourhoods Boundaries** | Toronto Open Data | GeoJSON | 2024 | [Link](https://open.toronto.ca/dataset/neighbourhoods/) | +| **Neighbourhood Profiles** | Toronto Open Data | CSV | 2021 Census | [Link](https://open.toronto.ca/dataset/neighbourhood-profiles/) | + +**Critical Notes:** +- Toronto uses 158 official neighbourhoods (updated 2024, was 140) +- GeoJSON includes `AREA_ID` for joining to tabular data +- Neighbourhood Profiles has 2,400+ indicators per neighbourhood from Census + +--- + +## Part 2: Tier 1 — MVP Datasets + +| Dataset | Source | Measures Available | Update Freq | Granularity | +|---------|--------|-------------------|-------------|-------------| +| **Neighbourhoods GeoJSON** | Toronto Open Data | Boundary polygons, area IDs | Static | Neighbourhood | +| **Neighbourhood Profiles (full)** | Toronto Open Data | 2,400+ Census indicators | Every 5 years | Neighbourhood | +| **Neighbourhood Crime Rates** | Toronto Police Portal | MCI rates per 100K by year | Annual | Neighbourhood | +| **CMHC Rental Market Survey** | CMHC Portal | Avg rent by bedroom, vacancy rate | Annual (Oct) | 15 CMHC Zones | +| **Parks** | Toronto Open Data | Park locations, area, type | Annual | Point/Polygon | + +**Total API/Download Calls:** 5 +**Data Volume:** ~50MB combined + +### Tier 1 Measures to Extract + +**From Neighbourhood Profiles:** +- Population, population density +- Median household income +- Age distribution (0-14, 15-24, 25-44, 45-64, 65+) +- % Immigrants, % Visible minorities +- Top languages spoken +- Unemployment rate +- Education attainment (% with post-secondary) +- Housing tenure (own vs rent %) +- Dwelling types distribution +- Average rent, housing costs as % of income + +**From Crime Rates:** +- Total MCI rate per 100K population +- Year-over-year crime trend + +**From CMHC:** +- Average monthly rent (1BR, 2BR, 3BR) +- Vacancy rates + +**From Parks:** +- Park count per neighbourhood +- Park area per capita + +--- + +## Part 3: Tier 2 — Expansion Datasets + +| Dataset | Source | Measures Available | Update Freq | Granularity | +|---------|--------|-------------------|-------------|-------------| +| **Major Crime Indicators (MCI)** | Toronto Police Portal | Assault, B&E, auto theft, robbery, theft over | Quarterly | Neighbourhood | +| **Shootings & Firearm Discharges** | Toronto Police Portal | Shooting incidents, injuries, fatalities | Quarterly | Neighbourhood | +| **Building Permits** | Toronto Open Data | New construction, permits by type | Monthly | Address-level | +| **Schools** | Toronto Open Data | Public/Catholic, elementary/secondary | Annual | Point | +| **TTC Routes & Stops** | Toronto Open Data | Route geometry, stop locations | Static | Route/Stop | +| **Licensed Child Care Centres** | Toronto Open Data | Capacity, ages served, locations | Annual | Point | + +### Tier 2 Measures to Extract + +**From MCI Details:** +- Breakdown by crime type (assault, B&E, auto theft, robbery, theft over) + +**From Shootings:** +- Shooting incidents count +- Injuries/fatalities + +**From Building Permits:** +- New construction permits (trailing 12 months) +- Permit types distribution + +**From Schools:** +- Schools per 1000 children +- School type breakdown + +**From TTC:** +- Transit stops within neighbourhood +- Transit accessibility score + +**From Child Care:** +- Child care spaces per capita +- Coverage by age group + +--- + +## Part 4: Data Sources by Thematic Group + +### GROUP A: Housing & Affordability + +| Dataset | Tier | Measures | Update Freq | +|---------|------|----------|-------------| +| Neighbourhood Profiles (Housing) | 1 | Avg rent, ownership %, dwelling types, housing costs as % of income | Every 5 years | +| CMHC Rental Market Survey | 1 | Avg rent by bedroom, vacancy rate, rental universe | Annual | +| Building Permits | 2 | New construction, permits by type | Monthly | + +**Calculated Metrics:** +- Rent-to-Income Ratio (CMHC rent ÷ Census income) +- Affordability Index (% of income spent on housing) + +--- + +### GROUP B: Safety & Crime + +| Dataset | Tier | Measures | Update Freq | +|---------|------|----------|-------------| +| Neighbourhood Crime Rates | 1 | MCI rates per 100K pop by year | Annual | +| Major Crime Indicators (MCI) | 2 | Assault, B&E, auto theft, robbery, theft over | Quarterly | +| Shootings & Firearm Discharges | 2 | Shooting incidents, injuries, fatalities | Quarterly | + +**Calculated Metrics:** +- Year-over-year crime change % +- Crime type distribution + +--- + +### GROUP C: Demographics & Community + +| Dataset | Tier | Measures | Update Freq | +|---------|------|----------|-------------| +| Neighbourhood Profiles (Demographics) | 1 | Age distribution, household composition, income | Every 5 years | +| Neighbourhood Profiles (Immigration) | 1 | Immigration status, visible minorities, languages | Every 5 years | +| Neighbourhood Profiles (Education) | 1 | Education attainment, field of study | Every 5 years | +| Neighbourhood Profiles (Labour) | 1 | Employment rate, occupation, industry | Every 5 years | + +--- + +### GROUP D: Transportation & Mobility + +| Dataset | Tier | Measures | Update Freq | +|---------|------|----------|-------------| +| Commute Mode (Census) | 1 | % car, transit, walk, bike | Every 5 years | +| TTC Routes & Stops | 2 | Route geometry, stop locations | Static | + +**Calculated Metrics:** +- Transit accessibility (stops within 500m of neighbourhood centroid) + +--- + +### GROUP E: Amenities & Services + +| Dataset | Tier | Measures | Update Freq | +|---------|------|----------|-------------| +| Parks | 1 | Park locations, area, type | Annual | +| Schools | 2 | Public/Catholic, elementary/secondary | Annual | +| Licensed Child Care Centres | 2 | Capacity, ages served | Annual | + +**Calculated Metrics:** +- Park area per capita +- Schools per 1000 children (ages 5-17) +- Child care spaces per 1000 children (ages 0-4) + +--- + +## Part 5: Tab Structure + +### Tab Architecture + +``` +┌────────────────────────────────────────────────────────────────┐ +│ [Overview] [Housing] [Safety] [Demographics] [Amenities] │ +├────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────────────────────┐ ┌────────────────┐ │ +│ │ │ │ KPI Card 1 │ │ +│ │ CHOROPLETH MAP │ ├────────────────┤ │ +│ │ (158 Neighbourhoods) │ │ KPI Card 2 │ │ +│ │ │ ├────────────────┤ │ +│ │ Click to select │ │ KPI Card 3 │ │ +│ │ │ └────────────────┘ │ +│ └─────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────┐ ┌─────────────────────┐ │ +│ │ Supporting Chart 1 │ │ Supporting Chart 2 │ │ +│ │ (Context/Trend) │ │ (Comparison/Rank) │ │ +│ └─────────────────────┘ └─────────────────────┘ │ +│ │ +│ [Neighbourhood: Selected Name] ──────────────────────── │ +│ Details panel with all metrics for selected area │ +└────────────────────────────────────────────────────────────────┘ +``` + +--- + +### Tab 1: Overview (Default Landing) + +**Story:** "How do Toronto neighbourhoods compare across key livability metrics?" + +| Element | Content | Data Source | +|---------|---------|-------------| +| Map Colour | Composite livability score | Calculated from weighted metrics | +| KPI Cards | Population, Median Income, Avg Crime Rate | Neighbourhood Profiles, Crime Rates | +| Chart 1 | Top 10 / Bottom 10 by livability score | Calculated | +| Chart 2 | Income vs Crime scatter plot | Neighbourhood Profiles, Crime Rates | + +**Metric Selector:** Allow user to change map colour by any single metric. + +--- + +### Tab 2: Housing & Affordability + +**Story:** "Where can you afford to live, and what's being built?" + +| Element | Content | Data Source | +|---------|---------|-------------| +| Map Colour | Rent-to-Income Ratio (Affordability Index) | CMHC + Census income | +| KPI Cards | Median Rent (1BR), Vacancy Rate, New Permits (12mo) | CMHC, Building Permits | +| Chart 1 | Rent trend (5-year line chart by bedroom) | CMHC historical | +| Chart 2 | Dwelling type breakdown (pie/bar) | Neighbourhood Profiles | + +**Metric Selector:** Toggle between rent, ownership %, dwelling types. + +--- + +### Tab 3: Safety + +**Story:** "How safe is each neighbourhood, and what crimes are most common?" + +| Element | Content | Data Source | +|---------|---------|-------------| +| Map Colour | Total MCI Rate per 100K | Crime Rates | +| KPI Cards | Total Crimes, YoY Change %, Shooting Incidents | Crime Rates, Shootings | +| Chart 1 | Crime type breakdown (stacked bar) | MCI Details | +| Chart 2 | 5-year crime trend (line chart) | Crime Rates historical | + +**Metric Selector:** Toggle between total crime, specific crime types, shootings. + +--- + +### Tab 4: Demographics + +**Story:** "Who lives here? Age, income, diversity." + +| Element | Content | Data Source | +|---------|---------|-------------| +| Map Colour | Median Household Income | Neighbourhood Profiles | +| KPI Cards | Population, % Immigrant, Unemployment Rate | Neighbourhood Profiles | +| Chart 1 | Age distribution (population pyramid or bar) | Neighbourhood Profiles | +| Chart 2 | Top languages spoken (horizontal bar) | Neighbourhood Profiles | + +**Metric Selector:** Income, immigrant %, age groups, education. + +--- + +### Tab 5: Amenities & Services + +**Story:** "What's nearby? Parks, schools, child care, transit." + +| Element | Content | Data Source | +|---------|---------|-------------| +| Map Colour | Park Area per Capita | Parks + Population | +| KPI Cards | Parks Count, Schools Count, Child Care Spaces | Multiple datasets | +| Chart 1 | Amenity density comparison (radar or bar) | Calculated | +| Chart 2 | Transit accessibility (stops within 500m) | TTC Stops | + +**Metric Selector:** Parks, schools, child care, transit access. + +--- + +## Part 6: Data Pipeline Architecture + +### ETL Flow + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ DATA SOURCES │ │ STAGING LAYER │ │ MART LAYER │ +│ │ │ │ │ │ +│ Toronto Open │────▶│ stg_geography │────▶│ dim_neighbourhood│ +│ Data Portal │ │ stg_census │ │ fact_crime │ +│ │ │ stg_crime │ │ fact_housing │ +│ CMHC Portal │────▶│ stg_rental │ │ fact_amenities │ +│ │ │ stg_permits │ │ │ +│ Toronto Police │────▶│ stg_amenities │ │ agg_dashboard │ +│ Portal │ │ stg_childcare │ │ (pre-computed) │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ +``` + +### Key Transformations + +| Transformation | Description | +|----------------|-------------| +| **Geography Standardization** | Ensure all datasets use `neighbourhood_id` (AREA_ID from GeoJSON) | +| **Census Pivot** | Neighbourhood Profiles is wide format — pivot to metrics per neighbourhood | +| **CMHC Zone Mapping** | Create crosswalk from 15 CMHC zones to 158 neighbourhoods | +| **Amenity Aggregation** | Spatial join point data (schools, parks, child care) to neighbourhood polygons | +| **Rate Calculations** | Normalize counts to per-capita or per-100K | + +### Data Refresh Schedule + +| Layer | Frequency | Trigger | +|-------|-----------|---------| +| Staging (API pulls) | Weekly | Scheduled job | +| Marts (transforms) | Weekly | Post-staging | +| Dashboard cache | On-demand | User refresh button | + +--- + +## Part 7: Technical Stack + +### Core Stack + +| Component | Technology | Rationale | +|-----------|------------|-----------| +| **Frontend** | Plotly Dash | Production-ready, rapid iteration | +| **Mapping** | Plotly `choropleth_mapbox` | Native Dash integration | +| **Data Store** | PostgreSQL + PostGIS | Spatial queries, existing expertise | +| **ETL** | Python (Pandas, SQLAlchemy) | Existing stack | +| **Deployment** | Render / Railway | Free tier, easy Dash hosting | + +### Alternative (Portfolio Stretch) + +| Component | Technology | Why Consider | +|-----------|------------|--------------| +| **Frontend** | React + deck.gl | More "modern" for portfolio | +| **Data Store** | DuckDB | Serverless, embeddable | +| **ETL** | dbt | Aligns with skills roadmap | + +--- + +## Appendix A: Data Source URLs + +| Source | URL | +|--------|-----| +| Toronto Open Data — Neighbourhoods | https://open.toronto.ca/dataset/neighbourhoods/ | +| Toronto Open Data — Neighbourhood Profiles | https://open.toronto.ca/dataset/neighbourhood-profiles/ | +| Toronto Police — Neighbourhood Crime Rates | https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-open-data | +| Toronto Police — MCI | https://data.torontopolice.on.ca/datasets/major-crime-indicators-open-data | +| Toronto Police — Shootings | https://data.torontopolice.on.ca/datasets/shootings-firearm-discharges-open-data | +| CMHC Rental Market Survey | https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market | +| Toronto Open Data — Parks | https://open.toronto.ca/dataset/parks/ | +| Toronto Open Data — Schools | https://open.toronto.ca/dataset/school-locations-all-types/ | +| Toronto Open Data — Building Permits | https://open.toronto.ca/dataset/building-permits-cleared-permits/ | +| Toronto Open Data — Child Care | https://open.toronto.ca/dataset/licensed-child-care-centres/ | +| Toronto Open Data — TTC Routes | https://open.toronto.ca/dataset/ttc-routes-and-schedules/ | + +--- + +## Appendix B: Colour Palettes + +### Affordability (Diverging) +| Status | Hex | Usage | +|--------|-----|-------| +| Affordable (<30% income) | `#2ecc71` | Green | +| Stretched (30-50%) | `#f1c40f` | Yellow | +| Unaffordable (>50%) | `#e74c3c` | Red | + +### Safety (Sequential) +| Status | Hex | Usage | +|--------|-----|-------| +| Safest (lowest crime) | `#27ae60` | Dark green | +| Moderate | `#f39c12` | Orange | +| Highest Crime | `#c0392b` | Dark red | + +### Demographics — Income (Sequential) +| Level | Hex | Usage | +|-------|-----|-------| +| Highest Income | `#1a5276` | Dark blue | +| Mid Income | `#5dade2` | Light blue | +| Lowest Income | `#ecf0f1` | Light gray | + +### General Recommendation +Use **Viridis** or **Plasma** colorscales for perceptually uniform gradients on continuous metrics. + +--- + +## Appendix C: Glossary + +| Term | Definition | +|------|------------| +| **MCI** | Major Crime Indicators — Assault, B&E, Auto Theft, Robbery, Theft Over | +| **CMHC Zone** | Canada Mortgage and Housing Corporation rental market survey zones (15 in Toronto) | +| **Rent-to-Income Ratio** | Monthly rent ÷ monthly household income; <30% is considered affordable | +| **PostGIS** | PostgreSQL extension for geographic data | +| **Choropleth** | Thematic map where areas are shaded based on a statistical variable | + +--- + +## Appendix D: Interview Talking Points + +When discussing this project in interviews, emphasize: + +1. **Data Engineering:** "I built a multi-source ETL pipeline that standardizes geographic keys across Census data, police data, and CMHC rental surveys—three different granularities I had to reconcile." + +2. **Dimensional Modeling:** "The data model follows star schema patterns with a central neighbourhood dimension table and fact tables for crime, housing, and amenities." + +3. **dbt Patterns:** "The transformation layer uses staging → intermediate → mart patterns, which I've documented for maintainability." + +4. **Business Value:** "The dashboard answers questions like 'Where can a young professional afford to live that's safe and has good transit?' — turning raw data into actionable insights." + +5. **Technical Decisions:** "I chose Plotly Dash over a React frontend because it let me iterate faster while maintaining production-quality interactivity. For a portfolio piece, speed to working demo matters." + +--- + +*Document Version: 1.0* +*Created: January 2026* +*Author: Leo Miranda / Claude*