docs: Add local lessons learned backup system

- Create docs/project-lessons-learned/ for local lesson storage - Add INDEX.md with lesson template and index table - Document Phase 4 dbt test syntax deprecation lesson - Update CLAUDE.md with backup method when Wiki.js unavailable This provides a fallback for capturing lessons learned while Wiki.js integration is being configured. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feat: Implement Phase 4 dbt model restructuring
2026-01-16 11:52:06 -05:00 · 2026-01-16 11:41:27 -05:00 · 2026-01-16 11:07:13 -05:00
36 changed files with 2817 additions and 2 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -261,4 +261,71 @@ All scripts in `scripts/`:
 ---
 ## Projman Plugin Workflow
 **CRITICAL: Always use the projman plugin for sprint and task management.**
 ### When to Use Projman Skills
 | Skill | Trigger | Purpose |
 |-------|---------|---------|
 | `/projman:sprint-plan` | New sprint or phase implementation | Architecture analysis + Gitea issue creation |
 | `/projman:sprint-start` | Beginning implementation work | Load lessons learned (Wiki.js or local), start execution |
 | `/projman:sprint-status` | Check progress | Review blockers and completion status |
 | `/projman:sprint-close` | Sprint completion | Capture lessons learned (Wiki.js or local backup) |
 ### Default Behavior
 When user requests implementation work:
 1. **ALWAYS start with `/projman:sprint-plan`** before writing code
 2. Create Gitea issues with proper labels and acceptance criteria
 3. Use `/projman:sprint-start` to begin execution with lessons learned
 4. Track progress via Gitea issue comments
 5. Close sprint with `/projman:sprint-close` to document lessons
 ### Gitea Repository
 - **Repo**: `lmiranda/personal-portfolio`
 - **Host**: `gitea.hotserv.cloud`
 - **Note**: `lmiranda` is a user account (not org), so label lookup may require repo-level labels
 ### MCP Tools Available
 **Gitea**:
 - `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment`
 - `get_labels`, `suggest_labels`
 **Wiki.js**:
 - `search_lessons`, `create_lesson`, `search_pages`, `get_page`
 ### Lessons Learned (Backup Method)
 **When Wiki.js is unavailable**, use the local backup in `docs/project-lessons-learned/`:
 **At Sprint Start:**
 1. Review `docs/project-lessons-learned/INDEX.md` for relevant past lessons
 2. Search lesson files by tags/keywords before implementation
 3. Apply prevention strategies from applicable lessons
 **At Sprint Close:**
 1. Try Wiki.js `create_lesson` first
 2. If Wiki.js fails, create lesson in `docs/project-lessons-learned/`
 3. Use naming convention: `{phase-or-sprint}-{short-description}.md`
 4. Update `INDEX.md` with new entry
 5. Follow the lesson template in INDEX.md
 **Migration:** Once Wiki.js is configured, lessons will be migrated there for better searchability.
 ### Issue Structure
 Every Gitea issue should include:
 - **Overview**: Brief description
 - **Files to Create/Modify**: Explicit paths
 - **Acceptance Criteria**: Checkboxes
 - **Technical Notes**: Implementation hints
 - **Labels**: Listed in body (workaround for label API issues)
 ---
 *Last Updated: Sprint 9*
--- a/dbt/models/intermediate/_intermediate.yml
+++ b/dbt/models/intermediate/_intermediate.yml
@@ -11,3 +11,77 @@ models:
      - name: zone_code
        tests:
          - not_null
  - name: int_neighbourhood__demographics
    description: "Combined census demographics with neighbourhood attributes"
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: census_year
        description: "Census year"
        tests:
          - not_null
      - name: income_quintile
        description: "Income quintile (1-5, city-wide)"
  - name: int_neighbourhood__housing
    description: "Housing indicators combining census and rental data"
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: year
        description: "Reference year"
      - name: rent_to_income_pct
        description: "Rent as percentage of median income"
      - name: is_affordable
        description: "Boolean: rent <= 30% of income"
  - name: int_neighbourhood__crime_summary
    description: "Aggregated crime with year-over-year trends"
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: year
        description: "Statistics year"
        tests:
          - not_null
      - name: crime_rate_per_100k
        description: "Total crime rate per 100K population"
      - name: yoy_change_pct
        description: "Year-over-year change percentage"
  - name: int_neighbourhood__amenity_scores
    description: "Normalized amenities per capita and per area"
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: year
        description: "Reference year"
      - name: total_amenities_per_1000
        description: "Total amenities per 1000 population"
      - name: amenities_per_sqkm
        description: "Total amenities per square km"
  - name: int_rentals__neighbourhood_allocated
    description: "CMHC rental data allocated to neighbourhoods via area weights"
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: year
        description: "Survey year"
        tests:
          - not_null
      - name: avg_rent_2bed
        description: "Weighted average 2-bedroom rent"
      - name: vacancy_rate
        description: "Weighted average vacancy rate"
--- a/dbt/models/intermediate/int_neighbourhood__amenity_scores.sql
+++ b/dbt/models/intermediate/int_neighbourhood__amenity_scores.sql
@@ -0,0 +1,79 @@
 -- Intermediate: Normalized amenities per 1000 population
 -- Pivots amenity types and calculates per-capita metrics
 -- Grain: One row per neighbourhood per year
 with neighbourhoods as (
    select * from {{ ref('stg_toronto__neighbourhoods') }}
 ),
 amenities as (
    select * from {{ ref('stg_toronto__amenities') }}
 ),
 -- Aggregate amenity types
 amenities_by_year as (
    select
        neighbourhood_id,
        amenity_year as year,
        sum(case when amenity_type = 'Parks' then amenity_count else 0 end) as parks_count,
        sum(case when amenity_type = 'Schools' then amenity_count else 0 end) as schools_count,
        sum(case when amenity_type = 'Transit Stops' then amenity_count else 0 end) as transit_count,
        sum(case when amenity_type = 'Libraries' then amenity_count else 0 end) as libraries_count,
        sum(case when amenity_type = 'Community Centres' then amenity_count else 0 end) as community_centres_count,
        sum(case when amenity_type = 'Recreation' then amenity_count else 0 end) as recreation_count,
        sum(amenity_count) as total_amenities
    from amenities
    group by neighbourhood_id, amenity_year
 ),
 amenity_scores as (
    select
        n.neighbourhood_id,
        n.neighbourhood_name,
        n.geometry,
        n.population,
        n.land_area_sqkm,
        a.year,
        -- Raw counts
        a.parks_count,
        a.schools_count,
        a.transit_count,
        a.libraries_count,
        a.community_centres_count,
        a.recreation_count,
        a.total_amenities,
        -- Per 1000 population
        case when n.population > 0
            then round(a.parks_count::numeric / n.population * 1000, 3)
            else null
        end as parks_per_1000,
        case when n.population > 0
            then round(a.schools_count::numeric / n.population * 1000, 3)
            else null
        end as schools_per_1000,
        case when n.population > 0
            then round(a.transit_count::numeric / n.population * 1000, 3)
            else null
        end as transit_per_1000,
        case when n.population > 0
            then round(a.total_amenities::numeric / n.population * 1000, 3)
            else null
        end as total_amenities_per_1000,
        -- Per square km
        case when n.land_area_sqkm > 0
            then round(a.total_amenities::numeric / n.land_area_sqkm, 2)
            else null
        end as amenities_per_sqkm
    from neighbourhoods n
    left join amenities_by_year a on n.neighbourhood_id = a.neighbourhood_id
 )
 select * from amenity_scores
--- a/dbt/models/intermediate/int_neighbourhood__crime_summary.sql
+++ b/dbt/models/intermediate/int_neighbourhood__crime_summary.sql
@@ -0,0 +1,81 @@
 -- Intermediate: Aggregated crime by neighbourhood with YoY change
 -- Pivots crime types and calculates year-over-year trends
 -- Grain: One row per neighbourhood per year
 with neighbourhoods as (
    select * from {{ ref('stg_toronto__neighbourhoods') }}
 ),
 crime as (
    select * from {{ ref('stg_toronto__crime') }}
 ),
 -- Aggregate crime types
 crime_by_year as (
    select
        neighbourhood_id,
        crime_year as year,
        sum(incident_count) as total_incidents,
        sum(case when crime_type = 'Assault' then incident_count else 0 end) as assault_count,
        sum(case when crime_type = 'Auto Theft' then incident_count else 0 end) as auto_theft_count,
        sum(case when crime_type = 'Break and Enter' then incident_count else 0 end) as break_enter_count,
        sum(case when crime_type = 'Robbery' then incident_count else 0 end) as robbery_count,
        sum(case when crime_type = 'Theft Over' then incident_count else 0 end) as theft_over_count,
        sum(case when crime_type = 'Homicide' then incident_count else 0 end) as homicide_count,
        avg(rate_per_100k) as avg_rate_per_100k
    from crime
    group by neighbourhood_id, crime_year
 ),
 -- Add year-over-year changes
 with_yoy as (
    select
        c.*,
        lag(c.total_incidents, 1) over (
            partition by c.neighbourhood_id
            order by c.year
        ) as prev_year_incidents,
        round(
            (c.total_incidents - lag(c.total_incidents, 1) over (
                partition by c.neighbourhood_id
                order by c.year
            ))::numeric /
            nullif(lag(c.total_incidents, 1) over (
                partition by c.neighbourhood_id
                order by c.year
            ), 0) * 100,
            2
        ) as yoy_change_pct
    from crime_by_year c
 ),
 crime_summary as (
    select
        n.neighbourhood_id,
        n.neighbourhood_name,
        n.geometry,
        n.population,
        w.year,
        w.total_incidents,
        w.assault_count,
        w.auto_theft_count,
        w.break_enter_count,
        w.robbery_count,
        w.theft_over_count,
        w.homicide_count,
        w.avg_rate_per_100k,
        w.yoy_change_pct,
        -- Crime rate per 100K population
        case
            when n.population > 0
            then round(w.total_incidents::numeric / n.population * 100000, 2)
            else null
        end as crime_rate_per_100k
    from neighbourhoods n
    inner join with_yoy w on n.neighbourhood_id = w.neighbourhood_id
 )
 select * from crime_summary
--- a/dbt/models/intermediate/int_neighbourhood__demographics.sql
+++ b/dbt/models/intermediate/int_neighbourhood__demographics.sql
@@ -0,0 +1,44 @@
 -- Intermediate: Combined census demographics by neighbourhood
 -- Joins neighbourhoods with census data for demographic analysis
 -- Grain: One row per neighbourhood per census year
 with neighbourhoods as (
    select * from {{ ref('stg_toronto__neighbourhoods') }}
 ),
 census as (
    select * from {{ ref('stg_toronto__census') }}
 ),
 demographics as (
    select
        n.neighbourhood_id,
        n.neighbourhood_name,
        n.geometry,
        n.land_area_sqkm,
        c.census_year,
        c.population,
        c.population_density,
        c.median_household_income,
        c.average_household_income,
        c.median_age,
        c.unemployment_rate,
        c.pct_bachelors_or_higher as education_bachelors_pct,
        c.average_dwelling_value,
        -- Tenure mix
        c.pct_owner_occupied,
        c.pct_renter_occupied,
        -- Income quintile (city-wide comparison)
        ntile(5) over (
            partition by c.census_year
            order by c.median_household_income
        ) as income_quintile
    from neighbourhoods n
    left join census c on n.neighbourhood_id = c.neighbourhood_id
 )
 select * from demographics
--- a/dbt/models/intermediate/int_neighbourhood__housing.sql
+++ b/dbt/models/intermediate/int_neighbourhood__housing.sql
@@ -0,0 +1,56 @@
 -- Intermediate: Housing indicators by neighbourhood
 -- Combines census housing data with allocated CMHC rental data
 -- Grain: One row per neighbourhood per year
 with neighbourhoods as (
    select * from {{ ref('stg_toronto__neighbourhoods') }}
 ),
 census as (
    select * from {{ ref('stg_toronto__census') }}
 ),
 allocated_rentals as (
    select * from {{ ref('int_rentals__neighbourhood_allocated') }}
 ),
 housing as (
    select
        n.neighbourhood_id,
        n.neighbourhood_name,
        n.geometry,
        coalesce(r.year, c.census_year) as year,
        -- Census housing metrics
        c.pct_owner_occupied,
        c.pct_renter_occupied,
        c.average_dwelling_value,
        c.median_household_income,
        -- Allocated rental metrics (weighted average from CMHC zones)
        r.avg_rent_2bed,
        r.vacancy_rate,
        -- Affordability calculations
        case
            when c.median_household_income > 0 and r.avg_rent_2bed > 0
            then round((r.avg_rent_2bed * 12 / c.median_household_income) * 100, 2)
            else null
        end as rent_to_income_pct,
        -- Affordability threshold (30% of income)
        case
            when c.median_household_income > 0 and r.avg_rent_2bed > 0
            then r.avg_rent_2bed * 12 <= c.median_household_income * 0.30
            else null
        end as is_affordable
    from neighbourhoods n
    left join census c on n.neighbourhood_id = c.neighbourhood_id
    left join allocated_rentals r
        on n.neighbourhood_id = r.neighbourhood_id
        and r.year = c.census_year
 )
 select * from housing
--- a/dbt/models/intermediate/int_rentals__neighbourhood_allocated.sql
+++ b/dbt/models/intermediate/int_rentals__neighbourhood_allocated.sql
@@ -0,0 +1,73 @@
 -- Intermediate: CMHC rentals allocated to neighbourhoods via area weights
 -- Disaggregates zone-level rental data to neighbourhood level
 -- Grain: One row per neighbourhood per year
 with crosswalk as (
    select * from {{ ref('stg_cmhc__zone_crosswalk') }}
 ),
 rentals as (
    select * from {{ ref('int_rentals__annual') }}
 ),
 neighbourhoods as (
    select * from {{ ref('stg_toronto__neighbourhoods') }}
 ),
 -- Allocate rental metrics to neighbourhoods using area weights
 allocated as (
    select
        c.neighbourhood_id,
        r.year,
        r.bedroom_type,
        -- Weighted average rent (using area weight)
        sum(r.avg_rent * c.area_weight) as weighted_avg_rent,
        sum(r.median_rent * c.area_weight) as weighted_median_rent,
        sum(c.area_weight) as total_weight,
        -- Weighted vacancy rate
        sum(r.vacancy_rate * c.area_weight) / nullif(sum(c.area_weight), 0) as vacancy_rate,
        -- Weighted rental universe
        sum(r.rental_universe * c.area_weight) as rental_units_estimate
    from crosswalk c
    inner join rentals r on c.cmhc_zone_code = r.zone_code
    group by c.neighbourhood_id, r.year, r.bedroom_type
 ),
 -- Pivot to get 2-bedroom as primary metric
 pivoted as (
    select
        neighbourhood_id,
        year,
        max(case when bedroom_type = 'Two Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_2bed,
        max(case when bedroom_type = 'One Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_1bed,
        max(case when bedroom_type = 'Bachelor' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_bachelor,
        max(case when bedroom_type = 'Three Bedroom +' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_3bed,
        avg(vacancy_rate) as vacancy_rate,
        sum(rental_units_estimate) as total_rental_units
    from allocated
    group by neighbourhood_id, year
 ),
 final as (
    select
        n.neighbourhood_id,
        n.neighbourhood_name,
        n.geometry,
        p.year,
        round(p.avg_rent_bachelor::numeric, 2) as avg_rent_bachelor,
        round(p.avg_rent_1bed::numeric, 2) as avg_rent_1bed,
        round(p.avg_rent_2bed::numeric, 2) as avg_rent_2bed,
        round(p.avg_rent_3bed::numeric, 2) as avg_rent_3bed,
        round(p.vacancy_rate::numeric, 2) as vacancy_rate,
        round(p.total_rental_units::numeric, 0) as total_rental_units
    from neighbourhoods n
    inner join pivoted p on n.neighbourhood_id = p.neighbourhood_id
 )
 select * from final
--- a/dbt/models/marts/_marts.yml
+++ b/dbt/models/marts/_marts.yml
@@ -9,3 +9,127 @@ models:
        tests:
          - unique
          - not_null
  - name: mart_neighbourhood_overview
    description: "Neighbourhood overview with composite livability score"
    meta:
      dashboard_tab: Overview
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: neighbourhood_name
        description: "Official neighbourhood name"
        tests:
          - not_null
      - name: geometry
        description: "PostGIS geometry for mapping"
      - name: livability_score
        description: "Composite score: safety (30%), affordability (40%), amenities (30%)"
      - name: safety_score
        description: "Safety component score (0-100)"
      - name: affordability_score
        description: "Affordability component score (0-100)"
      - name: amenity_score
        description: "Amenity component score (0-100)"
  - name: mart_neighbourhood_housing
    description: "Housing and affordability metrics by neighbourhood"
    meta:
      dashboard_tab: Housing
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: neighbourhood_name
        description: "Official neighbourhood name"
        tests:
          - not_null
      - name: geometry
        description: "PostGIS geometry for mapping"
      - name: rent_to_income_pct
        description: "Rent as percentage of median income"
      - name: affordability_index
        description: "100 = city average affordability"
      - name: rent_yoy_change_pct
        description: "Year-over-year rent change"
  - name: mart_neighbourhood_safety
    description: "Crime rates and safety metrics by neighbourhood"
    meta:
      dashboard_tab: Safety
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: neighbourhood_name
        description: "Official neighbourhood name"
        tests:
          - not_null
      - name: geometry
        description: "PostGIS geometry for mapping"
      - name: crime_rate_per_100k
        description: "Total crime rate per 100K population"
      - name: crime_index
        description: "100 = city average crime rate"
      - name: safety_tier
        description: "Safety tier (1=safest, 5=highest crime)"
        tests:
          - accepted_values:
              arguments:
                values: [1, 2, 3, 4, 5]
  - name: mart_neighbourhood_demographics
    description: "Demographics and income metrics by neighbourhood"
    meta:
      dashboard_tab: Demographics
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: neighbourhood_name
        description: "Official neighbourhood name"
        tests:
          - not_null
      - name: geometry
        description: "PostGIS geometry for mapping"
      - name: median_household_income
        description: "Median household income"
      - name: income_index
        description: "100 = city average income"
      - name: income_quintile
        description: "Income quintile (1-5)"
        tests:
          - accepted_values:
              arguments:
                values: [1, 2, 3, 4, 5]
  - name: mart_neighbourhood_amenities
    description: "Amenity access metrics by neighbourhood"
    meta:
      dashboard_tab: Amenities
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood identifier"
        tests:
          - not_null
      - name: neighbourhood_name
        description: "Official neighbourhood name"
        tests:
          - not_null
      - name: geometry
        description: "PostGIS geometry for mapping"
      - name: total_amenities_per_1000
        description: "Total amenities per 1000 population"
      - name: amenity_index
        description: "100 = city average amenities"
      - name: amenity_tier
        description: "Amenity tier (1=best, 5=lowest)"
        tests:
          - accepted_values:
              arguments:
                values: [1, 2, 3, 4, 5]
--- a/dbt/models/marts/mart_neighbourhood_amenities.sql
+++ b/dbt/models/marts/mart_neighbourhood_amenities.sql
@@ -0,0 +1,89 @@
 -- Mart: Neighbourhood Amenities Analysis
 -- Dashboard Tab: Amenities
 -- Grain: One row per neighbourhood per year
 with amenities as (
    select * from {{ ref('int_neighbourhood__amenity_scores') }}
 ),
 -- City-wide averages for comparison
 city_avg as (
    select
        year,
        avg(parks_per_1000) as city_avg_parks,
        avg(schools_per_1000) as city_avg_schools,
        avg(transit_per_1000) as city_avg_transit,
        avg(total_amenities_per_1000) as city_avg_total_amenities
    from amenities
    group by year
 ),
 final as (
    select
        a.neighbourhood_id,
        a.neighbourhood_name,
        a.geometry,
        a.population,
        a.land_area_sqkm,
        a.year,
        -- Raw counts
        a.parks_count,
        a.schools_count,
        a.transit_count,
        a.libraries_count,
        a.community_centres_count,
        a.recreation_count,
        a.total_amenities,
        -- Per 1000 population
        a.parks_per_1000,
        a.schools_per_1000,
        a.transit_per_1000,
        a.total_amenities_per_1000,
        -- Per square km
        a.amenities_per_sqkm,
        -- City averages
        round(ca.city_avg_parks::numeric, 3) as city_avg_parks_per_1000,
        round(ca.city_avg_schools::numeric, 3) as city_avg_schools_per_1000,
        round(ca.city_avg_transit::numeric, 3) as city_avg_transit_per_1000,
        -- Amenity index (100 = city average)
        case
            when ca.city_avg_total_amenities > 0
            then round(a.total_amenities_per_1000 / ca.city_avg_total_amenities * 100, 1)
            else null
        end as amenity_index,
        -- Category indices
        case
            when ca.city_avg_parks > 0
            then round(a.parks_per_1000 / ca.city_avg_parks * 100, 1)
            else null
        end as parks_index,
        case
            when ca.city_avg_schools > 0
            then round(a.schools_per_1000 / ca.city_avg_schools * 100, 1)
            else null
        end as schools_index,
        case
            when ca.city_avg_transit > 0
            then round(a.transit_per_1000 / ca.city_avg_transit * 100, 1)
            else null
        end as transit_index,
        -- Amenity tier (1 = best, 5 = lowest)
        ntile(5) over (
            partition by a.year
            order by a.total_amenities_per_1000 desc
        ) as amenity_tier
    from amenities a
    left join city_avg ca on a.year = ca.year
 )
 select * from final
--- a/dbt/models/marts/mart_neighbourhood_demographics.sql
+++ b/dbt/models/marts/mart_neighbourhood_demographics.sql
@@ -0,0 +1,81 @@
 -- Mart: Neighbourhood Demographics Analysis
 -- Dashboard Tab: Demographics
 -- Grain: One row per neighbourhood per census year
 with demographics as (
    select * from {{ ref('int_neighbourhood__demographics') }}
 ),
 -- City-wide averages for comparison
 city_avg as (
    select
        census_year,
        avg(median_household_income) as city_avg_income,
        avg(median_age) as city_avg_age,
        avg(unemployment_rate) as city_avg_unemployment,
        avg(education_bachelors_pct) as city_avg_education,
        avg(population_density) as city_avg_density
    from demographics
    group by census_year
 ),
 final as (
    select
        d.neighbourhood_id,
        d.neighbourhood_name,
        d.geometry,
        d.census_year as year,
        -- Population
        d.population,
        d.land_area_sqkm,
        d.population_density,
        -- Income
        d.median_household_income,
        d.average_household_income,
        d.income_quintile,
        -- Income index (100 = city average)
        case
            when ca.city_avg_income > 0
            then round(d.median_household_income / ca.city_avg_income * 100, 1)
            else null
        end as income_index,
        -- Demographics
        d.median_age,
        d.unemployment_rate,
        d.education_bachelors_pct,
        -- Age index (100 = city average)
        case
            when ca.city_avg_age > 0
            then round(d.median_age / ca.city_avg_age * 100, 1)
            else null
        end as age_index,
        -- Housing tenure
        d.pct_owner_occupied,
        d.pct_renter_occupied,
        d.average_dwelling_value,
        -- Diversity index (using tenure mix as proxy - higher rental = more diverse typically)
        round(
            1 - (
                power(d.pct_owner_occupied / 100, 2) +
                power(d.pct_renter_occupied / 100, 2)
            ),
            3
        ) * 100 as tenure_diversity_index,
        -- City comparisons
        round(ca.city_avg_income::numeric, 2) as city_avg_income,
        round(ca.city_avg_age::numeric, 1) as city_avg_age,
        round(ca.city_avg_unemployment::numeric, 2) as city_avg_unemployment
    from demographics d
    left join city_avg ca on d.census_year = ca.census_year
 )
 select * from final
--- a/dbt/models/marts/mart_neighbourhood_housing.sql
+++ b/dbt/models/marts/mart_neighbourhood_housing.sql
@@ -0,0 +1,93 @@
 -- Mart: Neighbourhood Housing Analysis
 -- Dashboard Tab: Housing
 -- Grain: One row per neighbourhood per year
 with housing as (
    select * from {{ ref('int_neighbourhood__housing') }}
 ),
 rentals as (
    select * from {{ ref('int_rentals__neighbourhood_allocated') }}
 ),
 demographics as (
    select * from {{ ref('int_neighbourhood__demographics') }}
 ),
 -- Add year-over-year rent changes
 with_yoy as (
    select
        h.*,
        r.avg_rent_bachelor,
        r.avg_rent_1bed,
        r.avg_rent_3bed,
        r.total_rental_units,
        d.income_quintile,
        -- Previous year rent for YoY calculation
        lag(h.avg_rent_2bed, 1) over (
            partition by h.neighbourhood_id
            order by h.year
        ) as prev_year_rent_2bed
    from housing h
    left join rentals r
        on h.neighbourhood_id = r.neighbourhood_id
        and h.year = r.year
    left join demographics d
        on h.neighbourhood_id = d.neighbourhood_id
        and h.year = d.census_year
 ),
 final as (
    select
        neighbourhood_id,
        neighbourhood_name,
        geometry,
        year,
        -- Tenure mix
        pct_owner_occupied,
        pct_renter_occupied,
        -- Housing values
        average_dwelling_value,
        median_household_income,
        -- Rental metrics
        avg_rent_bachelor,
        avg_rent_1bed,
        avg_rent_2bed,
        avg_rent_3bed,
        vacancy_rate,
        total_rental_units,
        -- Affordability
        rent_to_income_pct,
        is_affordable,
        -- Affordability index (100 = city average)
        round(
            rent_to_income_pct / nullif(
                avg(rent_to_income_pct) over (partition by year),
                0
            ) * 100,
            1
        ) as affordability_index,
        -- Year-over-year rent change
        case
            when prev_year_rent_2bed > 0
            then round(
                (avg_rent_2bed - prev_year_rent_2bed) / prev_year_rent_2bed * 100,
                2
            )
            else null
        end as rent_yoy_change_pct,
        income_quintile
    from with_yoy
 )
 select * from final
--- a/dbt/models/marts/mart_neighbourhood_overview.sql
+++ b/dbt/models/marts/mart_neighbourhood_overview.sql
@@ -0,0 +1,110 @@
 -- Mart: Neighbourhood Overview with Composite Livability Score
 -- Dashboard Tab: Overview
 -- Grain: One row per neighbourhood per year
 with demographics as (
    select * from {{ ref('int_neighbourhood__demographics') }}
 ),
 housing as (
    select * from {{ ref('int_neighbourhood__housing') }}
 ),
 crime as (
    select * from {{ ref('int_neighbourhood__crime_summary') }}
 ),
 amenities as (
    select * from {{ ref('int_neighbourhood__amenity_scores') }}
 ),
 -- Compute percentile ranks for scoring components
 percentiles as (
    select
        d.neighbourhood_id,
        d.neighbourhood_name,
        d.geometry,
        d.census_year as year,
        d.population,
        d.median_household_income,
        -- Safety score: inverse of crime rate (higher = safer)
        case
            when c.crime_rate_per_100k is not null
            then 100 - percent_rank() over (
                partition by d.census_year
                order by c.crime_rate_per_100k
            ) * 100
            else null
        end as safety_score,
        -- Affordability score: inverse of rent-to-income ratio
        case
            when h.rent_to_income_pct is not null
            then 100 - percent_rank() over (
                partition by d.census_year
                order by h.rent_to_income_pct
            ) * 100
            else null
        end as affordability_score,
        -- Amenity score: based on amenities per capita
        case
            when a.total_amenities_per_1000 is not null
            then percent_rank() over (
                partition by d.census_year
                order by a.total_amenities_per_1000
            ) * 100
            else null
        end as amenity_score,
        -- Raw metrics for reference
        c.crime_rate_per_100k,
        h.rent_to_income_pct,
        h.avg_rent_2bed,
        a.total_amenities_per_1000
    from demographics d
    left join housing h
        on d.neighbourhood_id = h.neighbourhood_id
        and d.census_year = h.year
    left join crime c
        on d.neighbourhood_id = c.neighbourhood_id
        and d.census_year = c.year
    left join amenities a
        on d.neighbourhood_id = a.neighbourhood_id
        and d.census_year = a.year
 ),
 final as (
    select
        neighbourhood_id,
        neighbourhood_name,
        geometry,
        year,
        population,
        median_household_income,
        -- Component scores (0-100)
        round(safety_score::numeric, 1) as safety_score,
        round(affordability_score::numeric, 1) as affordability_score,
        round(amenity_score::numeric, 1) as amenity_score,
        -- Composite livability score: safety (30%), affordability (40%), amenities (30%)
        round(
            (coalesce(safety_score, 50) * 0.30 +
             coalesce(affordability_score, 50) * 0.40 +
             coalesce(amenity_score, 50) * 0.30)::numeric,
            1
        ) as livability_score,
        -- Raw metrics
        crime_rate_per_100k,
        rent_to_income_pct,
        avg_rent_2bed,
        total_amenities_per_1000
    from percentiles
 )
 select * from final
--- a/dbt/models/marts/mart_neighbourhood_safety.sql
+++ b/dbt/models/marts/mart_neighbourhood_safety.sql
@@ -0,0 +1,78 @@
 -- Mart: Neighbourhood Safety Analysis
 -- Dashboard Tab: Safety
 -- Grain: One row per neighbourhood per year
 with crime as (
    select * from {{ ref('int_neighbourhood__crime_summary') }}
 ),
 -- City-wide averages for comparison
 city_avg as (
    select
        year,
        avg(crime_rate_per_100k) as city_avg_crime_rate,
        avg(assault_count) as city_avg_assault,
        avg(auto_theft_count) as city_avg_auto_theft,
        avg(break_enter_count) as city_avg_break_enter
    from crime
    group by year
 ),
 final as (
    select
        c.neighbourhood_id,
        c.neighbourhood_name,
        c.geometry,
        c.population,
        c.year,
        -- Total crime
        c.total_incidents,
        c.crime_rate_per_100k,
        c.yoy_change_pct as crime_yoy_change_pct,
        -- Crime breakdown
        c.assault_count,
        c.auto_theft_count,
        c.break_enter_count,
        c.robbery_count,
        c.theft_over_count,
        c.homicide_count,
        -- Per 100K rates by type
        case when c.population > 0
            then round(c.assault_count::numeric / c.population * 100000, 2)
            else null
        end as assault_rate_per_100k,
        case when c.population > 0
            then round(c.auto_theft_count::numeric / c.population * 100000, 2)
            else null
        end as auto_theft_rate_per_100k,
        case when c.population > 0
            then round(c.break_enter_count::numeric / c.population * 100000, 2)
            else null
        end as break_enter_rate_per_100k,
        -- Comparison to city average
        round(ca.city_avg_crime_rate::numeric, 2) as city_avg_crime_rate,
        -- Crime index (100 = city average)
        case
            when ca.city_avg_crime_rate > 0
            then round(c.crime_rate_per_100k / ca.city_avg_crime_rate * 100, 1)
            else null
        end as crime_index,
        -- Safety tier based on crime rate percentile
        ntile(5) over (
            partition by c.year
            order by c.crime_rate_per_100k desc
        ) as safety_tier
    from crime c
    left join city_avg ca on c.year = ca.year
 )
 select * from final
--- a/dbt/models/staging/_sources.yml
+++ b/dbt/models/staging/_sources.yml
@@ -41,3 +41,59 @@ sources:
        columns:
          - name: event_id
            description: "Primary key"
      - name: fact_census
        description: "Census demographics by neighbourhood and year"
        columns:
          - name: id
            description: "Primary key"
          - name: neighbourhood_id
            description: "Foreign key to dim_neighbourhood"
          - name: census_year
            description: "Census year (2016, 2021, etc.)"
          - name: population
            description: "Total population"
          - name: median_household_income
            description: "Median household income"
      - name: fact_crime
        description: "Crime statistics by neighbourhood, year, and type"
        columns:
          - name: id
            description: "Primary key"
          - name: neighbourhood_id
            description: "Foreign key to dim_neighbourhood"
          - name: year
            description: "Statistics year"
          - name: crime_type
            description: "Type of crime"
          - name: count
            description: "Number of incidents"
          - name: rate_per_100k
            description: "Rate per 100,000 population"
      - name: fact_amenities
        description: "Amenity counts by neighbourhood and type"
        columns:
          - name: id
            description: "Primary key"
          - name: neighbourhood_id
            description: "Foreign key to dim_neighbourhood"
          - name: amenity_type
            description: "Type of amenity (parks, schools, transit)"
          - name: count
            description: "Number of amenities"
          - name: year
            description: "Reference year"
      - name: bridge_cmhc_neighbourhood
        description: "CMHC zone to neighbourhood mapping with area weights"
        columns:
          - name: id
            description: "Primary key"
          - name: cmhc_zone_code
            description: "CMHC zone code"
          - name: neighbourhood_id
            description: "Neighbourhood ID"
          - name: weight
            description: "Proportional area weight (0-1)"
--- a/dbt/models/staging/_staging.yml
+++ b/dbt/models/staging/_staging.yml
@@ -40,3 +40,90 @@ models:
        tests:
          - unique
          - not_null
  - name: stg_toronto__neighbourhoods
    description: "Staged Toronto neighbourhood dimension (158 official boundaries)"
    columns:
      - name: neighbourhood_id
        description: "Neighbourhood primary key"
        tests:
          - unique
          - not_null
      - name: neighbourhood_name
        description: "Official neighbourhood name"
        tests:
          - not_null
      - name: geometry
        description: "PostGIS geometry (POLYGON)"
  - name: stg_toronto__census
    description: "Staged census demographics by neighbourhood"
    columns:
      - name: census_id
        description: "Census record identifier"
        tests:
          - unique
          - not_null
      - name: neighbourhood_id
        description: "Neighbourhood foreign key"
        tests:
          - not_null
      - name: census_year
        description: "Census year (2016, 2021)"
        tests:
          - not_null
  - name: stg_toronto__crime
    description: "Staged crime statistics by neighbourhood"
    columns:
      - name: crime_id
        description: "Crime record identifier"
        tests:
          - unique
          - not_null
      - name: neighbourhood_id
        description: "Neighbourhood foreign key"
        tests:
          - not_null
      - name: crime_type
        description: "Type of crime"
        tests:
          - not_null
  - name: stg_toronto__amenities
    description: "Staged amenity counts by neighbourhood"
    columns:
      - name: amenity_id
        description: "Amenity record identifier"
        tests:
          - unique
          - not_null
      - name: neighbourhood_id
        description: "Neighbourhood foreign key"
        tests:
          - not_null
      - name: amenity_type
        description: "Type of amenity"
        tests:
          - not_null
  - name: stg_cmhc__zone_crosswalk
    description: "Staged CMHC zone to neighbourhood crosswalk with area weights"
    columns:
      - name: crosswalk_id
        description: "Crosswalk record identifier"
        tests:
          - unique
          - not_null
      - name: cmhc_zone_code
        description: "CMHC zone code"
        tests:
          - not_null
      - name: neighbourhood_id
        description: "Neighbourhood foreign key"
        tests:
          - not_null
      - name: area_weight
        description: "Proportional area weight (0-1)"
        tests:
          - not_null
--- a/dbt/models/staging/stg_cmhc__zone_crosswalk.sql
+++ b/dbt/models/staging/stg_cmhc__zone_crosswalk.sql
@@ -0,0 +1,18 @@
 -- Staged CMHC zone to neighbourhood crosswalk
 -- Source: bridge_cmhc_neighbourhood table
 -- Grain: One row per zone-neighbourhood intersection
 with source as (
    select * from {{ source('toronto_housing', 'bridge_cmhc_neighbourhood') }}
 ),
 staged as (
    select
        id as crosswalk_id,
        cmhc_zone_code,
        neighbourhood_id,
        weight as area_weight
    from source
 )
 select * from staged
--- a/dbt/models/staging/stg_toronto__amenities.sql
+++ b/dbt/models/staging/stg_toronto__amenities.sql
@@ -0,0 +1,19 @@
 -- Staged amenity counts by neighbourhood
 -- Source: fact_amenities table
 -- Grain: One row per neighbourhood per amenity type per year
 with source as (
    select * from {{ source('toronto_housing', 'fact_amenities') }}
 ),
 staged as (
    select
        id as amenity_id,
        neighbourhood_id,
        amenity_type,
        count as amenity_count,
        year as amenity_year
    from source
 )
 select * from staged
--- a/dbt/models/staging/stg_toronto__census.sql
+++ b/dbt/models/staging/stg_toronto__census.sql
@@ -0,0 +1,27 @@
 -- Staged census demographics by neighbourhood
 -- Source: fact_census table
 -- Grain: One row per neighbourhood per census year
 with source as (
    select * from {{ source('toronto_housing', 'fact_census') }}
 ),
 staged as (
    select
        id as census_id,
        neighbourhood_id,
        census_year,
        population,
        population_density,
        median_household_income,
        average_household_income,
        unemployment_rate,
        pct_bachelors_or_higher,
        pct_owner_occupied,
        pct_renter_occupied,
        median_age,
        average_dwelling_value
    from source
 )
 select * from staged
--- a/dbt/models/staging/stg_toronto__crime.sql
+++ b/dbt/models/staging/stg_toronto__crime.sql
@@ -0,0 +1,20 @@
 -- Staged crime statistics by neighbourhood
 -- Source: fact_crime table
 -- Grain: One row per neighbourhood per year per crime type
 with source as (
    select * from {{ source('toronto_housing', 'fact_crime') }}
 ),
 staged as (
    select
        id as crime_id,
        neighbourhood_id,
        year as crime_year,
        crime_type,
        count as incident_count,
        rate_per_100k
    from source
 )
 select * from staged
--- a/dbt/models/staging/stg_toronto__neighbourhoods.sql
+++ b/dbt/models/staging/stg_toronto__neighbourhoods.sql
@@ -0,0 +1,25 @@
 -- Staged Toronto neighbourhood dimension
 -- Source: dim_neighbourhood table
 -- Grain: One row per neighbourhood (158 total)
 with source as (
    select * from {{ source('toronto_housing', 'dim_neighbourhood') }}
 ),
 staged as (
    select
        neighbourhood_id,
        name as neighbourhood_name,
        geometry,
        population,
        land_area_sqkm,
        pop_density_per_sqkm,
        pct_bachelors_or_higher,
        median_household_income,
        pct_owner_occupied,
        pct_renter_occupied,
        census_year
    from source
 )
 select * from staged
--- a/dbt/package-lock.yml
+++ b/dbt/package-lock.yml
@@ -0,0 +1,11 @@
 packages:
  - name: dbt_utils
    package: dbt-labs/dbt_utils
    version: 1.3.3
  - name: dbt_expectations
    package: calogica/dbt_expectations
    version: 0.10.4
  - name: dbt_date
    package: calogica/dbt_date
    version: 0.10.1
 sha1_hash: 51a51ab489f7b302c8745ae3c3781271816b01be
--- a/docs/project-lessons-learned/INDEX.md
+++ b/docs/project-lessons-learned/INDEX.md
@@ -0,0 +1,50 @@
 # Project Lessons Learned
 This folder contains lessons learned from sprints and development work. These lessons help prevent repeating mistakes and capture valuable insights.
 **Note:** This is a temporary local backup while Wiki.js integration is being configured. Once Wiki.js is ready, lessons will be migrated there for better searchability.
 ---
 ## Lessons Index
 | Date | Sprint/Phase | Title | Tags |
 |------|--------------|-------|------|
 | 2026-01-16 | Phase 4 | [dbt Test Syntax Deprecation](./phase-4-dbt-test-syntax.md) | dbt, testing, yaml, deprecation |
 ---
 ## How to Use
 ### When Starting a Sprint
 1. Review relevant lessons in this folder before implementation
 2. Search by tags or keywords to find applicable insights
 3. Apply prevention strategies from past lessons
 ### When Closing a Sprint
 1. Document any significant lessons learned
 2. Use the template below
 3. Add entry to the index table above
 ---
 ## Lesson Template
 ```markdown
 # [Sprint/Phase] - [Lesson Title]
 ## Context
 [What were you trying to do?]
 ## Problem
 [What went wrong or what insight emerged?]
 ## Solution
 [How did you solve it?]
 ## Prevention
 [How can this be avoided in future sprints?]
 ## Tags
 [Comma-separated tags for search]
 ```
--- a/docs/project-lessons-learned/phase-4-dbt-test-syntax.md
+++ b/docs/project-lessons-learned/phase-4-dbt-test-syntax.md
@@ -0,0 +1,38 @@
 # Phase 4 - dbt Test Syntax Deprecation
 ## Context
 Implementing dbt mart models with `accepted_values` tests for tier columns (safety_tier, income_quintile, amenity_tier) that should only contain values 1-5.
 ## Problem
 dbt 1.9+ introduced a deprecation warning for generic test arguments. The old syntax:
 ```yaml
 tests:
  - accepted_values:
      values: [1, 2, 3, 4, 5]
 ```
 Produces deprecation warnings:
 ```
 MissingArgumentsPropertyInGenericTestDeprecation: Arguments to generic tests should be nested under the `arguments` property.
 ```
 ## Solution
 Nest test arguments under the `arguments` property:
 ```yaml
 tests:
  - accepted_values:
      arguments:
        values: [1, 2, 3, 4, 5]
 ```
 This applies to all generic tests with arguments, not just `accepted_values`.
 ## Prevention
 - When writing dbt schema YAML files, always use the `arguments:` nesting for generic tests
 - Run `dbt parse --no-partial-parse` to catch all deprecation warnings before they become errors
 - Check dbt changelog when upgrading versions for breaking changes to test syntax
 ## Tags
 dbt, testing, yaml, deprecation, syntax, schema
--- a/portfolio_app/toronto/loaders/init.py
+++ b/portfolio_app/toronto/loaders/init.py
@@ -1,7 +1,15 @@
 """Database loaders for Toronto housing data."""
 from .amenities import load_amenities, load_amenity_counts
 from .base import bulk_insert, get_session, upsert_by_key
 from .census import load_census_data
 from .cmhc import load_cmhc_record, load_cmhc_rentals
 from .cmhc_crosswalk import (
    build_cmhc_neighbourhood_crosswalk,
    disaggregate_zone_value,
    get_neighbourhood_weights_for_zone,
 )
 from .crime import load_crime_data
 from .dimensions import (
    generate_date_key,
    load_cmhc_zones,
@@ -24,4 +32,13 @@ __all__ = [
    # Fact loaders
    "load_cmhc_rentals",
    "load_cmhc_record",
    # Phase 3 loaders
    "load_census_data",
    "load_crime_data",
    "load_amenities",
    "load_amenity_counts",
    # CMHC crosswalk
    "build_cmhc_neighbourhood_crosswalk",
    "get_neighbourhood_weights_for_zone",
    "disaggregate_zone_value",
 ]
--- a/portfolio_app/toronto/loaders/amenities.py
+++ b/portfolio_app/toronto/loaders/amenities.py
@@ -0,0 +1,93 @@
 """Loader for amenities data to fact_amenities table."""
 from collections import Counter
 from sqlalchemy.orm import Session
 from portfolio_app.toronto.models import FactAmenities
 from portfolio_app.toronto.schemas import AmenityCount, AmenityRecord
 from .base import get_session, upsert_by_key
 def load_amenities(
    records: list[AmenityRecord],
    year: int,
    session: Session | None = None,
 ) -> int:
    """Load amenity records to fact_amenities table.
    Aggregates individual amenity records into counts by neighbourhood
    and amenity type before loading.
    Args:
        records: List of validated AmenityRecord schemas.
        year: Year to associate with the amenity counts.
        session: Optional existing session.
    Returns:
        Number of records loaded (inserted + updated).
    """
    # Aggregate records by neighbourhood and amenity type
    counts: Counter[tuple[int, str]] = Counter()
    for r in records:
        key = (r.neighbourhood_id, r.amenity_type.value)
        counts[key] += 1
    # Convert to AmenityCount schemas then to models
    def _load(sess: Session) -> int:
        models = []
        for (neighbourhood_id, amenity_type), count in counts.items():
            model = FactAmenities(
                neighbourhood_id=neighbourhood_id,
                amenity_type=amenity_type,
                count=count,
                year=year,
            )
            models.append(model)
        inserted, updated = upsert_by_key(
            sess, FactAmenities, models, ["neighbourhood_id", "amenity_type", "year"]
        )
        return inserted + updated
    if session:
        return _load(session)
    with get_session() as sess:
        return _load(sess)
 def load_amenity_counts(
    records: list[AmenityCount],
    session: Session | None = None,
 ) -> int:
    """Load pre-aggregated amenity counts to fact_amenities table.
    Args:
        records: List of validated AmenityCount schemas.
        session: Optional existing session.
    Returns:
        Number of records loaded (inserted + updated).
    """
    def _load(sess: Session) -> int:
        models = []
        for r in records:
            model = FactAmenities(
                neighbourhood_id=r.neighbourhood_id,
                amenity_type=r.amenity_type.value,
                count=r.count,
                year=r.year,
            )
            models.append(model)
        inserted, updated = upsert_by_key(
            sess, FactAmenities, models, ["neighbourhood_id", "amenity_type", "year"]
        )
        return inserted + updated
    if session:
        return _load(session)
    with get_session() as sess:
        return _load(sess)
--- a/portfolio_app/toronto/loaders/census.py
+++ b/portfolio_app/toronto/loaders/census.py
@@ -0,0 +1,68 @@
 """Loader for census data to fact_census table."""
 from sqlalchemy.orm import Session
 from portfolio_app.toronto.models import FactCensus
 from portfolio_app.toronto.schemas import CensusRecord
 from .base import get_session, upsert_by_key
 def load_census_data(
    records: list[CensusRecord],
    session: Session | None = None,
 ) -> int:
    """Load census records to fact_census table.
    Args:
        records: List of validated CensusRecord schemas.
        session: Optional existing session.
    Returns:
        Number of records loaded (inserted + updated).
    """
    def _load(sess: Session) -> int:
        models = []
        for r in records:
            model = FactCensus(
                neighbourhood_id=r.neighbourhood_id,
                census_year=r.census_year,
                population=r.population,
                population_density=float(r.population_density)
                if r.population_density
                else None,
                median_household_income=float(r.median_household_income)
                if r.median_household_income
                else None,
                average_household_income=float(r.average_household_income)
                if r.average_household_income
                else None,
                unemployment_rate=float(r.unemployment_rate)
                if r.unemployment_rate
                else None,
                pct_bachelors_or_higher=float(r.pct_bachelors_or_higher)
                if r.pct_bachelors_or_higher
                else None,
                pct_owner_occupied=float(r.pct_owner_occupied)
                if r.pct_owner_occupied
                else None,
                pct_renter_occupied=float(r.pct_renter_occupied)
                if r.pct_renter_occupied
                else None,
                median_age=float(r.median_age) if r.median_age else None,
                average_dwelling_value=float(r.average_dwelling_value)
                if r.average_dwelling_value
                else None,
            )
            models.append(model)
        inserted, updated = upsert_by_key(
            sess, FactCensus, models, ["neighbourhood_id", "census_year"]
        )
        return inserted + updated
    if session:
        return _load(session)
    with get_session() as sess:
        return _load(sess)
--- a/portfolio_app/toronto/loaders/cmhc_crosswalk.py
+++ b/portfolio_app/toronto/loaders/cmhc_crosswalk.py
@@ -0,0 +1,131 @@
 """Loader for CMHC zone to neighbourhood crosswalk with area weights."""
 from sqlalchemy import text
 from sqlalchemy.orm import Session
 from .base import get_session
 def build_cmhc_neighbourhood_crosswalk(
    session: Session | None = None,
 ) -> int:
    """Calculate area overlap weights between CMHC zones and neighbourhoods.
    Uses PostGIS ST_Intersection and ST_Area functions to compute the
    proportion of each CMHC zone that overlaps with each neighbourhood.
    This enables disaggregation of CMHC zone-level data to neighbourhood level.
    The function is idempotent - it clears existing crosswalk data before
    rebuilding.
    Args:
        session: Optional existing session.
    Returns:
        Number of bridge records created.
    Note:
        Requires both dim_cmhc_zone and dim_neighbourhood tables to have
        geometry columns populated with valid PostGIS geometries.
    """
    def _build(sess: Session) -> int:
        # Clear existing crosswalk data
        sess.execute(text("DELETE FROM bridge_cmhc_neighbourhood"))
        # Calculate overlap weights using PostGIS
        # Weight = area of intersection / total area of CMHC zone
        crosswalk_query = text(
            """
            INSERT INTO bridge_cmhc_neighbourhood (cmhc_zone_code, neighbourhood_id, weight)
            SELECT
                z.zone_code,
                n.neighbourhood_id,
                CASE
                    WHEN ST_Area(z.geometry::geography) > 0 THEN
                        ST_Area(ST_Intersection(z.geometry, n.geometry)::geography) /
                        ST_Area(z.geometry::geography)
                    ELSE 0
                END as weight
            FROM dim_cmhc_zone z
            JOIN dim_neighbourhood n
                ON ST_Intersects(z.geometry, n.geometry)
            WHERE
                z.geometry IS NOT NULL
                AND n.geometry IS NOT NULL
                AND ST_Area(ST_Intersection(z.geometry, n.geometry)::geography) > 0
        """
        )
        sess.execute(crosswalk_query)
        # Count records created
        count_result = sess.execute(
            text("SELECT COUNT(*) FROM bridge_cmhc_neighbourhood")
        )
        count = count_result.scalar() or 0
        return int(count)
    if session:
        return _build(session)
    with get_session() as sess:
        return _build(sess)
 def get_neighbourhood_weights_for_zone(
    zone_code: str,
    session: Session | None = None,
 ) -> list[tuple[int, float]]:
    """Get neighbourhood weights for a specific CMHC zone.
    Args:
        zone_code: CMHC zone code.
        session: Optional existing session.
    Returns:
        List of (neighbourhood_id, weight) tuples.
    """
    def _get(sess: Session) -> list[tuple[int, float]]:
        result = sess.execute(
            text(
                """
                SELECT neighbourhood_id, weight
                FROM bridge_cmhc_neighbourhood
                WHERE cmhc_zone_code = :zone_code
                ORDER BY weight DESC
            """
            ),
            {"zone_code": zone_code},
        )
        return [(int(row[0]), float(row[1])) for row in result]
    if session:
        return _get(session)
    with get_session() as sess:
        return _get(sess)
 def disaggregate_zone_value(
    zone_code: str,
    value: float,
    session: Session | None = None,
 ) -> dict[int, float]:
    """Disaggregate a CMHC zone value to neighbourhoods using weights.
    Args:
        zone_code: CMHC zone code.
        value: Value to disaggregate (e.g., average rent).
        session: Optional existing session.
    Returns:
        Dictionary mapping neighbourhood_id to weighted value.
    Note:
        For averages (like rent), the weighted value represents the
        contribution from this zone. To get a neighbourhood's total,
        sum contributions from all overlapping zones.
    """
    weights = get_neighbourhood_weights_for_zone(zone_code, session)
    return {neighbourhood_id: value * weight for neighbourhood_id, weight in weights}
--- a/portfolio_app/toronto/loaders/crime.py
+++ b/portfolio_app/toronto/loaders/crime.py
@@ -0,0 +1,45 @@
 """Loader for crime data to fact_crime table."""
 from sqlalchemy.orm import Session
 from portfolio_app.toronto.models import FactCrime
 from portfolio_app.toronto.schemas import CrimeRecord
 from .base import get_session, upsert_by_key
 def load_crime_data(
    records: list[CrimeRecord],
    session: Session | None = None,
 ) -> int:
    """Load crime records to fact_crime table.
    Args:
        records: List of validated CrimeRecord schemas.
        session: Optional existing session.
    Returns:
        Number of records loaded (inserted + updated).
    """
    def _load(sess: Session) -> int:
        models = []
        for r in records:
            model = FactCrime(
                neighbourhood_id=r.neighbourhood_id,
                year=r.year,
                crime_type=r.crime_type.value,
                count=r.count,
                rate_per_100k=float(r.rate_per_100k) if r.rate_per_100k else None,
            )
            models.append(model)
        inserted, updated = upsert_by_key(
            sess, FactCrime, models, ["neighbourhood_id", "year", "crime_type"]
        )
        return inserted + updated
    if session:
        return _load(session)
    with get_session() as sess:
        return _load(sess)
--- a/portfolio_app/toronto/models/init.py
+++ b/portfolio_app/toronto/models/init.py
@@ -7,7 +7,13 @@ from .dimensions import (
    DimPolicyEvent,
    DimTime,
 )
-from .facts import FactRentals
+from .facts import (
    BridgeCMHCNeighbourhood,
    FactAmenities,
    FactCensus,
    FactCrime,
    FactRentals,
 )
 __all__ = [
    # Base
@@ -22,4 +28,9 @@ __all__ = [
    "DimPolicyEvent",
    # Facts
    "FactRentals",
    "FactCensus",
    "FactCrime",
    "FactAmenities",
    # Bridge tables
    "BridgeCMHCNeighbourhood",
 ]
--- a/portfolio_app/toronto/models/facts.py
+++ b/portfolio_app/toronto/models/facts.py
@@ -1,11 +1,117 @@
 """SQLAlchemy models for fact tables."""
-from sqlalchemy import ForeignKey, Integer, Numeric, String
+from sqlalchemy import ForeignKey, Index, Integer, Numeric, String
 from sqlalchemy.orm import Mapped, mapped_column, relationship
 from .base import Base
 class BridgeCMHCNeighbourhood(Base):
    """Bridge table for CMHC zone to neighbourhood mapping with area weights.
    Enables disaggregation of CMHC zone-level rental data to neighbourhood level
    using area-based proportional weights computed via PostGIS.
    """
    __tablename__ = "bridge_cmhc_neighbourhood"
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    cmhc_zone_code: Mapped[str] = mapped_column(String(10), nullable=False)
    neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
    weight: Mapped[float] = mapped_column(
        Numeric(5, 4), nullable=False
    )  # 0.0000 to 1.0000
    __table_args__ = (
        Index("ix_bridge_cmhc_zone", "cmhc_zone_code"),
        Index("ix_bridge_neighbourhood", "neighbourhood_id"),
    )
 class FactCensus(Base):
    """Census statistics by neighbourhood and year.
    Grain: One row per neighbourhood per census year.
    """
    __tablename__ = "fact_census"
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
    census_year: Mapped[int] = mapped_column(Integer, nullable=False)
    population: Mapped[int | None] = mapped_column(Integer, nullable=True)
    population_density: Mapped[float | None] = mapped_column(
        Numeric(10, 2), nullable=True
    )
    median_household_income: Mapped[float | None] = mapped_column(
        Numeric(12, 2), nullable=True
    )
    average_household_income: Mapped[float | None] = mapped_column(
        Numeric(12, 2), nullable=True
    )
    unemployment_rate: Mapped[float | None] = mapped_column(
        Numeric(5, 2), nullable=True
    )
    pct_bachelors_or_higher: Mapped[float | None] = mapped_column(
        Numeric(5, 2), nullable=True
    )
    pct_owner_occupied: Mapped[float | None] = mapped_column(
        Numeric(5, 2), nullable=True
    )
    pct_renter_occupied: Mapped[float | None] = mapped_column(
        Numeric(5, 2), nullable=True
    )
    median_age: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
    average_dwelling_value: Mapped[float | None] = mapped_column(
        Numeric(12, 2), nullable=True
    )
    __table_args__ = (
        Index("ix_fact_census_neighbourhood_year", "neighbourhood_id", "census_year"),
    )
 class FactCrime(Base):
    """Crime statistics by neighbourhood and year.
    Grain: One row per neighbourhood per year per crime type.
    """
    __tablename__ = "fact_crime"
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
    year: Mapped[int] = mapped_column(Integer, nullable=False)
    crime_type: Mapped[str] = mapped_column(String(50), nullable=False)
    count: Mapped[int] = mapped_column(Integer, nullable=False)
    rate_per_100k: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
    __table_args__ = (
        Index("ix_fact_crime_neighbourhood_year", "neighbourhood_id", "year"),
        Index("ix_fact_crime_type", "crime_type"),
    )
 class FactAmenities(Base):
    """Amenity counts by neighbourhood.
    Grain: One row per neighbourhood per amenity type per year.
    """
    __tablename__ = "fact_amenities"
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
    amenity_type: Mapped[str] = mapped_column(String(50), nullable=False)
    count: Mapped[int] = mapped_column(Integer, nullable=False)
    year: Mapped[int] = mapped_column(Integer, nullable=False)
    __table_args__ = (
        Index("ix_fact_amenities_neighbourhood_year", "neighbourhood_id", "year"),
        Index("ix_fact_amenities_type", "amenity_type"),
    )
 class FactRentals(Base):
    """Fact table for CMHC rental market data.
--- a/portfolio_app/toronto/parsers/init.py
+++ b/portfolio_app/toronto/parsers/init.py
@@ -6,6 +6,8 @@ from .geo import (
    NeighbourhoodParser,
    load_geojson,
 )
 from .toronto_open_data import TorontoOpenDataParser
 from .toronto_police import TorontoPoliceParser
 __all__ = [
    "CMHCParser",
@@ -13,4 +15,7 @@ __all__ = [
    "CMHCZoneParser",
    "NeighbourhoodParser",
    "load_geojson",
    # API parsers (Phase 3)
    "TorontoOpenDataParser",
    "TorontoPoliceParser",
 ]
--- a/portfolio_app/toronto/parsers/toronto_open_data.py
+++ b/portfolio_app/toronto/parsers/toronto_open_data.py
@@ -0,0 +1,391 @@
 """Parser for Toronto Open Data CKAN API.
 Fetches neighbourhood boundaries, census profiles, and amenities data
 from the City of Toronto's Open Data Portal.
 API Documentation: https://open.toronto.ca/dataset/
 """
 import json
 import logging
 from decimal import Decimal
 from pathlib import Path
 from typing import Any
 import httpx
 from portfolio_app.toronto.schemas import (
    AmenityRecord,
    AmenityType,
    CensusRecord,
    NeighbourhoodRecord,
 )
 logger = logging.getLogger(__name__)
 class TorontoOpenDataParser:
    """Parser for Toronto Open Data CKAN API.
    Provides methods to fetch and parse neighbourhood boundaries, census profiles,
    and amenities (parks, schools, childcare) from the Toronto Open Data portal.
    """
    BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
    API_PATH = "/api/3/action"
    # Dataset package IDs
    DATASETS = {
        "neighbourhoods": "neighbourhoods",
        "neighbourhood_profiles": "neighbourhood-profiles",
        "parks": "parks",
        "schools": "school-locations-all-types",
        "childcare": "licensed-child-care-centres",
    }
    def __init__(
        self,
        cache_dir: Path | None = None,
        timeout: float = 30.0,
    ) -> None:
        """Initialize parser.
        Args:
            cache_dir: Optional directory for caching API responses.
            timeout: HTTP request timeout in seconds.
        """
        self._cache_dir = cache_dir
        self._timeout = timeout
        self._client: httpx.Client | None = None
    @property
    def client(self) -> httpx.Client:
        """Lazy-initialize HTTP client."""
        if self._client is None:
            self._client = httpx.Client(
                base_url=self.BASE_URL,
                timeout=self._timeout,
                headers={"Accept": "application/json"},
            )
        return self._client
    def close(self) -> None:
        """Close HTTP client."""
        if self._client is not None:
            self._client.close()
            self._client = None
    def __enter__(self) -> "TorontoOpenDataParser":
        return self
    def __exit__(self, *args: Any) -> None:
        self.close()
    def _get_package(self, package_id: str) -> dict[str, Any]:
        """Fetch package metadata from CKAN.
        Args:
            package_id: The package/dataset ID.
        Returns:
            Package metadata dictionary.
        """
        response = self.client.get(
            f"{self.API_PATH}/package_show",
            params={"id": package_id},
        )
        response.raise_for_status()
        result = response.json()
        if not result.get("success"):
            raise ValueError(f"CKAN API error: {result.get('error', 'Unknown error')}")
        return dict(result["result"])
    def _get_resource_url(
        self,
        package_id: str,
        format_filter: str = "geojson",
    ) -> str:
        """Get the download URL for a resource in a package.
        Args:
            package_id: The package/dataset ID.
            format_filter: Resource format to filter by (e.g., 'geojson', 'csv').
        Returns:
            Resource download URL.
        Raises:
            ValueError: If no matching resource is found.
        """
        package = self._get_package(package_id)
        resources = package.get("resources", [])
        for resource in resources:
            resource_format = resource.get("format", "").lower()
            if format_filter.lower() in resource_format:
                return str(resource["url"])
        available = [r.get("format") for r in resources]
        raise ValueError(
            f"No {format_filter} resource in {package_id}. Available: {available}"
        )
    def _fetch_geojson(self, package_id: str) -> dict[str, Any]:
        """Fetch GeoJSON data from a package.
        Args:
            package_id: The package/dataset ID.
        Returns:
            GeoJSON FeatureCollection.
        """
        # Check cache first
        if self._cache_dir:
            cache_file = self._cache_dir / f"{package_id}.geojson"
            if cache_file.exists():
                logger.debug(f"Loading {package_id} from cache")
                with open(cache_file, encoding="utf-8") as f:
                    return dict(json.load(f))
        url = self._get_resource_url(package_id, format_filter="geojson")
        logger.info(f"Fetching GeoJSON from {url}")
        response = self.client.get(url)
        response.raise_for_status()
        data = response.json()
        # Cache the response
        if self._cache_dir:
            self._cache_dir.mkdir(parents=True, exist_ok=True)
            cache_file = self._cache_dir / f"{package_id}.geojson"
            with open(cache_file, "w", encoding="utf-8") as f:
                json.dump(data, f)
        return dict(data)
    def _fetch_csv_as_json(self, package_id: str) -> list[dict[str, Any]]:
        """Fetch CSV data as JSON records via CKAN datastore.
        Args:
            package_id: The package/dataset ID.
        Returns:
            List of records as dictionaries.
        """
        package = self._get_package(package_id)
        resources = package.get("resources", [])
        # Find a datastore-enabled resource
        for resource in resources:
            if resource.get("datastore_active"):
                resource_id = resource["id"]
                break
        else:
            raise ValueError(f"No datastore resource in {package_id}")
        # Fetch all records via datastore_search
        records: list[dict[str, Any]] = []
        offset = 0
        limit = 1000
        while True:
            response = self.client.get(
                f"{self.API_PATH}/datastore_search",
                params={"id": resource_id, "limit": limit, "offset": offset},
            )
            response.raise_for_status()
            result = response.json()
            if not result.get("success"):
                raise ValueError(f"Datastore error: {result.get('error')}")
            batch = result["result"]["records"]
            records.extend(batch)
            if len(batch) < limit:
                break
            offset += limit
        return records
    def get_neighbourhoods(self) -> list[NeighbourhoodRecord]:
        """Fetch 158 Toronto neighbourhood boundaries.
        Returns:
            List of validated NeighbourhoodRecord objects.
        """
        geojson = self._fetch_geojson(self.DATASETS["neighbourhoods"])
        features = geojson.get("features", [])
        records = []
        for feature in features:
            props = feature.get("properties", {})
            geometry = feature.get("geometry")
            # Extract area_id from various possible property names
            area_id = props.get("AREA_ID") or props.get("area_id")
            if area_id is None:
                # Try AREA_SHORT_CODE as fallback
                short_code = props.get("AREA_SHORT_CODE", "")
                if short_code:
                    # Extract numeric part
                    area_id = int("".join(c for c in short_code if c.isdigit()) or "0")
            area_name = (
                props.get("AREA_NAME")
                or props.get("area_name")
                or f"Neighbourhood {area_id}"
            )
            area_short_code = props.get("AREA_SHORT_CODE") or props.get(
                "area_short_code"
            )
            records.append(
                NeighbourhoodRecord(
                    area_id=int(area_id),
                    area_name=str(area_name),
                    area_short_code=area_short_code,
                    geometry=geometry,
                )
            )
        logger.info(f"Parsed {len(records)} neighbourhoods")
        return records
    def get_census_profiles(self, year: int = 2021) -> list[CensusRecord]:
        """Fetch neighbourhood census profiles.
        Note: Census profile data structure varies by year. This method
        extracts key demographic indicators where available.
        Args:
            year: Census year (2016 or 2021).
        Returns:
            List of validated CensusRecord objects.
        """
        # Census profiles are typically in CSV/datastore format
        try:
            raw_records = self._fetch_csv_as_json(
                self.DATASETS["neighbourhood_profiles"]
            )
        except ValueError as e:
            logger.warning(f"Could not fetch census profiles: {e}")
            return []
        # Census profiles are pivoted - rows are indicators, columns are neighbourhoods
        # This requires special handling based on the actual data structure
        logger.info(f"Fetched {len(raw_records)} census profile rows")
        # For now, return empty list - actual implementation depends on data structure
        # TODO: Implement census profile parsing based on actual data format
        return []
    def get_parks(self) -> list[AmenityRecord]:
        """Fetch park locations.
        Returns:
            List of validated AmenityRecord objects.
        """
        return self._fetch_amenities(
            self.DATASETS["parks"],
            AmenityType.PARK,
            name_field="ASSET_NAME",
            address_field="ADDRESS_FULL",
        )
    def get_schools(self) -> list[AmenityRecord]:
        """Fetch school locations.
        Returns:
            List of validated AmenityRecord objects.
        """
        return self._fetch_amenities(
            self.DATASETS["schools"],
            AmenityType.SCHOOL,
            name_field="NAME",
            address_field="ADDRESS_FULL",
        )
    def get_childcare_centres(self) -> list[AmenityRecord]:
        """Fetch licensed childcare centre locations.
        Returns:
            List of validated AmenityRecord objects.
        """
        return self._fetch_amenities(
            self.DATASETS["childcare"],
            AmenityType.CHILDCARE,
            name_field="LOC_NAME",
            address_field="ADDRESS",
        )
    def _fetch_amenities(
        self,
        package_id: str,
        amenity_type: AmenityType,
        name_field: str,
        address_field: str,
    ) -> list[AmenityRecord]:
        """Fetch and parse amenity data from GeoJSON.
        Args:
            package_id: CKAN package ID.
            amenity_type: Type of amenity.
            name_field: Property name containing amenity name.
            address_field: Property name containing address.
        Returns:
            List of AmenityRecord objects.
        """
        try:
            geojson = self._fetch_geojson(package_id)
        except (httpx.HTTPError, ValueError) as e:
            logger.warning(f"Could not fetch {package_id}: {e}")
            return []
        features = geojson.get("features", [])
        records = []
        for feature in features:
            props = feature.get("properties", {})
            geometry = feature.get("geometry")
            # Get coordinates from geometry
            lat, lon = None, None
            if geometry and geometry.get("type") == "Point":
                coords = geometry.get("coordinates", [])
                if len(coords) >= 2:
                    lon, lat = coords[0], coords[1]
            # Try to determine neighbourhood_id
            # Many datasets include AREA_ID or similar
            neighbourhood_id = (
                props.get("AREA_ID")
                or props.get("area_id")
                or props.get("NEIGHBOURHOOD_ID")
                or 0  # Will need spatial join if not available
            )
            name = props.get(name_field) or props.get(name_field.lower()) or "Unknown"
            address = props.get(address_field) or props.get(address_field.lower())
            # Skip if we don't have a neighbourhood assignment
            if neighbourhood_id == 0:
                continue
            records.append(
                AmenityRecord(
                    neighbourhood_id=int(neighbourhood_id),
                    amenity_type=amenity_type,
                    amenity_name=str(name)[:200],
                    address=str(address)[:300] if address else None,
                    latitude=Decimal(str(lat)) if lat else None,
                    longitude=Decimal(str(lon)) if lon else None,
                )
            )
        logger.info(f"Parsed {len(records)} {amenity_type.value} records")
        return records
--- a/portfolio_app/toronto/parsers/toronto_police.py
+++ b/portfolio_app/toronto/parsers/toronto_police.py
@@ -0,0 +1,371 @@
 """Parser for Toronto Police crime data via CKAN API.
 Fetches neighbourhood crime rates and major crime indicators from the
 Toronto Police Service data hosted on Toronto Open Data Portal.
 Data Sources:
 - Neighbourhood Crime Rates: Annual crime rates by neighbourhood
 - Major Crime Indicators (MCI): Detailed incident-level data
 """
 import contextlib
 import logging
 from decimal import Decimal
 from typing import Any
 import httpx
 from portfolio_app.toronto.schemas import CrimeRecord, CrimeType
 logger = logging.getLogger(__name__)
 # Mapping from Toronto Police crime categories to CrimeType enum
 CRIME_TYPE_MAPPING: dict[str, CrimeType] = {
    "assault": CrimeType.ASSAULT,
    "assaults": CrimeType.ASSAULT,
    "auto theft": CrimeType.AUTO_THEFT,
    "autotheft": CrimeType.AUTO_THEFT,
    "auto_theft": CrimeType.AUTO_THEFT,
    "break and enter": CrimeType.BREAK_AND_ENTER,
    "breakenter": CrimeType.BREAK_AND_ENTER,
    "break_and_enter": CrimeType.BREAK_AND_ENTER,
    "homicide": CrimeType.HOMICIDE,
    "homicides": CrimeType.HOMICIDE,
    "robbery": CrimeType.ROBBERY,
    "robberies": CrimeType.ROBBERY,
    "shooting": CrimeType.SHOOTING,
    "shootings": CrimeType.SHOOTING,
    "theft over": CrimeType.THEFT_OVER,
    "theftover": CrimeType.THEFT_OVER,
    "theft_over": CrimeType.THEFT_OVER,
    "theft from motor vehicle": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
    "theftfrommv": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
    "theft_from_mv": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
 }
 def _normalize_crime_type(crime_str: str) -> CrimeType:
    """Normalize crime type string to CrimeType enum.
    Args:
        crime_str: Raw crime type string from data source.
    Returns:
        Matched CrimeType enum value, or CrimeType.OTHER if no match.
    """
    normalized = crime_str.lower().strip().replace("-", " ").replace("_", " ")
    return CRIME_TYPE_MAPPING.get(normalized, CrimeType.OTHER)
 class TorontoPoliceParser:
    """Parser for Toronto Police crime data via CKAN API.
    Crime data is hosted on Toronto Open Data Portal but sourced from
    Toronto Police Service.
    """
    BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
    API_PATH = "/api/3/action"
    # Dataset package IDs
    DATASETS = {
        "crime_rates": "neighbourhood-crime-rates",
        "mci": "major-crime-indicators",
        "shootings": "shootings-firearm-discharges",
    }
    def __init__(self, timeout: float = 30.0) -> None:
        """Initialize parser.
        Args:
            timeout: HTTP request timeout in seconds.
        """
        self._timeout = timeout
        self._client: httpx.Client | None = None
    @property
    def client(self) -> httpx.Client:
        """Lazy-initialize HTTP client."""
        if self._client is None:
            self._client = httpx.Client(
                base_url=self.BASE_URL,
                timeout=self._timeout,
                headers={"Accept": "application/json"},
            )
        return self._client
    def close(self) -> None:
        """Close HTTP client."""
        if self._client is not None:
            self._client.close()
            self._client = None
    def __enter__(self) -> "TorontoPoliceParser":
        return self
    def __exit__(self, *args: Any) -> None:
        self.close()
    def _get_package(self, package_id: str) -> dict[str, Any]:
        """Fetch package metadata from CKAN."""
        response = self.client.get(
            f"{self.API_PATH}/package_show",
            params={"id": package_id},
        )
        response.raise_for_status()
        result = response.json()
        if not result.get("success"):
            raise ValueError(f"CKAN API error: {result.get('error', 'Unknown error')}")
        return dict(result["result"])
    def _fetch_datastore_records(
        self,
        package_id: str,
        filters: dict[str, Any] | None = None,
    ) -> list[dict[str, Any]]:
        """Fetch records from CKAN datastore.
        Args:
            package_id: CKAN package ID.
            filters: Optional filters to apply.
        Returns:
            List of records as dictionaries.
        """
        package = self._get_package(package_id)
        resources = package.get("resources", [])
        # Find datastore-enabled resource
        resource_id = None
        for resource in resources:
            if resource.get("datastore_active"):
                resource_id = resource["id"]
                break
        if not resource_id:
            raise ValueError(f"No datastore resource in {package_id}")
        # Fetch all records
        records: list[dict[str, Any]] = []
        offset = 0
        limit = 1000
        while True:
            params: dict[str, Any] = {
                "id": resource_id,
                "limit": limit,
                "offset": offset,
            }
            if filters:
                params["filters"] = str(filters)
            response = self.client.get(
                f"{self.API_PATH}/datastore_search",
                params=params,
            )
            response.raise_for_status()
            result = response.json()
            if not result.get("success"):
                raise ValueError(f"Datastore error: {result.get('error')}")
            batch = result["result"]["records"]
            records.extend(batch)
            if len(batch) < limit:
                break
            offset += limit
        return records
    def get_crime_rates(
        self,
        years: list[int] | None = None,
    ) -> list[CrimeRecord]:
        """Fetch neighbourhood crime rates.
        The crime rates dataset contains annual counts and rates per 100k
        population for each neighbourhood.
        Args:
            years: Optional list of years to filter. If None, fetches all.
        Returns:
            List of validated CrimeRecord objects.
        """
        try:
            raw_records = self._fetch_datastore_records(self.DATASETS["crime_rates"])
        except (httpx.HTTPError, ValueError) as e:
            logger.warning(f"Could not fetch crime rates: {e}")
            return []
        records = []
        for row in raw_records:
            # Extract neighbourhood ID (Hood_ID maps to AREA_ID)
            hood_id = row.get("HOOD_ID") or row.get("Hood_ID") or row.get("hood_id")
            if not hood_id:
                continue
            try:
                neighbourhood_id = int(hood_id)
            except (ValueError, TypeError):
                continue
            # Crime rate data typically has columns like:
            # ASSAULT_2019, ASSAULT_RATE_2019, AUTOTHEFT_2020, etc.
            # We need to parse column names to extract crime type and year
            for col_name, value in row.items():
                if value is None or col_name in (
                    "_id",
                    "HOOD_ID",
                    "Hood_ID",
                    "hood_id",
                    "AREA_NAME",
                    "NEIGHBOURHOOD",
                ):
                    continue
                # Try to parse column name for crime type and year
                # Pattern: CRIMETYPE_YEAR or CRIMETYPE_RATE_YEAR
                parts = col_name.upper().split("_")
                if len(parts) < 2:
                    continue
                # Check if last part is a year
                try:
                    year = int(parts[-1])
                    if year < 2014 or year > 2030:
                        continue
                except ValueError:
                    continue
                # Filter by years if specified
                if years and year not in years:
                    continue
                # Check if this is a rate column
                is_rate = "RATE" in parts
                # Extract crime type (everything before RATE/year)
                if is_rate:
                    rate_idx = parts.index("RATE")
                    crime_type_str = "_".join(parts[:rate_idx])
                else:
                    crime_type_str = "_".join(parts[:-1])
                crime_type = _normalize_crime_type(crime_type_str)
                try:
                    numeric_value = Decimal(str(value))
                except (ValueError, TypeError):
                    continue
                if is_rate:
                    # This is a rate column - look for corresponding count
                    # We'll skip rate-only entries and create records from counts
                    continue
                # Find corresponding rate if available
                rate_col = f"{crime_type_str}_RATE_{year}"
                rate_value = row.get(rate_col)
                rate_per_100k = None
                if rate_value is not None:
                    with contextlib.suppress(ValueError, TypeError):
                        rate_per_100k = Decimal(str(rate_value))
                records.append(
                    CrimeRecord(
                        neighbourhood_id=neighbourhood_id,
                        year=year,
                        crime_type=crime_type,
                        count=int(numeric_value),
                        rate_per_100k=rate_per_100k,
                    )
                )
        logger.info(f"Parsed {len(records)} crime rate records")
        return records
    def get_major_crime_indicators(
        self,
        years: list[int] | None = None,
    ) -> list[CrimeRecord]:
        """Fetch major crime indicators (detailed MCI data).
        MCI data contains incident-level records that need to be aggregated
        by neighbourhood and year.
        Args:
            years: Optional list of years to filter.
        Returns:
            List of aggregated CrimeRecord objects.
        """
        try:
            raw_records = self._fetch_datastore_records(self.DATASETS["mci"])
        except (httpx.HTTPError, ValueError) as e:
            logger.warning(f"Could not fetch MCI data: {e}")
            return []
        # Aggregate counts by neighbourhood, year, and crime type
        aggregates: dict[tuple[int, int, CrimeType], int] = {}
        for row in raw_records:
            # Extract neighbourhood ID
            hood_id = (
                row.get("HOOD_158")
                or row.get("HOOD_140")
                or row.get("HOOD_ID")
                or row.get("Hood_ID")
            )
            if not hood_id:
                continue
            try:
                neighbourhood_id = int(hood_id)
            except (ValueError, TypeError):
                continue
            # Extract year from occurrence date
            occ_year = row.get("OCC_YEAR") or row.get("REPORT_YEAR")
            if not occ_year:
                continue
            try:
                year = int(occ_year)
                if year < 2014 or year > 2030:
                    continue
            except (ValueError, TypeError):
                continue
            # Filter by years if specified
            if years and year not in years:
                continue
            # Extract crime type
            mci_category = row.get("MCI_CATEGORY") or row.get("OFFENCE") or ""
            crime_type = _normalize_crime_type(str(mci_category))
            # Aggregate count
            key = (neighbourhood_id, year, crime_type)
            aggregates[key] = aggregates.get(key, 0) + 1
        # Convert aggregates to CrimeRecord objects
        records = [
            CrimeRecord(
                neighbourhood_id=neighbourhood_id,
                year=year,
                crime_type=crime_type,
                count=count,
                rate_per_100k=None,  # Would need population data to calculate
            )
            for (neighbourhood_id, year, crime_type), count in aggregates.items()
        ]
        logger.info(f"Parsed {len(records)} MCI records (aggregated)")
        return records
--- a/portfolio_app/toronto/schemas/init.py
+++ b/portfolio_app/toronto/schemas/init.py
@@ -1,5 +1,6 @@
 """Pydantic schemas for Toronto housing data validation."""
 from .amenities import AmenityCount, AmenityRecord, AmenityType
 from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
 from .dimensions import (
    CMHCZone,
@@ -11,6 +12,7 @@ from .dimensions import (
    PolicyLevel,
    TimeDimension,
 )
 from .neighbourhood import CensusRecord, CrimeRecord, CrimeType, NeighbourhoodRecord
 __all__ = [
    # CMHC
@@ -28,4 +30,13 @@ __all__ = [
    "PolicyCategory",
    "ExpectedDirection",
    "Confidence",
    # Neighbourhood data (Phase 3)
    "NeighbourhoodRecord",
    "CensusRecord",
    "CrimeRecord",
    "CrimeType",
    # Amenities (Phase 3)
    "AmenityType",
    "AmenityRecord",
    "AmenityCount",
 ]
--- a/portfolio_app/toronto/schemas/amenities.py
+++ b/portfolio_app/toronto/schemas/amenities.py
@@ -0,0 +1,60 @@
 """Pydantic schemas for Toronto amenities data.
 Includes schemas for parks, schools, childcare centres, and transit stops.
 """
 from decimal import Decimal
 from enum import Enum
 from pydantic import BaseModel, Field
 class AmenityType(str, Enum):
    """Types of amenities tracked in the neighbourhood dashboard."""
    PARK = "park"
    SCHOOL = "school"
    CHILDCARE = "childcare"
    TRANSIT_STOP = "transit_stop"
    LIBRARY = "library"
    COMMUNITY_CENTRE = "community_centre"
    HOSPITAL = "hospital"
 class AmenityRecord(BaseModel):
    """Amenity location record for a neighbourhood.
    Represents a single amenity (park, school, etc.) with its location
    and associated neighbourhood.
    """
    neighbourhood_id: int = Field(
        ge=1, le=200, description="Neighbourhood ID containing this amenity"
    )
    amenity_type: AmenityType = Field(description="Type of amenity")
    amenity_name: str = Field(max_length=200, description="Name of the amenity")
    address: str | None = Field(
        default=None, max_length=300, description="Street address"
    )
    latitude: Decimal | None = Field(
        default=None, ge=-90, le=90, description="Latitude (WGS84)"
    )
    longitude: Decimal | None = Field(
        default=None, ge=-180, le=180, description="Longitude (WGS84)"
    )
    model_config = {"str_strip_whitespace": True}
 class AmenityCount(BaseModel):
    """Aggregated amenity count for a neighbourhood.
    Used for dashboard metrics showing amenity density per neighbourhood.
    """
    neighbourhood_id: int = Field(ge=1, le=200, description="Neighbourhood ID")
    amenity_type: AmenityType = Field(description="Type of amenity")
    count: int = Field(ge=0, description="Number of amenities of this type")
    year: int = Field(ge=2020, le=2030, description="Year of data snapshot")
    model_config = {"str_strip_whitespace": True}
--- a/portfolio_app/toronto/schemas/neighbourhood.py
+++ b/portfolio_app/toronto/schemas/neighbourhood.py
@@ -0,0 +1,106 @@
 """Pydantic schemas for Toronto neighbourhood data.
 Includes schemas for neighbourhood boundaries, census profiles, and crime statistics.
 """
 from decimal import Decimal
 from enum import Enum
 from typing import Any
 from pydantic import BaseModel, Field
 class CrimeType(str, Enum):
    """Major crime indicator types from Toronto Police data."""
    ASSAULT = "assault"
    AUTO_THEFT = "auto_theft"
    BREAK_AND_ENTER = "break_and_enter"
    HOMICIDE = "homicide"
    ROBBERY = "robbery"
    SHOOTING = "shooting"
    THEFT_OVER = "theft_over"
    THEFT_FROM_MOTOR_VEHICLE = "theft_from_motor_vehicle"
    OTHER = "other"
 class NeighbourhoodRecord(BaseModel):
    """Schema for Toronto neighbourhood boundary data.
    Based on City of Toronto's 158 neighbourhoods dataset.
    AREA_ID maps to neighbourhood_id for consistency with police data (Hood_ID).
    """
    area_id: int = Field(description="AREA_ID from Toronto Open Data (1-158)")
    area_name: str = Field(max_length=100, description="Official neighbourhood name")
    area_short_code: str | None = Field(
        default=None, max_length=10, description="Short code (e.g., 'E01')"
    )
    geometry: dict[str, Any] | None = Field(
        default=None, description="GeoJSON geometry object"
    )
    model_config = {"str_strip_whitespace": True}
 class CensusRecord(BaseModel):
    """Census profile data for a neighbourhood.
    Contains demographic and socioeconomic indicators from Statistics Canada
    census data, aggregated to the neighbourhood level.
    """
    neighbourhood_id: int = Field(
        ge=1, le=200, description="Neighbourhood ID (AREA_ID)"
    )
    census_year: int = Field(ge=2016, le=2030, description="Census year")
    population: int | None = Field(default=None, ge=0, description="Total population")
    population_density: Decimal | None = Field(
        default=None, ge=0, description="Population per square kilometre"
    )
    median_household_income: Decimal | None = Field(
        default=None, ge=0, description="Median household income (CAD)"
    )
    average_household_income: Decimal | None = Field(
        default=None, ge=0, description="Average household income (CAD)"
    )
    unemployment_rate: Decimal | None = Field(
        default=None, ge=0, le=100, description="Unemployment rate percentage"
    )
    pct_bachelors_or_higher: Decimal | None = Field(
        default=None, ge=0, le=100, description="Percentage with bachelor's degree+"
    )
    pct_owner_occupied: Decimal | None = Field(
        default=None, ge=0, le=100, description="Percentage owner-occupied dwellings"
    )
    pct_renter_occupied: Decimal | None = Field(
        default=None, ge=0, le=100, description="Percentage renter-occupied dwellings"
    )
    median_age: Decimal | None = Field(
        default=None, ge=0, le=120, description="Median age of residents"
    )
    average_dwelling_value: Decimal | None = Field(
        default=None, ge=0, description="Average dwelling value (CAD)"
    )
    model_config = {"str_strip_whitespace": True}
 class CrimeRecord(BaseModel):
    """Crime statistics for a neighbourhood.
    Based on Toronto Police neighbourhood crime rates data.
    Hood_ID in source data maps to neighbourhood_id (AREA_ID).
    """
    neighbourhood_id: int = Field(
        ge=1, le=200, description="Neighbourhood ID (Hood_ID -> AREA_ID)"
    )
    year: int = Field(ge=2014, le=2030, description="Year of crime statistics")
    crime_type: CrimeType = Field(description="Type of crime (MCI category)")
    count: int = Field(ge=0, description="Number of incidents")
    rate_per_100k: Decimal | None = Field(
        default=None, ge=0, description="Rate per 100,000 population"
    )
    model_config = {"str_strip_whitespace": True}