3 Commits

Author SHA1 Message Date
3054441630 docs: Add local lessons learned backup system
- Create docs/project-lessons-learned/ for local lesson storage
- Add INDEX.md with lesson template and index table
- Document Phase 4 dbt test syntax deprecation lesson
- Update CLAUDE.md with backup method when Wiki.js unavailable

This provides a fallback for capturing lessons learned while
Wiki.js integration is being configured.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 11:52:06 -05:00
b6d210ec6b feat: Implement Phase 4 dbt model restructuring
Create neighbourhood-centric dbt transformation layer:

Staging (5 models):
- stg_toronto__neighbourhoods - Neighbourhood dimension
- stg_toronto__census - Census demographics
- stg_toronto__crime - Crime statistics
- stg_toronto__amenities - Amenity counts
- stg_cmhc__zone_crosswalk - Zone-to-neighbourhood weights

Intermediate (5 models):
- int_neighbourhood__demographics - Combined census with quintiles
- int_neighbourhood__housing - Housing + affordability indicators
- int_neighbourhood__crime_summary - Aggregated crime with YoY
- int_neighbourhood__amenity_scores - Per-capita amenity metrics
- int_rentals__neighbourhood_allocated - CMHC via area weights

Marts (5 models):
- mart_neighbourhood_overview - Composite livability score
- mart_neighbourhood_housing - Affordability index
- mart_neighbourhood_safety - Crime rates per 100K
- mart_neighbourhood_demographics - Income/age indices
- mart_neighbourhood_amenities - Amenity index

Closes #60, #61, #62, #63

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 11:41:27 -05:00
053acf6436 feat: Implement Phase 3 neighbourhood data model
Add schemas, parsers, loaders, and models for Toronto neighbourhood-centric
data including census profiles, crime statistics, and amenities.

Schemas:
- NeighbourhoodRecord, CensusRecord, CrimeRecord, CrimeType
- AmenityType, AmenityRecord, AmenityCount

Models:
- BridgeCMHCNeighbourhood (zone-to-neighbourhood mapping with weights)
- FactCensus, FactCrime, FactAmenities

Parsers:
- TorontoOpenDataParser (CKAN API for neighbourhoods, census, amenities)
- TorontoPoliceParser (crime rates, MCI data)

Loaders:
- load_census_data, load_crime_data, load_amenities
- build_cmhc_neighbourhood_crosswalk (PostGIS area weights)

Also updates CLAUDE.md with projman plugin workflow documentation.

Closes #53, #54, #55, #56, #57, #58, #59

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 11:07:13 -05:00
36 changed files with 2817 additions and 2 deletions

View File

@@ -261,4 +261,71 @@ All scripts in `scripts/`:
--- ---
## Projman Plugin Workflow
**CRITICAL: Always use the projman plugin for sprint and task management.**
### When to Use Projman Skills
| Skill | Trigger | Purpose |
|-------|---------|---------|
| `/projman:sprint-plan` | New sprint or phase implementation | Architecture analysis + Gitea issue creation |
| `/projman:sprint-start` | Beginning implementation work | Load lessons learned (Wiki.js or local), start execution |
| `/projman:sprint-status` | Check progress | Review blockers and completion status |
| `/projman:sprint-close` | Sprint completion | Capture lessons learned (Wiki.js or local backup) |
### Default Behavior
When user requests implementation work:
1. **ALWAYS start with `/projman:sprint-plan`** before writing code
2. Create Gitea issues with proper labels and acceptance criteria
3. Use `/projman:sprint-start` to begin execution with lessons learned
4. Track progress via Gitea issue comments
5. Close sprint with `/projman:sprint-close` to document lessons
### Gitea Repository
- **Repo**: `lmiranda/personal-portfolio`
- **Host**: `gitea.hotserv.cloud`
- **Note**: `lmiranda` is a user account (not org), so label lookup may require repo-level labels
### MCP Tools Available
**Gitea**:
- `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment`
- `get_labels`, `suggest_labels`
**Wiki.js**:
- `search_lessons`, `create_lesson`, `search_pages`, `get_page`
### Lessons Learned (Backup Method)
**When Wiki.js is unavailable**, use the local backup in `docs/project-lessons-learned/`:
**At Sprint Start:**
1. Review `docs/project-lessons-learned/INDEX.md` for relevant past lessons
2. Search lesson files by tags/keywords before implementation
3. Apply prevention strategies from applicable lessons
**At Sprint Close:**
1. Try Wiki.js `create_lesson` first
2. If Wiki.js fails, create lesson in `docs/project-lessons-learned/`
3. Use naming convention: `{phase-or-sprint}-{short-description}.md`
4. Update `INDEX.md` with new entry
5. Follow the lesson template in INDEX.md
**Migration:** Once Wiki.js is configured, lessons will be migrated there for better searchability.
### Issue Structure
Every Gitea issue should include:
- **Overview**: Brief description
- **Files to Create/Modify**: Explicit paths
- **Acceptance Criteria**: Checkboxes
- **Technical Notes**: Implementation hints
- **Labels**: Listed in body (workaround for label API issues)
---
*Last Updated: Sprint 9* *Last Updated: Sprint 9*

View File

@@ -11,3 +11,77 @@ models:
- name: zone_code - name: zone_code
tests: tests:
- not_null - not_null
- name: int_neighbourhood__demographics
description: "Combined census demographics with neighbourhood attributes"
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: census_year
description: "Census year"
tests:
- not_null
- name: income_quintile
description: "Income quintile (1-5, city-wide)"
- name: int_neighbourhood__housing
description: "Housing indicators combining census and rental data"
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: year
description: "Reference year"
- name: rent_to_income_pct
description: "Rent as percentage of median income"
- name: is_affordable
description: "Boolean: rent <= 30% of income"
- name: int_neighbourhood__crime_summary
description: "Aggregated crime with year-over-year trends"
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: year
description: "Statistics year"
tests:
- not_null
- name: crime_rate_per_100k
description: "Total crime rate per 100K population"
- name: yoy_change_pct
description: "Year-over-year change percentage"
- name: int_neighbourhood__amenity_scores
description: "Normalized amenities per capita and per area"
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: year
description: "Reference year"
- name: total_amenities_per_1000
description: "Total amenities per 1000 population"
- name: amenities_per_sqkm
description: "Total amenities per square km"
- name: int_rentals__neighbourhood_allocated
description: "CMHC rental data allocated to neighbourhoods via area weights"
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: year
description: "Survey year"
tests:
- not_null
- name: avg_rent_2bed
description: "Weighted average 2-bedroom rent"
- name: vacancy_rate
description: "Weighted average vacancy rate"

View File

@@ -0,0 +1,79 @@
-- Intermediate: Normalized amenities per 1000 population
-- Pivots amenity types and calculates per-capita metrics
-- Grain: One row per neighbourhood per year
with neighbourhoods as (
select * from {{ ref('stg_toronto__neighbourhoods') }}
),
amenities as (
select * from {{ ref('stg_toronto__amenities') }}
),
-- Aggregate amenity types
amenities_by_year as (
select
neighbourhood_id,
amenity_year as year,
sum(case when amenity_type = 'Parks' then amenity_count else 0 end) as parks_count,
sum(case when amenity_type = 'Schools' then amenity_count else 0 end) as schools_count,
sum(case when amenity_type = 'Transit Stops' then amenity_count else 0 end) as transit_count,
sum(case when amenity_type = 'Libraries' then amenity_count else 0 end) as libraries_count,
sum(case when amenity_type = 'Community Centres' then amenity_count else 0 end) as community_centres_count,
sum(case when amenity_type = 'Recreation' then amenity_count else 0 end) as recreation_count,
sum(amenity_count) as total_amenities
from amenities
group by neighbourhood_id, amenity_year
),
amenity_scores as (
select
n.neighbourhood_id,
n.neighbourhood_name,
n.geometry,
n.population,
n.land_area_sqkm,
a.year,
-- Raw counts
a.parks_count,
a.schools_count,
a.transit_count,
a.libraries_count,
a.community_centres_count,
a.recreation_count,
a.total_amenities,
-- Per 1000 population
case when n.population > 0
then round(a.parks_count::numeric / n.population * 1000, 3)
else null
end as parks_per_1000,
case when n.population > 0
then round(a.schools_count::numeric / n.population * 1000, 3)
else null
end as schools_per_1000,
case when n.population > 0
then round(a.transit_count::numeric / n.population * 1000, 3)
else null
end as transit_per_1000,
case when n.population > 0
then round(a.total_amenities::numeric / n.population * 1000, 3)
else null
end as total_amenities_per_1000,
-- Per square km
case when n.land_area_sqkm > 0
then round(a.total_amenities::numeric / n.land_area_sqkm, 2)
else null
end as amenities_per_sqkm
from neighbourhoods n
left join amenities_by_year a on n.neighbourhood_id = a.neighbourhood_id
)
select * from amenity_scores

View File

@@ -0,0 +1,81 @@
-- Intermediate: Aggregated crime by neighbourhood with YoY change
-- Pivots crime types and calculates year-over-year trends
-- Grain: One row per neighbourhood per year
with neighbourhoods as (
select * from {{ ref('stg_toronto__neighbourhoods') }}
),
crime as (
select * from {{ ref('stg_toronto__crime') }}
),
-- Aggregate crime types
crime_by_year as (
select
neighbourhood_id,
crime_year as year,
sum(incident_count) as total_incidents,
sum(case when crime_type = 'Assault' then incident_count else 0 end) as assault_count,
sum(case when crime_type = 'Auto Theft' then incident_count else 0 end) as auto_theft_count,
sum(case when crime_type = 'Break and Enter' then incident_count else 0 end) as break_enter_count,
sum(case when crime_type = 'Robbery' then incident_count else 0 end) as robbery_count,
sum(case when crime_type = 'Theft Over' then incident_count else 0 end) as theft_over_count,
sum(case when crime_type = 'Homicide' then incident_count else 0 end) as homicide_count,
avg(rate_per_100k) as avg_rate_per_100k
from crime
group by neighbourhood_id, crime_year
),
-- Add year-over-year changes
with_yoy as (
select
c.*,
lag(c.total_incidents, 1) over (
partition by c.neighbourhood_id
order by c.year
) as prev_year_incidents,
round(
(c.total_incidents - lag(c.total_incidents, 1) over (
partition by c.neighbourhood_id
order by c.year
))::numeric /
nullif(lag(c.total_incidents, 1) over (
partition by c.neighbourhood_id
order by c.year
), 0) * 100,
2
) as yoy_change_pct
from crime_by_year c
),
crime_summary as (
select
n.neighbourhood_id,
n.neighbourhood_name,
n.geometry,
n.population,
w.year,
w.total_incidents,
w.assault_count,
w.auto_theft_count,
w.break_enter_count,
w.robbery_count,
w.theft_over_count,
w.homicide_count,
w.avg_rate_per_100k,
w.yoy_change_pct,
-- Crime rate per 100K population
case
when n.population > 0
then round(w.total_incidents::numeric / n.population * 100000, 2)
else null
end as crime_rate_per_100k
from neighbourhoods n
inner join with_yoy w on n.neighbourhood_id = w.neighbourhood_id
)
select * from crime_summary

View File

@@ -0,0 +1,44 @@
-- Intermediate: Combined census demographics by neighbourhood
-- Joins neighbourhoods with census data for demographic analysis
-- Grain: One row per neighbourhood per census year
with neighbourhoods as (
select * from {{ ref('stg_toronto__neighbourhoods') }}
),
census as (
select * from {{ ref('stg_toronto__census') }}
),
demographics as (
select
n.neighbourhood_id,
n.neighbourhood_name,
n.geometry,
n.land_area_sqkm,
c.census_year,
c.population,
c.population_density,
c.median_household_income,
c.average_household_income,
c.median_age,
c.unemployment_rate,
c.pct_bachelors_or_higher as education_bachelors_pct,
c.average_dwelling_value,
-- Tenure mix
c.pct_owner_occupied,
c.pct_renter_occupied,
-- Income quintile (city-wide comparison)
ntile(5) over (
partition by c.census_year
order by c.median_household_income
) as income_quintile
from neighbourhoods n
left join census c on n.neighbourhood_id = c.neighbourhood_id
)
select * from demographics

View File

@@ -0,0 +1,56 @@
-- Intermediate: Housing indicators by neighbourhood
-- Combines census housing data with allocated CMHC rental data
-- Grain: One row per neighbourhood per year
with neighbourhoods as (
select * from {{ ref('stg_toronto__neighbourhoods') }}
),
census as (
select * from {{ ref('stg_toronto__census') }}
),
allocated_rentals as (
select * from {{ ref('int_rentals__neighbourhood_allocated') }}
),
housing as (
select
n.neighbourhood_id,
n.neighbourhood_name,
n.geometry,
coalesce(r.year, c.census_year) as year,
-- Census housing metrics
c.pct_owner_occupied,
c.pct_renter_occupied,
c.average_dwelling_value,
c.median_household_income,
-- Allocated rental metrics (weighted average from CMHC zones)
r.avg_rent_2bed,
r.vacancy_rate,
-- Affordability calculations
case
when c.median_household_income > 0 and r.avg_rent_2bed > 0
then round((r.avg_rent_2bed * 12 / c.median_household_income) * 100, 2)
else null
end as rent_to_income_pct,
-- Affordability threshold (30% of income)
case
when c.median_household_income > 0 and r.avg_rent_2bed > 0
then r.avg_rent_2bed * 12 <= c.median_household_income * 0.30
else null
end as is_affordable
from neighbourhoods n
left join census c on n.neighbourhood_id = c.neighbourhood_id
left join allocated_rentals r
on n.neighbourhood_id = r.neighbourhood_id
and r.year = c.census_year
)
select * from housing

View File

@@ -0,0 +1,73 @@
-- Intermediate: CMHC rentals allocated to neighbourhoods via area weights
-- Disaggregates zone-level rental data to neighbourhood level
-- Grain: One row per neighbourhood per year
with crosswalk as (
select * from {{ ref('stg_cmhc__zone_crosswalk') }}
),
rentals as (
select * from {{ ref('int_rentals__annual') }}
),
neighbourhoods as (
select * from {{ ref('stg_toronto__neighbourhoods') }}
),
-- Allocate rental metrics to neighbourhoods using area weights
allocated as (
select
c.neighbourhood_id,
r.year,
r.bedroom_type,
-- Weighted average rent (using area weight)
sum(r.avg_rent * c.area_weight) as weighted_avg_rent,
sum(r.median_rent * c.area_weight) as weighted_median_rent,
sum(c.area_weight) as total_weight,
-- Weighted vacancy rate
sum(r.vacancy_rate * c.area_weight) / nullif(sum(c.area_weight), 0) as vacancy_rate,
-- Weighted rental universe
sum(r.rental_universe * c.area_weight) as rental_units_estimate
from crosswalk c
inner join rentals r on c.cmhc_zone_code = r.zone_code
group by c.neighbourhood_id, r.year, r.bedroom_type
),
-- Pivot to get 2-bedroom as primary metric
pivoted as (
select
neighbourhood_id,
year,
max(case when bedroom_type = 'Two Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_2bed,
max(case when bedroom_type = 'One Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_1bed,
max(case when bedroom_type = 'Bachelor' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_bachelor,
max(case when bedroom_type = 'Three Bedroom +' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_3bed,
avg(vacancy_rate) as vacancy_rate,
sum(rental_units_estimate) as total_rental_units
from allocated
group by neighbourhood_id, year
),
final as (
select
n.neighbourhood_id,
n.neighbourhood_name,
n.geometry,
p.year,
round(p.avg_rent_bachelor::numeric, 2) as avg_rent_bachelor,
round(p.avg_rent_1bed::numeric, 2) as avg_rent_1bed,
round(p.avg_rent_2bed::numeric, 2) as avg_rent_2bed,
round(p.avg_rent_3bed::numeric, 2) as avg_rent_3bed,
round(p.vacancy_rate::numeric, 2) as vacancy_rate,
round(p.total_rental_units::numeric, 0) as total_rental_units
from neighbourhoods n
inner join pivoted p on n.neighbourhood_id = p.neighbourhood_id
)
select * from final

View File

@@ -9,3 +9,127 @@ models:
tests: tests:
- unique - unique
- not_null - not_null
- name: mart_neighbourhood_overview
description: "Neighbourhood overview with composite livability score"
meta:
dashboard_tab: Overview
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: neighbourhood_name
description: "Official neighbourhood name"
tests:
- not_null
- name: geometry
description: "PostGIS geometry for mapping"
- name: livability_score
description: "Composite score: safety (30%), affordability (40%), amenities (30%)"
- name: safety_score
description: "Safety component score (0-100)"
- name: affordability_score
description: "Affordability component score (0-100)"
- name: amenity_score
description: "Amenity component score (0-100)"
- name: mart_neighbourhood_housing
description: "Housing and affordability metrics by neighbourhood"
meta:
dashboard_tab: Housing
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: neighbourhood_name
description: "Official neighbourhood name"
tests:
- not_null
- name: geometry
description: "PostGIS geometry for mapping"
- name: rent_to_income_pct
description: "Rent as percentage of median income"
- name: affordability_index
description: "100 = city average affordability"
- name: rent_yoy_change_pct
description: "Year-over-year rent change"
- name: mart_neighbourhood_safety
description: "Crime rates and safety metrics by neighbourhood"
meta:
dashboard_tab: Safety
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: neighbourhood_name
description: "Official neighbourhood name"
tests:
- not_null
- name: geometry
description: "PostGIS geometry for mapping"
- name: crime_rate_per_100k
description: "Total crime rate per 100K population"
- name: crime_index
description: "100 = city average crime rate"
- name: safety_tier
description: "Safety tier (1=safest, 5=highest crime)"
tests:
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5]
- name: mart_neighbourhood_demographics
description: "Demographics and income metrics by neighbourhood"
meta:
dashboard_tab: Demographics
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: neighbourhood_name
description: "Official neighbourhood name"
tests:
- not_null
- name: geometry
description: "PostGIS geometry for mapping"
- name: median_household_income
description: "Median household income"
- name: income_index
description: "100 = city average income"
- name: income_quintile
description: "Income quintile (1-5)"
tests:
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5]
- name: mart_neighbourhood_amenities
description: "Amenity access metrics by neighbourhood"
meta:
dashboard_tab: Amenities
columns:
- name: neighbourhood_id
description: "Neighbourhood identifier"
tests:
- not_null
- name: neighbourhood_name
description: "Official neighbourhood name"
tests:
- not_null
- name: geometry
description: "PostGIS geometry for mapping"
- name: total_amenities_per_1000
description: "Total amenities per 1000 population"
- name: amenity_index
description: "100 = city average amenities"
- name: amenity_tier
description: "Amenity tier (1=best, 5=lowest)"
tests:
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5]

View File

@@ -0,0 +1,89 @@
-- Mart: Neighbourhood Amenities Analysis
-- Dashboard Tab: Amenities
-- Grain: One row per neighbourhood per year
with amenities as (
select * from {{ ref('int_neighbourhood__amenity_scores') }}
),
-- City-wide averages for comparison
city_avg as (
select
year,
avg(parks_per_1000) as city_avg_parks,
avg(schools_per_1000) as city_avg_schools,
avg(transit_per_1000) as city_avg_transit,
avg(total_amenities_per_1000) as city_avg_total_amenities
from amenities
group by year
),
final as (
select
a.neighbourhood_id,
a.neighbourhood_name,
a.geometry,
a.population,
a.land_area_sqkm,
a.year,
-- Raw counts
a.parks_count,
a.schools_count,
a.transit_count,
a.libraries_count,
a.community_centres_count,
a.recreation_count,
a.total_amenities,
-- Per 1000 population
a.parks_per_1000,
a.schools_per_1000,
a.transit_per_1000,
a.total_amenities_per_1000,
-- Per square km
a.amenities_per_sqkm,
-- City averages
round(ca.city_avg_parks::numeric, 3) as city_avg_parks_per_1000,
round(ca.city_avg_schools::numeric, 3) as city_avg_schools_per_1000,
round(ca.city_avg_transit::numeric, 3) as city_avg_transit_per_1000,
-- Amenity index (100 = city average)
case
when ca.city_avg_total_amenities > 0
then round(a.total_amenities_per_1000 / ca.city_avg_total_amenities * 100, 1)
else null
end as amenity_index,
-- Category indices
case
when ca.city_avg_parks > 0
then round(a.parks_per_1000 / ca.city_avg_parks * 100, 1)
else null
end as parks_index,
case
when ca.city_avg_schools > 0
then round(a.schools_per_1000 / ca.city_avg_schools * 100, 1)
else null
end as schools_index,
case
when ca.city_avg_transit > 0
then round(a.transit_per_1000 / ca.city_avg_transit * 100, 1)
else null
end as transit_index,
-- Amenity tier (1 = best, 5 = lowest)
ntile(5) over (
partition by a.year
order by a.total_amenities_per_1000 desc
) as amenity_tier
from amenities a
left join city_avg ca on a.year = ca.year
)
select * from final

View File

@@ -0,0 +1,81 @@
-- Mart: Neighbourhood Demographics Analysis
-- Dashboard Tab: Demographics
-- Grain: One row per neighbourhood per census year
with demographics as (
select * from {{ ref('int_neighbourhood__demographics') }}
),
-- City-wide averages for comparison
city_avg as (
select
census_year,
avg(median_household_income) as city_avg_income,
avg(median_age) as city_avg_age,
avg(unemployment_rate) as city_avg_unemployment,
avg(education_bachelors_pct) as city_avg_education,
avg(population_density) as city_avg_density
from demographics
group by census_year
),
final as (
select
d.neighbourhood_id,
d.neighbourhood_name,
d.geometry,
d.census_year as year,
-- Population
d.population,
d.land_area_sqkm,
d.population_density,
-- Income
d.median_household_income,
d.average_household_income,
d.income_quintile,
-- Income index (100 = city average)
case
when ca.city_avg_income > 0
then round(d.median_household_income / ca.city_avg_income * 100, 1)
else null
end as income_index,
-- Demographics
d.median_age,
d.unemployment_rate,
d.education_bachelors_pct,
-- Age index (100 = city average)
case
when ca.city_avg_age > 0
then round(d.median_age / ca.city_avg_age * 100, 1)
else null
end as age_index,
-- Housing tenure
d.pct_owner_occupied,
d.pct_renter_occupied,
d.average_dwelling_value,
-- Diversity index (using tenure mix as proxy - higher rental = more diverse typically)
round(
1 - (
power(d.pct_owner_occupied / 100, 2) +
power(d.pct_renter_occupied / 100, 2)
),
3
) * 100 as tenure_diversity_index,
-- City comparisons
round(ca.city_avg_income::numeric, 2) as city_avg_income,
round(ca.city_avg_age::numeric, 1) as city_avg_age,
round(ca.city_avg_unemployment::numeric, 2) as city_avg_unemployment
from demographics d
left join city_avg ca on d.census_year = ca.census_year
)
select * from final

View File

@@ -0,0 +1,93 @@
-- Mart: Neighbourhood Housing Analysis
-- Dashboard Tab: Housing
-- Grain: One row per neighbourhood per year
with housing as (
select * from {{ ref('int_neighbourhood__housing') }}
),
rentals as (
select * from {{ ref('int_rentals__neighbourhood_allocated') }}
),
demographics as (
select * from {{ ref('int_neighbourhood__demographics') }}
),
-- Add year-over-year rent changes
with_yoy as (
select
h.*,
r.avg_rent_bachelor,
r.avg_rent_1bed,
r.avg_rent_3bed,
r.total_rental_units,
d.income_quintile,
-- Previous year rent for YoY calculation
lag(h.avg_rent_2bed, 1) over (
partition by h.neighbourhood_id
order by h.year
) as prev_year_rent_2bed
from housing h
left join rentals r
on h.neighbourhood_id = r.neighbourhood_id
and h.year = r.year
left join demographics d
on h.neighbourhood_id = d.neighbourhood_id
and h.year = d.census_year
),
final as (
select
neighbourhood_id,
neighbourhood_name,
geometry,
year,
-- Tenure mix
pct_owner_occupied,
pct_renter_occupied,
-- Housing values
average_dwelling_value,
median_household_income,
-- Rental metrics
avg_rent_bachelor,
avg_rent_1bed,
avg_rent_2bed,
avg_rent_3bed,
vacancy_rate,
total_rental_units,
-- Affordability
rent_to_income_pct,
is_affordable,
-- Affordability index (100 = city average)
round(
rent_to_income_pct / nullif(
avg(rent_to_income_pct) over (partition by year),
0
) * 100,
1
) as affordability_index,
-- Year-over-year rent change
case
when prev_year_rent_2bed > 0
then round(
(avg_rent_2bed - prev_year_rent_2bed) / prev_year_rent_2bed * 100,
2
)
else null
end as rent_yoy_change_pct,
income_quintile
from with_yoy
)
select * from final

View File

@@ -0,0 +1,110 @@
-- Mart: Neighbourhood Overview with Composite Livability Score
-- Dashboard Tab: Overview
-- Grain: One row per neighbourhood per year
with demographics as (
select * from {{ ref('int_neighbourhood__demographics') }}
),
housing as (
select * from {{ ref('int_neighbourhood__housing') }}
),
crime as (
select * from {{ ref('int_neighbourhood__crime_summary') }}
),
amenities as (
select * from {{ ref('int_neighbourhood__amenity_scores') }}
),
-- Compute percentile ranks for scoring components
percentiles as (
select
d.neighbourhood_id,
d.neighbourhood_name,
d.geometry,
d.census_year as year,
d.population,
d.median_household_income,
-- Safety score: inverse of crime rate (higher = safer)
case
when c.crime_rate_per_100k is not null
then 100 - percent_rank() over (
partition by d.census_year
order by c.crime_rate_per_100k
) * 100
else null
end as safety_score,
-- Affordability score: inverse of rent-to-income ratio
case
when h.rent_to_income_pct is not null
then 100 - percent_rank() over (
partition by d.census_year
order by h.rent_to_income_pct
) * 100
else null
end as affordability_score,
-- Amenity score: based on amenities per capita
case
when a.total_amenities_per_1000 is not null
then percent_rank() over (
partition by d.census_year
order by a.total_amenities_per_1000
) * 100
else null
end as amenity_score,
-- Raw metrics for reference
c.crime_rate_per_100k,
h.rent_to_income_pct,
h.avg_rent_2bed,
a.total_amenities_per_1000
from demographics d
left join housing h
on d.neighbourhood_id = h.neighbourhood_id
and d.census_year = h.year
left join crime c
on d.neighbourhood_id = c.neighbourhood_id
and d.census_year = c.year
left join amenities a
on d.neighbourhood_id = a.neighbourhood_id
and d.census_year = a.year
),
final as (
select
neighbourhood_id,
neighbourhood_name,
geometry,
year,
population,
median_household_income,
-- Component scores (0-100)
round(safety_score::numeric, 1) as safety_score,
round(affordability_score::numeric, 1) as affordability_score,
round(amenity_score::numeric, 1) as amenity_score,
-- Composite livability score: safety (30%), affordability (40%), amenities (30%)
round(
(coalesce(safety_score, 50) * 0.30 +
coalesce(affordability_score, 50) * 0.40 +
coalesce(amenity_score, 50) * 0.30)::numeric,
1
) as livability_score,
-- Raw metrics
crime_rate_per_100k,
rent_to_income_pct,
avg_rent_2bed,
total_amenities_per_1000
from percentiles
)
select * from final

View File

@@ -0,0 +1,78 @@
-- Mart: Neighbourhood Safety Analysis
-- Dashboard Tab: Safety
-- Grain: One row per neighbourhood per year
with crime as (
select * from {{ ref('int_neighbourhood__crime_summary') }}
),
-- City-wide averages for comparison
city_avg as (
select
year,
avg(crime_rate_per_100k) as city_avg_crime_rate,
avg(assault_count) as city_avg_assault,
avg(auto_theft_count) as city_avg_auto_theft,
avg(break_enter_count) as city_avg_break_enter
from crime
group by year
),
final as (
select
c.neighbourhood_id,
c.neighbourhood_name,
c.geometry,
c.population,
c.year,
-- Total crime
c.total_incidents,
c.crime_rate_per_100k,
c.yoy_change_pct as crime_yoy_change_pct,
-- Crime breakdown
c.assault_count,
c.auto_theft_count,
c.break_enter_count,
c.robbery_count,
c.theft_over_count,
c.homicide_count,
-- Per 100K rates by type
case when c.population > 0
then round(c.assault_count::numeric / c.population * 100000, 2)
else null
end as assault_rate_per_100k,
case when c.population > 0
then round(c.auto_theft_count::numeric / c.population * 100000, 2)
else null
end as auto_theft_rate_per_100k,
case when c.population > 0
then round(c.break_enter_count::numeric / c.population * 100000, 2)
else null
end as break_enter_rate_per_100k,
-- Comparison to city average
round(ca.city_avg_crime_rate::numeric, 2) as city_avg_crime_rate,
-- Crime index (100 = city average)
case
when ca.city_avg_crime_rate > 0
then round(c.crime_rate_per_100k / ca.city_avg_crime_rate * 100, 1)
else null
end as crime_index,
-- Safety tier based on crime rate percentile
ntile(5) over (
partition by c.year
order by c.crime_rate_per_100k desc
) as safety_tier
from crime c
left join city_avg ca on c.year = ca.year
)
select * from final

View File

@@ -41,3 +41,59 @@ sources:
columns: columns:
- name: event_id - name: event_id
description: "Primary key" description: "Primary key"
- name: fact_census
description: "Census demographics by neighbourhood and year"
columns:
- name: id
description: "Primary key"
- name: neighbourhood_id
description: "Foreign key to dim_neighbourhood"
- name: census_year
description: "Census year (2016, 2021, etc.)"
- name: population
description: "Total population"
- name: median_household_income
description: "Median household income"
- name: fact_crime
description: "Crime statistics by neighbourhood, year, and type"
columns:
- name: id
description: "Primary key"
- name: neighbourhood_id
description: "Foreign key to dim_neighbourhood"
- name: year
description: "Statistics year"
- name: crime_type
description: "Type of crime"
- name: count
description: "Number of incidents"
- name: rate_per_100k
description: "Rate per 100,000 population"
- name: fact_amenities
description: "Amenity counts by neighbourhood and type"
columns:
- name: id
description: "Primary key"
- name: neighbourhood_id
description: "Foreign key to dim_neighbourhood"
- name: amenity_type
description: "Type of amenity (parks, schools, transit)"
- name: count
description: "Number of amenities"
- name: year
description: "Reference year"
- name: bridge_cmhc_neighbourhood
description: "CMHC zone to neighbourhood mapping with area weights"
columns:
- name: id
description: "Primary key"
- name: cmhc_zone_code
description: "CMHC zone code"
- name: neighbourhood_id
description: "Neighbourhood ID"
- name: weight
description: "Proportional area weight (0-1)"

View File

@@ -40,3 +40,90 @@ models:
tests: tests:
- unique - unique
- not_null - not_null
- name: stg_toronto__neighbourhoods
description: "Staged Toronto neighbourhood dimension (158 official boundaries)"
columns:
- name: neighbourhood_id
description: "Neighbourhood primary key"
tests:
- unique
- not_null
- name: neighbourhood_name
description: "Official neighbourhood name"
tests:
- not_null
- name: geometry
description: "PostGIS geometry (POLYGON)"
- name: stg_toronto__census
description: "Staged census demographics by neighbourhood"
columns:
- name: census_id
description: "Census record identifier"
tests:
- unique
- not_null
- name: neighbourhood_id
description: "Neighbourhood foreign key"
tests:
- not_null
- name: census_year
description: "Census year (2016, 2021)"
tests:
- not_null
- name: stg_toronto__crime
description: "Staged crime statistics by neighbourhood"
columns:
- name: crime_id
description: "Crime record identifier"
tests:
- unique
- not_null
- name: neighbourhood_id
description: "Neighbourhood foreign key"
tests:
- not_null
- name: crime_type
description: "Type of crime"
tests:
- not_null
- name: stg_toronto__amenities
description: "Staged amenity counts by neighbourhood"
columns:
- name: amenity_id
description: "Amenity record identifier"
tests:
- unique
- not_null
- name: neighbourhood_id
description: "Neighbourhood foreign key"
tests:
- not_null
- name: amenity_type
description: "Type of amenity"
tests:
- not_null
- name: stg_cmhc__zone_crosswalk
description: "Staged CMHC zone to neighbourhood crosswalk with area weights"
columns:
- name: crosswalk_id
description: "Crosswalk record identifier"
tests:
- unique
- not_null
- name: cmhc_zone_code
description: "CMHC zone code"
tests:
- not_null
- name: neighbourhood_id
description: "Neighbourhood foreign key"
tests:
- not_null
- name: area_weight
description: "Proportional area weight (0-1)"
tests:
- not_null

View File

@@ -0,0 +1,18 @@
-- Staged CMHC zone to neighbourhood crosswalk
-- Source: bridge_cmhc_neighbourhood table
-- Grain: One row per zone-neighbourhood intersection
with source as (
select * from {{ source('toronto_housing', 'bridge_cmhc_neighbourhood') }}
),
staged as (
select
id as crosswalk_id,
cmhc_zone_code,
neighbourhood_id,
weight as area_weight
from source
)
select * from staged

View File

@@ -0,0 +1,19 @@
-- Staged amenity counts by neighbourhood
-- Source: fact_amenities table
-- Grain: One row per neighbourhood per amenity type per year
with source as (
select * from {{ source('toronto_housing', 'fact_amenities') }}
),
staged as (
select
id as amenity_id,
neighbourhood_id,
amenity_type,
count as amenity_count,
year as amenity_year
from source
)
select * from staged

View File

@@ -0,0 +1,27 @@
-- Staged census demographics by neighbourhood
-- Source: fact_census table
-- Grain: One row per neighbourhood per census year
with source as (
select * from {{ source('toronto_housing', 'fact_census') }}
),
staged as (
select
id as census_id,
neighbourhood_id,
census_year,
population,
population_density,
median_household_income,
average_household_income,
unemployment_rate,
pct_bachelors_or_higher,
pct_owner_occupied,
pct_renter_occupied,
median_age,
average_dwelling_value
from source
)
select * from staged

View File

@@ -0,0 +1,20 @@
-- Staged crime statistics by neighbourhood
-- Source: fact_crime table
-- Grain: One row per neighbourhood per year per crime type
with source as (
select * from {{ source('toronto_housing', 'fact_crime') }}
),
staged as (
select
id as crime_id,
neighbourhood_id,
year as crime_year,
crime_type,
count as incident_count,
rate_per_100k
from source
)
select * from staged

View File

@@ -0,0 +1,25 @@
-- Staged Toronto neighbourhood dimension
-- Source: dim_neighbourhood table
-- Grain: One row per neighbourhood (158 total)
with source as (
select * from {{ source('toronto_housing', 'dim_neighbourhood') }}
),
staged as (
select
neighbourhood_id,
name as neighbourhood_name,
geometry,
population,
land_area_sqkm,
pop_density_per_sqkm,
pct_bachelors_or_higher,
median_household_income,
pct_owner_occupied,
pct_renter_occupied,
census_year
from source
)
select * from staged

11
dbt/package-lock.yml Normal file
View File

@@ -0,0 +1,11 @@
packages:
- name: dbt_utils
package: dbt-labs/dbt_utils
version: 1.3.3
- name: dbt_expectations
package: calogica/dbt_expectations
version: 0.10.4
- name: dbt_date
package: calogica/dbt_date
version: 0.10.1
sha1_hash: 51a51ab489f7b302c8745ae3c3781271816b01be

View File

@@ -0,0 +1,50 @@
# Project Lessons Learned
This folder contains lessons learned from sprints and development work. These lessons help prevent repeating mistakes and capture valuable insights.
**Note:** This is a temporary local backup while Wiki.js integration is being configured. Once Wiki.js is ready, lessons will be migrated there for better searchability.
---
## Lessons Index
| Date | Sprint/Phase | Title | Tags |
|------|--------------|-------|------|
| 2026-01-16 | Phase 4 | [dbt Test Syntax Deprecation](./phase-4-dbt-test-syntax.md) | dbt, testing, yaml, deprecation |
---
## How to Use
### When Starting a Sprint
1. Review relevant lessons in this folder before implementation
2. Search by tags or keywords to find applicable insights
3. Apply prevention strategies from past lessons
### When Closing a Sprint
1. Document any significant lessons learned
2. Use the template below
3. Add entry to the index table above
---
## Lesson Template
```markdown
# [Sprint/Phase] - [Lesson Title]
## Context
[What were you trying to do?]
## Problem
[What went wrong or what insight emerged?]
## Solution
[How did you solve it?]
## Prevention
[How can this be avoided in future sprints?]
## Tags
[Comma-separated tags for search]
```

View File

@@ -0,0 +1,38 @@
# Phase 4 - dbt Test Syntax Deprecation
## Context
Implementing dbt mart models with `accepted_values` tests for tier columns (safety_tier, income_quintile, amenity_tier) that should only contain values 1-5.
## Problem
dbt 1.9+ introduced a deprecation warning for generic test arguments. The old syntax:
```yaml
tests:
- accepted_values:
values: [1, 2, 3, 4, 5]
```
Produces deprecation warnings:
```
MissingArgumentsPropertyInGenericTestDeprecation: Arguments to generic tests should be nested under the `arguments` property.
```
## Solution
Nest test arguments under the `arguments` property:
```yaml
tests:
- accepted_values:
arguments:
values: [1, 2, 3, 4, 5]
```
This applies to all generic tests with arguments, not just `accepted_values`.
## Prevention
- When writing dbt schema YAML files, always use the `arguments:` nesting for generic tests
- Run `dbt parse --no-partial-parse` to catch all deprecation warnings before they become errors
- Check dbt changelog when upgrading versions for breaking changes to test syntax
## Tags
dbt, testing, yaml, deprecation, syntax, schema

View File

@@ -1,7 +1,15 @@
"""Database loaders for Toronto housing data.""" """Database loaders for Toronto housing data."""
from .amenities import load_amenities, load_amenity_counts
from .base import bulk_insert, get_session, upsert_by_key from .base import bulk_insert, get_session, upsert_by_key
from .census import load_census_data
from .cmhc import load_cmhc_record, load_cmhc_rentals from .cmhc import load_cmhc_record, load_cmhc_rentals
from .cmhc_crosswalk import (
build_cmhc_neighbourhood_crosswalk,
disaggregate_zone_value,
get_neighbourhood_weights_for_zone,
)
from .crime import load_crime_data
from .dimensions import ( from .dimensions import (
generate_date_key, generate_date_key,
load_cmhc_zones, load_cmhc_zones,
@@ -24,4 +32,13 @@ __all__ = [
# Fact loaders # Fact loaders
"load_cmhc_rentals", "load_cmhc_rentals",
"load_cmhc_record", "load_cmhc_record",
# Phase 3 loaders
"load_census_data",
"load_crime_data",
"load_amenities",
"load_amenity_counts",
# CMHC crosswalk
"build_cmhc_neighbourhood_crosswalk",
"get_neighbourhood_weights_for_zone",
"disaggregate_zone_value",
] ]

View File

@@ -0,0 +1,93 @@
"""Loader for amenities data to fact_amenities table."""
from collections import Counter
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import FactAmenities
from portfolio_app.toronto.schemas import AmenityCount, AmenityRecord
from .base import get_session, upsert_by_key
def load_amenities(
records: list[AmenityRecord],
year: int,
session: Session | None = None,
) -> int:
"""Load amenity records to fact_amenities table.
Aggregates individual amenity records into counts by neighbourhood
and amenity type before loading.
Args:
records: List of validated AmenityRecord schemas.
year: Year to associate with the amenity counts.
session: Optional existing session.
Returns:
Number of records loaded (inserted + updated).
"""
# Aggregate records by neighbourhood and amenity type
counts: Counter[tuple[int, str]] = Counter()
for r in records:
key = (r.neighbourhood_id, r.amenity_type.value)
counts[key] += 1
# Convert to AmenityCount schemas then to models
def _load(sess: Session) -> int:
models = []
for (neighbourhood_id, amenity_type), count in counts.items():
model = FactAmenities(
neighbourhood_id=neighbourhood_id,
amenity_type=amenity_type,
count=count,
year=year,
)
models.append(model)
inserted, updated = upsert_by_key(
sess, FactAmenities, models, ["neighbourhood_id", "amenity_type", "year"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_amenity_counts(
records: list[AmenityCount],
session: Session | None = None,
) -> int:
"""Load pre-aggregated amenity counts to fact_amenities table.
Args:
records: List of validated AmenityCount schemas.
session: Optional existing session.
Returns:
Number of records loaded (inserted + updated).
"""
def _load(sess: Session) -> int:
models = []
for r in records:
model = FactAmenities(
neighbourhood_id=r.neighbourhood_id,
amenity_type=r.amenity_type.value,
count=r.count,
year=r.year,
)
models.append(model)
inserted, updated = upsert_by_key(
sess, FactAmenities, models, ["neighbourhood_id", "amenity_type", "year"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)

View File

@@ -0,0 +1,68 @@
"""Loader for census data to fact_census table."""
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import FactCensus
from portfolio_app.toronto.schemas import CensusRecord
from .base import get_session, upsert_by_key
def load_census_data(
records: list[CensusRecord],
session: Session | None = None,
) -> int:
"""Load census records to fact_census table.
Args:
records: List of validated CensusRecord schemas.
session: Optional existing session.
Returns:
Number of records loaded (inserted + updated).
"""
def _load(sess: Session) -> int:
models = []
for r in records:
model = FactCensus(
neighbourhood_id=r.neighbourhood_id,
census_year=r.census_year,
population=r.population,
population_density=float(r.population_density)
if r.population_density
else None,
median_household_income=float(r.median_household_income)
if r.median_household_income
else None,
average_household_income=float(r.average_household_income)
if r.average_household_income
else None,
unemployment_rate=float(r.unemployment_rate)
if r.unemployment_rate
else None,
pct_bachelors_or_higher=float(r.pct_bachelors_or_higher)
if r.pct_bachelors_or_higher
else None,
pct_owner_occupied=float(r.pct_owner_occupied)
if r.pct_owner_occupied
else None,
pct_renter_occupied=float(r.pct_renter_occupied)
if r.pct_renter_occupied
else None,
median_age=float(r.median_age) if r.median_age else None,
average_dwelling_value=float(r.average_dwelling_value)
if r.average_dwelling_value
else None,
)
models.append(model)
inserted, updated = upsert_by_key(
sess, FactCensus, models, ["neighbourhood_id", "census_year"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)

View File

@@ -0,0 +1,131 @@
"""Loader for CMHC zone to neighbourhood crosswalk with area weights."""
from sqlalchemy import text
from sqlalchemy.orm import Session
from .base import get_session
def build_cmhc_neighbourhood_crosswalk(
session: Session | None = None,
) -> int:
"""Calculate area overlap weights between CMHC zones and neighbourhoods.
Uses PostGIS ST_Intersection and ST_Area functions to compute the
proportion of each CMHC zone that overlaps with each neighbourhood.
This enables disaggregation of CMHC zone-level data to neighbourhood level.
The function is idempotent - it clears existing crosswalk data before
rebuilding.
Args:
session: Optional existing session.
Returns:
Number of bridge records created.
Note:
Requires both dim_cmhc_zone and dim_neighbourhood tables to have
geometry columns populated with valid PostGIS geometries.
"""
def _build(sess: Session) -> int:
# Clear existing crosswalk data
sess.execute(text("DELETE FROM bridge_cmhc_neighbourhood"))
# Calculate overlap weights using PostGIS
# Weight = area of intersection / total area of CMHC zone
crosswalk_query = text(
"""
INSERT INTO bridge_cmhc_neighbourhood (cmhc_zone_code, neighbourhood_id, weight)
SELECT
z.zone_code,
n.neighbourhood_id,
CASE
WHEN ST_Area(z.geometry::geography) > 0 THEN
ST_Area(ST_Intersection(z.geometry, n.geometry)::geography) /
ST_Area(z.geometry::geography)
ELSE 0
END as weight
FROM dim_cmhc_zone z
JOIN dim_neighbourhood n
ON ST_Intersects(z.geometry, n.geometry)
WHERE
z.geometry IS NOT NULL
AND n.geometry IS NOT NULL
AND ST_Area(ST_Intersection(z.geometry, n.geometry)::geography) > 0
"""
)
sess.execute(crosswalk_query)
# Count records created
count_result = sess.execute(
text("SELECT COUNT(*) FROM bridge_cmhc_neighbourhood")
)
count = count_result.scalar() or 0
return int(count)
if session:
return _build(session)
with get_session() as sess:
return _build(sess)
def get_neighbourhood_weights_for_zone(
zone_code: str,
session: Session | None = None,
) -> list[tuple[int, float]]:
"""Get neighbourhood weights for a specific CMHC zone.
Args:
zone_code: CMHC zone code.
session: Optional existing session.
Returns:
List of (neighbourhood_id, weight) tuples.
"""
def _get(sess: Session) -> list[tuple[int, float]]:
result = sess.execute(
text(
"""
SELECT neighbourhood_id, weight
FROM bridge_cmhc_neighbourhood
WHERE cmhc_zone_code = :zone_code
ORDER BY weight DESC
"""
),
{"zone_code": zone_code},
)
return [(int(row[0]), float(row[1])) for row in result]
if session:
return _get(session)
with get_session() as sess:
return _get(sess)
def disaggregate_zone_value(
zone_code: str,
value: float,
session: Session | None = None,
) -> dict[int, float]:
"""Disaggregate a CMHC zone value to neighbourhoods using weights.
Args:
zone_code: CMHC zone code.
value: Value to disaggregate (e.g., average rent).
session: Optional existing session.
Returns:
Dictionary mapping neighbourhood_id to weighted value.
Note:
For averages (like rent), the weighted value represents the
contribution from this zone. To get a neighbourhood's total,
sum contributions from all overlapping zones.
"""
weights = get_neighbourhood_weights_for_zone(zone_code, session)
return {neighbourhood_id: value * weight for neighbourhood_id, weight in weights}

View File

@@ -0,0 +1,45 @@
"""Loader for crime data to fact_crime table."""
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import FactCrime
from portfolio_app.toronto.schemas import CrimeRecord
from .base import get_session, upsert_by_key
def load_crime_data(
records: list[CrimeRecord],
session: Session | None = None,
) -> int:
"""Load crime records to fact_crime table.
Args:
records: List of validated CrimeRecord schemas.
session: Optional existing session.
Returns:
Number of records loaded (inserted + updated).
"""
def _load(sess: Session) -> int:
models = []
for r in records:
model = FactCrime(
neighbourhood_id=r.neighbourhood_id,
year=r.year,
crime_type=r.crime_type.value,
count=r.count,
rate_per_100k=float(r.rate_per_100k) if r.rate_per_100k else None,
)
models.append(model)
inserted, updated = upsert_by_key(
sess, FactCrime, models, ["neighbourhood_id", "year", "crime_type"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)

View File

@@ -7,7 +7,13 @@ from .dimensions import (
DimPolicyEvent, DimPolicyEvent,
DimTime, DimTime,
) )
from .facts import FactRentals from .facts import (
BridgeCMHCNeighbourhood,
FactAmenities,
FactCensus,
FactCrime,
FactRentals,
)
__all__ = [ __all__ = [
# Base # Base
@@ -22,4 +28,9 @@ __all__ = [
"DimPolicyEvent", "DimPolicyEvent",
# Facts # Facts
"FactRentals", "FactRentals",
"FactCensus",
"FactCrime",
"FactAmenities",
# Bridge tables
"BridgeCMHCNeighbourhood",
] ]

View File

@@ -1,11 +1,117 @@
"""SQLAlchemy models for fact tables.""" """SQLAlchemy models for fact tables."""
from sqlalchemy import ForeignKey, Integer, Numeric, String from sqlalchemy import ForeignKey, Index, Integer, Numeric, String
from sqlalchemy.orm import Mapped, mapped_column, relationship from sqlalchemy.orm import Mapped, mapped_column, relationship
from .base import Base from .base import Base
class BridgeCMHCNeighbourhood(Base):
"""Bridge table for CMHC zone to neighbourhood mapping with area weights.
Enables disaggregation of CMHC zone-level rental data to neighbourhood level
using area-based proportional weights computed via PostGIS.
"""
__tablename__ = "bridge_cmhc_neighbourhood"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
cmhc_zone_code: Mapped[str] = mapped_column(String(10), nullable=False)
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
weight: Mapped[float] = mapped_column(
Numeric(5, 4), nullable=False
) # 0.0000 to 1.0000
__table_args__ = (
Index("ix_bridge_cmhc_zone", "cmhc_zone_code"),
Index("ix_bridge_neighbourhood", "neighbourhood_id"),
)
class FactCensus(Base):
"""Census statistics by neighbourhood and year.
Grain: One row per neighbourhood per census year.
"""
__tablename__ = "fact_census"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
census_year: Mapped[int] = mapped_column(Integer, nullable=False)
population: Mapped[int | None] = mapped_column(Integer, nullable=True)
population_density: Mapped[float | None] = mapped_column(
Numeric(10, 2), nullable=True
)
median_household_income: Mapped[float | None] = mapped_column(
Numeric(12, 2), nullable=True
)
average_household_income: Mapped[float | None] = mapped_column(
Numeric(12, 2), nullable=True
)
unemployment_rate: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
pct_bachelors_or_higher: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
pct_owner_occupied: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
pct_renter_occupied: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
median_age: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
average_dwelling_value: Mapped[float | None] = mapped_column(
Numeric(12, 2), nullable=True
)
__table_args__ = (
Index("ix_fact_census_neighbourhood_year", "neighbourhood_id", "census_year"),
)
class FactCrime(Base):
"""Crime statistics by neighbourhood and year.
Grain: One row per neighbourhood per year per crime type.
"""
__tablename__ = "fact_crime"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
year: Mapped[int] = mapped_column(Integer, nullable=False)
crime_type: Mapped[str] = mapped_column(String(50), nullable=False)
count: Mapped[int] = mapped_column(Integer, nullable=False)
rate_per_100k: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
__table_args__ = (
Index("ix_fact_crime_neighbourhood_year", "neighbourhood_id", "year"),
Index("ix_fact_crime_type", "crime_type"),
)
class FactAmenities(Base):
"""Amenity counts by neighbourhood.
Grain: One row per neighbourhood per amenity type per year.
"""
__tablename__ = "fact_amenities"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
amenity_type: Mapped[str] = mapped_column(String(50), nullable=False)
count: Mapped[int] = mapped_column(Integer, nullable=False)
year: Mapped[int] = mapped_column(Integer, nullable=False)
__table_args__ = (
Index("ix_fact_amenities_neighbourhood_year", "neighbourhood_id", "year"),
Index("ix_fact_amenities_type", "amenity_type"),
)
class FactRentals(Base): class FactRentals(Base):
"""Fact table for CMHC rental market data. """Fact table for CMHC rental market data.

View File

@@ -6,6 +6,8 @@ from .geo import (
NeighbourhoodParser, NeighbourhoodParser,
load_geojson, load_geojson,
) )
from .toronto_open_data import TorontoOpenDataParser
from .toronto_police import TorontoPoliceParser
__all__ = [ __all__ = [
"CMHCParser", "CMHCParser",
@@ -13,4 +15,7 @@ __all__ = [
"CMHCZoneParser", "CMHCZoneParser",
"NeighbourhoodParser", "NeighbourhoodParser",
"load_geojson", "load_geojson",
# API parsers (Phase 3)
"TorontoOpenDataParser",
"TorontoPoliceParser",
] ]

View File

@@ -0,0 +1,391 @@
"""Parser for Toronto Open Data CKAN API.
Fetches neighbourhood boundaries, census profiles, and amenities data
from the City of Toronto's Open Data Portal.
API Documentation: https://open.toronto.ca/dataset/
"""
import json
import logging
from decimal import Decimal
from pathlib import Path
from typing import Any
import httpx
from portfolio_app.toronto.schemas import (
AmenityRecord,
AmenityType,
CensusRecord,
NeighbourhoodRecord,
)
logger = logging.getLogger(__name__)
class TorontoOpenDataParser:
"""Parser for Toronto Open Data CKAN API.
Provides methods to fetch and parse neighbourhood boundaries, census profiles,
and amenities (parks, schools, childcare) from the Toronto Open Data portal.
"""
BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
API_PATH = "/api/3/action"
# Dataset package IDs
DATASETS = {
"neighbourhoods": "neighbourhoods",
"neighbourhood_profiles": "neighbourhood-profiles",
"parks": "parks",
"schools": "school-locations-all-types",
"childcare": "licensed-child-care-centres",
}
def __init__(
self,
cache_dir: Path | None = None,
timeout: float = 30.0,
) -> None:
"""Initialize parser.
Args:
cache_dir: Optional directory for caching API responses.
timeout: HTTP request timeout in seconds.
"""
self._cache_dir = cache_dir
self._timeout = timeout
self._client: httpx.Client | None = None
@property
def client(self) -> httpx.Client:
"""Lazy-initialize HTTP client."""
if self._client is None:
self._client = httpx.Client(
base_url=self.BASE_URL,
timeout=self._timeout,
headers={"Accept": "application/json"},
)
return self._client
def close(self) -> None:
"""Close HTTP client."""
if self._client is not None:
self._client.close()
self._client = None
def __enter__(self) -> "TorontoOpenDataParser":
return self
def __exit__(self, *args: Any) -> None:
self.close()
def _get_package(self, package_id: str) -> dict[str, Any]:
"""Fetch package metadata from CKAN.
Args:
package_id: The package/dataset ID.
Returns:
Package metadata dictionary.
"""
response = self.client.get(
f"{self.API_PATH}/package_show",
params={"id": package_id},
)
response.raise_for_status()
result = response.json()
if not result.get("success"):
raise ValueError(f"CKAN API error: {result.get('error', 'Unknown error')}")
return dict(result["result"])
def _get_resource_url(
self,
package_id: str,
format_filter: str = "geojson",
) -> str:
"""Get the download URL for a resource in a package.
Args:
package_id: The package/dataset ID.
format_filter: Resource format to filter by (e.g., 'geojson', 'csv').
Returns:
Resource download URL.
Raises:
ValueError: If no matching resource is found.
"""
package = self._get_package(package_id)
resources = package.get("resources", [])
for resource in resources:
resource_format = resource.get("format", "").lower()
if format_filter.lower() in resource_format:
return str(resource["url"])
available = [r.get("format") for r in resources]
raise ValueError(
f"No {format_filter} resource in {package_id}. Available: {available}"
)
def _fetch_geojson(self, package_id: str) -> dict[str, Any]:
"""Fetch GeoJSON data from a package.
Args:
package_id: The package/dataset ID.
Returns:
GeoJSON FeatureCollection.
"""
# Check cache first
if self._cache_dir:
cache_file = self._cache_dir / f"{package_id}.geojson"
if cache_file.exists():
logger.debug(f"Loading {package_id} from cache")
with open(cache_file, encoding="utf-8") as f:
return dict(json.load(f))
url = self._get_resource_url(package_id, format_filter="geojson")
logger.info(f"Fetching GeoJSON from {url}")
response = self.client.get(url)
response.raise_for_status()
data = response.json()
# Cache the response
if self._cache_dir:
self._cache_dir.mkdir(parents=True, exist_ok=True)
cache_file = self._cache_dir / f"{package_id}.geojson"
with open(cache_file, "w", encoding="utf-8") as f:
json.dump(data, f)
return dict(data)
def _fetch_csv_as_json(self, package_id: str) -> list[dict[str, Any]]:
"""Fetch CSV data as JSON records via CKAN datastore.
Args:
package_id: The package/dataset ID.
Returns:
List of records as dictionaries.
"""
package = self._get_package(package_id)
resources = package.get("resources", [])
# Find a datastore-enabled resource
for resource in resources:
if resource.get("datastore_active"):
resource_id = resource["id"]
break
else:
raise ValueError(f"No datastore resource in {package_id}")
# Fetch all records via datastore_search
records: list[dict[str, Any]] = []
offset = 0
limit = 1000
while True:
response = self.client.get(
f"{self.API_PATH}/datastore_search",
params={"id": resource_id, "limit": limit, "offset": offset},
)
response.raise_for_status()
result = response.json()
if not result.get("success"):
raise ValueError(f"Datastore error: {result.get('error')}")
batch = result["result"]["records"]
records.extend(batch)
if len(batch) < limit:
break
offset += limit
return records
def get_neighbourhoods(self) -> list[NeighbourhoodRecord]:
"""Fetch 158 Toronto neighbourhood boundaries.
Returns:
List of validated NeighbourhoodRecord objects.
"""
geojson = self._fetch_geojson(self.DATASETS["neighbourhoods"])
features = geojson.get("features", [])
records = []
for feature in features:
props = feature.get("properties", {})
geometry = feature.get("geometry")
# Extract area_id from various possible property names
area_id = props.get("AREA_ID") or props.get("area_id")
if area_id is None:
# Try AREA_SHORT_CODE as fallback
short_code = props.get("AREA_SHORT_CODE", "")
if short_code:
# Extract numeric part
area_id = int("".join(c for c in short_code if c.isdigit()) or "0")
area_name = (
props.get("AREA_NAME")
or props.get("area_name")
or f"Neighbourhood {area_id}"
)
area_short_code = props.get("AREA_SHORT_CODE") or props.get(
"area_short_code"
)
records.append(
NeighbourhoodRecord(
area_id=int(area_id),
area_name=str(area_name),
area_short_code=area_short_code,
geometry=geometry,
)
)
logger.info(f"Parsed {len(records)} neighbourhoods")
return records
def get_census_profiles(self, year: int = 2021) -> list[CensusRecord]:
"""Fetch neighbourhood census profiles.
Note: Census profile data structure varies by year. This method
extracts key demographic indicators where available.
Args:
year: Census year (2016 or 2021).
Returns:
List of validated CensusRecord objects.
"""
# Census profiles are typically in CSV/datastore format
try:
raw_records = self._fetch_csv_as_json(
self.DATASETS["neighbourhood_profiles"]
)
except ValueError as e:
logger.warning(f"Could not fetch census profiles: {e}")
return []
# Census profiles are pivoted - rows are indicators, columns are neighbourhoods
# This requires special handling based on the actual data structure
logger.info(f"Fetched {len(raw_records)} census profile rows")
# For now, return empty list - actual implementation depends on data structure
# TODO: Implement census profile parsing based on actual data format
return []
def get_parks(self) -> list[AmenityRecord]:
"""Fetch park locations.
Returns:
List of validated AmenityRecord objects.
"""
return self._fetch_amenities(
self.DATASETS["parks"],
AmenityType.PARK,
name_field="ASSET_NAME",
address_field="ADDRESS_FULL",
)
def get_schools(self) -> list[AmenityRecord]:
"""Fetch school locations.
Returns:
List of validated AmenityRecord objects.
"""
return self._fetch_amenities(
self.DATASETS["schools"],
AmenityType.SCHOOL,
name_field="NAME",
address_field="ADDRESS_FULL",
)
def get_childcare_centres(self) -> list[AmenityRecord]:
"""Fetch licensed childcare centre locations.
Returns:
List of validated AmenityRecord objects.
"""
return self._fetch_amenities(
self.DATASETS["childcare"],
AmenityType.CHILDCARE,
name_field="LOC_NAME",
address_field="ADDRESS",
)
def _fetch_amenities(
self,
package_id: str,
amenity_type: AmenityType,
name_field: str,
address_field: str,
) -> list[AmenityRecord]:
"""Fetch and parse amenity data from GeoJSON.
Args:
package_id: CKAN package ID.
amenity_type: Type of amenity.
name_field: Property name containing amenity name.
address_field: Property name containing address.
Returns:
List of AmenityRecord objects.
"""
try:
geojson = self._fetch_geojson(package_id)
except (httpx.HTTPError, ValueError) as e:
logger.warning(f"Could not fetch {package_id}: {e}")
return []
features = geojson.get("features", [])
records = []
for feature in features:
props = feature.get("properties", {})
geometry = feature.get("geometry")
# Get coordinates from geometry
lat, lon = None, None
if geometry and geometry.get("type") == "Point":
coords = geometry.get("coordinates", [])
if len(coords) >= 2:
lon, lat = coords[0], coords[1]
# Try to determine neighbourhood_id
# Many datasets include AREA_ID or similar
neighbourhood_id = (
props.get("AREA_ID")
or props.get("area_id")
or props.get("NEIGHBOURHOOD_ID")
or 0 # Will need spatial join if not available
)
name = props.get(name_field) or props.get(name_field.lower()) or "Unknown"
address = props.get(address_field) or props.get(address_field.lower())
# Skip if we don't have a neighbourhood assignment
if neighbourhood_id == 0:
continue
records.append(
AmenityRecord(
neighbourhood_id=int(neighbourhood_id),
amenity_type=amenity_type,
amenity_name=str(name)[:200],
address=str(address)[:300] if address else None,
latitude=Decimal(str(lat)) if lat else None,
longitude=Decimal(str(lon)) if lon else None,
)
)
logger.info(f"Parsed {len(records)} {amenity_type.value} records")
return records

View File

@@ -0,0 +1,371 @@
"""Parser for Toronto Police crime data via CKAN API.
Fetches neighbourhood crime rates and major crime indicators from the
Toronto Police Service data hosted on Toronto Open Data Portal.
Data Sources:
- Neighbourhood Crime Rates: Annual crime rates by neighbourhood
- Major Crime Indicators (MCI): Detailed incident-level data
"""
import contextlib
import logging
from decimal import Decimal
from typing import Any
import httpx
from portfolio_app.toronto.schemas import CrimeRecord, CrimeType
logger = logging.getLogger(__name__)
# Mapping from Toronto Police crime categories to CrimeType enum
CRIME_TYPE_MAPPING: dict[str, CrimeType] = {
"assault": CrimeType.ASSAULT,
"assaults": CrimeType.ASSAULT,
"auto theft": CrimeType.AUTO_THEFT,
"autotheft": CrimeType.AUTO_THEFT,
"auto_theft": CrimeType.AUTO_THEFT,
"break and enter": CrimeType.BREAK_AND_ENTER,
"breakenter": CrimeType.BREAK_AND_ENTER,
"break_and_enter": CrimeType.BREAK_AND_ENTER,
"homicide": CrimeType.HOMICIDE,
"homicides": CrimeType.HOMICIDE,
"robbery": CrimeType.ROBBERY,
"robberies": CrimeType.ROBBERY,
"shooting": CrimeType.SHOOTING,
"shootings": CrimeType.SHOOTING,
"theft over": CrimeType.THEFT_OVER,
"theftover": CrimeType.THEFT_OVER,
"theft_over": CrimeType.THEFT_OVER,
"theft from motor vehicle": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
"theftfrommv": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
"theft_from_mv": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
}
def _normalize_crime_type(crime_str: str) -> CrimeType:
"""Normalize crime type string to CrimeType enum.
Args:
crime_str: Raw crime type string from data source.
Returns:
Matched CrimeType enum value, or CrimeType.OTHER if no match.
"""
normalized = crime_str.lower().strip().replace("-", " ").replace("_", " ")
return CRIME_TYPE_MAPPING.get(normalized, CrimeType.OTHER)
class TorontoPoliceParser:
"""Parser for Toronto Police crime data via CKAN API.
Crime data is hosted on Toronto Open Data Portal but sourced from
Toronto Police Service.
"""
BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
API_PATH = "/api/3/action"
# Dataset package IDs
DATASETS = {
"crime_rates": "neighbourhood-crime-rates",
"mci": "major-crime-indicators",
"shootings": "shootings-firearm-discharges",
}
def __init__(self, timeout: float = 30.0) -> None:
"""Initialize parser.
Args:
timeout: HTTP request timeout in seconds.
"""
self._timeout = timeout
self._client: httpx.Client | None = None
@property
def client(self) -> httpx.Client:
"""Lazy-initialize HTTP client."""
if self._client is None:
self._client = httpx.Client(
base_url=self.BASE_URL,
timeout=self._timeout,
headers={"Accept": "application/json"},
)
return self._client
def close(self) -> None:
"""Close HTTP client."""
if self._client is not None:
self._client.close()
self._client = None
def __enter__(self) -> "TorontoPoliceParser":
return self
def __exit__(self, *args: Any) -> None:
self.close()
def _get_package(self, package_id: str) -> dict[str, Any]:
"""Fetch package metadata from CKAN."""
response = self.client.get(
f"{self.API_PATH}/package_show",
params={"id": package_id},
)
response.raise_for_status()
result = response.json()
if not result.get("success"):
raise ValueError(f"CKAN API error: {result.get('error', 'Unknown error')}")
return dict(result["result"])
def _fetch_datastore_records(
self,
package_id: str,
filters: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
"""Fetch records from CKAN datastore.
Args:
package_id: CKAN package ID.
filters: Optional filters to apply.
Returns:
List of records as dictionaries.
"""
package = self._get_package(package_id)
resources = package.get("resources", [])
# Find datastore-enabled resource
resource_id = None
for resource in resources:
if resource.get("datastore_active"):
resource_id = resource["id"]
break
if not resource_id:
raise ValueError(f"No datastore resource in {package_id}")
# Fetch all records
records: list[dict[str, Any]] = []
offset = 0
limit = 1000
while True:
params: dict[str, Any] = {
"id": resource_id,
"limit": limit,
"offset": offset,
}
if filters:
params["filters"] = str(filters)
response = self.client.get(
f"{self.API_PATH}/datastore_search",
params=params,
)
response.raise_for_status()
result = response.json()
if not result.get("success"):
raise ValueError(f"Datastore error: {result.get('error')}")
batch = result["result"]["records"]
records.extend(batch)
if len(batch) < limit:
break
offset += limit
return records
def get_crime_rates(
self,
years: list[int] | None = None,
) -> list[CrimeRecord]:
"""Fetch neighbourhood crime rates.
The crime rates dataset contains annual counts and rates per 100k
population for each neighbourhood.
Args:
years: Optional list of years to filter. If None, fetches all.
Returns:
List of validated CrimeRecord objects.
"""
try:
raw_records = self._fetch_datastore_records(self.DATASETS["crime_rates"])
except (httpx.HTTPError, ValueError) as e:
logger.warning(f"Could not fetch crime rates: {e}")
return []
records = []
for row in raw_records:
# Extract neighbourhood ID (Hood_ID maps to AREA_ID)
hood_id = row.get("HOOD_ID") or row.get("Hood_ID") or row.get("hood_id")
if not hood_id:
continue
try:
neighbourhood_id = int(hood_id)
except (ValueError, TypeError):
continue
# Crime rate data typically has columns like:
# ASSAULT_2019, ASSAULT_RATE_2019, AUTOTHEFT_2020, etc.
# We need to parse column names to extract crime type and year
for col_name, value in row.items():
if value is None or col_name in (
"_id",
"HOOD_ID",
"Hood_ID",
"hood_id",
"AREA_NAME",
"NEIGHBOURHOOD",
):
continue
# Try to parse column name for crime type and year
# Pattern: CRIMETYPE_YEAR or CRIMETYPE_RATE_YEAR
parts = col_name.upper().split("_")
if len(parts) < 2:
continue
# Check if last part is a year
try:
year = int(parts[-1])
if year < 2014 or year > 2030:
continue
except ValueError:
continue
# Filter by years if specified
if years and year not in years:
continue
# Check if this is a rate column
is_rate = "RATE" in parts
# Extract crime type (everything before RATE/year)
if is_rate:
rate_idx = parts.index("RATE")
crime_type_str = "_".join(parts[:rate_idx])
else:
crime_type_str = "_".join(parts[:-1])
crime_type = _normalize_crime_type(crime_type_str)
try:
numeric_value = Decimal(str(value))
except (ValueError, TypeError):
continue
if is_rate:
# This is a rate column - look for corresponding count
# We'll skip rate-only entries and create records from counts
continue
# Find corresponding rate if available
rate_col = f"{crime_type_str}_RATE_{year}"
rate_value = row.get(rate_col)
rate_per_100k = None
if rate_value is not None:
with contextlib.suppress(ValueError, TypeError):
rate_per_100k = Decimal(str(rate_value))
records.append(
CrimeRecord(
neighbourhood_id=neighbourhood_id,
year=year,
crime_type=crime_type,
count=int(numeric_value),
rate_per_100k=rate_per_100k,
)
)
logger.info(f"Parsed {len(records)} crime rate records")
return records
def get_major_crime_indicators(
self,
years: list[int] | None = None,
) -> list[CrimeRecord]:
"""Fetch major crime indicators (detailed MCI data).
MCI data contains incident-level records that need to be aggregated
by neighbourhood and year.
Args:
years: Optional list of years to filter.
Returns:
List of aggregated CrimeRecord objects.
"""
try:
raw_records = self._fetch_datastore_records(self.DATASETS["mci"])
except (httpx.HTTPError, ValueError) as e:
logger.warning(f"Could not fetch MCI data: {e}")
return []
# Aggregate counts by neighbourhood, year, and crime type
aggregates: dict[tuple[int, int, CrimeType], int] = {}
for row in raw_records:
# Extract neighbourhood ID
hood_id = (
row.get("HOOD_158")
or row.get("HOOD_140")
or row.get("HOOD_ID")
or row.get("Hood_ID")
)
if not hood_id:
continue
try:
neighbourhood_id = int(hood_id)
except (ValueError, TypeError):
continue
# Extract year from occurrence date
occ_year = row.get("OCC_YEAR") or row.get("REPORT_YEAR")
if not occ_year:
continue
try:
year = int(occ_year)
if year < 2014 or year > 2030:
continue
except (ValueError, TypeError):
continue
# Filter by years if specified
if years and year not in years:
continue
# Extract crime type
mci_category = row.get("MCI_CATEGORY") or row.get("OFFENCE") or ""
crime_type = _normalize_crime_type(str(mci_category))
# Aggregate count
key = (neighbourhood_id, year, crime_type)
aggregates[key] = aggregates.get(key, 0) + 1
# Convert aggregates to CrimeRecord objects
records = [
CrimeRecord(
neighbourhood_id=neighbourhood_id,
year=year,
crime_type=crime_type,
count=count,
rate_per_100k=None, # Would need population data to calculate
)
for (neighbourhood_id, year, crime_type), count in aggregates.items()
]
logger.info(f"Parsed {len(records)} MCI records (aggregated)")
return records

View File

@@ -1,5 +1,6 @@
"""Pydantic schemas for Toronto housing data validation.""" """Pydantic schemas for Toronto housing data validation."""
from .amenities import AmenityCount, AmenityRecord, AmenityType
from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
from .dimensions import ( from .dimensions import (
CMHCZone, CMHCZone,
@@ -11,6 +12,7 @@ from .dimensions import (
PolicyLevel, PolicyLevel,
TimeDimension, TimeDimension,
) )
from .neighbourhood import CensusRecord, CrimeRecord, CrimeType, NeighbourhoodRecord
__all__ = [ __all__ = [
# CMHC # CMHC
@@ -28,4 +30,13 @@ __all__ = [
"PolicyCategory", "PolicyCategory",
"ExpectedDirection", "ExpectedDirection",
"Confidence", "Confidence",
# Neighbourhood data (Phase 3)
"NeighbourhoodRecord",
"CensusRecord",
"CrimeRecord",
"CrimeType",
# Amenities (Phase 3)
"AmenityType",
"AmenityRecord",
"AmenityCount",
] ]

View File

@@ -0,0 +1,60 @@
"""Pydantic schemas for Toronto amenities data.
Includes schemas for parks, schools, childcare centres, and transit stops.
"""
from decimal import Decimal
from enum import Enum
from pydantic import BaseModel, Field
class AmenityType(str, Enum):
"""Types of amenities tracked in the neighbourhood dashboard."""
PARK = "park"
SCHOOL = "school"
CHILDCARE = "childcare"
TRANSIT_STOP = "transit_stop"
LIBRARY = "library"
COMMUNITY_CENTRE = "community_centre"
HOSPITAL = "hospital"
class AmenityRecord(BaseModel):
"""Amenity location record for a neighbourhood.
Represents a single amenity (park, school, etc.) with its location
and associated neighbourhood.
"""
neighbourhood_id: int = Field(
ge=1, le=200, description="Neighbourhood ID containing this amenity"
)
amenity_type: AmenityType = Field(description="Type of amenity")
amenity_name: str = Field(max_length=200, description="Name of the amenity")
address: str | None = Field(
default=None, max_length=300, description="Street address"
)
latitude: Decimal | None = Field(
default=None, ge=-90, le=90, description="Latitude (WGS84)"
)
longitude: Decimal | None = Field(
default=None, ge=-180, le=180, description="Longitude (WGS84)"
)
model_config = {"str_strip_whitespace": True}
class AmenityCount(BaseModel):
"""Aggregated amenity count for a neighbourhood.
Used for dashboard metrics showing amenity density per neighbourhood.
"""
neighbourhood_id: int = Field(ge=1, le=200, description="Neighbourhood ID")
amenity_type: AmenityType = Field(description="Type of amenity")
count: int = Field(ge=0, description="Number of amenities of this type")
year: int = Field(ge=2020, le=2030, description="Year of data snapshot")
model_config = {"str_strip_whitespace": True}

View File

@@ -0,0 +1,106 @@
"""Pydantic schemas for Toronto neighbourhood data.
Includes schemas for neighbourhood boundaries, census profiles, and crime statistics.
"""
from decimal import Decimal
from enum import Enum
from typing import Any
from pydantic import BaseModel, Field
class CrimeType(str, Enum):
"""Major crime indicator types from Toronto Police data."""
ASSAULT = "assault"
AUTO_THEFT = "auto_theft"
BREAK_AND_ENTER = "break_and_enter"
HOMICIDE = "homicide"
ROBBERY = "robbery"
SHOOTING = "shooting"
THEFT_OVER = "theft_over"
THEFT_FROM_MOTOR_VEHICLE = "theft_from_motor_vehicle"
OTHER = "other"
class NeighbourhoodRecord(BaseModel):
"""Schema for Toronto neighbourhood boundary data.
Based on City of Toronto's 158 neighbourhoods dataset.
AREA_ID maps to neighbourhood_id for consistency with police data (Hood_ID).
"""
area_id: int = Field(description="AREA_ID from Toronto Open Data (1-158)")
area_name: str = Field(max_length=100, description="Official neighbourhood name")
area_short_code: str | None = Field(
default=None, max_length=10, description="Short code (e.g., 'E01')"
)
geometry: dict[str, Any] | None = Field(
default=None, description="GeoJSON geometry object"
)
model_config = {"str_strip_whitespace": True}
class CensusRecord(BaseModel):
"""Census profile data for a neighbourhood.
Contains demographic and socioeconomic indicators from Statistics Canada
census data, aggregated to the neighbourhood level.
"""
neighbourhood_id: int = Field(
ge=1, le=200, description="Neighbourhood ID (AREA_ID)"
)
census_year: int = Field(ge=2016, le=2030, description="Census year")
population: int | None = Field(default=None, ge=0, description="Total population")
population_density: Decimal | None = Field(
default=None, ge=0, description="Population per square kilometre"
)
median_household_income: Decimal | None = Field(
default=None, ge=0, description="Median household income (CAD)"
)
average_household_income: Decimal | None = Field(
default=None, ge=0, description="Average household income (CAD)"
)
unemployment_rate: Decimal | None = Field(
default=None, ge=0, le=100, description="Unemployment rate percentage"
)
pct_bachelors_or_higher: Decimal | None = Field(
default=None, ge=0, le=100, description="Percentage with bachelor's degree+"
)
pct_owner_occupied: Decimal | None = Field(
default=None, ge=0, le=100, description="Percentage owner-occupied dwellings"
)
pct_renter_occupied: Decimal | None = Field(
default=None, ge=0, le=100, description="Percentage renter-occupied dwellings"
)
median_age: Decimal | None = Field(
default=None, ge=0, le=120, description="Median age of residents"
)
average_dwelling_value: Decimal | None = Field(
default=None, ge=0, description="Average dwelling value (CAD)"
)
model_config = {"str_strip_whitespace": True}
class CrimeRecord(BaseModel):
"""Crime statistics for a neighbourhood.
Based on Toronto Police neighbourhood crime rates data.
Hood_ID in source data maps to neighbourhood_id (AREA_ID).
"""
neighbourhood_id: int = Field(
ge=1, le=200, description="Neighbourhood ID (Hood_ID -> AREA_ID)"
)
year: int = Field(ge=2014, le=2030, description="Year of crime statistics")
crime_type: CrimeType = Field(description="Type of crime (MCI category)")
count: int = Field(ge=0, description="Number of incidents")
rate_per_100k: Decimal | None = Field(
default=None, ge=0, description="Rate per 100,000 population"
)
model_config = {"str_strip_whitespace": True}