Compare commits
3 Commits
f69d0c15a7
...
3054441630
| Author | SHA1 | Date | |
|---|---|---|---|
| 3054441630 | |||
| b6d210ec6b | |||
| 053acf6436 |
67
CLAUDE.md
67
CLAUDE.md
@@ -261,4 +261,71 @@ All scripts in `scripts/`:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Projman Plugin Workflow
|
||||||
|
|
||||||
|
**CRITICAL: Always use the projman plugin for sprint and task management.**
|
||||||
|
|
||||||
|
### When to Use Projman Skills
|
||||||
|
|
||||||
|
| Skill | Trigger | Purpose |
|
||||||
|
|-------|---------|---------|
|
||||||
|
| `/projman:sprint-plan` | New sprint or phase implementation | Architecture analysis + Gitea issue creation |
|
||||||
|
| `/projman:sprint-start` | Beginning implementation work | Load lessons learned (Wiki.js or local), start execution |
|
||||||
|
| `/projman:sprint-status` | Check progress | Review blockers and completion status |
|
||||||
|
| `/projman:sprint-close` | Sprint completion | Capture lessons learned (Wiki.js or local backup) |
|
||||||
|
|
||||||
|
### Default Behavior
|
||||||
|
|
||||||
|
When user requests implementation work:
|
||||||
|
|
||||||
|
1. **ALWAYS start with `/projman:sprint-plan`** before writing code
|
||||||
|
2. Create Gitea issues with proper labels and acceptance criteria
|
||||||
|
3. Use `/projman:sprint-start` to begin execution with lessons learned
|
||||||
|
4. Track progress via Gitea issue comments
|
||||||
|
5. Close sprint with `/projman:sprint-close` to document lessons
|
||||||
|
|
||||||
|
### Gitea Repository
|
||||||
|
|
||||||
|
- **Repo**: `lmiranda/personal-portfolio`
|
||||||
|
- **Host**: `gitea.hotserv.cloud`
|
||||||
|
- **Note**: `lmiranda` is a user account (not org), so label lookup may require repo-level labels
|
||||||
|
|
||||||
|
### MCP Tools Available
|
||||||
|
|
||||||
|
**Gitea**:
|
||||||
|
- `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment`
|
||||||
|
- `get_labels`, `suggest_labels`
|
||||||
|
|
||||||
|
**Wiki.js**:
|
||||||
|
- `search_lessons`, `create_lesson`, `search_pages`, `get_page`
|
||||||
|
|
||||||
|
### Lessons Learned (Backup Method)
|
||||||
|
|
||||||
|
**When Wiki.js is unavailable**, use the local backup in `docs/project-lessons-learned/`:
|
||||||
|
|
||||||
|
**At Sprint Start:**
|
||||||
|
1. Review `docs/project-lessons-learned/INDEX.md` for relevant past lessons
|
||||||
|
2. Search lesson files by tags/keywords before implementation
|
||||||
|
3. Apply prevention strategies from applicable lessons
|
||||||
|
|
||||||
|
**At Sprint Close:**
|
||||||
|
1. Try Wiki.js `create_lesson` first
|
||||||
|
2. If Wiki.js fails, create lesson in `docs/project-lessons-learned/`
|
||||||
|
3. Use naming convention: `{phase-or-sprint}-{short-description}.md`
|
||||||
|
4. Update `INDEX.md` with new entry
|
||||||
|
5. Follow the lesson template in INDEX.md
|
||||||
|
|
||||||
|
**Migration:** Once Wiki.js is configured, lessons will be migrated there for better searchability.
|
||||||
|
|
||||||
|
### Issue Structure
|
||||||
|
|
||||||
|
Every Gitea issue should include:
|
||||||
|
- **Overview**: Brief description
|
||||||
|
- **Files to Create/Modify**: Explicit paths
|
||||||
|
- **Acceptance Criteria**: Checkboxes
|
||||||
|
- **Technical Notes**: Implementation hints
|
||||||
|
- **Labels**: Listed in body (workaround for label API issues)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
*Last Updated: Sprint 9*
|
*Last Updated: Sprint 9*
|
||||||
|
|||||||
@@ -11,3 +11,77 @@ models:
|
|||||||
- name: zone_code
|
- name: zone_code
|
||||||
tests:
|
tests:
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: int_neighbourhood__demographics
|
||||||
|
description: "Combined census demographics with neighbourhood attributes"
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: census_year
|
||||||
|
description: "Census year"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: income_quintile
|
||||||
|
description: "Income quintile (1-5, city-wide)"
|
||||||
|
|
||||||
|
- name: int_neighbourhood__housing
|
||||||
|
description: "Housing indicators combining census and rental data"
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: year
|
||||||
|
description: "Reference year"
|
||||||
|
- name: rent_to_income_pct
|
||||||
|
description: "Rent as percentage of median income"
|
||||||
|
- name: is_affordable
|
||||||
|
description: "Boolean: rent <= 30% of income"
|
||||||
|
|
||||||
|
- name: int_neighbourhood__crime_summary
|
||||||
|
description: "Aggregated crime with year-over-year trends"
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: year
|
||||||
|
description: "Statistics year"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: crime_rate_per_100k
|
||||||
|
description: "Total crime rate per 100K population"
|
||||||
|
- name: yoy_change_pct
|
||||||
|
description: "Year-over-year change percentage"
|
||||||
|
|
||||||
|
- name: int_neighbourhood__amenity_scores
|
||||||
|
description: "Normalized amenities per capita and per area"
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: year
|
||||||
|
description: "Reference year"
|
||||||
|
- name: total_amenities_per_1000
|
||||||
|
description: "Total amenities per 1000 population"
|
||||||
|
- name: amenities_per_sqkm
|
||||||
|
description: "Total amenities per square km"
|
||||||
|
|
||||||
|
- name: int_rentals__neighbourhood_allocated
|
||||||
|
description: "CMHC rental data allocated to neighbourhoods via area weights"
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: year
|
||||||
|
description: "Survey year"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: avg_rent_2bed
|
||||||
|
description: "Weighted average 2-bedroom rent"
|
||||||
|
- name: vacancy_rate
|
||||||
|
description: "Weighted average vacancy rate"
|
||||||
|
|||||||
@@ -0,0 +1,79 @@
|
|||||||
|
-- Intermediate: Normalized amenities per 1000 population
|
||||||
|
-- Pivots amenity types and calculates per-capita metrics
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with neighbourhoods as (
|
||||||
|
select * from {{ ref('stg_toronto__neighbourhoods') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
amenities as (
|
||||||
|
select * from {{ ref('stg_toronto__amenities') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Aggregate amenity types
|
||||||
|
amenities_by_year as (
|
||||||
|
select
|
||||||
|
neighbourhood_id,
|
||||||
|
amenity_year as year,
|
||||||
|
sum(case when amenity_type = 'Parks' then amenity_count else 0 end) as parks_count,
|
||||||
|
sum(case when amenity_type = 'Schools' then amenity_count else 0 end) as schools_count,
|
||||||
|
sum(case when amenity_type = 'Transit Stops' then amenity_count else 0 end) as transit_count,
|
||||||
|
sum(case when amenity_type = 'Libraries' then amenity_count else 0 end) as libraries_count,
|
||||||
|
sum(case when amenity_type = 'Community Centres' then amenity_count else 0 end) as community_centres_count,
|
||||||
|
sum(case when amenity_type = 'Recreation' then amenity_count else 0 end) as recreation_count,
|
||||||
|
sum(amenity_count) as total_amenities
|
||||||
|
from amenities
|
||||||
|
group by neighbourhood_id, amenity_year
|
||||||
|
),
|
||||||
|
|
||||||
|
amenity_scores as (
|
||||||
|
select
|
||||||
|
n.neighbourhood_id,
|
||||||
|
n.neighbourhood_name,
|
||||||
|
n.geometry,
|
||||||
|
n.population,
|
||||||
|
n.land_area_sqkm,
|
||||||
|
|
||||||
|
a.year,
|
||||||
|
|
||||||
|
-- Raw counts
|
||||||
|
a.parks_count,
|
||||||
|
a.schools_count,
|
||||||
|
a.transit_count,
|
||||||
|
a.libraries_count,
|
||||||
|
a.community_centres_count,
|
||||||
|
a.recreation_count,
|
||||||
|
a.total_amenities,
|
||||||
|
|
||||||
|
-- Per 1000 population
|
||||||
|
case when n.population > 0
|
||||||
|
then round(a.parks_count::numeric / n.population * 1000, 3)
|
||||||
|
else null
|
||||||
|
end as parks_per_1000,
|
||||||
|
|
||||||
|
case when n.population > 0
|
||||||
|
then round(a.schools_count::numeric / n.population * 1000, 3)
|
||||||
|
else null
|
||||||
|
end as schools_per_1000,
|
||||||
|
|
||||||
|
case when n.population > 0
|
||||||
|
then round(a.transit_count::numeric / n.population * 1000, 3)
|
||||||
|
else null
|
||||||
|
end as transit_per_1000,
|
||||||
|
|
||||||
|
case when n.population > 0
|
||||||
|
then round(a.total_amenities::numeric / n.population * 1000, 3)
|
||||||
|
else null
|
||||||
|
end as total_amenities_per_1000,
|
||||||
|
|
||||||
|
-- Per square km
|
||||||
|
case when n.land_area_sqkm > 0
|
||||||
|
then round(a.total_amenities::numeric / n.land_area_sqkm, 2)
|
||||||
|
else null
|
||||||
|
end as amenities_per_sqkm
|
||||||
|
|
||||||
|
from neighbourhoods n
|
||||||
|
left join amenities_by_year a on n.neighbourhood_id = a.neighbourhood_id
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from amenity_scores
|
||||||
81
dbt/models/intermediate/int_neighbourhood__crime_summary.sql
Normal file
81
dbt/models/intermediate/int_neighbourhood__crime_summary.sql
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
-- Intermediate: Aggregated crime by neighbourhood with YoY change
|
||||||
|
-- Pivots crime types and calculates year-over-year trends
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with neighbourhoods as (
|
||||||
|
select * from {{ ref('stg_toronto__neighbourhoods') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
crime as (
|
||||||
|
select * from {{ ref('stg_toronto__crime') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Aggregate crime types
|
||||||
|
crime_by_year as (
|
||||||
|
select
|
||||||
|
neighbourhood_id,
|
||||||
|
crime_year as year,
|
||||||
|
sum(incident_count) as total_incidents,
|
||||||
|
sum(case when crime_type = 'Assault' then incident_count else 0 end) as assault_count,
|
||||||
|
sum(case when crime_type = 'Auto Theft' then incident_count else 0 end) as auto_theft_count,
|
||||||
|
sum(case when crime_type = 'Break and Enter' then incident_count else 0 end) as break_enter_count,
|
||||||
|
sum(case when crime_type = 'Robbery' then incident_count else 0 end) as robbery_count,
|
||||||
|
sum(case when crime_type = 'Theft Over' then incident_count else 0 end) as theft_over_count,
|
||||||
|
sum(case when crime_type = 'Homicide' then incident_count else 0 end) as homicide_count,
|
||||||
|
avg(rate_per_100k) as avg_rate_per_100k
|
||||||
|
from crime
|
||||||
|
group by neighbourhood_id, crime_year
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Add year-over-year changes
|
||||||
|
with_yoy as (
|
||||||
|
select
|
||||||
|
c.*,
|
||||||
|
lag(c.total_incidents, 1) over (
|
||||||
|
partition by c.neighbourhood_id
|
||||||
|
order by c.year
|
||||||
|
) as prev_year_incidents,
|
||||||
|
round(
|
||||||
|
(c.total_incidents - lag(c.total_incidents, 1) over (
|
||||||
|
partition by c.neighbourhood_id
|
||||||
|
order by c.year
|
||||||
|
))::numeric /
|
||||||
|
nullif(lag(c.total_incidents, 1) over (
|
||||||
|
partition by c.neighbourhood_id
|
||||||
|
order by c.year
|
||||||
|
), 0) * 100,
|
||||||
|
2
|
||||||
|
) as yoy_change_pct
|
||||||
|
from crime_by_year c
|
||||||
|
),
|
||||||
|
|
||||||
|
crime_summary as (
|
||||||
|
select
|
||||||
|
n.neighbourhood_id,
|
||||||
|
n.neighbourhood_name,
|
||||||
|
n.geometry,
|
||||||
|
n.population,
|
||||||
|
|
||||||
|
w.year,
|
||||||
|
w.total_incidents,
|
||||||
|
w.assault_count,
|
||||||
|
w.auto_theft_count,
|
||||||
|
w.break_enter_count,
|
||||||
|
w.robbery_count,
|
||||||
|
w.theft_over_count,
|
||||||
|
w.homicide_count,
|
||||||
|
w.avg_rate_per_100k,
|
||||||
|
w.yoy_change_pct,
|
||||||
|
|
||||||
|
-- Crime rate per 100K population
|
||||||
|
case
|
||||||
|
when n.population > 0
|
||||||
|
then round(w.total_incidents::numeric / n.population * 100000, 2)
|
||||||
|
else null
|
||||||
|
end as crime_rate_per_100k
|
||||||
|
|
||||||
|
from neighbourhoods n
|
||||||
|
inner join with_yoy w on n.neighbourhood_id = w.neighbourhood_id
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from crime_summary
|
||||||
44
dbt/models/intermediate/int_neighbourhood__demographics.sql
Normal file
44
dbt/models/intermediate/int_neighbourhood__demographics.sql
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
-- Intermediate: Combined census demographics by neighbourhood
|
||||||
|
-- Joins neighbourhoods with census data for demographic analysis
|
||||||
|
-- Grain: One row per neighbourhood per census year
|
||||||
|
|
||||||
|
with neighbourhoods as (
|
||||||
|
select * from {{ ref('stg_toronto__neighbourhoods') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
census as (
|
||||||
|
select * from {{ ref('stg_toronto__census') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
demographics as (
|
||||||
|
select
|
||||||
|
n.neighbourhood_id,
|
||||||
|
n.neighbourhood_name,
|
||||||
|
n.geometry,
|
||||||
|
n.land_area_sqkm,
|
||||||
|
|
||||||
|
c.census_year,
|
||||||
|
c.population,
|
||||||
|
c.population_density,
|
||||||
|
c.median_household_income,
|
||||||
|
c.average_household_income,
|
||||||
|
c.median_age,
|
||||||
|
c.unemployment_rate,
|
||||||
|
c.pct_bachelors_or_higher as education_bachelors_pct,
|
||||||
|
c.average_dwelling_value,
|
||||||
|
|
||||||
|
-- Tenure mix
|
||||||
|
c.pct_owner_occupied,
|
||||||
|
c.pct_renter_occupied,
|
||||||
|
|
||||||
|
-- Income quintile (city-wide comparison)
|
||||||
|
ntile(5) over (
|
||||||
|
partition by c.census_year
|
||||||
|
order by c.median_household_income
|
||||||
|
) as income_quintile
|
||||||
|
|
||||||
|
from neighbourhoods n
|
||||||
|
left join census c on n.neighbourhood_id = c.neighbourhood_id
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from demographics
|
||||||
56
dbt/models/intermediate/int_neighbourhood__housing.sql
Normal file
56
dbt/models/intermediate/int_neighbourhood__housing.sql
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
-- Intermediate: Housing indicators by neighbourhood
|
||||||
|
-- Combines census housing data with allocated CMHC rental data
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with neighbourhoods as (
|
||||||
|
select * from {{ ref('stg_toronto__neighbourhoods') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
census as (
|
||||||
|
select * from {{ ref('stg_toronto__census') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
allocated_rentals as (
|
||||||
|
select * from {{ ref('int_rentals__neighbourhood_allocated') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
housing as (
|
||||||
|
select
|
||||||
|
n.neighbourhood_id,
|
||||||
|
n.neighbourhood_name,
|
||||||
|
n.geometry,
|
||||||
|
|
||||||
|
coalesce(r.year, c.census_year) as year,
|
||||||
|
|
||||||
|
-- Census housing metrics
|
||||||
|
c.pct_owner_occupied,
|
||||||
|
c.pct_renter_occupied,
|
||||||
|
c.average_dwelling_value,
|
||||||
|
c.median_household_income,
|
||||||
|
|
||||||
|
-- Allocated rental metrics (weighted average from CMHC zones)
|
||||||
|
r.avg_rent_2bed,
|
||||||
|
r.vacancy_rate,
|
||||||
|
|
||||||
|
-- Affordability calculations
|
||||||
|
case
|
||||||
|
when c.median_household_income > 0 and r.avg_rent_2bed > 0
|
||||||
|
then round((r.avg_rent_2bed * 12 / c.median_household_income) * 100, 2)
|
||||||
|
else null
|
||||||
|
end as rent_to_income_pct,
|
||||||
|
|
||||||
|
-- Affordability threshold (30% of income)
|
||||||
|
case
|
||||||
|
when c.median_household_income > 0 and r.avg_rent_2bed > 0
|
||||||
|
then r.avg_rent_2bed * 12 <= c.median_household_income * 0.30
|
||||||
|
else null
|
||||||
|
end as is_affordable
|
||||||
|
|
||||||
|
from neighbourhoods n
|
||||||
|
left join census c on n.neighbourhood_id = c.neighbourhood_id
|
||||||
|
left join allocated_rentals r
|
||||||
|
on n.neighbourhood_id = r.neighbourhood_id
|
||||||
|
and r.year = c.census_year
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from housing
|
||||||
@@ -0,0 +1,73 @@
|
|||||||
|
-- Intermediate: CMHC rentals allocated to neighbourhoods via area weights
|
||||||
|
-- Disaggregates zone-level rental data to neighbourhood level
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with crosswalk as (
|
||||||
|
select * from {{ ref('stg_cmhc__zone_crosswalk') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
rentals as (
|
||||||
|
select * from {{ ref('int_rentals__annual') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
neighbourhoods as (
|
||||||
|
select * from {{ ref('stg_toronto__neighbourhoods') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Allocate rental metrics to neighbourhoods using area weights
|
||||||
|
allocated as (
|
||||||
|
select
|
||||||
|
c.neighbourhood_id,
|
||||||
|
r.year,
|
||||||
|
r.bedroom_type,
|
||||||
|
|
||||||
|
-- Weighted average rent (using area weight)
|
||||||
|
sum(r.avg_rent * c.area_weight) as weighted_avg_rent,
|
||||||
|
sum(r.median_rent * c.area_weight) as weighted_median_rent,
|
||||||
|
sum(c.area_weight) as total_weight,
|
||||||
|
|
||||||
|
-- Weighted vacancy rate
|
||||||
|
sum(r.vacancy_rate * c.area_weight) / nullif(sum(c.area_weight), 0) as vacancy_rate,
|
||||||
|
|
||||||
|
-- Weighted rental universe
|
||||||
|
sum(r.rental_universe * c.area_weight) as rental_units_estimate
|
||||||
|
|
||||||
|
from crosswalk c
|
||||||
|
inner join rentals r on c.cmhc_zone_code = r.zone_code
|
||||||
|
group by c.neighbourhood_id, r.year, r.bedroom_type
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Pivot to get 2-bedroom as primary metric
|
||||||
|
pivoted as (
|
||||||
|
select
|
||||||
|
neighbourhood_id,
|
||||||
|
year,
|
||||||
|
max(case when bedroom_type = 'Two Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_2bed,
|
||||||
|
max(case when bedroom_type = 'One Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_1bed,
|
||||||
|
max(case when bedroom_type = 'Bachelor' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_bachelor,
|
||||||
|
max(case when bedroom_type = 'Three Bedroom +' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_3bed,
|
||||||
|
avg(vacancy_rate) as vacancy_rate,
|
||||||
|
sum(rental_units_estimate) as total_rental_units
|
||||||
|
from allocated
|
||||||
|
group by neighbourhood_id, year
|
||||||
|
),
|
||||||
|
|
||||||
|
final as (
|
||||||
|
select
|
||||||
|
n.neighbourhood_id,
|
||||||
|
n.neighbourhood_name,
|
||||||
|
n.geometry,
|
||||||
|
|
||||||
|
p.year,
|
||||||
|
round(p.avg_rent_bachelor::numeric, 2) as avg_rent_bachelor,
|
||||||
|
round(p.avg_rent_1bed::numeric, 2) as avg_rent_1bed,
|
||||||
|
round(p.avg_rent_2bed::numeric, 2) as avg_rent_2bed,
|
||||||
|
round(p.avg_rent_3bed::numeric, 2) as avg_rent_3bed,
|
||||||
|
round(p.vacancy_rate::numeric, 2) as vacancy_rate,
|
||||||
|
round(p.total_rental_units::numeric, 0) as total_rental_units
|
||||||
|
|
||||||
|
from neighbourhoods n
|
||||||
|
inner join pivoted p on n.neighbourhood_id = p.neighbourhood_id
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from final
|
||||||
@@ -9,3 +9,127 @@ models:
|
|||||||
tests:
|
tests:
|
||||||
- unique
|
- unique
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: mart_neighbourhood_overview
|
||||||
|
description: "Neighbourhood overview with composite livability score"
|
||||||
|
meta:
|
||||||
|
dashboard_tab: Overview
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_name
|
||||||
|
description: "Official neighbourhood name"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: geometry
|
||||||
|
description: "PostGIS geometry for mapping"
|
||||||
|
- name: livability_score
|
||||||
|
description: "Composite score: safety (30%), affordability (40%), amenities (30%)"
|
||||||
|
- name: safety_score
|
||||||
|
description: "Safety component score (0-100)"
|
||||||
|
- name: affordability_score
|
||||||
|
description: "Affordability component score (0-100)"
|
||||||
|
- name: amenity_score
|
||||||
|
description: "Amenity component score (0-100)"
|
||||||
|
|
||||||
|
- name: mart_neighbourhood_housing
|
||||||
|
description: "Housing and affordability metrics by neighbourhood"
|
||||||
|
meta:
|
||||||
|
dashboard_tab: Housing
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_name
|
||||||
|
description: "Official neighbourhood name"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: geometry
|
||||||
|
description: "PostGIS geometry for mapping"
|
||||||
|
- name: rent_to_income_pct
|
||||||
|
description: "Rent as percentage of median income"
|
||||||
|
- name: affordability_index
|
||||||
|
description: "100 = city average affordability"
|
||||||
|
- name: rent_yoy_change_pct
|
||||||
|
description: "Year-over-year rent change"
|
||||||
|
|
||||||
|
- name: mart_neighbourhood_safety
|
||||||
|
description: "Crime rates and safety metrics by neighbourhood"
|
||||||
|
meta:
|
||||||
|
dashboard_tab: Safety
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_name
|
||||||
|
description: "Official neighbourhood name"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: geometry
|
||||||
|
description: "PostGIS geometry for mapping"
|
||||||
|
- name: crime_rate_per_100k
|
||||||
|
description: "Total crime rate per 100K population"
|
||||||
|
- name: crime_index
|
||||||
|
description: "100 = city average crime rate"
|
||||||
|
- name: safety_tier
|
||||||
|
description: "Safety tier (1=safest, 5=highest crime)"
|
||||||
|
tests:
|
||||||
|
- accepted_values:
|
||||||
|
arguments:
|
||||||
|
values: [1, 2, 3, 4, 5]
|
||||||
|
|
||||||
|
- name: mart_neighbourhood_demographics
|
||||||
|
description: "Demographics and income metrics by neighbourhood"
|
||||||
|
meta:
|
||||||
|
dashboard_tab: Demographics
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_name
|
||||||
|
description: "Official neighbourhood name"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: geometry
|
||||||
|
description: "PostGIS geometry for mapping"
|
||||||
|
- name: median_household_income
|
||||||
|
description: "Median household income"
|
||||||
|
- name: income_index
|
||||||
|
description: "100 = city average income"
|
||||||
|
- name: income_quintile
|
||||||
|
description: "Income quintile (1-5)"
|
||||||
|
tests:
|
||||||
|
- accepted_values:
|
||||||
|
arguments:
|
||||||
|
values: [1, 2, 3, 4, 5]
|
||||||
|
|
||||||
|
- name: mart_neighbourhood_amenities
|
||||||
|
description: "Amenity access metrics by neighbourhood"
|
||||||
|
meta:
|
||||||
|
dashboard_tab: Amenities
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood identifier"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_name
|
||||||
|
description: "Official neighbourhood name"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: geometry
|
||||||
|
description: "PostGIS geometry for mapping"
|
||||||
|
- name: total_amenities_per_1000
|
||||||
|
description: "Total amenities per 1000 population"
|
||||||
|
- name: amenity_index
|
||||||
|
description: "100 = city average amenities"
|
||||||
|
- name: amenity_tier
|
||||||
|
description: "Amenity tier (1=best, 5=lowest)"
|
||||||
|
tests:
|
||||||
|
- accepted_values:
|
||||||
|
arguments:
|
||||||
|
values: [1, 2, 3, 4, 5]
|
||||||
|
|||||||
89
dbt/models/marts/mart_neighbourhood_amenities.sql
Normal file
89
dbt/models/marts/mart_neighbourhood_amenities.sql
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
-- Mart: Neighbourhood Amenities Analysis
|
||||||
|
-- Dashboard Tab: Amenities
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with amenities as (
|
||||||
|
select * from {{ ref('int_neighbourhood__amenity_scores') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- City-wide averages for comparison
|
||||||
|
city_avg as (
|
||||||
|
select
|
||||||
|
year,
|
||||||
|
avg(parks_per_1000) as city_avg_parks,
|
||||||
|
avg(schools_per_1000) as city_avg_schools,
|
||||||
|
avg(transit_per_1000) as city_avg_transit,
|
||||||
|
avg(total_amenities_per_1000) as city_avg_total_amenities
|
||||||
|
from amenities
|
||||||
|
group by year
|
||||||
|
),
|
||||||
|
|
||||||
|
final as (
|
||||||
|
select
|
||||||
|
a.neighbourhood_id,
|
||||||
|
a.neighbourhood_name,
|
||||||
|
a.geometry,
|
||||||
|
a.population,
|
||||||
|
a.land_area_sqkm,
|
||||||
|
a.year,
|
||||||
|
|
||||||
|
-- Raw counts
|
||||||
|
a.parks_count,
|
||||||
|
a.schools_count,
|
||||||
|
a.transit_count,
|
||||||
|
a.libraries_count,
|
||||||
|
a.community_centres_count,
|
||||||
|
a.recreation_count,
|
||||||
|
a.total_amenities,
|
||||||
|
|
||||||
|
-- Per 1000 population
|
||||||
|
a.parks_per_1000,
|
||||||
|
a.schools_per_1000,
|
||||||
|
a.transit_per_1000,
|
||||||
|
a.total_amenities_per_1000,
|
||||||
|
|
||||||
|
-- Per square km
|
||||||
|
a.amenities_per_sqkm,
|
||||||
|
|
||||||
|
-- City averages
|
||||||
|
round(ca.city_avg_parks::numeric, 3) as city_avg_parks_per_1000,
|
||||||
|
round(ca.city_avg_schools::numeric, 3) as city_avg_schools_per_1000,
|
||||||
|
round(ca.city_avg_transit::numeric, 3) as city_avg_transit_per_1000,
|
||||||
|
|
||||||
|
-- Amenity index (100 = city average)
|
||||||
|
case
|
||||||
|
when ca.city_avg_total_amenities > 0
|
||||||
|
then round(a.total_amenities_per_1000 / ca.city_avg_total_amenities * 100, 1)
|
||||||
|
else null
|
||||||
|
end as amenity_index,
|
||||||
|
|
||||||
|
-- Category indices
|
||||||
|
case
|
||||||
|
when ca.city_avg_parks > 0
|
||||||
|
then round(a.parks_per_1000 / ca.city_avg_parks * 100, 1)
|
||||||
|
else null
|
||||||
|
end as parks_index,
|
||||||
|
|
||||||
|
case
|
||||||
|
when ca.city_avg_schools > 0
|
||||||
|
then round(a.schools_per_1000 / ca.city_avg_schools * 100, 1)
|
||||||
|
else null
|
||||||
|
end as schools_index,
|
||||||
|
|
||||||
|
case
|
||||||
|
when ca.city_avg_transit > 0
|
||||||
|
then round(a.transit_per_1000 / ca.city_avg_transit * 100, 1)
|
||||||
|
else null
|
||||||
|
end as transit_index,
|
||||||
|
|
||||||
|
-- Amenity tier (1 = best, 5 = lowest)
|
||||||
|
ntile(5) over (
|
||||||
|
partition by a.year
|
||||||
|
order by a.total_amenities_per_1000 desc
|
||||||
|
) as amenity_tier
|
||||||
|
|
||||||
|
from amenities a
|
||||||
|
left join city_avg ca on a.year = ca.year
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from final
|
||||||
81
dbt/models/marts/mart_neighbourhood_demographics.sql
Normal file
81
dbt/models/marts/mart_neighbourhood_demographics.sql
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
-- Mart: Neighbourhood Demographics Analysis
|
||||||
|
-- Dashboard Tab: Demographics
|
||||||
|
-- Grain: One row per neighbourhood per census year
|
||||||
|
|
||||||
|
with demographics as (
|
||||||
|
select * from {{ ref('int_neighbourhood__demographics') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- City-wide averages for comparison
|
||||||
|
city_avg as (
|
||||||
|
select
|
||||||
|
census_year,
|
||||||
|
avg(median_household_income) as city_avg_income,
|
||||||
|
avg(median_age) as city_avg_age,
|
||||||
|
avg(unemployment_rate) as city_avg_unemployment,
|
||||||
|
avg(education_bachelors_pct) as city_avg_education,
|
||||||
|
avg(population_density) as city_avg_density
|
||||||
|
from demographics
|
||||||
|
group by census_year
|
||||||
|
),
|
||||||
|
|
||||||
|
final as (
|
||||||
|
select
|
||||||
|
d.neighbourhood_id,
|
||||||
|
d.neighbourhood_name,
|
||||||
|
d.geometry,
|
||||||
|
d.census_year as year,
|
||||||
|
|
||||||
|
-- Population
|
||||||
|
d.population,
|
||||||
|
d.land_area_sqkm,
|
||||||
|
d.population_density,
|
||||||
|
|
||||||
|
-- Income
|
||||||
|
d.median_household_income,
|
||||||
|
d.average_household_income,
|
||||||
|
d.income_quintile,
|
||||||
|
|
||||||
|
-- Income index (100 = city average)
|
||||||
|
case
|
||||||
|
when ca.city_avg_income > 0
|
||||||
|
then round(d.median_household_income / ca.city_avg_income * 100, 1)
|
||||||
|
else null
|
||||||
|
end as income_index,
|
||||||
|
|
||||||
|
-- Demographics
|
||||||
|
d.median_age,
|
||||||
|
d.unemployment_rate,
|
||||||
|
d.education_bachelors_pct,
|
||||||
|
|
||||||
|
-- Age index (100 = city average)
|
||||||
|
case
|
||||||
|
when ca.city_avg_age > 0
|
||||||
|
then round(d.median_age / ca.city_avg_age * 100, 1)
|
||||||
|
else null
|
||||||
|
end as age_index,
|
||||||
|
|
||||||
|
-- Housing tenure
|
||||||
|
d.pct_owner_occupied,
|
||||||
|
d.pct_renter_occupied,
|
||||||
|
d.average_dwelling_value,
|
||||||
|
|
||||||
|
-- Diversity index (using tenure mix as proxy - higher rental = more diverse typically)
|
||||||
|
round(
|
||||||
|
1 - (
|
||||||
|
power(d.pct_owner_occupied / 100, 2) +
|
||||||
|
power(d.pct_renter_occupied / 100, 2)
|
||||||
|
),
|
||||||
|
3
|
||||||
|
) * 100 as tenure_diversity_index,
|
||||||
|
|
||||||
|
-- City comparisons
|
||||||
|
round(ca.city_avg_income::numeric, 2) as city_avg_income,
|
||||||
|
round(ca.city_avg_age::numeric, 1) as city_avg_age,
|
||||||
|
round(ca.city_avg_unemployment::numeric, 2) as city_avg_unemployment
|
||||||
|
|
||||||
|
from demographics d
|
||||||
|
left join city_avg ca on d.census_year = ca.census_year
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from final
|
||||||
93
dbt/models/marts/mart_neighbourhood_housing.sql
Normal file
93
dbt/models/marts/mart_neighbourhood_housing.sql
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
-- Mart: Neighbourhood Housing Analysis
|
||||||
|
-- Dashboard Tab: Housing
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with housing as (
|
||||||
|
select * from {{ ref('int_neighbourhood__housing') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
rentals as (
|
||||||
|
select * from {{ ref('int_rentals__neighbourhood_allocated') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
demographics as (
|
||||||
|
select * from {{ ref('int_neighbourhood__demographics') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Add year-over-year rent changes
|
||||||
|
with_yoy as (
|
||||||
|
select
|
||||||
|
h.*,
|
||||||
|
r.avg_rent_bachelor,
|
||||||
|
r.avg_rent_1bed,
|
||||||
|
r.avg_rent_3bed,
|
||||||
|
r.total_rental_units,
|
||||||
|
d.income_quintile,
|
||||||
|
|
||||||
|
-- Previous year rent for YoY calculation
|
||||||
|
lag(h.avg_rent_2bed, 1) over (
|
||||||
|
partition by h.neighbourhood_id
|
||||||
|
order by h.year
|
||||||
|
) as prev_year_rent_2bed
|
||||||
|
|
||||||
|
from housing h
|
||||||
|
left join rentals r
|
||||||
|
on h.neighbourhood_id = r.neighbourhood_id
|
||||||
|
and h.year = r.year
|
||||||
|
left join demographics d
|
||||||
|
on h.neighbourhood_id = d.neighbourhood_id
|
||||||
|
and h.year = d.census_year
|
||||||
|
),
|
||||||
|
|
||||||
|
final as (
|
||||||
|
select
|
||||||
|
neighbourhood_id,
|
||||||
|
neighbourhood_name,
|
||||||
|
geometry,
|
||||||
|
year,
|
||||||
|
|
||||||
|
-- Tenure mix
|
||||||
|
pct_owner_occupied,
|
||||||
|
pct_renter_occupied,
|
||||||
|
|
||||||
|
-- Housing values
|
||||||
|
average_dwelling_value,
|
||||||
|
median_household_income,
|
||||||
|
|
||||||
|
-- Rental metrics
|
||||||
|
avg_rent_bachelor,
|
||||||
|
avg_rent_1bed,
|
||||||
|
avg_rent_2bed,
|
||||||
|
avg_rent_3bed,
|
||||||
|
vacancy_rate,
|
||||||
|
total_rental_units,
|
||||||
|
|
||||||
|
-- Affordability
|
||||||
|
rent_to_income_pct,
|
||||||
|
is_affordable,
|
||||||
|
|
||||||
|
-- Affordability index (100 = city average)
|
||||||
|
round(
|
||||||
|
rent_to_income_pct / nullif(
|
||||||
|
avg(rent_to_income_pct) over (partition by year),
|
||||||
|
0
|
||||||
|
) * 100,
|
||||||
|
1
|
||||||
|
) as affordability_index,
|
||||||
|
|
||||||
|
-- Year-over-year rent change
|
||||||
|
case
|
||||||
|
when prev_year_rent_2bed > 0
|
||||||
|
then round(
|
||||||
|
(avg_rent_2bed - prev_year_rent_2bed) / prev_year_rent_2bed * 100,
|
||||||
|
2
|
||||||
|
)
|
||||||
|
else null
|
||||||
|
end as rent_yoy_change_pct,
|
||||||
|
|
||||||
|
income_quintile
|
||||||
|
|
||||||
|
from with_yoy
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from final
|
||||||
110
dbt/models/marts/mart_neighbourhood_overview.sql
Normal file
110
dbt/models/marts/mart_neighbourhood_overview.sql
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
-- Mart: Neighbourhood Overview with Composite Livability Score
|
||||||
|
-- Dashboard Tab: Overview
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with demographics as (
|
||||||
|
select * from {{ ref('int_neighbourhood__demographics') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
housing as (
|
||||||
|
select * from {{ ref('int_neighbourhood__housing') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
crime as (
|
||||||
|
select * from {{ ref('int_neighbourhood__crime_summary') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
amenities as (
|
||||||
|
select * from {{ ref('int_neighbourhood__amenity_scores') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- Compute percentile ranks for scoring components
|
||||||
|
percentiles as (
|
||||||
|
select
|
||||||
|
d.neighbourhood_id,
|
||||||
|
d.neighbourhood_name,
|
||||||
|
d.geometry,
|
||||||
|
d.census_year as year,
|
||||||
|
d.population,
|
||||||
|
d.median_household_income,
|
||||||
|
|
||||||
|
-- Safety score: inverse of crime rate (higher = safer)
|
||||||
|
case
|
||||||
|
when c.crime_rate_per_100k is not null
|
||||||
|
then 100 - percent_rank() over (
|
||||||
|
partition by d.census_year
|
||||||
|
order by c.crime_rate_per_100k
|
||||||
|
) * 100
|
||||||
|
else null
|
||||||
|
end as safety_score,
|
||||||
|
|
||||||
|
-- Affordability score: inverse of rent-to-income ratio
|
||||||
|
case
|
||||||
|
when h.rent_to_income_pct is not null
|
||||||
|
then 100 - percent_rank() over (
|
||||||
|
partition by d.census_year
|
||||||
|
order by h.rent_to_income_pct
|
||||||
|
) * 100
|
||||||
|
else null
|
||||||
|
end as affordability_score,
|
||||||
|
|
||||||
|
-- Amenity score: based on amenities per capita
|
||||||
|
case
|
||||||
|
when a.total_amenities_per_1000 is not null
|
||||||
|
then percent_rank() over (
|
||||||
|
partition by d.census_year
|
||||||
|
order by a.total_amenities_per_1000
|
||||||
|
) * 100
|
||||||
|
else null
|
||||||
|
end as amenity_score,
|
||||||
|
|
||||||
|
-- Raw metrics for reference
|
||||||
|
c.crime_rate_per_100k,
|
||||||
|
h.rent_to_income_pct,
|
||||||
|
h.avg_rent_2bed,
|
||||||
|
a.total_amenities_per_1000
|
||||||
|
|
||||||
|
from demographics d
|
||||||
|
left join housing h
|
||||||
|
on d.neighbourhood_id = h.neighbourhood_id
|
||||||
|
and d.census_year = h.year
|
||||||
|
left join crime c
|
||||||
|
on d.neighbourhood_id = c.neighbourhood_id
|
||||||
|
and d.census_year = c.year
|
||||||
|
left join amenities a
|
||||||
|
on d.neighbourhood_id = a.neighbourhood_id
|
||||||
|
and d.census_year = a.year
|
||||||
|
),
|
||||||
|
|
||||||
|
final as (
|
||||||
|
select
|
||||||
|
neighbourhood_id,
|
||||||
|
neighbourhood_name,
|
||||||
|
geometry,
|
||||||
|
year,
|
||||||
|
population,
|
||||||
|
median_household_income,
|
||||||
|
|
||||||
|
-- Component scores (0-100)
|
||||||
|
round(safety_score::numeric, 1) as safety_score,
|
||||||
|
round(affordability_score::numeric, 1) as affordability_score,
|
||||||
|
round(amenity_score::numeric, 1) as amenity_score,
|
||||||
|
|
||||||
|
-- Composite livability score: safety (30%), affordability (40%), amenities (30%)
|
||||||
|
round(
|
||||||
|
(coalesce(safety_score, 50) * 0.30 +
|
||||||
|
coalesce(affordability_score, 50) * 0.40 +
|
||||||
|
coalesce(amenity_score, 50) * 0.30)::numeric,
|
||||||
|
1
|
||||||
|
) as livability_score,
|
||||||
|
|
||||||
|
-- Raw metrics
|
||||||
|
crime_rate_per_100k,
|
||||||
|
rent_to_income_pct,
|
||||||
|
avg_rent_2bed,
|
||||||
|
total_amenities_per_1000
|
||||||
|
|
||||||
|
from percentiles
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from final
|
||||||
78
dbt/models/marts/mart_neighbourhood_safety.sql
Normal file
78
dbt/models/marts/mart_neighbourhood_safety.sql
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
-- Mart: Neighbourhood Safety Analysis
|
||||||
|
-- Dashboard Tab: Safety
|
||||||
|
-- Grain: One row per neighbourhood per year
|
||||||
|
|
||||||
|
with crime as (
|
||||||
|
select * from {{ ref('int_neighbourhood__crime_summary') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
-- City-wide averages for comparison
|
||||||
|
city_avg as (
|
||||||
|
select
|
||||||
|
year,
|
||||||
|
avg(crime_rate_per_100k) as city_avg_crime_rate,
|
||||||
|
avg(assault_count) as city_avg_assault,
|
||||||
|
avg(auto_theft_count) as city_avg_auto_theft,
|
||||||
|
avg(break_enter_count) as city_avg_break_enter
|
||||||
|
from crime
|
||||||
|
group by year
|
||||||
|
),
|
||||||
|
|
||||||
|
final as (
|
||||||
|
select
|
||||||
|
c.neighbourhood_id,
|
||||||
|
c.neighbourhood_name,
|
||||||
|
c.geometry,
|
||||||
|
c.population,
|
||||||
|
c.year,
|
||||||
|
|
||||||
|
-- Total crime
|
||||||
|
c.total_incidents,
|
||||||
|
c.crime_rate_per_100k,
|
||||||
|
c.yoy_change_pct as crime_yoy_change_pct,
|
||||||
|
|
||||||
|
-- Crime breakdown
|
||||||
|
c.assault_count,
|
||||||
|
c.auto_theft_count,
|
||||||
|
c.break_enter_count,
|
||||||
|
c.robbery_count,
|
||||||
|
c.theft_over_count,
|
||||||
|
c.homicide_count,
|
||||||
|
|
||||||
|
-- Per 100K rates by type
|
||||||
|
case when c.population > 0
|
||||||
|
then round(c.assault_count::numeric / c.population * 100000, 2)
|
||||||
|
else null
|
||||||
|
end as assault_rate_per_100k,
|
||||||
|
|
||||||
|
case when c.population > 0
|
||||||
|
then round(c.auto_theft_count::numeric / c.population * 100000, 2)
|
||||||
|
else null
|
||||||
|
end as auto_theft_rate_per_100k,
|
||||||
|
|
||||||
|
case when c.population > 0
|
||||||
|
then round(c.break_enter_count::numeric / c.population * 100000, 2)
|
||||||
|
else null
|
||||||
|
end as break_enter_rate_per_100k,
|
||||||
|
|
||||||
|
-- Comparison to city average
|
||||||
|
round(ca.city_avg_crime_rate::numeric, 2) as city_avg_crime_rate,
|
||||||
|
|
||||||
|
-- Crime index (100 = city average)
|
||||||
|
case
|
||||||
|
when ca.city_avg_crime_rate > 0
|
||||||
|
then round(c.crime_rate_per_100k / ca.city_avg_crime_rate * 100, 1)
|
||||||
|
else null
|
||||||
|
end as crime_index,
|
||||||
|
|
||||||
|
-- Safety tier based on crime rate percentile
|
||||||
|
ntile(5) over (
|
||||||
|
partition by c.year
|
||||||
|
order by c.crime_rate_per_100k desc
|
||||||
|
) as safety_tier
|
||||||
|
|
||||||
|
from crime c
|
||||||
|
left join city_avg ca on c.year = ca.year
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from final
|
||||||
@@ -41,3 +41,59 @@ sources:
|
|||||||
columns:
|
columns:
|
||||||
- name: event_id
|
- name: event_id
|
||||||
description: "Primary key"
|
description: "Primary key"
|
||||||
|
|
||||||
|
- name: fact_census
|
||||||
|
description: "Census demographics by neighbourhood and year"
|
||||||
|
columns:
|
||||||
|
- name: id
|
||||||
|
description: "Primary key"
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Foreign key to dim_neighbourhood"
|
||||||
|
- name: census_year
|
||||||
|
description: "Census year (2016, 2021, etc.)"
|
||||||
|
- name: population
|
||||||
|
description: "Total population"
|
||||||
|
- name: median_household_income
|
||||||
|
description: "Median household income"
|
||||||
|
|
||||||
|
- name: fact_crime
|
||||||
|
description: "Crime statistics by neighbourhood, year, and type"
|
||||||
|
columns:
|
||||||
|
- name: id
|
||||||
|
description: "Primary key"
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Foreign key to dim_neighbourhood"
|
||||||
|
- name: year
|
||||||
|
description: "Statistics year"
|
||||||
|
- name: crime_type
|
||||||
|
description: "Type of crime"
|
||||||
|
- name: count
|
||||||
|
description: "Number of incidents"
|
||||||
|
- name: rate_per_100k
|
||||||
|
description: "Rate per 100,000 population"
|
||||||
|
|
||||||
|
- name: fact_amenities
|
||||||
|
description: "Amenity counts by neighbourhood and type"
|
||||||
|
columns:
|
||||||
|
- name: id
|
||||||
|
description: "Primary key"
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Foreign key to dim_neighbourhood"
|
||||||
|
- name: amenity_type
|
||||||
|
description: "Type of amenity (parks, schools, transit)"
|
||||||
|
- name: count
|
||||||
|
description: "Number of amenities"
|
||||||
|
- name: year
|
||||||
|
description: "Reference year"
|
||||||
|
|
||||||
|
- name: bridge_cmhc_neighbourhood
|
||||||
|
description: "CMHC zone to neighbourhood mapping with area weights"
|
||||||
|
columns:
|
||||||
|
- name: id
|
||||||
|
description: "Primary key"
|
||||||
|
- name: cmhc_zone_code
|
||||||
|
description: "CMHC zone code"
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood ID"
|
||||||
|
- name: weight
|
||||||
|
description: "Proportional area weight (0-1)"
|
||||||
|
|||||||
@@ -40,3 +40,90 @@ models:
|
|||||||
tests:
|
tests:
|
||||||
- unique
|
- unique
|
||||||
- not_null
|
- not_null
|
||||||
|
|
||||||
|
- name: stg_toronto__neighbourhoods
|
||||||
|
description: "Staged Toronto neighbourhood dimension (158 official boundaries)"
|
||||||
|
columns:
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood primary key"
|
||||||
|
tests:
|
||||||
|
- unique
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_name
|
||||||
|
description: "Official neighbourhood name"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: geometry
|
||||||
|
description: "PostGIS geometry (POLYGON)"
|
||||||
|
|
||||||
|
- name: stg_toronto__census
|
||||||
|
description: "Staged census demographics by neighbourhood"
|
||||||
|
columns:
|
||||||
|
- name: census_id
|
||||||
|
description: "Census record identifier"
|
||||||
|
tests:
|
||||||
|
- unique
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood foreign key"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: census_year
|
||||||
|
description: "Census year (2016, 2021)"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: stg_toronto__crime
|
||||||
|
description: "Staged crime statistics by neighbourhood"
|
||||||
|
columns:
|
||||||
|
- name: crime_id
|
||||||
|
description: "Crime record identifier"
|
||||||
|
tests:
|
||||||
|
- unique
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood foreign key"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: crime_type
|
||||||
|
description: "Type of crime"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: stg_toronto__amenities
|
||||||
|
description: "Staged amenity counts by neighbourhood"
|
||||||
|
columns:
|
||||||
|
- name: amenity_id
|
||||||
|
description: "Amenity record identifier"
|
||||||
|
tests:
|
||||||
|
- unique
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood foreign key"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: amenity_type
|
||||||
|
description: "Type of amenity"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
|
||||||
|
- name: stg_cmhc__zone_crosswalk
|
||||||
|
description: "Staged CMHC zone to neighbourhood crosswalk with area weights"
|
||||||
|
columns:
|
||||||
|
- name: crosswalk_id
|
||||||
|
description: "Crosswalk record identifier"
|
||||||
|
tests:
|
||||||
|
- unique
|
||||||
|
- not_null
|
||||||
|
- name: cmhc_zone_code
|
||||||
|
description: "CMHC zone code"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: neighbourhood_id
|
||||||
|
description: "Neighbourhood foreign key"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
- name: area_weight
|
||||||
|
description: "Proportional area weight (0-1)"
|
||||||
|
tests:
|
||||||
|
- not_null
|
||||||
|
|||||||
18
dbt/models/staging/stg_cmhc__zone_crosswalk.sql
Normal file
18
dbt/models/staging/stg_cmhc__zone_crosswalk.sql
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
-- Staged CMHC zone to neighbourhood crosswalk
|
||||||
|
-- Source: bridge_cmhc_neighbourhood table
|
||||||
|
-- Grain: One row per zone-neighbourhood intersection
|
||||||
|
|
||||||
|
with source as (
|
||||||
|
select * from {{ source('toronto_housing', 'bridge_cmhc_neighbourhood') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
staged as (
|
||||||
|
select
|
||||||
|
id as crosswalk_id,
|
||||||
|
cmhc_zone_code,
|
||||||
|
neighbourhood_id,
|
||||||
|
weight as area_weight
|
||||||
|
from source
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from staged
|
||||||
19
dbt/models/staging/stg_toronto__amenities.sql
Normal file
19
dbt/models/staging/stg_toronto__amenities.sql
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
-- Staged amenity counts by neighbourhood
|
||||||
|
-- Source: fact_amenities table
|
||||||
|
-- Grain: One row per neighbourhood per amenity type per year
|
||||||
|
|
||||||
|
with source as (
|
||||||
|
select * from {{ source('toronto_housing', 'fact_amenities') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
staged as (
|
||||||
|
select
|
||||||
|
id as amenity_id,
|
||||||
|
neighbourhood_id,
|
||||||
|
amenity_type,
|
||||||
|
count as amenity_count,
|
||||||
|
year as amenity_year
|
||||||
|
from source
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from staged
|
||||||
27
dbt/models/staging/stg_toronto__census.sql
Normal file
27
dbt/models/staging/stg_toronto__census.sql
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
-- Staged census demographics by neighbourhood
|
||||||
|
-- Source: fact_census table
|
||||||
|
-- Grain: One row per neighbourhood per census year
|
||||||
|
|
||||||
|
with source as (
|
||||||
|
select * from {{ source('toronto_housing', 'fact_census') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
staged as (
|
||||||
|
select
|
||||||
|
id as census_id,
|
||||||
|
neighbourhood_id,
|
||||||
|
census_year,
|
||||||
|
population,
|
||||||
|
population_density,
|
||||||
|
median_household_income,
|
||||||
|
average_household_income,
|
||||||
|
unemployment_rate,
|
||||||
|
pct_bachelors_or_higher,
|
||||||
|
pct_owner_occupied,
|
||||||
|
pct_renter_occupied,
|
||||||
|
median_age,
|
||||||
|
average_dwelling_value
|
||||||
|
from source
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from staged
|
||||||
20
dbt/models/staging/stg_toronto__crime.sql
Normal file
20
dbt/models/staging/stg_toronto__crime.sql
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
-- Staged crime statistics by neighbourhood
|
||||||
|
-- Source: fact_crime table
|
||||||
|
-- Grain: One row per neighbourhood per year per crime type
|
||||||
|
|
||||||
|
with source as (
|
||||||
|
select * from {{ source('toronto_housing', 'fact_crime') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
staged as (
|
||||||
|
select
|
||||||
|
id as crime_id,
|
||||||
|
neighbourhood_id,
|
||||||
|
year as crime_year,
|
||||||
|
crime_type,
|
||||||
|
count as incident_count,
|
||||||
|
rate_per_100k
|
||||||
|
from source
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from staged
|
||||||
25
dbt/models/staging/stg_toronto__neighbourhoods.sql
Normal file
25
dbt/models/staging/stg_toronto__neighbourhoods.sql
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
-- Staged Toronto neighbourhood dimension
|
||||||
|
-- Source: dim_neighbourhood table
|
||||||
|
-- Grain: One row per neighbourhood (158 total)
|
||||||
|
|
||||||
|
with source as (
|
||||||
|
select * from {{ source('toronto_housing', 'dim_neighbourhood') }}
|
||||||
|
),
|
||||||
|
|
||||||
|
staged as (
|
||||||
|
select
|
||||||
|
neighbourhood_id,
|
||||||
|
name as neighbourhood_name,
|
||||||
|
geometry,
|
||||||
|
population,
|
||||||
|
land_area_sqkm,
|
||||||
|
pop_density_per_sqkm,
|
||||||
|
pct_bachelors_or_higher,
|
||||||
|
median_household_income,
|
||||||
|
pct_owner_occupied,
|
||||||
|
pct_renter_occupied,
|
||||||
|
census_year
|
||||||
|
from source
|
||||||
|
)
|
||||||
|
|
||||||
|
select * from staged
|
||||||
11
dbt/package-lock.yml
Normal file
11
dbt/package-lock.yml
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
packages:
|
||||||
|
- name: dbt_utils
|
||||||
|
package: dbt-labs/dbt_utils
|
||||||
|
version: 1.3.3
|
||||||
|
- name: dbt_expectations
|
||||||
|
package: calogica/dbt_expectations
|
||||||
|
version: 0.10.4
|
||||||
|
- name: dbt_date
|
||||||
|
package: calogica/dbt_date
|
||||||
|
version: 0.10.1
|
||||||
|
sha1_hash: 51a51ab489f7b302c8745ae3c3781271816b01be
|
||||||
50
docs/project-lessons-learned/INDEX.md
Normal file
50
docs/project-lessons-learned/INDEX.md
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
# Project Lessons Learned
|
||||||
|
|
||||||
|
This folder contains lessons learned from sprints and development work. These lessons help prevent repeating mistakes and capture valuable insights.
|
||||||
|
|
||||||
|
**Note:** This is a temporary local backup while Wiki.js integration is being configured. Once Wiki.js is ready, lessons will be migrated there for better searchability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lessons Index
|
||||||
|
|
||||||
|
| Date | Sprint/Phase | Title | Tags |
|
||||||
|
|------|--------------|-------|------|
|
||||||
|
| 2026-01-16 | Phase 4 | [dbt Test Syntax Deprecation](./phase-4-dbt-test-syntax.md) | dbt, testing, yaml, deprecation |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to Use
|
||||||
|
|
||||||
|
### When Starting a Sprint
|
||||||
|
1. Review relevant lessons in this folder before implementation
|
||||||
|
2. Search by tags or keywords to find applicable insights
|
||||||
|
3. Apply prevention strategies from past lessons
|
||||||
|
|
||||||
|
### When Closing a Sprint
|
||||||
|
1. Document any significant lessons learned
|
||||||
|
2. Use the template below
|
||||||
|
3. Add entry to the index table above
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Lesson Template
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# [Sprint/Phase] - [Lesson Title]
|
||||||
|
|
||||||
|
## Context
|
||||||
|
[What were you trying to do?]
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
[What went wrong or what insight emerged?]
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
[How did you solve it?]
|
||||||
|
|
||||||
|
## Prevention
|
||||||
|
[How can this be avoided in future sprints?]
|
||||||
|
|
||||||
|
## Tags
|
||||||
|
[Comma-separated tags for search]
|
||||||
|
```
|
||||||
38
docs/project-lessons-learned/phase-4-dbt-test-syntax.md
Normal file
38
docs/project-lessons-learned/phase-4-dbt-test-syntax.md
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
# Phase 4 - dbt Test Syntax Deprecation
|
||||||
|
|
||||||
|
## Context
|
||||||
|
Implementing dbt mart models with `accepted_values` tests for tier columns (safety_tier, income_quintile, amenity_tier) that should only contain values 1-5.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
dbt 1.9+ introduced a deprecation warning for generic test arguments. The old syntax:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
tests:
|
||||||
|
- accepted_values:
|
||||||
|
values: [1, 2, 3, 4, 5]
|
||||||
|
```
|
||||||
|
|
||||||
|
Produces deprecation warnings:
|
||||||
|
```
|
||||||
|
MissingArgumentsPropertyInGenericTestDeprecation: Arguments to generic tests should be nested under the `arguments` property.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
Nest test arguments under the `arguments` property:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
tests:
|
||||||
|
- accepted_values:
|
||||||
|
arguments:
|
||||||
|
values: [1, 2, 3, 4, 5]
|
||||||
|
```
|
||||||
|
|
||||||
|
This applies to all generic tests with arguments, not just `accepted_values`.
|
||||||
|
|
||||||
|
## Prevention
|
||||||
|
- When writing dbt schema YAML files, always use the `arguments:` nesting for generic tests
|
||||||
|
- Run `dbt parse --no-partial-parse` to catch all deprecation warnings before they become errors
|
||||||
|
- Check dbt changelog when upgrading versions for breaking changes to test syntax
|
||||||
|
|
||||||
|
## Tags
|
||||||
|
dbt, testing, yaml, deprecation, syntax, schema
|
||||||
@@ -1,7 +1,15 @@
|
|||||||
"""Database loaders for Toronto housing data."""
|
"""Database loaders for Toronto housing data."""
|
||||||
|
|
||||||
|
from .amenities import load_amenities, load_amenity_counts
|
||||||
from .base import bulk_insert, get_session, upsert_by_key
|
from .base import bulk_insert, get_session, upsert_by_key
|
||||||
|
from .census import load_census_data
|
||||||
from .cmhc import load_cmhc_record, load_cmhc_rentals
|
from .cmhc import load_cmhc_record, load_cmhc_rentals
|
||||||
|
from .cmhc_crosswalk import (
|
||||||
|
build_cmhc_neighbourhood_crosswalk,
|
||||||
|
disaggregate_zone_value,
|
||||||
|
get_neighbourhood_weights_for_zone,
|
||||||
|
)
|
||||||
|
from .crime import load_crime_data
|
||||||
from .dimensions import (
|
from .dimensions import (
|
||||||
generate_date_key,
|
generate_date_key,
|
||||||
load_cmhc_zones,
|
load_cmhc_zones,
|
||||||
@@ -24,4 +32,13 @@ __all__ = [
|
|||||||
# Fact loaders
|
# Fact loaders
|
||||||
"load_cmhc_rentals",
|
"load_cmhc_rentals",
|
||||||
"load_cmhc_record",
|
"load_cmhc_record",
|
||||||
|
# Phase 3 loaders
|
||||||
|
"load_census_data",
|
||||||
|
"load_crime_data",
|
||||||
|
"load_amenities",
|
||||||
|
"load_amenity_counts",
|
||||||
|
# CMHC crosswalk
|
||||||
|
"build_cmhc_neighbourhood_crosswalk",
|
||||||
|
"get_neighbourhood_weights_for_zone",
|
||||||
|
"disaggregate_zone_value",
|
||||||
]
|
]
|
||||||
|
|||||||
93
portfolio_app/toronto/loaders/amenities.py
Normal file
93
portfolio_app/toronto/loaders/amenities.py
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
"""Loader for amenities data to fact_amenities table."""
|
||||||
|
|
||||||
|
from collections import Counter
|
||||||
|
|
||||||
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
|
from portfolio_app.toronto.models import FactAmenities
|
||||||
|
from portfolio_app.toronto.schemas import AmenityCount, AmenityRecord
|
||||||
|
|
||||||
|
from .base import get_session, upsert_by_key
|
||||||
|
|
||||||
|
|
||||||
|
def load_amenities(
|
||||||
|
records: list[AmenityRecord],
|
||||||
|
year: int,
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> int:
|
||||||
|
"""Load amenity records to fact_amenities table.
|
||||||
|
|
||||||
|
Aggregates individual amenity records into counts by neighbourhood
|
||||||
|
and amenity type before loading.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
records: List of validated AmenityRecord schemas.
|
||||||
|
year: Year to associate with the amenity counts.
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Number of records loaded (inserted + updated).
|
||||||
|
"""
|
||||||
|
# Aggregate records by neighbourhood and amenity type
|
||||||
|
counts: Counter[tuple[int, str]] = Counter()
|
||||||
|
for r in records:
|
||||||
|
key = (r.neighbourhood_id, r.amenity_type.value)
|
||||||
|
counts[key] += 1
|
||||||
|
|
||||||
|
# Convert to AmenityCount schemas then to models
|
||||||
|
def _load(sess: Session) -> int:
|
||||||
|
models = []
|
||||||
|
for (neighbourhood_id, amenity_type), count in counts.items():
|
||||||
|
model = FactAmenities(
|
||||||
|
neighbourhood_id=neighbourhood_id,
|
||||||
|
amenity_type=amenity_type,
|
||||||
|
count=count,
|
||||||
|
year=year,
|
||||||
|
)
|
||||||
|
models.append(model)
|
||||||
|
|
||||||
|
inserted, updated = upsert_by_key(
|
||||||
|
sess, FactAmenities, models, ["neighbourhood_id", "amenity_type", "year"]
|
||||||
|
)
|
||||||
|
return inserted + updated
|
||||||
|
|
||||||
|
if session:
|
||||||
|
return _load(session)
|
||||||
|
with get_session() as sess:
|
||||||
|
return _load(sess)
|
||||||
|
|
||||||
|
|
||||||
|
def load_amenity_counts(
|
||||||
|
records: list[AmenityCount],
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> int:
|
||||||
|
"""Load pre-aggregated amenity counts to fact_amenities table.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
records: List of validated AmenityCount schemas.
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Number of records loaded (inserted + updated).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _load(sess: Session) -> int:
|
||||||
|
models = []
|
||||||
|
for r in records:
|
||||||
|
model = FactAmenities(
|
||||||
|
neighbourhood_id=r.neighbourhood_id,
|
||||||
|
amenity_type=r.amenity_type.value,
|
||||||
|
count=r.count,
|
||||||
|
year=r.year,
|
||||||
|
)
|
||||||
|
models.append(model)
|
||||||
|
|
||||||
|
inserted, updated = upsert_by_key(
|
||||||
|
sess, FactAmenities, models, ["neighbourhood_id", "amenity_type", "year"]
|
||||||
|
)
|
||||||
|
return inserted + updated
|
||||||
|
|
||||||
|
if session:
|
||||||
|
return _load(session)
|
||||||
|
with get_session() as sess:
|
||||||
|
return _load(sess)
|
||||||
68
portfolio_app/toronto/loaders/census.py
Normal file
68
portfolio_app/toronto/loaders/census.py
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
"""Loader for census data to fact_census table."""
|
||||||
|
|
||||||
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
|
from portfolio_app.toronto.models import FactCensus
|
||||||
|
from portfolio_app.toronto.schemas import CensusRecord
|
||||||
|
|
||||||
|
from .base import get_session, upsert_by_key
|
||||||
|
|
||||||
|
|
||||||
|
def load_census_data(
|
||||||
|
records: list[CensusRecord],
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> int:
|
||||||
|
"""Load census records to fact_census table.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
records: List of validated CensusRecord schemas.
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Number of records loaded (inserted + updated).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _load(sess: Session) -> int:
|
||||||
|
models = []
|
||||||
|
for r in records:
|
||||||
|
model = FactCensus(
|
||||||
|
neighbourhood_id=r.neighbourhood_id,
|
||||||
|
census_year=r.census_year,
|
||||||
|
population=r.population,
|
||||||
|
population_density=float(r.population_density)
|
||||||
|
if r.population_density
|
||||||
|
else None,
|
||||||
|
median_household_income=float(r.median_household_income)
|
||||||
|
if r.median_household_income
|
||||||
|
else None,
|
||||||
|
average_household_income=float(r.average_household_income)
|
||||||
|
if r.average_household_income
|
||||||
|
else None,
|
||||||
|
unemployment_rate=float(r.unemployment_rate)
|
||||||
|
if r.unemployment_rate
|
||||||
|
else None,
|
||||||
|
pct_bachelors_or_higher=float(r.pct_bachelors_or_higher)
|
||||||
|
if r.pct_bachelors_or_higher
|
||||||
|
else None,
|
||||||
|
pct_owner_occupied=float(r.pct_owner_occupied)
|
||||||
|
if r.pct_owner_occupied
|
||||||
|
else None,
|
||||||
|
pct_renter_occupied=float(r.pct_renter_occupied)
|
||||||
|
if r.pct_renter_occupied
|
||||||
|
else None,
|
||||||
|
median_age=float(r.median_age) if r.median_age else None,
|
||||||
|
average_dwelling_value=float(r.average_dwelling_value)
|
||||||
|
if r.average_dwelling_value
|
||||||
|
else None,
|
||||||
|
)
|
||||||
|
models.append(model)
|
||||||
|
|
||||||
|
inserted, updated = upsert_by_key(
|
||||||
|
sess, FactCensus, models, ["neighbourhood_id", "census_year"]
|
||||||
|
)
|
||||||
|
return inserted + updated
|
||||||
|
|
||||||
|
if session:
|
||||||
|
return _load(session)
|
||||||
|
with get_session() as sess:
|
||||||
|
return _load(sess)
|
||||||
131
portfolio_app/toronto/loaders/cmhc_crosswalk.py
Normal file
131
portfolio_app/toronto/loaders/cmhc_crosswalk.py
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
"""Loader for CMHC zone to neighbourhood crosswalk with area weights."""
|
||||||
|
|
||||||
|
from sqlalchemy import text
|
||||||
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
|
from .base import get_session
|
||||||
|
|
||||||
|
|
||||||
|
def build_cmhc_neighbourhood_crosswalk(
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> int:
|
||||||
|
"""Calculate area overlap weights between CMHC zones and neighbourhoods.
|
||||||
|
|
||||||
|
Uses PostGIS ST_Intersection and ST_Area functions to compute the
|
||||||
|
proportion of each CMHC zone that overlaps with each neighbourhood.
|
||||||
|
This enables disaggregation of CMHC zone-level data to neighbourhood level.
|
||||||
|
|
||||||
|
The function is idempotent - it clears existing crosswalk data before
|
||||||
|
rebuilding.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Number of bridge records created.
|
||||||
|
|
||||||
|
Note:
|
||||||
|
Requires both dim_cmhc_zone and dim_neighbourhood tables to have
|
||||||
|
geometry columns populated with valid PostGIS geometries.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _build(sess: Session) -> int:
|
||||||
|
# Clear existing crosswalk data
|
||||||
|
sess.execute(text("DELETE FROM bridge_cmhc_neighbourhood"))
|
||||||
|
|
||||||
|
# Calculate overlap weights using PostGIS
|
||||||
|
# Weight = area of intersection / total area of CMHC zone
|
||||||
|
crosswalk_query = text(
|
||||||
|
"""
|
||||||
|
INSERT INTO bridge_cmhc_neighbourhood (cmhc_zone_code, neighbourhood_id, weight)
|
||||||
|
SELECT
|
||||||
|
z.zone_code,
|
||||||
|
n.neighbourhood_id,
|
||||||
|
CASE
|
||||||
|
WHEN ST_Area(z.geometry::geography) > 0 THEN
|
||||||
|
ST_Area(ST_Intersection(z.geometry, n.geometry)::geography) /
|
||||||
|
ST_Area(z.geometry::geography)
|
||||||
|
ELSE 0
|
||||||
|
END as weight
|
||||||
|
FROM dim_cmhc_zone z
|
||||||
|
JOIN dim_neighbourhood n
|
||||||
|
ON ST_Intersects(z.geometry, n.geometry)
|
||||||
|
WHERE
|
||||||
|
z.geometry IS NOT NULL
|
||||||
|
AND n.geometry IS NOT NULL
|
||||||
|
AND ST_Area(ST_Intersection(z.geometry, n.geometry)::geography) > 0
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
sess.execute(crosswalk_query)
|
||||||
|
|
||||||
|
# Count records created
|
||||||
|
count_result = sess.execute(
|
||||||
|
text("SELECT COUNT(*) FROM bridge_cmhc_neighbourhood")
|
||||||
|
)
|
||||||
|
count = count_result.scalar() or 0
|
||||||
|
|
||||||
|
return int(count)
|
||||||
|
|
||||||
|
if session:
|
||||||
|
return _build(session)
|
||||||
|
with get_session() as sess:
|
||||||
|
return _build(sess)
|
||||||
|
|
||||||
|
|
||||||
|
def get_neighbourhood_weights_for_zone(
|
||||||
|
zone_code: str,
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> list[tuple[int, float]]:
|
||||||
|
"""Get neighbourhood weights for a specific CMHC zone.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
zone_code: CMHC zone code.
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of (neighbourhood_id, weight) tuples.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _get(sess: Session) -> list[tuple[int, float]]:
|
||||||
|
result = sess.execute(
|
||||||
|
text(
|
||||||
|
"""
|
||||||
|
SELECT neighbourhood_id, weight
|
||||||
|
FROM bridge_cmhc_neighbourhood
|
||||||
|
WHERE cmhc_zone_code = :zone_code
|
||||||
|
ORDER BY weight DESC
|
||||||
|
"""
|
||||||
|
),
|
||||||
|
{"zone_code": zone_code},
|
||||||
|
)
|
||||||
|
return [(int(row[0]), float(row[1])) for row in result]
|
||||||
|
|
||||||
|
if session:
|
||||||
|
return _get(session)
|
||||||
|
with get_session() as sess:
|
||||||
|
return _get(sess)
|
||||||
|
|
||||||
|
|
||||||
|
def disaggregate_zone_value(
|
||||||
|
zone_code: str,
|
||||||
|
value: float,
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> dict[int, float]:
|
||||||
|
"""Disaggregate a CMHC zone value to neighbourhoods using weights.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
zone_code: CMHC zone code.
|
||||||
|
value: Value to disaggregate (e.g., average rent).
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary mapping neighbourhood_id to weighted value.
|
||||||
|
|
||||||
|
Note:
|
||||||
|
For averages (like rent), the weighted value represents the
|
||||||
|
contribution from this zone. To get a neighbourhood's total,
|
||||||
|
sum contributions from all overlapping zones.
|
||||||
|
"""
|
||||||
|
weights = get_neighbourhood_weights_for_zone(zone_code, session)
|
||||||
|
return {neighbourhood_id: value * weight for neighbourhood_id, weight in weights}
|
||||||
45
portfolio_app/toronto/loaders/crime.py
Normal file
45
portfolio_app/toronto/loaders/crime.py
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
"""Loader for crime data to fact_crime table."""
|
||||||
|
|
||||||
|
from sqlalchemy.orm import Session
|
||||||
|
|
||||||
|
from portfolio_app.toronto.models import FactCrime
|
||||||
|
from portfolio_app.toronto.schemas import CrimeRecord
|
||||||
|
|
||||||
|
from .base import get_session, upsert_by_key
|
||||||
|
|
||||||
|
|
||||||
|
def load_crime_data(
|
||||||
|
records: list[CrimeRecord],
|
||||||
|
session: Session | None = None,
|
||||||
|
) -> int:
|
||||||
|
"""Load crime records to fact_crime table.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
records: List of validated CrimeRecord schemas.
|
||||||
|
session: Optional existing session.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Number of records loaded (inserted + updated).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def _load(sess: Session) -> int:
|
||||||
|
models = []
|
||||||
|
for r in records:
|
||||||
|
model = FactCrime(
|
||||||
|
neighbourhood_id=r.neighbourhood_id,
|
||||||
|
year=r.year,
|
||||||
|
crime_type=r.crime_type.value,
|
||||||
|
count=r.count,
|
||||||
|
rate_per_100k=float(r.rate_per_100k) if r.rate_per_100k else None,
|
||||||
|
)
|
||||||
|
models.append(model)
|
||||||
|
|
||||||
|
inserted, updated = upsert_by_key(
|
||||||
|
sess, FactCrime, models, ["neighbourhood_id", "year", "crime_type"]
|
||||||
|
)
|
||||||
|
return inserted + updated
|
||||||
|
|
||||||
|
if session:
|
||||||
|
return _load(session)
|
||||||
|
with get_session() as sess:
|
||||||
|
return _load(sess)
|
||||||
@@ -7,7 +7,13 @@ from .dimensions import (
|
|||||||
DimPolicyEvent,
|
DimPolicyEvent,
|
||||||
DimTime,
|
DimTime,
|
||||||
)
|
)
|
||||||
from .facts import FactRentals
|
from .facts import (
|
||||||
|
BridgeCMHCNeighbourhood,
|
||||||
|
FactAmenities,
|
||||||
|
FactCensus,
|
||||||
|
FactCrime,
|
||||||
|
FactRentals,
|
||||||
|
)
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
# Base
|
# Base
|
||||||
@@ -22,4 +28,9 @@ __all__ = [
|
|||||||
"DimPolicyEvent",
|
"DimPolicyEvent",
|
||||||
# Facts
|
# Facts
|
||||||
"FactRentals",
|
"FactRentals",
|
||||||
|
"FactCensus",
|
||||||
|
"FactCrime",
|
||||||
|
"FactAmenities",
|
||||||
|
# Bridge tables
|
||||||
|
"BridgeCMHCNeighbourhood",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -1,11 +1,117 @@
|
|||||||
"""SQLAlchemy models for fact tables."""
|
"""SQLAlchemy models for fact tables."""
|
||||||
|
|
||||||
from sqlalchemy import ForeignKey, Integer, Numeric, String
|
from sqlalchemy import ForeignKey, Index, Integer, Numeric, String
|
||||||
from sqlalchemy.orm import Mapped, mapped_column, relationship
|
from sqlalchemy.orm import Mapped, mapped_column, relationship
|
||||||
|
|
||||||
from .base import Base
|
from .base import Base
|
||||||
|
|
||||||
|
|
||||||
|
class BridgeCMHCNeighbourhood(Base):
|
||||||
|
"""Bridge table for CMHC zone to neighbourhood mapping with area weights.
|
||||||
|
|
||||||
|
Enables disaggregation of CMHC zone-level rental data to neighbourhood level
|
||||||
|
using area-based proportional weights computed via PostGIS.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__tablename__ = "bridge_cmhc_neighbourhood"
|
||||||
|
|
||||||
|
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||||
|
cmhc_zone_code: Mapped[str] = mapped_column(String(10), nullable=False)
|
||||||
|
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
weight: Mapped[float] = mapped_column(
|
||||||
|
Numeric(5, 4), nullable=False
|
||||||
|
) # 0.0000 to 1.0000
|
||||||
|
|
||||||
|
__table_args__ = (
|
||||||
|
Index("ix_bridge_cmhc_zone", "cmhc_zone_code"),
|
||||||
|
Index("ix_bridge_neighbourhood", "neighbourhood_id"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FactCensus(Base):
|
||||||
|
"""Census statistics by neighbourhood and year.
|
||||||
|
|
||||||
|
Grain: One row per neighbourhood per census year.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__tablename__ = "fact_census"
|
||||||
|
|
||||||
|
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||||
|
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
census_year: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
population: Mapped[int | None] = mapped_column(Integer, nullable=True)
|
||||||
|
population_density: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(10, 2), nullable=True
|
||||||
|
)
|
||||||
|
median_household_income: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(12, 2), nullable=True
|
||||||
|
)
|
||||||
|
average_household_income: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(12, 2), nullable=True
|
||||||
|
)
|
||||||
|
unemployment_rate: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(5, 2), nullable=True
|
||||||
|
)
|
||||||
|
pct_bachelors_or_higher: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(5, 2), nullable=True
|
||||||
|
)
|
||||||
|
pct_owner_occupied: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(5, 2), nullable=True
|
||||||
|
)
|
||||||
|
pct_renter_occupied: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(5, 2), nullable=True
|
||||||
|
)
|
||||||
|
median_age: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
|
||||||
|
average_dwelling_value: Mapped[float | None] = mapped_column(
|
||||||
|
Numeric(12, 2), nullable=True
|
||||||
|
)
|
||||||
|
|
||||||
|
__table_args__ = (
|
||||||
|
Index("ix_fact_census_neighbourhood_year", "neighbourhood_id", "census_year"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FactCrime(Base):
|
||||||
|
"""Crime statistics by neighbourhood and year.
|
||||||
|
|
||||||
|
Grain: One row per neighbourhood per year per crime type.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__tablename__ = "fact_crime"
|
||||||
|
|
||||||
|
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||||
|
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
year: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
crime_type: Mapped[str] = mapped_column(String(50), nullable=False)
|
||||||
|
count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
rate_per_100k: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
|
||||||
|
|
||||||
|
__table_args__ = (
|
||||||
|
Index("ix_fact_crime_neighbourhood_year", "neighbourhood_id", "year"),
|
||||||
|
Index("ix_fact_crime_type", "crime_type"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FactAmenities(Base):
|
||||||
|
"""Amenity counts by neighbourhood.
|
||||||
|
|
||||||
|
Grain: One row per neighbourhood per amenity type per year.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__tablename__ = "fact_amenities"
|
||||||
|
|
||||||
|
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||||
|
neighbourhood_id: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
amenity_type: Mapped[str] = mapped_column(String(50), nullable=False)
|
||||||
|
count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
year: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||||
|
|
||||||
|
__table_args__ = (
|
||||||
|
Index("ix_fact_amenities_neighbourhood_year", "neighbourhood_id", "year"),
|
||||||
|
Index("ix_fact_amenities_type", "amenity_type"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class FactRentals(Base):
|
class FactRentals(Base):
|
||||||
"""Fact table for CMHC rental market data.
|
"""Fact table for CMHC rental market data.
|
||||||
|
|
||||||
|
|||||||
@@ -6,6 +6,8 @@ from .geo import (
|
|||||||
NeighbourhoodParser,
|
NeighbourhoodParser,
|
||||||
load_geojson,
|
load_geojson,
|
||||||
)
|
)
|
||||||
|
from .toronto_open_data import TorontoOpenDataParser
|
||||||
|
from .toronto_police import TorontoPoliceParser
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"CMHCParser",
|
"CMHCParser",
|
||||||
@@ -13,4 +15,7 @@ __all__ = [
|
|||||||
"CMHCZoneParser",
|
"CMHCZoneParser",
|
||||||
"NeighbourhoodParser",
|
"NeighbourhoodParser",
|
||||||
"load_geojson",
|
"load_geojson",
|
||||||
|
# API parsers (Phase 3)
|
||||||
|
"TorontoOpenDataParser",
|
||||||
|
"TorontoPoliceParser",
|
||||||
]
|
]
|
||||||
|
|||||||
391
portfolio_app/toronto/parsers/toronto_open_data.py
Normal file
391
portfolio_app/toronto/parsers/toronto_open_data.py
Normal file
@@ -0,0 +1,391 @@
|
|||||||
|
"""Parser for Toronto Open Data CKAN API.
|
||||||
|
|
||||||
|
Fetches neighbourhood boundaries, census profiles, and amenities data
|
||||||
|
from the City of Toronto's Open Data Portal.
|
||||||
|
|
||||||
|
API Documentation: https://open.toronto.ca/dataset/
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from decimal import Decimal
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from portfolio_app.toronto.schemas import (
|
||||||
|
AmenityRecord,
|
||||||
|
AmenityType,
|
||||||
|
CensusRecord,
|
||||||
|
NeighbourhoodRecord,
|
||||||
|
)
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class TorontoOpenDataParser:
|
||||||
|
"""Parser for Toronto Open Data CKAN API.
|
||||||
|
|
||||||
|
Provides methods to fetch and parse neighbourhood boundaries, census profiles,
|
||||||
|
and amenities (parks, schools, childcare) from the Toronto Open Data portal.
|
||||||
|
"""
|
||||||
|
|
||||||
|
BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
|
||||||
|
API_PATH = "/api/3/action"
|
||||||
|
|
||||||
|
# Dataset package IDs
|
||||||
|
DATASETS = {
|
||||||
|
"neighbourhoods": "neighbourhoods",
|
||||||
|
"neighbourhood_profiles": "neighbourhood-profiles",
|
||||||
|
"parks": "parks",
|
||||||
|
"schools": "school-locations-all-types",
|
||||||
|
"childcare": "licensed-child-care-centres",
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
cache_dir: Path | None = None,
|
||||||
|
timeout: float = 30.0,
|
||||||
|
) -> None:
|
||||||
|
"""Initialize parser.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
cache_dir: Optional directory for caching API responses.
|
||||||
|
timeout: HTTP request timeout in seconds.
|
||||||
|
"""
|
||||||
|
self._cache_dir = cache_dir
|
||||||
|
self._timeout = timeout
|
||||||
|
self._client: httpx.Client | None = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def client(self) -> httpx.Client:
|
||||||
|
"""Lazy-initialize HTTP client."""
|
||||||
|
if self._client is None:
|
||||||
|
self._client = httpx.Client(
|
||||||
|
base_url=self.BASE_URL,
|
||||||
|
timeout=self._timeout,
|
||||||
|
headers={"Accept": "application/json"},
|
||||||
|
)
|
||||||
|
return self._client
|
||||||
|
|
||||||
|
def close(self) -> None:
|
||||||
|
"""Close HTTP client."""
|
||||||
|
if self._client is not None:
|
||||||
|
self._client.close()
|
||||||
|
self._client = None
|
||||||
|
|
||||||
|
def __enter__(self) -> "TorontoOpenDataParser":
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, *args: Any) -> None:
|
||||||
|
self.close()
|
||||||
|
|
||||||
|
def _get_package(self, package_id: str) -> dict[str, Any]:
|
||||||
|
"""Fetch package metadata from CKAN.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
package_id: The package/dataset ID.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Package metadata dictionary.
|
||||||
|
"""
|
||||||
|
response = self.client.get(
|
||||||
|
f"{self.API_PATH}/package_show",
|
||||||
|
params={"id": package_id},
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
result = response.json()
|
||||||
|
|
||||||
|
if not result.get("success"):
|
||||||
|
raise ValueError(f"CKAN API error: {result.get('error', 'Unknown error')}")
|
||||||
|
|
||||||
|
return dict(result["result"])
|
||||||
|
|
||||||
|
def _get_resource_url(
|
||||||
|
self,
|
||||||
|
package_id: str,
|
||||||
|
format_filter: str = "geojson",
|
||||||
|
) -> str:
|
||||||
|
"""Get the download URL for a resource in a package.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
package_id: The package/dataset ID.
|
||||||
|
format_filter: Resource format to filter by (e.g., 'geojson', 'csv').
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Resource download URL.
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
ValueError: If no matching resource is found.
|
||||||
|
"""
|
||||||
|
package = self._get_package(package_id)
|
||||||
|
resources = package.get("resources", [])
|
||||||
|
|
||||||
|
for resource in resources:
|
||||||
|
resource_format = resource.get("format", "").lower()
|
||||||
|
if format_filter.lower() in resource_format:
|
||||||
|
return str(resource["url"])
|
||||||
|
|
||||||
|
available = [r.get("format") for r in resources]
|
||||||
|
raise ValueError(
|
||||||
|
f"No {format_filter} resource in {package_id}. Available: {available}"
|
||||||
|
)
|
||||||
|
|
||||||
|
def _fetch_geojson(self, package_id: str) -> dict[str, Any]:
|
||||||
|
"""Fetch GeoJSON data from a package.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
package_id: The package/dataset ID.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
GeoJSON FeatureCollection.
|
||||||
|
"""
|
||||||
|
# Check cache first
|
||||||
|
if self._cache_dir:
|
||||||
|
cache_file = self._cache_dir / f"{package_id}.geojson"
|
||||||
|
if cache_file.exists():
|
||||||
|
logger.debug(f"Loading {package_id} from cache")
|
||||||
|
with open(cache_file, encoding="utf-8") as f:
|
||||||
|
return dict(json.load(f))
|
||||||
|
|
||||||
|
url = self._get_resource_url(package_id, format_filter="geojson")
|
||||||
|
logger.info(f"Fetching GeoJSON from {url}")
|
||||||
|
|
||||||
|
response = self.client.get(url)
|
||||||
|
response.raise_for_status()
|
||||||
|
data = response.json()
|
||||||
|
|
||||||
|
# Cache the response
|
||||||
|
if self._cache_dir:
|
||||||
|
self._cache_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
cache_file = self._cache_dir / f"{package_id}.geojson"
|
||||||
|
with open(cache_file, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(data, f)
|
||||||
|
|
||||||
|
return dict(data)
|
||||||
|
|
||||||
|
def _fetch_csv_as_json(self, package_id: str) -> list[dict[str, Any]]:
|
||||||
|
"""Fetch CSV data as JSON records via CKAN datastore.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
package_id: The package/dataset ID.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of records as dictionaries.
|
||||||
|
"""
|
||||||
|
package = self._get_package(package_id)
|
||||||
|
resources = package.get("resources", [])
|
||||||
|
|
||||||
|
# Find a datastore-enabled resource
|
||||||
|
for resource in resources:
|
||||||
|
if resource.get("datastore_active"):
|
||||||
|
resource_id = resource["id"]
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
raise ValueError(f"No datastore resource in {package_id}")
|
||||||
|
|
||||||
|
# Fetch all records via datastore_search
|
||||||
|
records: list[dict[str, Any]] = []
|
||||||
|
offset = 0
|
||||||
|
limit = 1000
|
||||||
|
|
||||||
|
while True:
|
||||||
|
response = self.client.get(
|
||||||
|
f"{self.API_PATH}/datastore_search",
|
||||||
|
params={"id": resource_id, "limit": limit, "offset": offset},
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
result = response.json()
|
||||||
|
|
||||||
|
if not result.get("success"):
|
||||||
|
raise ValueError(f"Datastore error: {result.get('error')}")
|
||||||
|
|
||||||
|
batch = result["result"]["records"]
|
||||||
|
records.extend(batch)
|
||||||
|
|
||||||
|
if len(batch) < limit:
|
||||||
|
break
|
||||||
|
offset += limit
|
||||||
|
|
||||||
|
return records
|
||||||
|
|
||||||
|
def get_neighbourhoods(self) -> list[NeighbourhoodRecord]:
|
||||||
|
"""Fetch 158 Toronto neighbourhood boundaries.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of validated NeighbourhoodRecord objects.
|
||||||
|
"""
|
||||||
|
geojson = self._fetch_geojson(self.DATASETS["neighbourhoods"])
|
||||||
|
features = geojson.get("features", [])
|
||||||
|
|
||||||
|
records = []
|
||||||
|
for feature in features:
|
||||||
|
props = feature.get("properties", {})
|
||||||
|
geometry = feature.get("geometry")
|
||||||
|
|
||||||
|
# Extract area_id from various possible property names
|
||||||
|
area_id = props.get("AREA_ID") or props.get("area_id")
|
||||||
|
if area_id is None:
|
||||||
|
# Try AREA_SHORT_CODE as fallback
|
||||||
|
short_code = props.get("AREA_SHORT_CODE", "")
|
||||||
|
if short_code:
|
||||||
|
# Extract numeric part
|
||||||
|
area_id = int("".join(c for c in short_code if c.isdigit()) or "0")
|
||||||
|
|
||||||
|
area_name = (
|
||||||
|
props.get("AREA_NAME")
|
||||||
|
or props.get("area_name")
|
||||||
|
or f"Neighbourhood {area_id}"
|
||||||
|
)
|
||||||
|
area_short_code = props.get("AREA_SHORT_CODE") or props.get(
|
||||||
|
"area_short_code"
|
||||||
|
)
|
||||||
|
|
||||||
|
records.append(
|
||||||
|
NeighbourhoodRecord(
|
||||||
|
area_id=int(area_id),
|
||||||
|
area_name=str(area_name),
|
||||||
|
area_short_code=area_short_code,
|
||||||
|
geometry=geometry,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"Parsed {len(records)} neighbourhoods")
|
||||||
|
return records
|
||||||
|
|
||||||
|
def get_census_profiles(self, year: int = 2021) -> list[CensusRecord]:
|
||||||
|
"""Fetch neighbourhood census profiles.
|
||||||
|
|
||||||
|
Note: Census profile data structure varies by year. This method
|
||||||
|
extracts key demographic indicators where available.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
year: Census year (2016 or 2021).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of validated CensusRecord objects.
|
||||||
|
"""
|
||||||
|
# Census profiles are typically in CSV/datastore format
|
||||||
|
try:
|
||||||
|
raw_records = self._fetch_csv_as_json(
|
||||||
|
self.DATASETS["neighbourhood_profiles"]
|
||||||
|
)
|
||||||
|
except ValueError as e:
|
||||||
|
logger.warning(f"Could not fetch census profiles: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Census profiles are pivoted - rows are indicators, columns are neighbourhoods
|
||||||
|
# This requires special handling based on the actual data structure
|
||||||
|
logger.info(f"Fetched {len(raw_records)} census profile rows")
|
||||||
|
|
||||||
|
# For now, return empty list - actual implementation depends on data structure
|
||||||
|
# TODO: Implement census profile parsing based on actual data format
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_parks(self) -> list[AmenityRecord]:
|
||||||
|
"""Fetch park locations.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of validated AmenityRecord objects.
|
||||||
|
"""
|
||||||
|
return self._fetch_amenities(
|
||||||
|
self.DATASETS["parks"],
|
||||||
|
AmenityType.PARK,
|
||||||
|
name_field="ASSET_NAME",
|
||||||
|
address_field="ADDRESS_FULL",
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_schools(self) -> list[AmenityRecord]:
|
||||||
|
"""Fetch school locations.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of validated AmenityRecord objects.
|
||||||
|
"""
|
||||||
|
return self._fetch_amenities(
|
||||||
|
self.DATASETS["schools"],
|
||||||
|
AmenityType.SCHOOL,
|
||||||
|
name_field="NAME",
|
||||||
|
address_field="ADDRESS_FULL",
|
||||||
|
)
|
||||||
|
|
||||||
|
def get_childcare_centres(self) -> list[AmenityRecord]:
|
||||||
|
"""Fetch licensed childcare centre locations.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of validated AmenityRecord objects.
|
||||||
|
"""
|
||||||
|
return self._fetch_amenities(
|
||||||
|
self.DATASETS["childcare"],
|
||||||
|
AmenityType.CHILDCARE,
|
||||||
|
name_field="LOC_NAME",
|
||||||
|
address_field="ADDRESS",
|
||||||
|
)
|
||||||
|
|
||||||
|
def _fetch_amenities(
|
||||||
|
self,
|
||||||
|
package_id: str,
|
||||||
|
amenity_type: AmenityType,
|
||||||
|
name_field: str,
|
||||||
|
address_field: str,
|
||||||
|
) -> list[AmenityRecord]:
|
||||||
|
"""Fetch and parse amenity data from GeoJSON.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
package_id: CKAN package ID.
|
||||||
|
amenity_type: Type of amenity.
|
||||||
|
name_field: Property name containing amenity name.
|
||||||
|
address_field: Property name containing address.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of AmenityRecord objects.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
geojson = self._fetch_geojson(package_id)
|
||||||
|
except (httpx.HTTPError, ValueError) as e:
|
||||||
|
logger.warning(f"Could not fetch {package_id}: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
features = geojson.get("features", [])
|
||||||
|
records = []
|
||||||
|
|
||||||
|
for feature in features:
|
||||||
|
props = feature.get("properties", {})
|
||||||
|
geometry = feature.get("geometry")
|
||||||
|
|
||||||
|
# Get coordinates from geometry
|
||||||
|
lat, lon = None, None
|
||||||
|
if geometry and geometry.get("type") == "Point":
|
||||||
|
coords = geometry.get("coordinates", [])
|
||||||
|
if len(coords) >= 2:
|
||||||
|
lon, lat = coords[0], coords[1]
|
||||||
|
|
||||||
|
# Try to determine neighbourhood_id
|
||||||
|
# Many datasets include AREA_ID or similar
|
||||||
|
neighbourhood_id = (
|
||||||
|
props.get("AREA_ID")
|
||||||
|
or props.get("area_id")
|
||||||
|
or props.get("NEIGHBOURHOOD_ID")
|
||||||
|
or 0 # Will need spatial join if not available
|
||||||
|
)
|
||||||
|
|
||||||
|
name = props.get(name_field) or props.get(name_field.lower()) or "Unknown"
|
||||||
|
address = props.get(address_field) or props.get(address_field.lower())
|
||||||
|
|
||||||
|
# Skip if we don't have a neighbourhood assignment
|
||||||
|
if neighbourhood_id == 0:
|
||||||
|
continue
|
||||||
|
|
||||||
|
records.append(
|
||||||
|
AmenityRecord(
|
||||||
|
neighbourhood_id=int(neighbourhood_id),
|
||||||
|
amenity_type=amenity_type,
|
||||||
|
amenity_name=str(name)[:200],
|
||||||
|
address=str(address)[:300] if address else None,
|
||||||
|
latitude=Decimal(str(lat)) if lat else None,
|
||||||
|
longitude=Decimal(str(lon)) if lon else None,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"Parsed {len(records)} {amenity_type.value} records")
|
||||||
|
return records
|
||||||
371
portfolio_app/toronto/parsers/toronto_police.py
Normal file
371
portfolio_app/toronto/parsers/toronto_police.py
Normal file
@@ -0,0 +1,371 @@
|
|||||||
|
"""Parser for Toronto Police crime data via CKAN API.
|
||||||
|
|
||||||
|
Fetches neighbourhood crime rates and major crime indicators from the
|
||||||
|
Toronto Police Service data hosted on Toronto Open Data Portal.
|
||||||
|
|
||||||
|
Data Sources:
|
||||||
|
- Neighbourhood Crime Rates: Annual crime rates by neighbourhood
|
||||||
|
- Major Crime Indicators (MCI): Detailed incident-level data
|
||||||
|
"""
|
||||||
|
|
||||||
|
import contextlib
|
||||||
|
import logging
|
||||||
|
from decimal import Decimal
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from portfolio_app.toronto.schemas import CrimeRecord, CrimeType
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
# Mapping from Toronto Police crime categories to CrimeType enum
|
||||||
|
CRIME_TYPE_MAPPING: dict[str, CrimeType] = {
|
||||||
|
"assault": CrimeType.ASSAULT,
|
||||||
|
"assaults": CrimeType.ASSAULT,
|
||||||
|
"auto theft": CrimeType.AUTO_THEFT,
|
||||||
|
"autotheft": CrimeType.AUTO_THEFT,
|
||||||
|
"auto_theft": CrimeType.AUTO_THEFT,
|
||||||
|
"break and enter": CrimeType.BREAK_AND_ENTER,
|
||||||
|
"breakenter": CrimeType.BREAK_AND_ENTER,
|
||||||
|
"break_and_enter": CrimeType.BREAK_AND_ENTER,
|
||||||
|
"homicide": CrimeType.HOMICIDE,
|
||||||
|
"homicides": CrimeType.HOMICIDE,
|
||||||
|
"robbery": CrimeType.ROBBERY,
|
||||||
|
"robberies": CrimeType.ROBBERY,
|
||||||
|
"shooting": CrimeType.SHOOTING,
|
||||||
|
"shootings": CrimeType.SHOOTING,
|
||||||
|
"theft over": CrimeType.THEFT_OVER,
|
||||||
|
"theftover": CrimeType.THEFT_OVER,
|
||||||
|
"theft_over": CrimeType.THEFT_OVER,
|
||||||
|
"theft from motor vehicle": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
|
||||||
|
"theftfrommv": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
|
||||||
|
"theft_from_mv": CrimeType.THEFT_FROM_MOTOR_VEHICLE,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_crime_type(crime_str: str) -> CrimeType:
|
||||||
|
"""Normalize crime type string to CrimeType enum.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
crime_str: Raw crime type string from data source.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Matched CrimeType enum value, or CrimeType.OTHER if no match.
|
||||||
|
"""
|
||||||
|
normalized = crime_str.lower().strip().replace("-", " ").replace("_", " ")
|
||||||
|
return CRIME_TYPE_MAPPING.get(normalized, CrimeType.OTHER)
|
||||||
|
|
||||||
|
|
||||||
|
class TorontoPoliceParser:
|
||||||
|
"""Parser for Toronto Police crime data via CKAN API.
|
||||||
|
|
||||||
|
Crime data is hosted on Toronto Open Data Portal but sourced from
|
||||||
|
Toronto Police Service.
|
||||||
|
"""
|
||||||
|
|
||||||
|
BASE_URL = "https://ckan0.cf.opendata.inter.prod-toronto.ca"
|
||||||
|
API_PATH = "/api/3/action"
|
||||||
|
|
||||||
|
# Dataset package IDs
|
||||||
|
DATASETS = {
|
||||||
|
"crime_rates": "neighbourhood-crime-rates",
|
||||||
|
"mci": "major-crime-indicators",
|
||||||
|
"shootings": "shootings-firearm-discharges",
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, timeout: float = 30.0) -> None:
|
||||||
|
"""Initialize parser.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
timeout: HTTP request timeout in seconds.
|
||||||
|
"""
|
||||||
|
self._timeout = timeout
|
||||||
|
self._client: httpx.Client | None = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def client(self) -> httpx.Client:
|
||||||
|
"""Lazy-initialize HTTP client."""
|
||||||
|
if self._client is None:
|
||||||
|
self._client = httpx.Client(
|
||||||
|
base_url=self.BASE_URL,
|
||||||
|
timeout=self._timeout,
|
||||||
|
headers={"Accept": "application/json"},
|
||||||
|
)
|
||||||
|
return self._client
|
||||||
|
|
||||||
|
def close(self) -> None:
|
||||||
|
"""Close HTTP client."""
|
||||||
|
if self._client is not None:
|
||||||
|
self._client.close()
|
||||||
|
self._client = None
|
||||||
|
|
||||||
|
def __enter__(self) -> "TorontoPoliceParser":
|
||||||
|
return self
|
||||||
|
|
||||||
|
def __exit__(self, *args: Any) -> None:
|
||||||
|
self.close()
|
||||||
|
|
||||||
|
def _get_package(self, package_id: str) -> dict[str, Any]:
|
||||||
|
"""Fetch package metadata from CKAN."""
|
||||||
|
response = self.client.get(
|
||||||
|
f"{self.API_PATH}/package_show",
|
||||||
|
params={"id": package_id},
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
result = response.json()
|
||||||
|
|
||||||
|
if not result.get("success"):
|
||||||
|
raise ValueError(f"CKAN API error: {result.get('error', 'Unknown error')}")
|
||||||
|
|
||||||
|
return dict(result["result"])
|
||||||
|
|
||||||
|
def _fetch_datastore_records(
|
||||||
|
self,
|
||||||
|
package_id: str,
|
||||||
|
filters: dict[str, Any] | None = None,
|
||||||
|
) -> list[dict[str, Any]]:
|
||||||
|
"""Fetch records from CKAN datastore.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
package_id: CKAN package ID.
|
||||||
|
filters: Optional filters to apply.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of records as dictionaries.
|
||||||
|
"""
|
||||||
|
package = self._get_package(package_id)
|
||||||
|
resources = package.get("resources", [])
|
||||||
|
|
||||||
|
# Find datastore-enabled resource
|
||||||
|
resource_id = None
|
||||||
|
for resource in resources:
|
||||||
|
if resource.get("datastore_active"):
|
||||||
|
resource_id = resource["id"]
|
||||||
|
break
|
||||||
|
|
||||||
|
if not resource_id:
|
||||||
|
raise ValueError(f"No datastore resource in {package_id}")
|
||||||
|
|
||||||
|
# Fetch all records
|
||||||
|
records: list[dict[str, Any]] = []
|
||||||
|
offset = 0
|
||||||
|
limit = 1000
|
||||||
|
|
||||||
|
while True:
|
||||||
|
params: dict[str, Any] = {
|
||||||
|
"id": resource_id,
|
||||||
|
"limit": limit,
|
||||||
|
"offset": offset,
|
||||||
|
}
|
||||||
|
if filters:
|
||||||
|
params["filters"] = str(filters)
|
||||||
|
|
||||||
|
response = self.client.get(
|
||||||
|
f"{self.API_PATH}/datastore_search",
|
||||||
|
params=params,
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
result = response.json()
|
||||||
|
|
||||||
|
if not result.get("success"):
|
||||||
|
raise ValueError(f"Datastore error: {result.get('error')}")
|
||||||
|
|
||||||
|
batch = result["result"]["records"]
|
||||||
|
records.extend(batch)
|
||||||
|
|
||||||
|
if len(batch) < limit:
|
||||||
|
break
|
||||||
|
offset += limit
|
||||||
|
|
||||||
|
return records
|
||||||
|
|
||||||
|
def get_crime_rates(
|
||||||
|
self,
|
||||||
|
years: list[int] | None = None,
|
||||||
|
) -> list[CrimeRecord]:
|
||||||
|
"""Fetch neighbourhood crime rates.
|
||||||
|
|
||||||
|
The crime rates dataset contains annual counts and rates per 100k
|
||||||
|
population for each neighbourhood.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
years: Optional list of years to filter. If None, fetches all.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of validated CrimeRecord objects.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
raw_records = self._fetch_datastore_records(self.DATASETS["crime_rates"])
|
||||||
|
except (httpx.HTTPError, ValueError) as e:
|
||||||
|
logger.warning(f"Could not fetch crime rates: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
records = []
|
||||||
|
|
||||||
|
for row in raw_records:
|
||||||
|
# Extract neighbourhood ID (Hood_ID maps to AREA_ID)
|
||||||
|
hood_id = row.get("HOOD_ID") or row.get("Hood_ID") or row.get("hood_id")
|
||||||
|
if not hood_id:
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
neighbourhood_id = int(hood_id)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Crime rate data typically has columns like:
|
||||||
|
# ASSAULT_2019, ASSAULT_RATE_2019, AUTOTHEFT_2020, etc.
|
||||||
|
# We need to parse column names to extract crime type and year
|
||||||
|
|
||||||
|
for col_name, value in row.items():
|
||||||
|
if value is None or col_name in (
|
||||||
|
"_id",
|
||||||
|
"HOOD_ID",
|
||||||
|
"Hood_ID",
|
||||||
|
"hood_id",
|
||||||
|
"AREA_NAME",
|
||||||
|
"NEIGHBOURHOOD",
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Try to parse column name for crime type and year
|
||||||
|
# Pattern: CRIMETYPE_YEAR or CRIMETYPE_RATE_YEAR
|
||||||
|
parts = col_name.upper().split("_")
|
||||||
|
if len(parts) < 2:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if last part is a year
|
||||||
|
try:
|
||||||
|
year = int(parts[-1])
|
||||||
|
if year < 2014 or year > 2030:
|
||||||
|
continue
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Filter by years if specified
|
||||||
|
if years and year not in years:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if this is a rate column
|
||||||
|
is_rate = "RATE" in parts
|
||||||
|
|
||||||
|
# Extract crime type (everything before RATE/year)
|
||||||
|
if is_rate:
|
||||||
|
rate_idx = parts.index("RATE")
|
||||||
|
crime_type_str = "_".join(parts[:rate_idx])
|
||||||
|
else:
|
||||||
|
crime_type_str = "_".join(parts[:-1])
|
||||||
|
|
||||||
|
crime_type = _normalize_crime_type(crime_type_str)
|
||||||
|
|
||||||
|
try:
|
||||||
|
numeric_value = Decimal(str(value))
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
continue
|
||||||
|
|
||||||
|
if is_rate:
|
||||||
|
# This is a rate column - look for corresponding count
|
||||||
|
# We'll skip rate-only entries and create records from counts
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Find corresponding rate if available
|
||||||
|
rate_col = f"{crime_type_str}_RATE_{year}"
|
||||||
|
rate_value = row.get(rate_col)
|
||||||
|
rate_per_100k = None
|
||||||
|
if rate_value is not None:
|
||||||
|
with contextlib.suppress(ValueError, TypeError):
|
||||||
|
rate_per_100k = Decimal(str(rate_value))
|
||||||
|
|
||||||
|
records.append(
|
||||||
|
CrimeRecord(
|
||||||
|
neighbourhood_id=neighbourhood_id,
|
||||||
|
year=year,
|
||||||
|
crime_type=crime_type,
|
||||||
|
count=int(numeric_value),
|
||||||
|
rate_per_100k=rate_per_100k,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"Parsed {len(records)} crime rate records")
|
||||||
|
return records
|
||||||
|
|
||||||
|
def get_major_crime_indicators(
|
||||||
|
self,
|
||||||
|
years: list[int] | None = None,
|
||||||
|
) -> list[CrimeRecord]:
|
||||||
|
"""Fetch major crime indicators (detailed MCI data).
|
||||||
|
|
||||||
|
MCI data contains incident-level records that need to be aggregated
|
||||||
|
by neighbourhood and year.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
years: Optional list of years to filter.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of aggregated CrimeRecord objects.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
raw_records = self._fetch_datastore_records(self.DATASETS["mci"])
|
||||||
|
except (httpx.HTTPError, ValueError) as e:
|
||||||
|
logger.warning(f"Could not fetch MCI data: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
# Aggregate counts by neighbourhood, year, and crime type
|
||||||
|
aggregates: dict[tuple[int, int, CrimeType], int] = {}
|
||||||
|
|
||||||
|
for row in raw_records:
|
||||||
|
# Extract neighbourhood ID
|
||||||
|
hood_id = (
|
||||||
|
row.get("HOOD_158")
|
||||||
|
or row.get("HOOD_140")
|
||||||
|
or row.get("HOOD_ID")
|
||||||
|
or row.get("Hood_ID")
|
||||||
|
)
|
||||||
|
if not hood_id:
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
neighbourhood_id = int(hood_id)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Extract year from occurrence date
|
||||||
|
occ_year = row.get("OCC_YEAR") or row.get("REPORT_YEAR")
|
||||||
|
if not occ_year:
|
||||||
|
continue
|
||||||
|
|
||||||
|
try:
|
||||||
|
year = int(occ_year)
|
||||||
|
if year < 2014 or year > 2030:
|
||||||
|
continue
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Filter by years if specified
|
||||||
|
if years and year not in years:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Extract crime type
|
||||||
|
mci_category = row.get("MCI_CATEGORY") or row.get("OFFENCE") or ""
|
||||||
|
crime_type = _normalize_crime_type(str(mci_category))
|
||||||
|
|
||||||
|
# Aggregate count
|
||||||
|
key = (neighbourhood_id, year, crime_type)
|
||||||
|
aggregates[key] = aggregates.get(key, 0) + 1
|
||||||
|
|
||||||
|
# Convert aggregates to CrimeRecord objects
|
||||||
|
records = [
|
||||||
|
CrimeRecord(
|
||||||
|
neighbourhood_id=neighbourhood_id,
|
||||||
|
year=year,
|
||||||
|
crime_type=crime_type,
|
||||||
|
count=count,
|
||||||
|
rate_per_100k=None, # Would need population data to calculate
|
||||||
|
)
|
||||||
|
for (neighbourhood_id, year, crime_type), count in aggregates.items()
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Parsed {len(records)} MCI records (aggregated)")
|
||||||
|
return records
|
||||||
@@ -1,5 +1,6 @@
|
|||||||
"""Pydantic schemas for Toronto housing data validation."""
|
"""Pydantic schemas for Toronto housing data validation."""
|
||||||
|
|
||||||
|
from .amenities import AmenityCount, AmenityRecord, AmenityType
|
||||||
from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
|
from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
|
||||||
from .dimensions import (
|
from .dimensions import (
|
||||||
CMHCZone,
|
CMHCZone,
|
||||||
@@ -11,6 +12,7 @@ from .dimensions import (
|
|||||||
PolicyLevel,
|
PolicyLevel,
|
||||||
TimeDimension,
|
TimeDimension,
|
||||||
)
|
)
|
||||||
|
from .neighbourhood import CensusRecord, CrimeRecord, CrimeType, NeighbourhoodRecord
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
# CMHC
|
# CMHC
|
||||||
@@ -28,4 +30,13 @@ __all__ = [
|
|||||||
"PolicyCategory",
|
"PolicyCategory",
|
||||||
"ExpectedDirection",
|
"ExpectedDirection",
|
||||||
"Confidence",
|
"Confidence",
|
||||||
|
# Neighbourhood data (Phase 3)
|
||||||
|
"NeighbourhoodRecord",
|
||||||
|
"CensusRecord",
|
||||||
|
"CrimeRecord",
|
||||||
|
"CrimeType",
|
||||||
|
# Amenities (Phase 3)
|
||||||
|
"AmenityType",
|
||||||
|
"AmenityRecord",
|
||||||
|
"AmenityCount",
|
||||||
]
|
]
|
||||||
|
|||||||
60
portfolio_app/toronto/schemas/amenities.py
Normal file
60
portfolio_app/toronto/schemas/amenities.py
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
"""Pydantic schemas for Toronto amenities data.
|
||||||
|
|
||||||
|
Includes schemas for parks, schools, childcare centres, and transit stops.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from decimal import Decimal
|
||||||
|
from enum import Enum
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
class AmenityType(str, Enum):
|
||||||
|
"""Types of amenities tracked in the neighbourhood dashboard."""
|
||||||
|
|
||||||
|
PARK = "park"
|
||||||
|
SCHOOL = "school"
|
||||||
|
CHILDCARE = "childcare"
|
||||||
|
TRANSIT_STOP = "transit_stop"
|
||||||
|
LIBRARY = "library"
|
||||||
|
COMMUNITY_CENTRE = "community_centre"
|
||||||
|
HOSPITAL = "hospital"
|
||||||
|
|
||||||
|
|
||||||
|
class AmenityRecord(BaseModel):
|
||||||
|
"""Amenity location record for a neighbourhood.
|
||||||
|
|
||||||
|
Represents a single amenity (park, school, etc.) with its location
|
||||||
|
and associated neighbourhood.
|
||||||
|
"""
|
||||||
|
|
||||||
|
neighbourhood_id: int = Field(
|
||||||
|
ge=1, le=200, description="Neighbourhood ID containing this amenity"
|
||||||
|
)
|
||||||
|
amenity_type: AmenityType = Field(description="Type of amenity")
|
||||||
|
amenity_name: str = Field(max_length=200, description="Name of the amenity")
|
||||||
|
address: str | None = Field(
|
||||||
|
default=None, max_length=300, description="Street address"
|
||||||
|
)
|
||||||
|
latitude: Decimal | None = Field(
|
||||||
|
default=None, ge=-90, le=90, description="Latitude (WGS84)"
|
||||||
|
)
|
||||||
|
longitude: Decimal | None = Field(
|
||||||
|
default=None, ge=-180, le=180, description="Longitude (WGS84)"
|
||||||
|
)
|
||||||
|
|
||||||
|
model_config = {"str_strip_whitespace": True}
|
||||||
|
|
||||||
|
|
||||||
|
class AmenityCount(BaseModel):
|
||||||
|
"""Aggregated amenity count for a neighbourhood.
|
||||||
|
|
||||||
|
Used for dashboard metrics showing amenity density per neighbourhood.
|
||||||
|
"""
|
||||||
|
|
||||||
|
neighbourhood_id: int = Field(ge=1, le=200, description="Neighbourhood ID")
|
||||||
|
amenity_type: AmenityType = Field(description="Type of amenity")
|
||||||
|
count: int = Field(ge=0, description="Number of amenities of this type")
|
||||||
|
year: int = Field(ge=2020, le=2030, description="Year of data snapshot")
|
||||||
|
|
||||||
|
model_config = {"str_strip_whitespace": True}
|
||||||
106
portfolio_app/toronto/schemas/neighbourhood.py
Normal file
106
portfolio_app/toronto/schemas/neighbourhood.py
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
"""Pydantic schemas for Toronto neighbourhood data.
|
||||||
|
|
||||||
|
Includes schemas for neighbourhood boundaries, census profiles, and crime statistics.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from decimal import Decimal
|
||||||
|
from enum import Enum
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from pydantic import BaseModel, Field
|
||||||
|
|
||||||
|
|
||||||
|
class CrimeType(str, Enum):
|
||||||
|
"""Major crime indicator types from Toronto Police data."""
|
||||||
|
|
||||||
|
ASSAULT = "assault"
|
||||||
|
AUTO_THEFT = "auto_theft"
|
||||||
|
BREAK_AND_ENTER = "break_and_enter"
|
||||||
|
HOMICIDE = "homicide"
|
||||||
|
ROBBERY = "robbery"
|
||||||
|
SHOOTING = "shooting"
|
||||||
|
THEFT_OVER = "theft_over"
|
||||||
|
THEFT_FROM_MOTOR_VEHICLE = "theft_from_motor_vehicle"
|
||||||
|
OTHER = "other"
|
||||||
|
|
||||||
|
|
||||||
|
class NeighbourhoodRecord(BaseModel):
|
||||||
|
"""Schema for Toronto neighbourhood boundary data.
|
||||||
|
|
||||||
|
Based on City of Toronto's 158 neighbourhoods dataset.
|
||||||
|
AREA_ID maps to neighbourhood_id for consistency with police data (Hood_ID).
|
||||||
|
"""
|
||||||
|
|
||||||
|
area_id: int = Field(description="AREA_ID from Toronto Open Data (1-158)")
|
||||||
|
area_name: str = Field(max_length=100, description="Official neighbourhood name")
|
||||||
|
area_short_code: str | None = Field(
|
||||||
|
default=None, max_length=10, description="Short code (e.g., 'E01')"
|
||||||
|
)
|
||||||
|
geometry: dict[str, Any] | None = Field(
|
||||||
|
default=None, description="GeoJSON geometry object"
|
||||||
|
)
|
||||||
|
|
||||||
|
model_config = {"str_strip_whitespace": True}
|
||||||
|
|
||||||
|
|
||||||
|
class CensusRecord(BaseModel):
|
||||||
|
"""Census profile data for a neighbourhood.
|
||||||
|
|
||||||
|
Contains demographic and socioeconomic indicators from Statistics Canada
|
||||||
|
census data, aggregated to the neighbourhood level.
|
||||||
|
"""
|
||||||
|
|
||||||
|
neighbourhood_id: int = Field(
|
||||||
|
ge=1, le=200, description="Neighbourhood ID (AREA_ID)"
|
||||||
|
)
|
||||||
|
census_year: int = Field(ge=2016, le=2030, description="Census year")
|
||||||
|
population: int | None = Field(default=None, ge=0, description="Total population")
|
||||||
|
population_density: Decimal | None = Field(
|
||||||
|
default=None, ge=0, description="Population per square kilometre"
|
||||||
|
)
|
||||||
|
median_household_income: Decimal | None = Field(
|
||||||
|
default=None, ge=0, description="Median household income (CAD)"
|
||||||
|
)
|
||||||
|
average_household_income: Decimal | None = Field(
|
||||||
|
default=None, ge=0, description="Average household income (CAD)"
|
||||||
|
)
|
||||||
|
unemployment_rate: Decimal | None = Field(
|
||||||
|
default=None, ge=0, le=100, description="Unemployment rate percentage"
|
||||||
|
)
|
||||||
|
pct_bachelors_or_higher: Decimal | None = Field(
|
||||||
|
default=None, ge=0, le=100, description="Percentage with bachelor's degree+"
|
||||||
|
)
|
||||||
|
pct_owner_occupied: Decimal | None = Field(
|
||||||
|
default=None, ge=0, le=100, description="Percentage owner-occupied dwellings"
|
||||||
|
)
|
||||||
|
pct_renter_occupied: Decimal | None = Field(
|
||||||
|
default=None, ge=0, le=100, description="Percentage renter-occupied dwellings"
|
||||||
|
)
|
||||||
|
median_age: Decimal | None = Field(
|
||||||
|
default=None, ge=0, le=120, description="Median age of residents"
|
||||||
|
)
|
||||||
|
average_dwelling_value: Decimal | None = Field(
|
||||||
|
default=None, ge=0, description="Average dwelling value (CAD)"
|
||||||
|
)
|
||||||
|
|
||||||
|
model_config = {"str_strip_whitespace": True}
|
||||||
|
|
||||||
|
|
||||||
|
class CrimeRecord(BaseModel):
|
||||||
|
"""Crime statistics for a neighbourhood.
|
||||||
|
|
||||||
|
Based on Toronto Police neighbourhood crime rates data.
|
||||||
|
Hood_ID in source data maps to neighbourhood_id (AREA_ID).
|
||||||
|
"""
|
||||||
|
|
||||||
|
neighbourhood_id: int = Field(
|
||||||
|
ge=1, le=200, description="Neighbourhood ID (Hood_ID -> AREA_ID)"
|
||||||
|
)
|
||||||
|
year: int = Field(ge=2014, le=2030, description="Year of crime statistics")
|
||||||
|
crime_type: CrimeType = Field(description="Type of crime (MCI category)")
|
||||||
|
count: int = Field(ge=0, description="Number of incidents")
|
||||||
|
rate_per_100k: Decimal | None = Field(
|
||||||
|
default=None, ge=0, description="Rate per 100,000 population"
|
||||||
|
)
|
||||||
|
|
||||||
|
model_config = {"str_strip_whitespace": True}
|
||||||
Reference in New Issue
Block a user