Fixes identified by doc-guardian audit: Critical fixes: - DATABASE_SCHEMA.md: Fix staging model name stg_police__crimes → stg_toronto__crime - DATABASE_SCHEMA.md: Update mart model names to match actual dbt models - CLAUDE.md: Fix errors/ description (no handlers module exists) - scripts/etl/toronto.sh: Fix parser module references to actual modules Stale fixes: - CONTRIBUTING.md: Add make typecheck, test-cov; fix make ci description - PROJECT_REFERENCE.md: Document services/, callback modules, all Makefile targets - CLAUDE.md: Expand Makefile commands, add plugin documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
11 KiB
Database Schema
This document describes the PostgreSQL/PostGIS database schema for the Toronto Neighbourhood Dashboard.
Entity Relationship Diagram
erDiagram
dim_time {
int date_key PK
date full_date UK
int year
int month
int quarter
string month_name
bool is_month_start
}
dim_cmhc_zone {
int zone_key PK
string zone_code UK
string zone_name
geometry geometry
}
dim_neighbourhood {
int neighbourhood_id PK
string name
geometry geometry
int population
numeric land_area_sqkm
numeric pop_density_per_sqkm
numeric pct_bachelors_or_higher
numeric median_household_income
numeric pct_owner_occupied
numeric pct_renter_occupied
int census_year
}
dim_policy_event {
int event_id PK
date event_date
date effective_date
string level
string category
string title
text description
string expected_direction
string source_url
string confidence
}
fact_rentals {
int id PK
int date_key FK
int zone_key FK
string bedroom_type
int universe
numeric avg_rent
numeric median_rent
numeric vacancy_rate
numeric availability_rate
numeric turnover_rate
numeric rent_change_pct
string reliability_code
}
fact_census {
int id PK
int neighbourhood_id FK
int census_year
int population
numeric population_density
numeric median_household_income
numeric average_household_income
numeric unemployment_rate
numeric pct_bachelors_or_higher
numeric pct_owner_occupied
numeric pct_renter_occupied
numeric median_age
numeric average_dwelling_value
}
fact_crime {
int id PK
int neighbourhood_id FK
int year
string crime_type
int count
numeric rate_per_100k
}
fact_amenities {
int id PK
int neighbourhood_id FK
string amenity_type
int count
int year
}
bridge_cmhc_neighbourhood {
int id PK
string cmhc_zone_code FK
int neighbourhood_id FK
numeric weight
}
dim_time ||--o{ fact_rentals : "date_key"
dim_cmhc_zone ||--o{ fact_rentals : "zone_key"
dim_neighbourhood ||--o{ fact_census : "neighbourhood_id"
dim_neighbourhood ||--o{ fact_crime : "neighbourhood_id"
dim_neighbourhood ||--o{ fact_amenities : "neighbourhood_id"
dim_cmhc_zone ||--o{ bridge_cmhc_neighbourhood : "zone_code"
dim_neighbourhood ||--o{ bridge_cmhc_neighbourhood : "neighbourhood_id"
Schema Layers
Raw Schema
Raw data is loaded directly from external sources without transformation:
| Table | Source | Description |
|---|---|---|
raw.neighbourhoods |
City of Toronto API | GeoJSON neighbourhood boundaries |
raw.census_profiles |
City of Toronto API | Census profile data |
raw.crime_data |
Toronto Police API | Crime statistics by neighbourhood |
raw.cmhc_rentals |
CMHC Data Files | Rental market survey data |
Staging Schema (dbt)
Staging models provide 1:1 cleaned representations of source data:
| Model | Source Table | Purpose |
|---|---|---|
stg_toronto__neighbourhoods |
raw.neighbourhoods | Cleaned boundaries with standardized names |
stg_toronto__census |
raw.census_profiles | Typed census metrics |
stg_cmhc__rentals |
raw.cmhc_rentals | Validated rental data |
stg_toronto__crime |
raw.crime_data | Standardized crime categories |
stg_toronto__amenities |
raw.amenities | Typed amenity counts |
stg_dimensions__time |
generated | Time dimension |
stg_dimensions__cmhc_zones |
raw.cmhc_zones | CMHC zone boundaries |
stg_cmhc__zone_crosswalk |
raw.crosswalk | Zone-neighbourhood mapping |
Marts Schema (dbt)
Analytical tables ready for dashboard consumption:
| Model | Grain | Purpose |
|---|---|---|
mart_neighbourhood_overview |
neighbourhood | Composite livability scores |
mart_neighbourhood_housing |
neighbourhood | Housing and rent metrics |
mart_neighbourhood_safety |
neighbourhood × year | Crime rate calculations |
mart_neighbourhood_demographics |
neighbourhood | Income, age, population metrics |
mart_neighbourhood_amenities |
neighbourhood | Amenity accessibility scores |
mart_toronto_rentals |
zone × month | Time-series rental analysis |
Table Details
Dimension Tables
dim_time
Time dimension for date-based analysis. Grain: one row per month.
| Column | Type | Constraints | Description |
|---|---|---|---|
| date_key | INTEGER | PK | Surrogate key (YYYYMM format) |
| full_date | DATE | UNIQUE, NOT NULL | First day of month |
| year | INTEGER | NOT NULL | Calendar year |
| month | INTEGER | NOT NULL | Month number (1-12) |
| quarter | INTEGER | NOT NULL | Quarter (1-4) |
| month_name | VARCHAR(20) | NOT NULL | Month name |
| is_month_start | BOOLEAN | DEFAULT TRUE | Always true (monthly grain) |
dim_cmhc_zone
CMHC rental market zones (~20 zones covering Toronto).
| Column | Type | Constraints | Description |
|---|---|---|---|
| zone_key | INTEGER | PK, AUTO | Surrogate key |
| zone_code | VARCHAR(10) | UNIQUE, NOT NULL | CMHC zone identifier |
| zone_name | VARCHAR(100) | NOT NULL | Zone display name |
| geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS zone boundary |
dim_neighbourhood
Toronto's 158 official neighbourhoods.
| Column | Type | Constraints | Description |
|---|---|---|---|
| neighbourhood_id | INTEGER | PK | City-assigned ID |
| name | VARCHAR(100) | NOT NULL | Neighbourhood name |
| geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS boundary |
| population | INTEGER | Total population | |
| land_area_sqkm | NUMERIC(10,4) | Area in km² | |
| pop_density_per_sqkm | NUMERIC(10,2) | Population density | |
| pct_bachelors_or_higher | NUMERIC(5,2) | Education rate | |
| median_household_income | NUMERIC(12,2) | Median income | |
| pct_owner_occupied | NUMERIC(5,2) | Owner occupancy rate | |
| pct_renter_occupied | NUMERIC(5,2) | Renter occupancy rate | |
| census_year | INTEGER | DEFAULT 2021 | Census reference year |
dim_policy_event
Policy events for time-series annotation (rent control, interest rates, etc.).
| Column | Type | Constraints | Description |
|---|---|---|---|
| event_id | INTEGER | PK, AUTO | Surrogate key |
| event_date | DATE | NOT NULL | Announcement date |
| effective_date | DATE | Implementation date | |
| level | VARCHAR(20) | NOT NULL | federal/provincial/municipal |
| category | VARCHAR(20) | NOT NULL | monetary/tax/regulatory/supply/economic |
| title | VARCHAR(200) | NOT NULL | Event title |
| description | TEXT | Detailed description | |
| expected_direction | VARCHAR(10) | NOT NULL | bearish/bullish/neutral |
| source_url | VARCHAR(500) | Reference link | |
| confidence | VARCHAR(10) | DEFAULT 'medium' | high/medium/low |
Fact Tables
fact_rentals
CMHC rental market survey data. Grain: zone × bedroom type × survey date.
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | INTEGER | PK, AUTO | Surrogate key |
| date_key | INTEGER | FK → dim_time | Survey date reference |
| zone_key | INTEGER | FK → dim_cmhc_zone | CMHC zone reference |
| bedroom_type | VARCHAR(20) | NOT NULL | bachelor/1-bed/2-bed/3+bed/total |
| universe | INTEGER | Total rental units | |
| avg_rent | NUMERIC(10,2) | Average rent | |
| median_rent | NUMERIC(10,2) | Median rent | |
| vacancy_rate | NUMERIC(5,2) | Vacancy percentage | |
| availability_rate | NUMERIC(5,2) | Availability percentage | |
| turnover_rate | NUMERIC(5,2) | Turnover percentage | |
| rent_change_pct | NUMERIC(5,2) | Year-over-year change | |
| reliability_code | VARCHAR(2) | CMHC data quality code |
fact_census
Census statistics. Grain: neighbourhood × census year.
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | INTEGER | PK, AUTO | Surrogate key |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| census_year | INTEGER | NOT NULL | 2016, 2021, etc. |
| population | INTEGER | Total population | |
| population_density | NUMERIC(10,2) | People per km² | |
| median_household_income | NUMERIC(12,2) | Median income | |
| average_household_income | NUMERIC(12,2) | Average income | |
| unemployment_rate | NUMERIC(5,2) | Unemployment % | |
| pct_bachelors_or_higher | NUMERIC(5,2) | Education rate | |
| pct_owner_occupied | NUMERIC(5,2) | Owner rate | |
| pct_renter_occupied | NUMERIC(5,2) | Renter rate | |
| median_age | NUMERIC(5,2) | Median resident age | |
| average_dwelling_value | NUMERIC(12,2) | Average home value |
fact_crime
Crime statistics. Grain: neighbourhood × year × crime type.
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | INTEGER | PK, AUTO | Surrogate key |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| year | INTEGER | NOT NULL | Calendar year |
| crime_type | VARCHAR(50) | NOT NULL | Crime category |
| count | INTEGER | NOT NULL | Number of incidents |
| rate_per_100k | NUMERIC(10,2) | Rate per 100k population |
fact_amenities
Amenity counts. Grain: neighbourhood × amenity type × year.
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | INTEGER | PK, AUTO | Surrogate key |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| amenity_type | VARCHAR(50) | NOT NULL | parks/schools/transit/etc. |
| count | INTEGER | NOT NULL | Number of amenities |
| year | INTEGER | NOT NULL | Reference year |
Bridge Tables
bridge_cmhc_neighbourhood
Maps CMHC zones to neighbourhoods with area-based weights for data disaggregation.
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | INTEGER | PK, AUTO | Surrogate key |
| cmhc_zone_code | VARCHAR(10) | FK → dim_cmhc_zone | Zone reference |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| weight | NUMERIC(5,4) | NOT NULL | Proportional weight (0-1) |
Indexes
| Table | Index | Columns | Purpose |
|---|---|---|---|
| fact_rentals | ix_fact_rentals_date_zone | date_key, zone_key | Time-series queries |
| fact_census | ix_fact_census_neighbourhood_year | neighbourhood_id, census_year | Census lookups |
| fact_crime | ix_fact_crime_neighbourhood_year | neighbourhood_id, year | Crime trends |
| fact_crime | ix_fact_crime_type | crime_type | Crime filtering |
| fact_amenities | ix_fact_amenities_neighbourhood_year | neighbourhood_id, year | Amenity queries |
| fact_amenities | ix_fact_amenities_type | amenity_type | Amenity filtering |
| bridge_cmhc_neighbourhood | ix_bridge_cmhc_zone | cmhc_zone_code | Zone lookups |
| bridge_cmhc_neighbourhood | ix_bridge_neighbourhood | neighbourhood_id | Neighbourhood lookups |
PostGIS Extensions
The database requires PostGIS for geospatial operations:
CREATE EXTENSION IF NOT EXISTS postgis;
All geometry columns use SRID 4326 (WGS84) for compatibility with web mapping libraries.