Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- Create generate_schema_name macro to use custom schema names directly - Update dbt_project.yml schemas: staging→stg_toronto, intermediate→int_toronto, marts→mart_toronto - Add dbt/macros/toronto/ directory for future domain-specific macros - Fix documentation drift in PROJECT_REFERENCE.md (load-data-only→load-toronto-only) - Update DATABASE_SCHEMA.md with new schema names - Update CLAUDE.md database schemas table - Update adding-dashboard.md runbook with domain-scoped pattern Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
336 lines
12 KiB
Markdown
336 lines
12 KiB
Markdown
# Database Schema
|
||
|
||
This document describes the PostgreSQL/PostGIS database schema for the Toronto Neighbourhood Dashboard.
|
||
|
||
## Entity Relationship Diagram
|
||
|
||
```mermaid
|
||
erDiagram
|
||
dim_time {
|
||
int date_key PK
|
||
date full_date UK
|
||
int year
|
||
int month
|
||
int quarter
|
||
string month_name
|
||
bool is_month_start
|
||
}
|
||
|
||
dim_cmhc_zone {
|
||
int zone_key PK
|
||
string zone_code UK
|
||
string zone_name
|
||
geometry geometry
|
||
}
|
||
|
||
dim_neighbourhood {
|
||
int neighbourhood_id PK
|
||
string name
|
||
geometry geometry
|
||
int population
|
||
numeric land_area_sqkm
|
||
numeric pop_density_per_sqkm
|
||
numeric pct_bachelors_or_higher
|
||
numeric median_household_income
|
||
numeric pct_owner_occupied
|
||
numeric pct_renter_occupied
|
||
int census_year
|
||
}
|
||
|
||
dim_policy_event {
|
||
int event_id PK
|
||
date event_date
|
||
date effective_date
|
||
string level
|
||
string category
|
||
string title
|
||
text description
|
||
string expected_direction
|
||
string source_url
|
||
string confidence
|
||
}
|
||
|
||
fact_rentals {
|
||
int id PK
|
||
int date_key FK
|
||
int zone_key FK
|
||
string bedroom_type
|
||
int universe
|
||
numeric avg_rent
|
||
numeric median_rent
|
||
numeric vacancy_rate
|
||
numeric availability_rate
|
||
numeric turnover_rate
|
||
numeric rent_change_pct
|
||
string reliability_code
|
||
}
|
||
|
||
fact_census {
|
||
int id PK
|
||
int neighbourhood_id FK
|
||
int census_year
|
||
int population
|
||
numeric population_density
|
||
numeric median_household_income
|
||
numeric average_household_income
|
||
numeric unemployment_rate
|
||
numeric pct_bachelors_or_higher
|
||
numeric pct_owner_occupied
|
||
numeric pct_renter_occupied
|
||
numeric median_age
|
||
numeric average_dwelling_value
|
||
}
|
||
|
||
fact_crime {
|
||
int id PK
|
||
int neighbourhood_id FK
|
||
int year
|
||
string crime_type
|
||
int count
|
||
numeric rate_per_100k
|
||
}
|
||
|
||
fact_amenities {
|
||
int id PK
|
||
int neighbourhood_id FK
|
||
string amenity_type
|
||
int count
|
||
int year
|
||
}
|
||
|
||
bridge_cmhc_neighbourhood {
|
||
int id PK
|
||
string cmhc_zone_code FK
|
||
int neighbourhood_id FK
|
||
numeric weight
|
||
}
|
||
|
||
dim_time ||--o{ fact_rentals : "date_key"
|
||
dim_cmhc_zone ||--o{ fact_rentals : "zone_key"
|
||
dim_neighbourhood ||--o{ fact_census : "neighbourhood_id"
|
||
dim_neighbourhood ||--o{ fact_crime : "neighbourhood_id"
|
||
dim_neighbourhood ||--o{ fact_amenities : "neighbourhood_id"
|
||
dim_cmhc_zone ||--o{ bridge_cmhc_neighbourhood : "zone_code"
|
||
dim_neighbourhood ||--o{ bridge_cmhc_neighbourhood : "neighbourhood_id"
|
||
```
|
||
|
||
## Schema Layers
|
||
|
||
### Database Schemas
|
||
|
||
| Schema | Purpose | Managed By |
|
||
|--------|---------|------------|
|
||
| `public` | Shared dimensions (dim_time) | SQLAlchemy |
|
||
| `raw_toronto` | Toronto dimension and fact tables | SQLAlchemy |
|
||
| `stg_toronto` | Toronto staging models | dbt |
|
||
| `int_toronto` | Toronto intermediate models | dbt |
|
||
| `mart_toronto` | Toronto analytical tables | dbt |
|
||
|
||
### Raw Toronto Schema (raw_toronto)
|
||
|
||
Toronto-specific tables loaded by SQLAlchemy:
|
||
|
||
| Table | Source | Description |
|
||
|-------|--------|-------------|
|
||
| `dim_neighbourhood` | City of Toronto API | 158 neighbourhood boundaries |
|
||
| `dim_cmhc_zone` | CMHC | ~20 rental market zones |
|
||
| `dim_policy_event` | Manual | Policy events for annotation |
|
||
| `fact_census` | City of Toronto API | Census profile data |
|
||
| `fact_crime` | Toronto Police API | Crime statistics |
|
||
| `fact_amenities` | City of Toronto API | Amenity counts |
|
||
| `fact_rentals` | CMHC Data Files | Rental market survey data |
|
||
| `bridge_cmhc_neighbourhood` | Computed | Zone-neighbourhood mapping |
|
||
|
||
### Public Schema
|
||
|
||
Shared dimensions used across all projects:
|
||
|
||
| Table | Description |
|
||
|-------|-------------|
|
||
| `dim_time` | Time dimension (monthly grain) |
|
||
|
||
### Staging Schema - stg_toronto (dbt)
|
||
|
||
Staging models provide 1:1 cleaned representations of source data:
|
||
|
||
| Model | Source Table | Purpose |
|
||
|-------|-------------|---------|
|
||
| `stg_toronto__neighbourhoods` | raw.neighbourhoods | Cleaned boundaries with standardized names |
|
||
| `stg_toronto__census` | raw.census_profiles | Typed census metrics |
|
||
| `stg_cmhc__rentals` | raw.cmhc_rentals | Validated rental data |
|
||
| `stg_toronto__crime` | raw.crime_data | Standardized crime categories |
|
||
| `stg_toronto__amenities` | raw.amenities | Typed amenity counts |
|
||
| `stg_dimensions__time` | generated | Time dimension |
|
||
| `stg_dimensions__cmhc_zones` | raw.cmhc_zones | CMHC zone boundaries |
|
||
| `stg_cmhc__zone_crosswalk` | raw.crosswalk | Zone-neighbourhood mapping |
|
||
|
||
### Marts Schema - mart_toronto (dbt)
|
||
|
||
Analytical tables ready for dashboard consumption:
|
||
|
||
| Model | Grain | Purpose |
|
||
|-------|-------|---------|
|
||
| `mart_neighbourhood_overview` | neighbourhood | Composite livability scores |
|
||
| `mart_neighbourhood_housing` | neighbourhood | Housing and rent metrics |
|
||
| `mart_neighbourhood_safety` | neighbourhood × year | Crime rate calculations |
|
||
| `mart_neighbourhood_demographics` | neighbourhood | Income, age, population metrics |
|
||
| `mart_neighbourhood_amenities` | neighbourhood | Amenity accessibility scores |
|
||
| `mart_toronto_rentals` | zone × month | Time-series rental analysis |
|
||
|
||
## Table Details
|
||
|
||
### Dimension Tables
|
||
|
||
#### dim_time
|
||
Time dimension for date-based analysis. Grain: one row per month.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| date_key | INTEGER | PK | Surrogate key (YYYYMM format) |
|
||
| full_date | DATE | UNIQUE, NOT NULL | First day of month |
|
||
| year | INTEGER | NOT NULL | Calendar year |
|
||
| month | INTEGER | NOT NULL | Month number (1-12) |
|
||
| quarter | INTEGER | NOT NULL | Quarter (1-4) |
|
||
| month_name | VARCHAR(20) | NOT NULL | Month name |
|
||
| is_month_start | BOOLEAN | DEFAULT TRUE | Always true (monthly grain) |
|
||
|
||
#### dim_cmhc_zone
|
||
CMHC rental market zones (~20 zones covering Toronto).
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| zone_key | INTEGER | PK, AUTO | Surrogate key |
|
||
| zone_code | VARCHAR(10) | UNIQUE, NOT NULL | CMHC zone identifier |
|
||
| zone_name | VARCHAR(100) | NOT NULL | Zone display name |
|
||
| geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS zone boundary |
|
||
|
||
#### dim_neighbourhood
|
||
Toronto's 158 official neighbourhoods.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| neighbourhood_id | INTEGER | PK | City-assigned ID |
|
||
| name | VARCHAR(100) | NOT NULL | Neighbourhood name |
|
||
| geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS boundary |
|
||
| population | INTEGER | | Total population |
|
||
| land_area_sqkm | NUMERIC(10,4) | | Area in km² |
|
||
| pop_density_per_sqkm | NUMERIC(10,2) | | Population density |
|
||
| pct_bachelors_or_higher | NUMERIC(5,2) | | Education rate |
|
||
| median_household_income | NUMERIC(12,2) | | Median income |
|
||
| pct_owner_occupied | NUMERIC(5,2) | | Owner occupancy rate |
|
||
| pct_renter_occupied | NUMERIC(5,2) | | Renter occupancy rate |
|
||
| census_year | INTEGER | DEFAULT 2021 | Census reference year |
|
||
|
||
#### dim_policy_event
|
||
Policy events for time-series annotation (rent control, interest rates, etc.).
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| event_id | INTEGER | PK, AUTO | Surrogate key |
|
||
| event_date | DATE | NOT NULL | Announcement date |
|
||
| effective_date | DATE | | Implementation date |
|
||
| level | VARCHAR(20) | NOT NULL | federal/provincial/municipal |
|
||
| category | VARCHAR(20) | NOT NULL | monetary/tax/regulatory/supply/economic |
|
||
| title | VARCHAR(200) | NOT NULL | Event title |
|
||
| description | TEXT | | Detailed description |
|
||
| expected_direction | VARCHAR(10) | NOT NULL | bearish/bullish/neutral |
|
||
| source_url | VARCHAR(500) | | Reference link |
|
||
| confidence | VARCHAR(10) | DEFAULT 'medium' | high/medium/low |
|
||
|
||
### Fact Tables
|
||
|
||
#### fact_rentals
|
||
CMHC rental market survey data. Grain: zone × bedroom type × survey date.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| id | INTEGER | PK, AUTO | Surrogate key |
|
||
| date_key | INTEGER | FK → dim_time | Survey date reference |
|
||
| zone_key | INTEGER | FK → dim_cmhc_zone | CMHC zone reference |
|
||
| bedroom_type | VARCHAR(20) | NOT NULL | bachelor/1-bed/2-bed/3+bed/total |
|
||
| universe | INTEGER | | Total rental units |
|
||
| avg_rent | NUMERIC(10,2) | | Average rent |
|
||
| median_rent | NUMERIC(10,2) | | Median rent |
|
||
| vacancy_rate | NUMERIC(5,2) | | Vacancy percentage |
|
||
| availability_rate | NUMERIC(5,2) | | Availability percentage |
|
||
| turnover_rate | NUMERIC(5,2) | | Turnover percentage |
|
||
| rent_change_pct | NUMERIC(5,2) | | Year-over-year change |
|
||
| reliability_code | VARCHAR(2) | | CMHC data quality code |
|
||
|
||
#### fact_census
|
||
Census statistics. Grain: neighbourhood × census year.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| id | INTEGER | PK, AUTO | Surrogate key |
|
||
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
|
||
| census_year | INTEGER | NOT NULL | 2016, 2021, etc. |
|
||
| population | INTEGER | | Total population |
|
||
| population_density | NUMERIC(10,2) | | People per km² |
|
||
| median_household_income | NUMERIC(12,2) | | Median income |
|
||
| average_household_income | NUMERIC(12,2) | | Average income |
|
||
| unemployment_rate | NUMERIC(5,2) | | Unemployment % |
|
||
| pct_bachelors_or_higher | NUMERIC(5,2) | | Education rate |
|
||
| pct_owner_occupied | NUMERIC(5,2) | | Owner rate |
|
||
| pct_renter_occupied | NUMERIC(5,2) | | Renter rate |
|
||
| median_age | NUMERIC(5,2) | | Median resident age |
|
||
| average_dwelling_value | NUMERIC(12,2) | | Average home value |
|
||
|
||
#### fact_crime
|
||
Crime statistics. Grain: neighbourhood × year × crime type.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| id | INTEGER | PK, AUTO | Surrogate key |
|
||
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
|
||
| year | INTEGER | NOT NULL | Calendar year |
|
||
| crime_type | VARCHAR(50) | NOT NULL | Crime category |
|
||
| count | INTEGER | NOT NULL | Number of incidents |
|
||
| rate_per_100k | NUMERIC(10,2) | | Rate per 100k population |
|
||
|
||
#### fact_amenities
|
||
Amenity counts. Grain: neighbourhood × amenity type × year.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| id | INTEGER | PK, AUTO | Surrogate key |
|
||
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
|
||
| amenity_type | VARCHAR(50) | NOT NULL | parks/schools/transit/etc. |
|
||
| count | INTEGER | NOT NULL | Number of amenities |
|
||
| year | INTEGER | NOT NULL | Reference year |
|
||
|
||
### Bridge Tables
|
||
|
||
#### bridge_cmhc_neighbourhood
|
||
Maps CMHC zones to neighbourhoods with area-based weights for data disaggregation.
|
||
|
||
| Column | Type | Constraints | Description |
|
||
|--------|------|-------------|-------------|
|
||
| id | INTEGER | PK, AUTO | Surrogate key |
|
||
| cmhc_zone_code | VARCHAR(10) | FK → dim_cmhc_zone | Zone reference |
|
||
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
|
||
| weight | NUMERIC(5,4) | NOT NULL | Proportional weight (0-1) |
|
||
|
||
## Indexes
|
||
|
||
| Table | Index | Columns | Purpose |
|
||
|-------|-------|---------|---------|
|
||
| fact_rentals | ix_fact_rentals_date_zone | date_key, zone_key | Time-series queries |
|
||
| fact_census | ix_fact_census_neighbourhood_year | neighbourhood_id, census_year | Census lookups |
|
||
| fact_crime | ix_fact_crime_neighbourhood_year | neighbourhood_id, year | Crime trends |
|
||
| fact_crime | ix_fact_crime_type | crime_type | Crime filtering |
|
||
| fact_amenities | ix_fact_amenities_neighbourhood_year | neighbourhood_id, year | Amenity queries |
|
||
| fact_amenities | ix_fact_amenities_type | amenity_type | Amenity filtering |
|
||
| bridge_cmhc_neighbourhood | ix_bridge_cmhc_zone | cmhc_zone_code | Zone lookups |
|
||
| bridge_cmhc_neighbourhood | ix_bridge_neighbourhood | neighbourhood_id | Neighbourhood lookups |
|
||
|
||
## PostGIS Extensions
|
||
|
||
The database requires PostGIS for geospatial operations:
|
||
|
||
```sql
|
||
CREATE EXTENSION IF NOT EXISTS postgis;
|
||
```
|
||
|
||
All geometry columns use SRID 4326 (WGS84) for compatibility with web mapping libraries.
|