# Database Schema This document describes the PostgreSQL/PostGIS database schema for the Toronto Neighbourhood Dashboard. ## Entity Relationship Diagram ```mermaid erDiagram dim_time { int date_key PK date full_date UK int year int month int quarter string month_name bool is_month_start } dim_cmhc_zone { int zone_key PK string zone_code UK string zone_name geometry geometry } dim_neighbourhood { int neighbourhood_id PK string name geometry geometry int population numeric land_area_sqkm numeric pop_density_per_sqkm numeric pct_bachelors_or_higher numeric median_household_income numeric pct_owner_occupied numeric pct_renter_occupied int census_year } dim_policy_event { int event_id PK date event_date date effective_date string level string category string title text description string expected_direction string source_url string confidence } fact_rentals { int id PK int date_key FK int zone_key FK string bedroom_type int universe numeric avg_rent numeric median_rent numeric vacancy_rate numeric availability_rate numeric turnover_rate numeric rent_change_pct string reliability_code } fact_census { int id PK int neighbourhood_id FK int census_year int population numeric population_density numeric median_household_income numeric average_household_income numeric unemployment_rate numeric pct_bachelors_or_higher numeric pct_owner_occupied numeric pct_renter_occupied numeric median_age numeric average_dwelling_value } fact_crime { int id PK int neighbourhood_id FK int year string crime_type int count numeric rate_per_100k } fact_amenities { int id PK int neighbourhood_id FK string amenity_type int count int year } bridge_cmhc_neighbourhood { int id PK string cmhc_zone_code FK int neighbourhood_id FK numeric weight } dim_time ||--o{ fact_rentals : "date_key" dim_cmhc_zone ||--o{ fact_rentals : "zone_key" dim_neighbourhood ||--o{ fact_census : "neighbourhood_id" dim_neighbourhood ||--o{ fact_crime : "neighbourhood_id" dim_neighbourhood ||--o{ fact_amenities : "neighbourhood_id" dim_cmhc_zone ||--o{ bridge_cmhc_neighbourhood : "zone_code" dim_neighbourhood ||--o{ bridge_cmhc_neighbourhood : "neighbourhood_id" ``` ## Schema Layers ### Raw Schema Raw data is loaded directly from external sources without transformation: | Table | Source | Description | |-------|--------|-------------| | `raw.neighbourhoods` | City of Toronto API | GeoJSON neighbourhood boundaries | | `raw.census_profiles` | City of Toronto API | Census profile data | | `raw.crime_data` | Toronto Police API | Crime statistics by neighbourhood | | `raw.cmhc_rentals` | CMHC Data Files | Rental market survey data | ### Staging Schema (dbt) Staging models provide 1:1 cleaned representations of source data: | Model | Source Table | Purpose | |-------|-------------|---------| | `stg_toronto__neighbourhoods` | raw.neighbourhoods | Cleaned boundaries with standardized names | | `stg_toronto__census` | raw.census_profiles | Typed census metrics | | `stg_cmhc__rentals` | raw.cmhc_rentals | Validated rental data | | `stg_police__crimes` | raw.crime_data | Standardized crime categories | ### Marts Schema (dbt) Analytical tables ready for dashboard consumption: | Model | Grain | Purpose | |-------|-------|---------| | `mart_neighbourhood_summary` | neighbourhood | Composite livability scores | | `mart_rental_trends` | zone × month | Time-series rental analysis | | `mart_crime_rates` | neighbourhood × year | Crime rate calculations | | `mart_amenity_density` | neighbourhood | Amenity accessibility scores | ## Table Details ### Dimension Tables #### dim_time Time dimension for date-based analysis. Grain: one row per month. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | date_key | INTEGER | PK | Surrogate key (YYYYMM format) | | full_date | DATE | UNIQUE, NOT NULL | First day of month | | year | INTEGER | NOT NULL | Calendar year | | month | INTEGER | NOT NULL | Month number (1-12) | | quarter | INTEGER | NOT NULL | Quarter (1-4) | | month_name | VARCHAR(20) | NOT NULL | Month name | | is_month_start | BOOLEAN | DEFAULT TRUE | Always true (monthly grain) | #### dim_cmhc_zone CMHC rental market zones (~20 zones covering Toronto). | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | zone_key | INTEGER | PK, AUTO | Surrogate key | | zone_code | VARCHAR(10) | UNIQUE, NOT NULL | CMHC zone identifier | | zone_name | VARCHAR(100) | NOT NULL | Zone display name | | geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS zone boundary | #### dim_neighbourhood Toronto's 158 official neighbourhoods. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | neighbourhood_id | INTEGER | PK | City-assigned ID | | name | VARCHAR(100) | NOT NULL | Neighbourhood name | | geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS boundary | | population | INTEGER | | Total population | | land_area_sqkm | NUMERIC(10,4) | | Area in km² | | pop_density_per_sqkm | NUMERIC(10,2) | | Population density | | pct_bachelors_or_higher | NUMERIC(5,2) | | Education rate | | median_household_income | NUMERIC(12,2) | | Median income | | pct_owner_occupied | NUMERIC(5,2) | | Owner occupancy rate | | pct_renter_occupied | NUMERIC(5,2) | | Renter occupancy rate | | census_year | INTEGER | DEFAULT 2021 | Census reference year | #### dim_policy_event Policy events for time-series annotation (rent control, interest rates, etc.). | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | event_id | INTEGER | PK, AUTO | Surrogate key | | event_date | DATE | NOT NULL | Announcement date | | effective_date | DATE | | Implementation date | | level | VARCHAR(20) | NOT NULL | federal/provincial/municipal | | category | VARCHAR(20) | NOT NULL | monetary/tax/regulatory/supply/economic | | title | VARCHAR(200) | NOT NULL | Event title | | description | TEXT | | Detailed description | | expected_direction | VARCHAR(10) | NOT NULL | bearish/bullish/neutral | | source_url | VARCHAR(500) | | Reference link | | confidence | VARCHAR(10) | DEFAULT 'medium' | high/medium/low | ### Fact Tables #### fact_rentals CMHC rental market survey data. Grain: zone × bedroom type × survey date. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PK, AUTO | Surrogate key | | date_key | INTEGER | FK → dim_time | Survey date reference | | zone_key | INTEGER | FK → dim_cmhc_zone | CMHC zone reference | | bedroom_type | VARCHAR(20) | NOT NULL | bachelor/1-bed/2-bed/3+bed/total | | universe | INTEGER | | Total rental units | | avg_rent | NUMERIC(10,2) | | Average rent | | median_rent | NUMERIC(10,2) | | Median rent | | vacancy_rate | NUMERIC(5,2) | | Vacancy percentage | | availability_rate | NUMERIC(5,2) | | Availability percentage | | turnover_rate | NUMERIC(5,2) | | Turnover percentage | | rent_change_pct | NUMERIC(5,2) | | Year-over-year change | | reliability_code | VARCHAR(2) | | CMHC data quality code | #### fact_census Census statistics. Grain: neighbourhood × census year. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PK, AUTO | Surrogate key | | neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference | | census_year | INTEGER | NOT NULL | 2016, 2021, etc. | | population | INTEGER | | Total population | | population_density | NUMERIC(10,2) | | People per km² | | median_household_income | NUMERIC(12,2) | | Median income | | average_household_income | NUMERIC(12,2) | | Average income | | unemployment_rate | NUMERIC(5,2) | | Unemployment % | | pct_bachelors_or_higher | NUMERIC(5,2) | | Education rate | | pct_owner_occupied | NUMERIC(5,2) | | Owner rate | | pct_renter_occupied | NUMERIC(5,2) | | Renter rate | | median_age | NUMERIC(5,2) | | Median resident age | | average_dwelling_value | NUMERIC(12,2) | | Average home value | #### fact_crime Crime statistics. Grain: neighbourhood × year × crime type. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PK, AUTO | Surrogate key | | neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference | | year | INTEGER | NOT NULL | Calendar year | | crime_type | VARCHAR(50) | NOT NULL | Crime category | | count | INTEGER | NOT NULL | Number of incidents | | rate_per_100k | NUMERIC(10,2) | | Rate per 100k population | #### fact_amenities Amenity counts. Grain: neighbourhood × amenity type × year. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PK, AUTO | Surrogate key | | neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference | | amenity_type | VARCHAR(50) | NOT NULL | parks/schools/transit/etc. | | count | INTEGER | NOT NULL | Number of amenities | | year | INTEGER | NOT NULL | Reference year | ### Bridge Tables #### bridge_cmhc_neighbourhood Maps CMHC zones to neighbourhoods with area-based weights for data disaggregation. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | id | INTEGER | PK, AUTO | Surrogate key | | cmhc_zone_code | VARCHAR(10) | FK → dim_cmhc_zone | Zone reference | | neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference | | weight | NUMERIC(5,4) | NOT NULL | Proportional weight (0-1) | ## Indexes | Table | Index | Columns | Purpose | |-------|-------|---------|---------| | fact_rentals | ix_fact_rentals_date_zone | date_key, zone_key | Time-series queries | | fact_census | ix_fact_census_neighbourhood_year | neighbourhood_id, census_year | Census lookups | | fact_crime | ix_fact_crime_neighbourhood_year | neighbourhood_id, year | Crime trends | | fact_crime | ix_fact_crime_type | crime_type | Crime filtering | | fact_amenities | ix_fact_amenities_neighbourhood_year | neighbourhood_id, year | Amenity queries | | fact_amenities | ix_fact_amenities_type | amenity_type | Amenity filtering | | bridge_cmhc_neighbourhood | ix_bridge_cmhc_zone | cmhc_zone_code | Zone lookups | | bridge_cmhc_neighbourhood | ix_bridge_neighbourhood | neighbourhood_id | Neighbourhood lookups | ## PostGIS Extensions The database requires PostGIS for geospatial operations: ```sql CREATE EXTENSION IF NOT EXISTS postgis; ``` All geometry columns use SRID 4326 (WGS84) for compatibility with web mapping libraries.