Files
personal-portfolio/docs/DATABASE_SCHEMA.md
lmiranda bf6e392002
Some checks failed
CI / lint-and-test (push) Has been cancelled
feat: Sprint 10 - Architecture docs, CI/CD, operational scripts
Phase 1 - Architecture Documentation:
- Add Architecture section with Mermaid flowchart to README
- Create docs/DATABASE_SCHEMA.md with full ERD

Phase 2 - CI/CD:
- Add CI badge to README
- Create .gitea/workflows/ci.yml for linting and tests
- Create .gitea/workflows/deploy-staging.yml
- Create .gitea/workflows/deploy-production.yml

Phase 3 - Operational Scripts:
- Create scripts/logs.sh for docker compose log following
- Create scripts/run-detached.sh with health check loop
- Create scripts/etl/toronto.sh for Toronto data pipeline
- Add Makefile targets: logs, run-detached, etl-toronto

Phase 4 - Runbooks:
- Create docs/runbooks/adding-dashboard.md
- Create docs/runbooks/deployment.md

Phase 5 - Hygiene:
- Create MIT LICENSE file

Phase 6 - Production:
- Add live demo link to README (leodata.science)

Closes #78, #79, #80, #81, #82, #83, #84, #85, #86, #87, #88, #89, #91

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 17:10:30 -05:00

11 KiB
Raw Blame History

Database Schema

This document describes the PostgreSQL/PostGIS database schema for the Toronto Neighbourhood Dashboard.

Entity Relationship Diagram

erDiagram
    dim_time {
        int date_key PK
        date full_date UK
        int year
        int month
        int quarter
        string month_name
        bool is_month_start
    }

    dim_cmhc_zone {
        int zone_key PK
        string zone_code UK
        string zone_name
        geometry geometry
    }

    dim_neighbourhood {
        int neighbourhood_id PK
        string name
        geometry geometry
        int population
        numeric land_area_sqkm
        numeric pop_density_per_sqkm
        numeric pct_bachelors_or_higher
        numeric median_household_income
        numeric pct_owner_occupied
        numeric pct_renter_occupied
        int census_year
    }

    dim_policy_event {
        int event_id PK
        date event_date
        date effective_date
        string level
        string category
        string title
        text description
        string expected_direction
        string source_url
        string confidence
    }

    fact_rentals {
        int id PK
        int date_key FK
        int zone_key FK
        string bedroom_type
        int universe
        numeric avg_rent
        numeric median_rent
        numeric vacancy_rate
        numeric availability_rate
        numeric turnover_rate
        numeric rent_change_pct
        string reliability_code
    }

    fact_census {
        int id PK
        int neighbourhood_id FK
        int census_year
        int population
        numeric population_density
        numeric median_household_income
        numeric average_household_income
        numeric unemployment_rate
        numeric pct_bachelors_or_higher
        numeric pct_owner_occupied
        numeric pct_renter_occupied
        numeric median_age
        numeric average_dwelling_value
    }

    fact_crime {
        int id PK
        int neighbourhood_id FK
        int year
        string crime_type
        int count
        numeric rate_per_100k
    }

    fact_amenities {
        int id PK
        int neighbourhood_id FK
        string amenity_type
        int count
        int year
    }

    bridge_cmhc_neighbourhood {
        int id PK
        string cmhc_zone_code FK
        int neighbourhood_id FK
        numeric weight
    }

    dim_time ||--o{ fact_rentals : "date_key"
    dim_cmhc_zone ||--o{ fact_rentals : "zone_key"
    dim_neighbourhood ||--o{ fact_census : "neighbourhood_id"
    dim_neighbourhood ||--o{ fact_crime : "neighbourhood_id"
    dim_neighbourhood ||--o{ fact_amenities : "neighbourhood_id"
    dim_cmhc_zone ||--o{ bridge_cmhc_neighbourhood : "zone_code"
    dim_neighbourhood ||--o{ bridge_cmhc_neighbourhood : "neighbourhood_id"

Schema Layers

Raw Schema

Raw data is loaded directly from external sources without transformation:

Table Source Description
raw.neighbourhoods City of Toronto API GeoJSON neighbourhood boundaries
raw.census_profiles City of Toronto API Census profile data
raw.crime_data Toronto Police API Crime statistics by neighbourhood
raw.cmhc_rentals CMHC Data Files Rental market survey data

Staging Schema (dbt)

Staging models provide 1:1 cleaned representations of source data:

Model Source Table Purpose
stg_toronto__neighbourhoods raw.neighbourhoods Cleaned boundaries with standardized names
stg_toronto__census raw.census_profiles Typed census metrics
stg_cmhc__rentals raw.cmhc_rentals Validated rental data
stg_police__crimes raw.crime_data Standardized crime categories

Marts Schema (dbt)

Analytical tables ready for dashboard consumption:

Model Grain Purpose
mart_neighbourhood_summary neighbourhood Composite livability scores
mart_rental_trends zone × month Time-series rental analysis
mart_crime_rates neighbourhood × year Crime rate calculations
mart_amenity_density neighbourhood Amenity accessibility scores

Table Details

Dimension Tables

dim_time

Time dimension for date-based analysis. Grain: one row per month.

Column Type Constraints Description
date_key INTEGER PK Surrogate key (YYYYMM format)
full_date DATE UNIQUE, NOT NULL First day of month
year INTEGER NOT NULL Calendar year
month INTEGER NOT NULL Month number (1-12)
quarter INTEGER NOT NULL Quarter (1-4)
month_name VARCHAR(20) NOT NULL Month name
is_month_start BOOLEAN DEFAULT TRUE Always true (monthly grain)

dim_cmhc_zone

CMHC rental market zones (~20 zones covering Toronto).

Column Type Constraints Description
zone_key INTEGER PK, AUTO Surrogate key
zone_code VARCHAR(10) UNIQUE, NOT NULL CMHC zone identifier
zone_name VARCHAR(100) NOT NULL Zone display name
geometry GEOMETRY(POLYGON) SRID 4326 PostGIS zone boundary

dim_neighbourhood

Toronto's 158 official neighbourhoods.

Column Type Constraints Description
neighbourhood_id INTEGER PK City-assigned ID
name VARCHAR(100) NOT NULL Neighbourhood name
geometry GEOMETRY(POLYGON) SRID 4326 PostGIS boundary
population INTEGER Total population
land_area_sqkm NUMERIC(10,4) Area in km²
pop_density_per_sqkm NUMERIC(10,2) Population density
pct_bachelors_or_higher NUMERIC(5,2) Education rate
median_household_income NUMERIC(12,2) Median income
pct_owner_occupied NUMERIC(5,2) Owner occupancy rate
pct_renter_occupied NUMERIC(5,2) Renter occupancy rate
census_year INTEGER DEFAULT 2021 Census reference year

dim_policy_event

Policy events for time-series annotation (rent control, interest rates, etc.).

Column Type Constraints Description
event_id INTEGER PK, AUTO Surrogate key
event_date DATE NOT NULL Announcement date
effective_date DATE Implementation date
level VARCHAR(20) NOT NULL federal/provincial/municipal
category VARCHAR(20) NOT NULL monetary/tax/regulatory/supply/economic
title VARCHAR(200) NOT NULL Event title
description TEXT Detailed description
expected_direction VARCHAR(10) NOT NULL bearish/bullish/neutral
source_url VARCHAR(500) Reference link
confidence VARCHAR(10) DEFAULT 'medium' high/medium/low

Fact Tables

fact_rentals

CMHC rental market survey data. Grain: zone × bedroom type × survey date.

Column Type Constraints Description
id INTEGER PK, AUTO Surrogate key
date_key INTEGER FK → dim_time Survey date reference
zone_key INTEGER FK → dim_cmhc_zone CMHC zone reference
bedroom_type VARCHAR(20) NOT NULL bachelor/1-bed/2-bed/3+bed/total
universe INTEGER Total rental units
avg_rent NUMERIC(10,2) Average rent
median_rent NUMERIC(10,2) Median rent
vacancy_rate NUMERIC(5,2) Vacancy percentage
availability_rate NUMERIC(5,2) Availability percentage
turnover_rate NUMERIC(5,2) Turnover percentage
rent_change_pct NUMERIC(5,2) Year-over-year change
reliability_code VARCHAR(2) CMHC data quality code

fact_census

Census statistics. Grain: neighbourhood × census year.

Column Type Constraints Description
id INTEGER PK, AUTO Surrogate key
neighbourhood_id INTEGER FK → dim_neighbourhood Neighbourhood reference
census_year INTEGER NOT NULL 2016, 2021, etc.
population INTEGER Total population
population_density NUMERIC(10,2) People per km²
median_household_income NUMERIC(12,2) Median income
average_household_income NUMERIC(12,2) Average income
unemployment_rate NUMERIC(5,2) Unemployment %
pct_bachelors_or_higher NUMERIC(5,2) Education rate
pct_owner_occupied NUMERIC(5,2) Owner rate
pct_renter_occupied NUMERIC(5,2) Renter rate
median_age NUMERIC(5,2) Median resident age
average_dwelling_value NUMERIC(12,2) Average home value

fact_crime

Crime statistics. Grain: neighbourhood × year × crime type.

Column Type Constraints Description
id INTEGER PK, AUTO Surrogate key
neighbourhood_id INTEGER FK → dim_neighbourhood Neighbourhood reference
year INTEGER NOT NULL Calendar year
crime_type VARCHAR(50) NOT NULL Crime category
count INTEGER NOT NULL Number of incidents
rate_per_100k NUMERIC(10,2) Rate per 100k population

fact_amenities

Amenity counts. Grain: neighbourhood × amenity type × year.

Column Type Constraints Description
id INTEGER PK, AUTO Surrogate key
neighbourhood_id INTEGER FK → dim_neighbourhood Neighbourhood reference
amenity_type VARCHAR(50) NOT NULL parks/schools/transit/etc.
count INTEGER NOT NULL Number of amenities
year INTEGER NOT NULL Reference year

Bridge Tables

bridge_cmhc_neighbourhood

Maps CMHC zones to neighbourhoods with area-based weights for data disaggregation.

Column Type Constraints Description
id INTEGER PK, AUTO Surrogate key
cmhc_zone_code VARCHAR(10) FK → dim_cmhc_zone Zone reference
neighbourhood_id INTEGER FK → dim_neighbourhood Neighbourhood reference
weight NUMERIC(5,4) NOT NULL Proportional weight (0-1)

Indexes

Table Index Columns Purpose
fact_rentals ix_fact_rentals_date_zone date_key, zone_key Time-series queries
fact_census ix_fact_census_neighbourhood_year neighbourhood_id, census_year Census lookups
fact_crime ix_fact_crime_neighbourhood_year neighbourhood_id, year Crime trends
fact_crime ix_fact_crime_type crime_type Crime filtering
fact_amenities ix_fact_amenities_neighbourhood_year neighbourhood_id, year Amenity queries
fact_amenities ix_fact_amenities_type amenity_type Amenity filtering
bridge_cmhc_neighbourhood ix_bridge_cmhc_zone cmhc_zone_code Zone lookups
bridge_cmhc_neighbourhood ix_bridge_neighbourhood neighbourhood_id Neighbourhood lookups

PostGIS Extensions

The database requires PostGIS for geospatial operations:

CREATE EXTENSION IF NOT EXISTS postgis;

All geometry columns use SRID 4326 (WGS84) for compatibility with web mapping libraries.