54 Commits

Author SHA1 Message Date
a2c213be5d Merge pull request 'development' (#106) from development into main
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Production / deploy (push) Has been cancelled
Reviewed-on: #106
2026-02-02 22:02:31 +00:00
0455ec69a0 Merge branch 'main' into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-02-02 22:02:26 +00:00
9e216962b1 Merge pull request 'refactor: domain-scoped schema migration for application code' (#104) from feature/domain-scoped-schema-migration into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
Reviewed-on: #104
2026-02-02 22:01:48 +00:00
dfa5f92d8a refactor: update app code for domain-scoped schema migration
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- Update dbt model references to use new schema naming (stg_toronto, int_toronto, mart_toronto)
- Refactor figure factories to use consistent column naming from new schema
- Update callbacks to work with refactored data structures
- Add centralized design tokens module for consistent styling
- Streamline CLAUDE.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 17:00:30 -05:00
0c9769fd27 Merge branch 'staging' into main
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Production / deploy (push) Has been cancelled
CI / lint-and-test (pull_request) Has been cancelled
2026-02-02 17:34:58 +00:00
cb908a18c3 Merge pull request 'Merge pull request 'development' (#98) from development into staging' (#101) from staging into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
Reviewed-on: #101
2026-02-02 17:34:27 +00:00
558022f26e Merge branch 'development' into staging
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Staging / deploy (push) Has been cancelled
CI / lint-and-test (pull_request) Has been cancelled
2026-02-02 17:34:14 +00:00
9e27fb8011 Merge pull request 'refactor(dbt): migrate to domain-scoped schema names' (#100) from feature/domain-scoped-schema-migration into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
Reviewed-on: #100
2026-02-02 17:33:40 +00:00
cda2a078d9 refactor(dbt): migrate to domain-scoped schema names
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- Create generate_schema_name macro to use custom schema names directly
- Update dbt_project.yml schemas: staging→stg_toronto, intermediate→int_toronto, marts→mart_toronto
- Add dbt/macros/toronto/ directory for future domain-specific macros
- Fix documentation drift in PROJECT_REFERENCE.md (load-data-only→load-toronto-only)
- Update DATABASE_SCHEMA.md with new schema names
- Update CLAUDE.md database schemas table
- Update adding-dashboard.md runbook with domain-scoped pattern

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 12:32:39 -05:00
dd8de9810d Merge pull request 'development' (#99) from development into main
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Production / deploy (push) Has been cancelled
Reviewed-on: #99
2026-02-02 00:39:19 +00:00
56bcc1bb1d Merge branch 'main' into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-02-02 00:39:13 +00:00
ee0a7ef7ad Merge pull request 'development' (#98) from development into staging
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Staging / deploy (push) Has been cancelled
CI / lint-and-test (pull_request) Has been cancelled
Reviewed-on: #98
2026-02-02 00:19:29 +00:00
fd9850778e Merge branch 'staging' into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-02-02 00:19:24 +00:00
01e98103c7 Merge pull request 'refactor: multi-dashboard structural migration' (#97) from feature/multi-dashboard-structure into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
Reviewed-on: #97
2026-02-02 00:18:45 +00:00
62d1a52eed refactor: multi-dashboard structural migration
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- Rename dbt project from toronto_housing to portfolio
- Restructure dbt models into domain subdirectories:
  - shared/ for cross-domain dimensions (dim_time)
  - staging/toronto/, intermediate/toronto/, marts/toronto/
- Update SQLAlchemy models for raw_toronto schema
- Add explicit cross-schema FK relationships for FactRentals
- Namespace figure factories under figures/toronto/
- Namespace notebooks under notebooks/toronto/
- Update Makefile with domain-specific targets and env loading
- Update all documentation for multi-dashboard structure

This enables adding new dashboard projects (e.g., /football, /energy)
without structural conflicts or naming collisions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 19:08:20 -05:00
e37611673f Merge pull request 'staging' (#96) from staging into main
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Production / deploy (push) Has been cancelled
Reviewed-on: #96
2026-02-01 21:33:12 +00:00
33306a911b Merge pull request 'development' (#95) from development into staging
Some checks failed
CI / lint-and-test (push) Has been cancelled
Deploy to Staging / deploy (push) Has been cancelled
Reviewed-on: #95
2026-02-01 21:32:41 +00:00
a5d6866d63 feat(contact): implement Formspree contact form submission
Some checks failed
CI / lint-and-test (push) Has been cancelled
- Enable contact form fields with component IDs
- Add callback for Formspree POST with JSON/AJAX
- Include honeypot spam protection (_gotcha field)
- Handle validation, loading, success/error states
- Clear form on successful submission
- Add lessons learned documentation

Closes #92, #93, #94

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 15:00:04 -05:00
f58b2f70e2 chore(vscode): add workspace settings
Configure Python interpreter path for VSCode.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-01 15:00:04 -05:00
263b52d5e4 docs: sync documentation with codebase
Some checks failed
CI / lint-and-test (push) Has been cancelled
Fixes identified by doc-guardian audit:

Critical fixes:
- DATABASE_SCHEMA.md: Fix staging model name stg_police__crimes → stg_toronto__crime
- DATABASE_SCHEMA.md: Update mart model names to match actual dbt models
- CLAUDE.md: Fix errors/ description (no handlers module exists)
- scripts/etl/toronto.sh: Fix parser module references to actual modules

Stale fixes:
- CONTRIBUTING.md: Add make typecheck, test-cov; fix make ci description
- PROJECT_REFERENCE.md: Document services/, callback modules, all Makefile targets
- CLAUDE.md: Expand Makefile commands, add plugin documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 16:25:29 -05:00
f345d41535 fix: Seed multi-year housing data for rent trend charts
Some checks failed
CI / lint-and-test (push) Has been cancelled
The seed script now inserts housing data for years 2019-2024 to
support rent trend line visualizations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 23:44:08 -05:00
14701f334c fix: Complete seed script with all missing data + add statsmodels
Some checks failed
CI / lint-and-test (push) Has been cancelled
- Seed script now seeds: amenities, population, median_age, census
  housing columns, housing mart (rent/affordability), overview mart
  (safety_score, population)
- Add statsmodels dependency for scatter plot trendlines
- Add dbt/.user.yml to gitignore

All 15 notebooks now pass with valid data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 23:21:14 -05:00
92763a17c4 fix: Use os.environ[] instead of .get() for DATABASE_URL
Some checks failed
CI / lint-and-test (push) Has been cancelled
Fixes Pylance type error - create_engine() expects str, not str | None.
Using direct access raises KeyError if not set, which is correct behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 23:03:23 -05:00
546ee1cc92 fix: Include seed-data in load-data target
Some checks failed
CI / lint-and-test (push) Has been cancelled
Now `make load-data` automatically seeds development data (amenities,
median_age) after loading Toronto data. Renamed seed-amenities to
seed-data to reflect broader scope.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 22:52:31 -05:00
9cc2cf0e00 fix: Add median_age seeding to development data script
Some checks failed
CI / lint-and-test (push) Has been cancelled
Updates seed_amenity_data.py to also seed median_age values in
fact_census where missing, ensuring demographics notebooks work.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 22:49:57 -05:00
28f239e8cd fix: Update all notebooks to load .env for database credentials
Some checks failed
CI / lint-and-test (push) Has been cancelled
All 15 notebooks now use load_dotenv('../../.env') instead of
hardcoded fallback credentials.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 22:31:07 -05:00
c3de98c4a5 feat: Add seed_amenity_data script for notebook testing
Some checks failed
CI / lint-and-test (push) Has been cancelled
Adds script to populate sample amenity data when Toronto Open Data
API doesn't return neighbourhood IDs (requires spatial join).

Run with: make seed-amenities

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 21:02:57 -05:00
eee015efac fix: Load .env in amenity_radar notebook for database credentials
Some checks failed
CI / lint-and-test (push) Has been cancelled
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 20:43:43 -05:00
941305e71c fix: Update amenity_radar notebook to use correct radar API
Some checks failed
CI / lint-and-test (push) Has been cancelled
Use create_comparison_radar instead of create_radar_figure with
incorrect parameters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 20:39:20 -05:00
54665bac63 revert: Remove unauthorized branch workflow instructions
Some checks failed
CI / lint-and-test (push) Has been cancelled
Removes instructions that were added without user authorization:
- Step about deleting feature branches after merge
- CRITICAL warning about development branch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 20:36:09 -05:00
3eb32a4766 Merge feature/fix-notebook-schema into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 19:45:25 -05:00
69c4216cd5 fix: Update notebooks to use public_marts schema
dbt creates mart tables in public_marts schema, not public.
Updated all notebook SQL queries to use the correct schema.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 19:45:23 -05:00
6e00a17c05 Merge feature/add-dbt-deps into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 12:20:38 -05:00
8f3c5554f9 fix: Run dbt deps before dbt run to install packages
dbt requires packages specified in packages.yml to be installed
before running models. Added dbt deps step to the pipeline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:20:26 -05:00
5839eabf1e Merge feature/fix-dbt-venv-path into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 12:18:28 -05:00
ebe48304d7 fix: Use venv dbt and show full error output
- Use .venv/bin/dbt if available, fall back to system dbt
- Show both stdout and stderr on dbt failures for better debugging

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:18:26 -05:00
2fc2a1bdb5 Merge feature/fix-dotenv-path into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 12:15:00 -05:00
6872aa510b fix: Use explicit path for .env file loading
load_dotenv() was searching from cwd, which may not be the project root.
Now explicitly passes PROJECT_ROOT / ".env" path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:14:48 -05:00
9a1fc81f79 Merge feature/fix-dbt-env-vars into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 12:10:58 -05:00
cf6e874961 fix: Load .env file for dbt database credentials
dbt uses env_var() in profiles.yml to read POSTGRES_PASSWORD,
but subprocess.run() doesn't automatically load .env files.
Added python-dotenv to load credentials before dbt runs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:10:46 -05:00
451dc10a10 Merge feature/fix-dbt-profiles into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 12:07:10 -05:00
193b9289b9 fix: Configure dbt to use local profiles.yml
- Rename profiles.yml.example to profiles.yml (uses env vars, safe to commit)
- Add --profiles-dir flag to dbt commands in load_toronto_data.py
- Add --profiles-dir flag to dbt targets in Makefile

This fixes the "Path '~/.dbt' does not exist" error when running make load-data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 12:06:58 -05:00
7a16e6d121 Merge feature/fix-db-init-makefile into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 11:59:20 -05:00
ecc50e5d98 fix: Update db-init target to use Python script
The Makefile was looking for scripts/db/init.sh which doesn't exist.
Updated to call scripts/db/init_schema.py instead.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:59:19 -05:00
ae3742630e Merge feature/add-jupyter-dependency into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 11:25:20 -05:00
e70965b429 fix: Add jupyter and ipykernel to dev dependencies
Required to run the notebooks in notebooks/ directory.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:25:19 -05:00
25954f17bb Merge feature/add-pyproj-dependency into development
Some checks failed
CI / lint-and-test (push) Has been cancelled
2026-01-18 11:20:47 -05:00
bffd44a5a5 fix: Add pyproj as explicit dependency
pyproj is directly imported in portfolio_app/toronto/parsers/geo.py
but was only available as a transitive dependency of geopandas.
Adding it explicitly ensures reliable installation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 11:20:40 -05:00
bf6e392002 feat: Sprint 10 - Architecture docs, CI/CD, operational scripts
Some checks failed
CI / lint-and-test (push) Has been cancelled
Phase 1 - Architecture Documentation:
- Add Architecture section with Mermaid flowchart to README
- Create docs/DATABASE_SCHEMA.md with full ERD

Phase 2 - CI/CD:
- Add CI badge to README
- Create .gitea/workflows/ci.yml for linting and tests
- Create .gitea/workflows/deploy-staging.yml
- Create .gitea/workflows/deploy-production.yml

Phase 3 - Operational Scripts:
- Create scripts/logs.sh for docker compose log following
- Create scripts/run-detached.sh with health check loop
- Create scripts/etl/toronto.sh for Toronto data pipeline
- Add Makefile targets: logs, run-detached, etl-toronto

Phase 4 - Runbooks:
- Create docs/runbooks/adding-dashboard.md
- Create docs/runbooks/deployment.md

Phase 5 - Hygiene:
- Create MIT LICENSE file

Phase 6 - Production:
- Add live demo link to README (leodata.science)

Closes #78, #79, #80, #81, #82, #83, #84, #85, #86, #87, #88, #89, #91

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 17:10:30 -05:00
d0f32edba7 fix: Repair data pipeline with StatCan CMHC rental data
- Add StatCan CMHC parser to fetch rental data from Statistics Canada API
- Create year spine (2014-2025) as time dimension driver instead of census
- Add CMA-level rental and income intermediate models
- Update mart_neighbourhood_overview to use rental years as base
- Fix neighbourhood_service queries to match dbt schema
- Add CMHC data loading to pipeline script

Data now flows correctly: 158 neighbourhoods × 12 years = 1,896 records
Rent data available 2019-2025, crime data 2014-2024

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 15:38:31 -05:00
4818c53fd2 docs: Rewrite documentation with accurate project state
- Delete obsolete change proposals and bio content source
- Rewrite README.md with correct features, data sources, structure
- Update PROJECT_REFERENCE.md with accurate status and completed work
- Update CLAUDE.md references and sprint status
- Add docs/CONTRIBUTING.md developer guide with:
  - How to add blog posts (frontmatter, markdown)
  - How to add new pages (Dash routing)
  - How to add dashboard tabs
  - How to create figure factories
  - Branch workflow and code standards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 12:27:25 -05:00
1a878313f8 docs: Add Sprint 9 lessons learned
Captured two lessons from Sprint 9:
1. Gitea Labels API requires org context - workaround for user repos
2. Always read CLAUDE.md before asking questions about sprint context

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 12:13:35 -05:00
1eba95d4d1 docs: Complete Phase 6 notebooks and Phase 7 documentation review
Phase 6 - Jupyter Notebooks (15 total):
- Overview tab: livability_choropleth, top_bottom_10_bar, income_safety_scatter
- Housing tab: affordability_choropleth, rent_trend_line, tenure_breakdown_bar
- Safety tab: crime_rate_choropleth, crime_breakdown_bar, crime_trend_line
- Demographics tab: income_choropleth, age_distribution, population_density_bar
- Amenities tab: amenity_index_choropleth, amenity_radar, transit_accessibility_bar

Phase 7 - Documentation:
- Updated CLAUDE.md with Sprint 9 completion status
- Added notebooks directory to application structure
- Expanded figures directory listing

Closes #71, #72, #73, #74, #75, #76, #77

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 12:10:46 -05:00
c9cf744d84 feat: Complete Phase 5 dashboard implementation
Implement full 5-tab Toronto Neighbourhood Dashboard with real data
connectivity:

Dashboard Structure:
- Overview tab with livability scores and rankings
- Housing tab with affordability metrics
- Safety tab with crime statistics
- Demographics tab with population/income data
- Amenities tab with parks, schools, transit

Figure Factories (portfolio_app/figures/):
- bar_charts.py: ranking, stacked, horizontal bars
- scatter.py: scatter plots, bubble charts
- radar.py: spider/radar charts
- demographics.py: donut, age pyramid, income distribution

Service Layer (portfolio_app/toronto/services/):
- neighbourhood_service.py: queries dbt marts for all tab data
- geometry_service.py: generates GeoJSON from PostGIS
- Graceful error handling when database unavailable

Callbacks (portfolio_app/pages/toronto/callbacks/):
- map_callbacks.py: choropleth updates, map click handling
- chart_callbacks.py: supporting chart updates
- selection_callbacks.py: dropdown handlers, KPI updates

Data Pipeline (scripts/data/):
- load_toronto_data.py: orchestration script with CLI flags

Lessons Learned:
- Graceful error handling in service layers
- Modular callback structure for multi-tab dashboards
- Figure factory pattern for reusable charts

Closes: #64, #65, #66, #67, #68, #69, #70

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 11:46:18 -05:00
124 changed files with 11853 additions and 3528 deletions

35
.gitea/workflows/ci.yml Normal file
View File

@@ -0,0 +1,35 @@
name: CI
on:
push:
branches:
- development
- staging
- main
pull_request:
branches:
- development
jobs:
lint-and-test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install ruff pytest
- name: Run linter
run: ruff check .
- name: Run tests
run: pytest tests/ -v --tb=short

View File

@@ -0,0 +1,44 @@
name: Deploy to Production
on:
push:
branches:
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to Production Server
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.PROD_HOST }}
username: ${{ secrets.PROD_USER }}
key: ${{ secrets.PROD_SSH_KEY }}
script: |
set -euo pipefail
cd ~/apps/personal-portfolio
echo "Pulling latest changes..."
git fetch origin main
git reset --hard origin/main
echo "Activating virtual environment..."
source .venv/bin/activate
echo "Installing dependencies..."
pip install -r requirements.txt --quiet
echo "Running dbt models..."
cd dbt && dbt run --profiles-dir . && cd ..
echo "Restarting application..."
docker compose down
docker compose up -d
echo "Waiting for health check..."
sleep 10
curl -f http://localhost:8050/health || exit 1
echo "Production deployment complete!"

View File

@@ -0,0 +1,44 @@
name: Deploy to Staging
on:
push:
branches:
- staging
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy to Staging Server
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.STAGING_HOST }}
username: ${{ secrets.STAGING_USER }}
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
set -euo pipefail
cd ~/apps/personal-portfolio
echo "Pulling latest changes..."
git fetch origin staging
git reset --hard origin/staging
echo "Activating virtual environment..."
source .venv/bin/activate
echo "Installing dependencies..."
pip install -r requirements.txt --quiet
echo "Running dbt models..."
cd dbt && dbt run --profiles-dir . && cd ..
echo "Restarting application..."
docker compose down
docker compose up -d
echo "Waiting for health check..."
sleep 10
curl -f http://localhost:8050/health || exit 1
echo "Staging deployment complete!"

1
.gitignore vendored
View File

@@ -198,3 +198,4 @@ cython_debug/
# PyPI configuration file # PyPI configuration file
.pypirc .pypirc
dbt/.user.yml

3
.vscode/settings.json vendored Normal file
View File

@@ -0,0 +1,3 @@
{
"python.defaultInterpreterPath": "/home/leomiranda/WorkDev/personal/personal-portfolio/.venv/bin/python"
}

378
CLAUDE.md
View File

@@ -1,13 +1,56 @@
# CLAUDE.md # CLAUDE.md
## ⛔ MANDATORY BEHAVIOR RULES - READ FIRST
**These rules are NON-NEGOTIABLE. Violating them wastes the user's time and money.**
### 1. WHEN USER ASKS YOU TO CHECK SOMETHING - CHECK EVERYTHING
- Search ALL locations, not just where you think it is
- Check cache directories: `~/.claude/plugins/cache/`
- Check installed: `~/.claude/plugins/marketplaces/`
- Check source directories
- **NEVER say "no" or "that's not the issue" without exhaustive verification**
### 2. WHEN USER SAYS SOMETHING IS WRONG - BELIEVE THEM
- The user knows their system better than you
- Investigate thoroughly before disagreeing
- **Your confidence is often wrong. User's instincts are often right.**
### 3. NEVER SAY "DONE" WITHOUT VERIFICATION
- Run the actual command/script to verify
- Show the output to the user
- **"Done" means VERIFIED WORKING, not "I made changes"**
### 4. SHOW EXACTLY WHAT USER ASKS FOR
- If user asks for messages, show the MESSAGES
- If user asks for code, show the CODE
- **Do not interpret or summarize unless asked**
**FAILURE TO FOLLOW THESE RULES = WASTED USER TIME = UNACCEPTABLE**
---
## Mandatory Behavior Rules
**These rules are NON-NEGOTIABLE. Violating them wastes the user's time and money.**
1. **CHECK EVERYTHING** - Search ALL locations before saying "no" (cache, installed, source directories)
2. **BELIEVE THE USER** - Investigate thoroughly before disagreeing; user instincts are often right
3. **VERIFY BEFORE "DONE"** - Run commands, show output; "done" means verified working
4. **SHOW EXACTLY WHAT'S ASKED** - Do not interpret or summarize unless requested
---
Working context for Claude Code on the Analytics Portfolio project. Working context for Claude Code on the Analytics Portfolio project.
--- ---
## Project Status ## Project Status
**Current Sprint**: 9 (Neighbourhood Dashboard Transition) **Last Completed Sprint**: 9 (Neighbourhood Dashboard Transition)
**Phase**: Toronto Neighbourhood Dashboard **Current State**: Ready for deployment sprint or new features
**Branch**: `development` (feature branches merge here) **Branch**: `development` (feature branches merge here)
--- ---
@@ -17,15 +60,33 @@ Working context for Claude Code on the Analytics Portfolio project.
### Run Commands ### Run Commands
```bash ```bash
# Setup & Database
make setup # Install deps, create .env, init pre-commit make setup # Install deps, create .env, init pre-commit
make docker-up # Start PostgreSQL + PostGIS make docker-up # Start PostgreSQL + PostGIS (auto-detects x86/ARM)
make docker-down # Stop containers make docker-down # Stop containers
make db-init # Initialize database schema make db-init # Initialize database schema
make db-reset # Drop and recreate database (DESTRUCTIVE)
# Data Loading
make load-data # Load all project data (currently: Toronto)
make load-toronto # Load Toronto data from APIs
# Application
make run # Start Dash dev server make run # Start Dash dev server
# Testing & Quality
make test # Run pytest make test # Run pytest
make lint # Run ruff linter make lint # Run ruff linter
make format # Run ruff formatter make format # Run ruff formatter
make ci # Run all checks make typecheck # Run mypy type checker
make ci # Run all checks (lint, typecheck, test)
# dbt
make dbt-run # Run dbt models
make dbt-test # Run dbt tests
make dbt-docs # Generate and serve dbt documentation
# Run `make help` for full target list
``` ```
### Branch Workflow ### Branch Workflow
@@ -33,10 +94,7 @@ make ci # Run all checks
1. Create feature branch FROM `development`: `git checkout -b feature/{sprint}-{description}` 1. Create feature branch FROM `development`: `git checkout -b feature/{sprint}-{description}`
2. Work and commit on feature branch 2. Work and commit on feature branch
3. Merge INTO `development` when complete 3. Merge INTO `development` when complete
4. Delete the feature branch after merge (keep branches clean) 4. `development` -> `staging` -> `main` for releases
5. `development` -> `staging` -> `main` for releases
**CRITICAL: NEVER DELETE the `development` branch. It is the main integration branch.**
--- ---
@@ -52,112 +110,44 @@ make ci # Run all checks
### Module Responsibilities ### Module Responsibilities
| Directory | Contains | Purpose | | Directory | Purpose |
|-----------|----------|---------| |-----------|---------|
| `schemas/` | Pydantic models | Data validation | | `schemas/` | Pydantic models for data validation |
| `models/` | SQLAlchemy ORM | Database persistence | | `models/` | SQLAlchemy ORM for database persistence |
| `parsers/` | API/CSV extraction | Raw data ingestion | | `parsers/` | API/CSV extraction for raw data ingestion |
| `loaders/` | Database operations | Data loading | | `loaders/` | Database operations for data loading |
| `figures/` | Chart factories | Plotly figure generation | | `services/` | Query functions for dbt mart queries |
| `callbacks/` | Dash callbacks | In `pages/{dashboard}/callbacks/` | | `figures/` | Chart factories for Plotly figure generation |
| `errors/` | Exceptions + handlers | Error handling | | `errors/` | Custom exception classes (see `errors/exceptions.py`) |
### Type Hints
Use Python 3.10+ style:
```python
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
...
```
### Error Handling
```python
# errors/exceptions.py
class PortfolioError(Exception):
"""Base exception."""
class ParseError(PortfolioError):
"""PDF/CSV parsing failed."""
class ValidationError(PortfolioError):
"""Pydantic or business rule validation failed."""
class LoadError(PortfolioError):
"""Database load operation failed."""
```
### Code Standards ### Code Standards
- Python 3.10+ type hints: `list[str]`, `dict[str, int] | None`
- Single responsibility functions with verb naming - Single responsibility functions with verb naming
- Early returns over deep nesting - Early returns over deep nesting
- Google-style docstrings only for non-obvious behavior - Google-style docstrings only for non-obvious behavior
- Module-level constants for magic values
- Pydantic BaseSettings for runtime config
--- ---
## Application Structure ## Application Structure
``` **Entry Point:** `portfolio_app/app.py` (Dash app factory with Pages routing)
portfolio_app/
├── app.py # Dash app factory with Pages routing
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images (auto-served)
│ └── sidebar.css # Navigation styling
├── callbacks/ # Global callbacks
│ ├── sidebar.py # Sidebar toggle
│ └── theme.py # Dark/light theme
├── pages/
│ ├── home.py # Bio landing page -> /
│ ├── about.py # About page -> /about
│ ├── contact.py # Contact form -> /contact
│ ├── health.py # Health endpoint -> /health
│ ├── projects.py # Project showcase -> /projects
│ ├── resume.py # Resume/CV -> /resume
│ ├── blog/
│ │ ├── index.py # Blog listing -> /blog
│ │ └── article.py # Blog article -> /blog/{slug}
│ └── toronto/
│ ├── dashboard.py # Dashboard -> /toronto
│ ├── methodology.py # Methodology -> /toronto/methodology
│ └── callbacks/ # Dashboard interactions
├── components/ # Shared UI (sidebar, cards, controls)
│ ├── metric_card.py # KPI card component
│ ├── map_controls.py # Map control panel
│ ├── sidebar.py # Navigation sidebar
│ └── time_slider.py # Time range selector
├── figures/ # Shared chart factories
│ ├── choropleth.py # Map visualizations
│ ├── summary_cards.py # KPI figures
│ └── time_series.py # Trend charts
├── content/ # Markdown content
│ └── blog/ # Blog articles
├── toronto/ # Toronto data logic
│ ├── parsers/
│ ├── loaders/
│ ├── schemas/ # Pydantic
│ ├── models/ # SQLAlchemy
│ └── demo_data.py # Sample data
├── utils/ # Utilities
│ └── markdown_loader.py # Markdown processing
└── errors/
```
### URL Routing | Directory | Purpose |
|-----------|---------|
| `pages/` | Dash Pages (file-based routing) |
| `pages/toronto/` | Toronto Dashboard (`tabs/` for layouts, `callbacks/` for interactions) |
| `components/` | Shared UI components |
| `figures/toronto/` | Toronto chart factories |
| `toronto/` | Toronto data logic (parsers, loaders, schemas, models) |
| URL | Page | Sprint | **Key URLs:** `/` (home), `/toronto` (dashboard), `/blog` (listing), `/blog/{slug}` (articles), `/health` (status)
|-----|------|--------|
| `/` | Bio landing page | 2 | ### Multi-Dashboard Architecture
| `/about` | About page | 8 |
| `/contact` | Contact form | 8 | - **figures/**: Domain-namespaced (`figures/toronto/`, future: `figures/football/`)
| `/health` | Health endpoint | 8 | - **dbt models**: Domain subdirectories (`staging/toronto/`, `marts/toronto/`)
| `/projects` | Project showcase | 8 | - **Database schemas**: Domain-specific raw data (`raw_toronto`, future: `raw_football`)
| `/resume` | Resume/CV | 8 |
| `/blog` | Blog listing | 8 |
| `/blog/{slug}` | Blog article | 8 |
| `/toronto` | Toronto Dashboard | 6 |
| `/toronto/methodology` | Dashboard methodology | 6 |
--- ---
@@ -169,43 +159,31 @@ portfolio_app/
| Validation | Pydantic | >=2.0 | | Validation | Pydantic | >=2.0 |
| ORM | SQLAlchemy | >=2.0 (2.0-style API only) | | ORM | SQLAlchemy | >=2.0 (2.0-style API only) |
| Transformation | dbt-postgres | >=1.7 | | Transformation | dbt-postgres | >=1.7 |
| Data Processing | Pandas | >=2.1 | | Visualization | Dash + Plotly + dash-mantine-components | >=2.14 |
| Geospatial | GeoPandas + Shapely | >=0.14 | | Geospatial | GeoPandas + Shapely | >=0.14 |
| Visualization | Dash + Plotly | >=2.14 |
| UI Components | dash-mantine-components | Latest stable |
| Testing | pytest | >=7.0 |
| Python | 3.11+ | Via pyenv | | Python | 3.11+ | Via pyenv |
**Notes**: **Notes**: SQLAlchemy 2.0 + Pydantic 2.0 only. Docker Compose V2 format (no `version` field).
- SQLAlchemy 2.0 + Pydantic 2.0 only (never mix 1.x APIs)
- PostGIS extension required in database
- Docker Compose V2 format (no `version` field)
--- ---
## Data Model Overview ## Data Model Overview
### Geographic Reality (Toronto Housing) ### Database Schemas
``` | Schema | Purpose |
City Neighbourhoods (158) - Primary geographic unit for analysis |--------|---------|
CMHC Zones (~20) - Rental data (Census Tract aligned) | `public` | Shared dimensions (dim_time) |
``` | `raw_toronto` | Toronto-specific raw/dimension tables |
| `stg_toronto` | Toronto dbt staging views |
| `int_toronto` | Toronto dbt intermediate views |
| `mart_toronto` | Toronto dbt mart tables |
### Star Schema ### dbt Project: `portfolio`
| Table | Type | Keys |
|-------|------|------|
| `fact_rentals` | Fact | -> dim_time, dim_cmhc_zone |
| `dim_time` | Dimension | date_key (PK) |
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
| `dim_policy_event` | Dimension | event_id (PK) |
### dbt Layers
| Layer | Naming | Purpose | | Layer | Naming | Purpose |
|-------|--------|---------| |-------|--------|---------|
| Shared | `stg_dimensions__*` | Cross-domain dimensions |
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed | | Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
| Intermediate | `int_{domain}__{transform}` | Business logic | | Intermediate | `int_{domain}__{transform}` | Business logic |
| Marts | `mart_{domain}` | Final analytical tables | | Marts | `mart_{domain}` | Final analytical tables |
@@ -214,13 +192,12 @@ CMHC Zones (~20) - Rental data (Census Tract aligned)
## Deferred Features ## Deferred Features
**Stop and flag if a task seems to require these**: **Stop and flag if a task requires these**:
| Feature | Reason | | Feature | Reason |
|---------|--------| |---------|--------|
| Historical boundary reconciliation (140->158) | 2021+ data only for V1 | | Historical boundary reconciliation (140->158) | 2021+ data only for V1 |
| ML prediction models | Energy project scope (future phase) | | ML prediction models | Energy project scope (future phase) |
| Multi-project shared infrastructure | Build first, abstract second |
--- ---
@@ -240,92 +217,123 @@ LOG_LEVEL=INFO
--- ---
## Script Standards
All scripts in `scripts/`:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use `set -euo pipefail` for bash
- Log to stdout, errors to stderr
---
## Reference Documents ## Reference Documents
| Document | Location | Use When | | Document | Location | Use When |
|----------|----------|----------| |----------|----------|----------|
| Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions | | Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions |
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification | | Developer guide | `docs/CONTRIBUTING.md` | How to add pages, tabs |
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning | | Lessons learned | `docs/project-lessons-learned/INDEX.md` | Past issues and solutions |
| Deployment runbook | `docs/runbooks/deployment.md` | Deploying to environments |
--- ---
## Projman Plugin Workflow ## Plugin Reference
**CRITICAL: Always use the projman plugin for sprint and task management.** ### Sprint Management: projman
### When to Use Projman Skills **CRITICAL: Always use projman for sprint and task management.**
| Skill | Trigger | Purpose | | Skill | Trigger | Purpose |
|-------|---------|---------| |-------|---------|---------|
| `/projman:sprint-plan` | New sprint or phase implementation | Architecture analysis + Gitea issue creation | | `/projman:sprint-plan` | New sprint/feature | Architecture analysis + Gitea issue creation |
| `/projman:sprint-start` | Beginning implementation work | Load lessons learned (Wiki.js or local), start execution | | `/projman:sprint-start` | Begin implementation | Load lessons learned, start execution |
| `/projman:sprint-status` | Check progress | Review blockers and completion status | | `/projman:sprint-status` | Check progress | Review blockers and completion |
| `/projman:sprint-close` | Sprint completion | Capture lessons learned (Wiki.js or local backup) | | `/projman:sprint-close` | Sprint completion | Capture lessons learned |
### Default Behavior **Default workflow**: `/projman:sprint-plan` before code -> create issues -> `/projman:sprint-start` -> track via Gitea -> `/projman:sprint-close`
When user requests implementation work: **Gitea**: `personal-projects/personal-portfolio` at `gitea.hotserv.cloud`
1. **ALWAYS start with `/projman:sprint-plan`** before writing code ### Data Platform: data-platform
2. Create Gitea issues with proper labels and acceptance criteria
3. Use `/projman:sprint-start` to begin execution with lessons learned
4. Track progress via Gitea issue comments
5. Close sprint with `/projman:sprint-close` to document lessons
### Gitea Repository Use for dbt, PostgreSQL, and PostGIS operations.
- **Repo**: `lmiranda/personal-portfolio` | Skill | Purpose |
- **Host**: `gitea.hotserv.cloud` |-------|---------|
- **Note**: `lmiranda` is a user account (not org), so label lookup may require repo-level labels | `/data-platform:data-review` | Audit data integrity, schema validity, dbt compliance |
| `/data-platform:data-gate` | CI/CD data quality gate (pass/fail) |
### MCP Tools Available **When to use:** Schema changes, dbt model development, data loading, before merging data PRs.
**Gitea**: **MCP tools available:** `pg_connect`, `pg_query`, `pg_tables`, `pg_columns`, `pg_schemas`, `st_*` (PostGIS), `dbt_*` operations.
- `list_issues`, `get_issue`, `create_issue`, `update_issue`, `add_comment`
- `get_labels`, `suggest_labels`
**Wiki.js**: ### Visualization: viz-platform
- `search_lessons`, `create_lesson`, `search_pages`, `get_page`
### Lessons Learned (Backup Method) Use for Dash/Mantine component validation and chart creation.
**When Wiki.js is unavailable**, use the local backup in `docs/project-lessons-learned/`: | Skill | Purpose |
|-------|---------|
| `/viz-platform:component` | Inspect DMC component props and validation |
| `/viz-platform:chart` | Create themed Plotly charts |
| `/viz-platform:theme` | Apply/validate themes |
| `/viz-platform:dashboard` | Create dashboard layouts |
**At Sprint Start:** **When to use:** Dashboard development, new visualizations, component prop lookup.
1. Review `docs/project-lessons-learned/INDEX.md` for relevant past lessons
2. Search lesson files by tags/keywords before implementation
3. Apply prevention strategies from applicable lessons
**At Sprint Close:** ### Code Quality: code-sentinel
1. Try Wiki.js `create_lesson` first
2. If Wiki.js fails, create lesson in `docs/project-lessons-learned/`
3. Use naming convention: `{phase-or-sprint}-{short-description}.md`
4. Update `INDEX.md` with new entry
5. Follow the lesson template in INDEX.md
**Migration:** Once Wiki.js is configured, lessons will be migrated there for better searchability. Use for security scanning and refactoring analysis.
### Issue Structure | Skill | Purpose |
|-------|---------|
| `/code-sentinel:security-scan` | Full security audit of codebase |
| `/code-sentinel:refactor` | Apply refactoring patterns |
| `/code-sentinel:refactor-dry` | Preview refactoring without applying |
Every Gitea issue should include: **When to use:** Before major releases, after adding auth/data handling code, periodic audits.
- **Overview**: Brief description
- **Files to Create/Modify**: Explicit paths ### Documentation: doc-guardian
- **Acceptance Criteria**: Checkboxes
- **Technical Notes**: Implementation hints Use for documentation drift detection and synchronization.
- **Labels**: Listed in body (workaround for label API issues)
| Skill | Purpose |
|-------|---------|
| `/doc-guardian:doc-audit` | Scan project for documentation drift |
| `/doc-guardian:doc-sync` | Synchronize pending documentation updates |
**When to use:** After significant code changes, before releases.
### Pull Requests: pr-review
Use for comprehensive PR review with multiple analysis perspectives.
| Skill | Purpose |
|-------|---------|
| `/pr-review:initial-setup` | Configure PR review for project |
| Triggered automatically | Security, performance, maintainability, test analysis |
**When to use:** Before merging significant PRs to `development` or `main`.
### Requirement Clarification: clarity-assist
Use when requirements are ambiguous or need decomposition.
**When to use:** Unclear specifications, complex feature requests, conflicting requirements.
### Contract Validation: contract-validator
Use for plugin interface validation.
| Skill | Purpose |
|-------|---------|
| `/contract-validator:agent-check` | Quick agent definition validation |
| `/contract-validator:full-validation` | Full plugin contract validation |
**When to use:** When modifying plugin integrations or agent definitions.
### Git Workflow: git-flow
Use for standardized git operations.
| Skill | Purpose |
|-------|---------|
| `/git-flow:commit` | Auto-generated conventional commit |
| `/git-flow:branch-start` | Create feature/fix/chore branch |
| `/git-flow:git-status` | Comprehensive status with recommendations |
**When to use:** Complex merge scenarios, branch management, standardized commits.
--- ---
*Last Updated: Sprint 9* *Last Updated: February 2026*

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2024-2025 Leo Miranda
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,13 +1,25 @@
.PHONY: setup docker-up docker-down db-init run test dbt-run dbt-test lint format ci deploy clean help .PHONY: setup docker-up docker-down db-init load-data load-all load-toronto load-toronto-only seed-data run test dbt-run dbt-test lint format ci deploy clean help logs run-detached etl-toronto
# Default target # Default target
.DEFAULT_GOAL := help .DEFAULT_GOAL := help
# Environment # Environment
PYTHON := python3 VENV := .venv
PIP := pip PYTHON := $(VENV)/bin/python3
PIP := $(VENV)/bin/pip
DOCKER_COMPOSE := docker compose DOCKER_COMPOSE := docker compose
# Architecture detection for Docker images
ARCH := $(shell uname -m)
ifeq ($(ARCH),aarch64)
POSTGIS_IMAGE := imresamu/postgis:16-3.4
else ifeq ($(ARCH),arm64)
POSTGIS_IMAGE := imresamu/postgis:16-3.4
else
POSTGIS_IMAGE := postgis/postgis:16-3.4
endif
export POSTGIS_IMAGE
# Colors for output # Colors for output
BLUE := \033[0;34m BLUE := \033[0;34m
GREEN := \033[0;32m GREEN := \033[0;32m
@@ -39,6 +51,7 @@ setup: ## Install dependencies, create .env, init pre-commit
docker-up: ## Start PostgreSQL + PostGIS containers docker-up: ## Start PostgreSQL + PostGIS containers
@echo "$(GREEN)Starting database containers...$(NC)" @echo "$(GREEN)Starting database containers...$(NC)"
@echo "$(BLUE)Architecture: $(ARCH) -> Using image: $(POSTGIS_IMAGE)$(NC)"
$(DOCKER_COMPOSE) up -d $(DOCKER_COMPOSE) up -d
@echo "$(GREEN)Waiting for database to be ready...$(NC)" @echo "$(GREEN)Waiting for database to be ready...$(NC)"
@sleep 3 @sleep 3
@@ -57,11 +70,7 @@ docker-logs: ## View container logs
db-init: ## Initialize database schema db-init: ## Initialize database schema
@echo "$(GREEN)Initializing database schema...$(NC)" @echo "$(GREEN)Initializing database schema...$(NC)"
@if [ -f scripts/db/init.sh ]; then \ $(PYTHON) scripts/db/init_schema.py
bash scripts/db/init.sh; \
else \
echo "$(YELLOW)scripts/db/init.sh not found - skipping$(NC)"; \
fi
db-reset: ## Drop and recreate database (DESTRUCTIVE) db-reset: ## Drop and recreate database (DESTRUCTIVE)
@echo "$(YELLOW)WARNING: This will delete all data!$(NC)" @echo "$(YELLOW)WARNING: This will delete all data!$(NC)"
@@ -71,6 +80,27 @@ db-reset: ## Drop and recreate database (DESTRUCTIVE)
@sleep 3 @sleep 3
$(MAKE) db-init $(MAKE) db-init
# Domain-specific data loading
load-toronto: ## Load Toronto data from APIs
@echo "$(GREEN)Loading Toronto neighbourhood data...$(NC)"
$(PYTHON) scripts/data/load_toronto_data.py
@echo "$(GREEN)Seeding Toronto development data...$(NC)"
$(PYTHON) scripts/data/seed_amenity_data.py
load-toronto-only: ## Load Toronto data without running dbt or seeding
@echo "$(GREEN)Loading Toronto data (skip dbt)...$(NC)"
$(PYTHON) scripts/data/load_toronto_data.py --skip-dbt
# Aggregate data loading
load-data: load-toronto ## Load all project data (currently: Toronto)
@echo "$(GREEN)All data loaded!$(NC)"
load-all: load-data ## Alias for load-data
seed-data: ## Seed sample development data (amenities, median_age)
@echo "$(GREEN)Seeding development data...$(NC)"
$(PYTHON) scripts/data/seed_amenity_data.py
# ============================================================================= # =============================================================================
# Application # Application
# ============================================================================= # =============================================================================
@@ -97,15 +127,15 @@ test-cov: ## Run pytest with coverage
dbt-run: ## Run dbt models dbt-run: ## Run dbt models
@echo "$(GREEN)Running dbt models...$(NC)" @echo "$(GREEN)Running dbt models...$(NC)"
cd dbt && dbt run @set -a && . ./.env && set +a && cd dbt && dbt run --profiles-dir .
dbt-test: ## Run dbt tests dbt-test: ## Run dbt tests
@echo "$(GREEN)Running dbt tests...$(NC)" @echo "$(GREEN)Running dbt tests...$(NC)"
cd dbt && dbt test @set -a && . ./.env && set +a && cd dbt && dbt test --profiles-dir .
dbt-docs: ## Generate dbt documentation dbt-docs: ## Generate dbt documentation
@echo "$(GREEN)Generating dbt docs...$(NC)" @echo "$(GREEN)Generating dbt docs...$(NC)"
cd dbt && dbt docs generate && dbt docs serve @set -a && . ./.env && set +a && cd dbt && dbt docs generate --profiles-dir . && dbt docs serve --profiles-dir .
# ============================================================================= # =============================================================================
# Code Quality # Code Quality
@@ -131,6 +161,19 @@ ci: ## Run all checks (lint, typecheck, test)
$(MAKE) test $(MAKE) test
@echo "$(GREEN)All checks passed!$(NC)" @echo "$(GREEN)All checks passed!$(NC)"
# =============================================================================
# Operations
# =============================================================================
logs: ## Follow docker compose logs (usage: make logs or make logs SERVICE=postgres)
@./scripts/logs.sh $(SERVICE)
run-detached: ## Start containers and wait for health check
@./scripts/run-detached.sh
etl-toronto: ## Run Toronto ETL pipeline (usage: make etl-toronto MODE=--full)
@./scripts/etl/toronto.sh $(MODE)
# ============================================================================= # =============================================================================
# Deployment # Deployment
# ============================================================================= # =============================================================================

186
README.md
View File

@@ -1,36 +1,82 @@
# Analytics Portfolio # Analytics Portfolio
A data analytics portfolio showcasing end-to-end data engineering, visualization, and analysis capabilities. [![CI](https://gitea.hotserv.cloud/lmiranda/personal-portfolio/actions/workflows/ci.yml/badge.svg)](https://gitea.hotserv.cloud/lmiranda/personal-portfolio/actions)
## Projects **Live Demo:** [leodata.science](https://leodata.science)
### Toronto Housing Dashboard A personal portfolio website showcasing data engineering and visualization capabilities, featuring an interactive Toronto Neighbourhood Dashboard.
An interactive choropleth dashboard analyzing Toronto's housing market using multi-source data integration. ## Live Pages
**Features:** | Route | Page | Description |
- Purchase market analysis from TRREB monthly reports |-------|------|-------------|
- Rental market analysis from CMHC annual surveys | `/` | Home | Bio landing page |
- Interactive choropleth maps by district/zone | `/about` | About | Background and experience |
- Time series visualization with policy event annotations | `/projects` | Projects | Portfolio project showcase |
- Purchase/Rental mode toggle | `/resume` | Resume | Professional CV |
| `/contact` | Contact | Contact form |
| `/blog` | Blog | Technical articles |
| `/blog/{slug}` | Article | Individual blog posts |
| `/toronto` | Toronto Dashboard | Neighbourhood analysis (5 tabs) |
| `/toronto/methodology` | Methodology | Dashboard data sources and methods |
| `/health` | Health | API health check endpoint |
**Data Sources:** ## Toronto Neighbourhood Dashboard
- [TRREB Market Watch](https://trreb.ca/market-data/market-watch/) - Monthly purchase statistics
- [CMHC Rental Market Survey](https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market) - Annual rental data
**Tech Stack:** An interactive choropleth dashboard analyzing Toronto's 158 official neighbourhoods across five dimensions:
- Python 3.11+ / Dash / Plotly
- PostgreSQL + PostGIS - **Overview**: Composite livability scores, income vs safety scatter
- dbt for data transformation - **Housing**: Affordability index, rent trends, dwelling types
- Pydantic for validation - **Safety**: Crime rates, breakdowns by type, trend analysis
- SQLAlchemy 2.0 - **Demographics**: Income distribution, age pyramids, population density
- **Amenities**: Parks, schools, transit accessibility
**Data Sources**:
- City of Toronto Open Data Portal (neighbourhoods, census profiles, amenities)
- Toronto Police Service (crime statistics)
- CMHC Rental Market Survey (rental data by zone)
## Architecture
```mermaid
flowchart LR
subgraph Sources
A1[City of Toronto API]
A2[Toronto Police API]
A3[CMHC Data]
end
subgraph ETL
B1[Parsers]
B2[Loaders]
end
subgraph Database
C1[(PostgreSQL/PostGIS)]
C2[dbt Models]
end
subgraph Application
D1[Dash App]
D2[Plotly Figures]
end
A1 & A2 & A3 --> B1 --> B2 --> C1 --> C2 --> D1 --> D2
```
**Pipeline Stages:**
- **Sources**: External APIs and data files (City of Toronto, Toronto Police, CMHC)
- **ETL**: Python parsers extract and validate data; loaders persist to database
- **Database**: PostgreSQL with PostGIS for geospatial; dbt transforms raw → staging → marts
- **Application**: Dash serves interactive dashboards with Plotly visualizations
For detailed database schema, see [docs/DATABASE_SCHEMA.md](docs/DATABASE_SCHEMA.md).
## Quick Start ## Quick Start
```bash ```bash
# Clone and setup # Clone and setup
git clone https://github.com/lmiranda/personal-portfolio.git git clone https://gitea.hotserv.cloud/lmiranda/personal-portfolio.git
cd personal-portfolio cd personal-portfolio
# Install dependencies and configure environment # Install dependencies and configure environment
@@ -55,48 +101,75 @@ portfolio_app/
├── app.py # Dash app factory ├── app.py # Dash app factory
├── config.py # Pydantic settings ├── config.py # Pydantic settings
├── pages/ ├── pages/
│ ├── home.py # Bio landing page (/) │ ├── home.py # Bio landing (/)
── toronto/ # Toronto dashboard (/toronto) ── about.py # About page
│ ├── contact.py # Contact form
│ ├── projects.py # Project showcase
│ ├── resume.py # Resume/CV
│ ├── blog/ # Blog system
│ │ ├── index.py # Article listing
│ │ └── article.py # Article renderer
│ └── toronto/ # Toronto dashboard
│ ├── dashboard.py # Main layout with tabs
│ ├── methodology.py # Data documentation
│ ├── tabs/ # Tab layouts (5)
│ └── callbacks/ # Interaction logic
├── components/ # Shared UI components ├── components/ # Shared UI components
├── figures/ # Plotly figure factories ├── figures/
└── toronto/ # Toronto data logic └── toronto/ # Toronto figure factories
├── parsers/ # PDF/CSV extraction ├── content/
── loaders/ # Database operations ── blog/ # Markdown blog articles
├── schemas/ # Pydantic models ├── toronto/ # Toronto data logic
── models/ # SQLAlchemy ORM ── parsers/ # API data extraction
│ ├── loaders/ # Database operations
│ ├── schemas/ # Pydantic models
│ └── models/ # SQLAlchemy ORM (raw_toronto schema)
└── errors/ # Exception handling
dbt/ dbt/ # dbt project: portfolio
├── models/ ├── models/
│ ├── staging/ # 1:1 source tables │ ├── shared/ # Cross-domain dimensions
│ ├── intermediate/ # Business logic │ ├── staging/toronto/ # Toronto staging models
── marts/ # Analytical tables ── intermediate/toronto/ # Toronto intermediate models
│ └── marts/toronto/ # Toronto analytical tables
notebooks/
└── toronto/ # Toronto documentation (15 notebooks)
├── overview/ # Overview tab visualizations
├── housing/ # Housing tab visualizations
├── safety/ # Safety tab visualizations
├── demographics/ # Demographics tab visualizations
└── amenities/ # Amenities tab visualizations
docs/
├── PROJECT_REFERENCE.md # Architecture reference
├── CONTRIBUTING.md # Developer guide
└── project-lessons-learned/
``` ```
## Tech Stack
| Layer | Technology |
|-------|------------|
| Database | PostgreSQL 16 + PostGIS |
| Validation | Pydantic 2.x |
| ORM | SQLAlchemy 2.x |
| Transformation | dbt-postgres |
| Data Processing | Pandas, GeoPandas |
| Visualization | Dash + Plotly |
| UI Components | dash-mantine-components |
| Testing | pytest |
| Python | 3.11+ |
## Development ## Development
```bash ```bash
make test # Run tests make test # Run pytest
make lint # Run linter make lint # Run ruff linter
make format # Format code make format # Format code
make ci # Run all checks make ci # Run all checks
``` make dbt-run # Run dbt models
make dbt-test # Run dbt tests
## Data Pipeline
```
Raw Files (PDF/Excel)
Parsers (pdfplumber, pandas)
Pydantic Validation
SQLAlchemy Loaders
PostgreSQL + PostGIS
dbt Transformations
Dash Visualization
``` ```
## Environment Variables ## Environment Variables
@@ -109,12 +182,19 @@ POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure> POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio POSTGRES_DB=portfolio
DASH_DEBUG=true DASH_DEBUG=true
SECRET_KEY=<random>
``` ```
## Documentation
- **For developers**: See `docs/CONTRIBUTING.md` for setup and contribution guidelines
- **For Claude Code**: See `CLAUDE.md` for AI assistant context
- **Architecture**: See `docs/PROJECT_REFERENCE.md` for technical details
## License ## License
MIT MIT
## Author ## Author
Leo Miranda - [GitHub](https://github.com/lmiranda) | [LinkedIn](https://linkedin.com/in/yourprofile) Leo Miranda

View File

@@ -1,8 +1,7 @@
name: 'toronto_housing' name: 'portfolio'
version: '1.0.0'
config-version: 2 config-version: 2
profile: 'toronto_housing' profile: 'portfolio'
model-paths: ["models"] model-paths: ["models"]
analysis-paths: ["analyses"] analysis-paths: ["analyses"]
@@ -16,13 +15,19 @@ clean-targets:
- "dbt_packages" - "dbt_packages"
models: models:
toronto_housing: portfolio:
shared:
+materialized: view
+schema: shared
staging: staging:
+materialized: view toronto:
+schema: staging +materialized: view
+schema: stg_toronto
intermediate: intermediate:
+materialized: view toronto:
+schema: intermediate +materialized: view
+schema: int_toronto
marts: marts:
+materialized: table toronto:
+schema: marts +materialized: table
+schema: mart_toronto

View File

@@ -0,0 +1,11 @@
-- Override dbt default schema name generation.
-- Use the custom schema name directly instead of
-- concatenating with the target schema.
-- See: https://docs.getdbt.com/docs/build/custom-schemas
{% macro generate_schema_name(custom_schema_name, node) %}
{%- if custom_schema_name is none -%}
{{ target.schema }}
{%- else -%}
{{ custom_schema_name | trim }}
{%- endif -%}
{% endmacro %}

View File

View File

@@ -5,11 +5,11 @@ models:
description: "Rental data enriched with time and zone dimensions" description: "Rental data enriched with time and zone dimensions"
columns: columns:
- name: rental_id - name: rental_id
tests: data_tests:
- unique - unique
- not_null - not_null
- name: zone_code - name: zone_code
tests: data_tests:
- not_null - not_null
- name: int_neighbourhood__demographics - name: int_neighbourhood__demographics
@@ -17,11 +17,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: census_year - name: census_year
description: "Census year" description: "Census year"
tests: data_tests:
- not_null - not_null
- name: income_quintile - name: income_quintile
description: "Income quintile (1-5, city-wide)" description: "Income quintile (1-5, city-wide)"
@@ -31,7 +31,7 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: year - name: year
description: "Reference year" description: "Reference year"
@@ -45,11 +45,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: year - name: year
description: "Statistics year" description: "Statistics year"
tests: data_tests:
- not_null - not_null
- name: crime_rate_per_100k - name: crime_rate_per_100k
description: "Total crime rate per 100K population" description: "Total crime rate per 100K population"
@@ -61,7 +61,7 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: year - name: year
description: "Reference year" description: "Reference year"
@@ -75,11 +75,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: year - name: year
description: "Survey year" description: "Survey year"
tests: data_tests:
- not_null - not_null
- name: avg_rent_2bed - name: avg_rent_2bed
description: "Weighted average 2-bedroom rent" description: "Weighted average 2-bedroom rent"

View File

@@ -0,0 +1,60 @@
-- Intermediate: Toronto CMA census statistics by year
-- Provides city-wide averages for metrics not available at neighbourhood level
-- Used when neighbourhood-level data is unavailable (e.g., median household income)
-- Grain: One row per year
with years as (
select * from {{ ref('int_year_spine') }}
),
census as (
select * from {{ ref('stg_toronto__census') }}
),
-- Census data is only available for 2016 and 2021
-- Map each analysis year to the appropriate census year
year_to_census as (
select
y.year,
case
when y.year <= 2018 then 2016
else 2021
end as census_year
from years y
),
-- Toronto CMA median household income from Statistics Canada
-- Source: Census Profile Table 98-316-X2021001
-- 2016: $65,829 (from Census Profile)
-- 2021: $84,000 (from Census Profile)
cma_income as (
select 2016 as census_year, 65829 as median_household_income union all
select 2021 as census_year, 84000 as median_household_income
),
-- City-wide aggregates from loaded neighbourhood data
city_aggregates as (
select
census_year,
sum(population) as total_population,
avg(population_density) as avg_population_density,
avg(unemployment_rate) as avg_unemployment_rate
from census
where population is not null
group by census_year
),
final as (
select
y.year,
y.census_year,
ci.median_household_income,
ca.total_population,
ca.avg_population_density,
ca.avg_unemployment_rate
from year_to_census y
left join cma_income ci on y.census_year = ci.census_year
left join city_aggregates ca on y.census_year = ca.census_year
)
select * from final

View File

@@ -34,7 +34,7 @@ amenity_scores as (
n.population, n.population,
n.land_area_sqkm, n.land_area_sqkm,
a.year, coalesce(a.year, 2021) as year,
-- Raw counts -- Raw counts
a.parks_count, a.parks_count,

View File

@@ -16,12 +16,12 @@ crime_by_year as (
neighbourhood_id, neighbourhood_id,
crime_year as year, crime_year as year,
sum(incident_count) as total_incidents, sum(incident_count) as total_incidents,
sum(case when crime_type = 'Assault' then incident_count else 0 end) as assault_count, sum(case when crime_type = 'assault' then incident_count else 0 end) as assault_count,
sum(case when crime_type = 'Auto Theft' then incident_count else 0 end) as auto_theft_count, sum(case when crime_type = 'auto_theft' then incident_count else 0 end) as auto_theft_count,
sum(case when crime_type = 'Break and Enter' then incident_count else 0 end) as break_enter_count, sum(case when crime_type = 'break_and_enter' then incident_count else 0 end) as break_enter_count,
sum(case when crime_type = 'Robbery' then incident_count else 0 end) as robbery_count, sum(case when crime_type = 'robbery' then incident_count else 0 end) as robbery_count,
sum(case when crime_type = 'Theft Over' then incident_count else 0 end) as theft_over_count, sum(case when crime_type = 'theft_over' then incident_count else 0 end) as theft_over_count,
sum(case when crime_type = 'Homicide' then incident_count else 0 end) as homicide_count, sum(case when crime_type = 'homicide' then incident_count else 0 end) as homicide_count,
avg(rate_per_100k) as avg_rate_per_100k avg(rate_per_100k) as avg_rate_per_100k
from crime from crime
group by neighbourhood_id, crime_year group by neighbourhood_id, crime_year
@@ -64,15 +64,17 @@ crime_summary as (
w.robbery_count, w.robbery_count,
w.theft_over_count, w.theft_over_count,
w.homicide_count, w.homicide_count,
w.avg_rate_per_100k,
w.yoy_change_pct, w.yoy_change_pct,
-- Crime rate per 100K population -- Crime rate per 100K population (use source data avg, or calculate if population available)
case coalesce(
when n.population > 0 w.avg_rate_per_100k,
then round(w.total_incidents::numeric / n.population * 100000, 2) case
else null when n.population > 0
end as crime_rate_per_100k then round(w.total_incidents::numeric / n.population * 100000, 2)
else null
end
) as crime_rate_per_100k
from neighbourhoods n from neighbourhoods n
inner join with_yoy w on n.neighbourhood_id = w.neighbourhood_id inner join with_yoy w on n.neighbourhood_id = w.neighbourhood_id

View File

@@ -17,7 +17,8 @@ demographics as (
n.geometry, n.geometry,
n.land_area_sqkm, n.land_area_sqkm,
c.census_year, -- Use census_year from census data, or fall back to dim_neighbourhood's year
coalesce(c.census_year, n.census_year, 2021) as census_year,
c.population, c.population,
c.population_density, c.population_density,
c.median_household_income, c.median_household_income,

View File

@@ -20,7 +20,7 @@ housing as (
n.neighbourhood_name, n.neighbourhood_name,
n.geometry, n.geometry,
coalesce(r.year, c.census_year) as year, coalesce(r.year, c.census_year, 2021) as year,
-- Census housing metrics -- Census housing metrics
c.pct_owner_occupied, c.pct_owner_occupied,

View File

@@ -42,10 +42,10 @@ pivoted as (
select select
neighbourhood_id, neighbourhood_id,
year, year,
max(case when bedroom_type = 'Two Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_2bed, max(case when bedroom_type = '2bed' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_2bed,
max(case when bedroom_type = 'One Bedroom' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_1bed, max(case when bedroom_type = '1bed' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_1bed,
max(case when bedroom_type = 'Bachelor' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_bachelor, max(case when bedroom_type = 'bachelor' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_bachelor,
max(case when bedroom_type = 'Three Bedroom +' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_3bed, max(case when bedroom_type = '3bed' then weighted_avg_rent / nullif(total_weight, 0) end) as avg_rent_3bed,
avg(vacancy_rate) as vacancy_rate, avg(vacancy_rate) as vacancy_rate,
sum(rental_units_estimate) as total_rental_units sum(rental_units_estimate) as total_rental_units
from allocated from allocated

View File

@@ -0,0 +1,25 @@
-- Intermediate: Toronto CMA rental metrics by year
-- Aggregates rental data to city-wide averages by year
-- Source: StatCan CMHC data at CMA level
-- Grain: One row per year
with rentals as (
select * from {{ ref('stg_cmhc__rentals') }}
),
-- Pivot bedroom types to columns
yearly_rentals as (
select
year,
max(case when bedroom_type = 'bachelor' then avg_rent end) as avg_rent_bachelor,
max(case when bedroom_type = '1bed' then avg_rent end) as avg_rent_1bed,
max(case when bedroom_type = '2bed' then avg_rent end) as avg_rent_2bed,
max(case when bedroom_type = '3bed' then avg_rent end) as avg_rent_3bed,
-- Use 2-bedroom as standard reference
max(case when bedroom_type = '2bed' then avg_rent end) as avg_rent_standard,
max(vacancy_rate) as vacancy_rate
from rentals
group by year
)
select * from yearly_rentals

View File

@@ -0,0 +1,11 @@
-- Intermediate: Year spine for analysis
-- Creates a row for each year from 2014-2025
-- Used to drive time-series analysis across all data sources
with years as (
-- Generate years from available data sources
-- Crime data: 2014-2024, Rentals: 2019-2025
select generate_series(2014, 2025) as year
)
select year from years

View File

@@ -1,110 +0,0 @@
-- Mart: Neighbourhood Overview with Composite Livability Score
-- Dashboard Tab: Overview
-- Grain: One row per neighbourhood per year
with demographics as (
select * from {{ ref('int_neighbourhood__demographics') }}
),
housing as (
select * from {{ ref('int_neighbourhood__housing') }}
),
crime as (
select * from {{ ref('int_neighbourhood__crime_summary') }}
),
amenities as (
select * from {{ ref('int_neighbourhood__amenity_scores') }}
),
-- Compute percentile ranks for scoring components
percentiles as (
select
d.neighbourhood_id,
d.neighbourhood_name,
d.geometry,
d.census_year as year,
d.population,
d.median_household_income,
-- Safety score: inverse of crime rate (higher = safer)
case
when c.crime_rate_per_100k is not null
then 100 - percent_rank() over (
partition by d.census_year
order by c.crime_rate_per_100k
) * 100
else null
end as safety_score,
-- Affordability score: inverse of rent-to-income ratio
case
when h.rent_to_income_pct is not null
then 100 - percent_rank() over (
partition by d.census_year
order by h.rent_to_income_pct
) * 100
else null
end as affordability_score,
-- Amenity score: based on amenities per capita
case
when a.total_amenities_per_1000 is not null
then percent_rank() over (
partition by d.census_year
order by a.total_amenities_per_1000
) * 100
else null
end as amenity_score,
-- Raw metrics for reference
c.crime_rate_per_100k,
h.rent_to_income_pct,
h.avg_rent_2bed,
a.total_amenities_per_1000
from demographics d
left join housing h
on d.neighbourhood_id = h.neighbourhood_id
and d.census_year = h.year
left join crime c
on d.neighbourhood_id = c.neighbourhood_id
and d.census_year = c.year
left join amenities a
on d.neighbourhood_id = a.neighbourhood_id
and d.census_year = a.year
),
final as (
select
neighbourhood_id,
neighbourhood_name,
geometry,
year,
population,
median_household_income,
-- Component scores (0-100)
round(safety_score::numeric, 1) as safety_score,
round(affordability_score::numeric, 1) as affordability_score,
round(amenity_score::numeric, 1) as amenity_score,
-- Composite livability score: safety (30%), affordability (40%), amenities (30%)
round(
(coalesce(safety_score, 50) * 0.30 +
coalesce(affordability_score, 50) * 0.40 +
coalesce(amenity_score, 50) * 0.30)::numeric,
1
) as livability_score,
-- Raw metrics
crime_rate_per_100k,
rent_to_income_pct,
avg_rent_2bed,
total_amenities_per_1000
from percentiles
)
select * from final

View File

@@ -6,7 +6,7 @@ models:
columns: columns:
- name: rental_id - name: rental_id
description: "Unique rental record identifier" description: "Unique rental record identifier"
tests: data_tests:
- unique - unique
- not_null - not_null
@@ -17,11 +17,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: neighbourhood_name - name: neighbourhood_name
description: "Official neighbourhood name" description: "Official neighbourhood name"
tests: data_tests:
- not_null - not_null
- name: geometry - name: geometry
description: "PostGIS geometry for mapping" description: "PostGIS geometry for mapping"
@@ -41,11 +41,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: neighbourhood_name - name: neighbourhood_name
description: "Official neighbourhood name" description: "Official neighbourhood name"
tests: data_tests:
- not_null - not_null
- name: geometry - name: geometry
description: "PostGIS geometry for mapping" description: "PostGIS geometry for mapping"
@@ -63,11 +63,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: neighbourhood_name - name: neighbourhood_name
description: "Official neighbourhood name" description: "Official neighbourhood name"
tests: data_tests:
- not_null - not_null
- name: geometry - name: geometry
description: "PostGIS geometry for mapping" description: "PostGIS geometry for mapping"
@@ -77,7 +77,7 @@ models:
description: "100 = city average crime rate" description: "100 = city average crime rate"
- name: safety_tier - name: safety_tier
description: "Safety tier (1=safest, 5=highest crime)" description: "Safety tier (1=safest, 5=highest crime)"
tests: data_tests:
- accepted_values: - accepted_values:
arguments: arguments:
values: [1, 2, 3, 4, 5] values: [1, 2, 3, 4, 5]
@@ -89,11 +89,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: neighbourhood_name - name: neighbourhood_name
description: "Official neighbourhood name" description: "Official neighbourhood name"
tests: data_tests:
- not_null - not_null
- name: geometry - name: geometry
description: "PostGIS geometry for mapping" description: "PostGIS geometry for mapping"
@@ -103,7 +103,7 @@ models:
description: "100 = city average income" description: "100 = city average income"
- name: income_quintile - name: income_quintile
description: "Income quintile (1-5)" description: "Income quintile (1-5)"
tests: data_tests:
- accepted_values: - accepted_values:
arguments: arguments:
values: [1, 2, 3, 4, 5] values: [1, 2, 3, 4, 5]
@@ -115,11 +115,11 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood identifier" description: "Neighbourhood identifier"
tests: data_tests:
- not_null - not_null
- name: neighbourhood_name - name: neighbourhood_name
description: "Official neighbourhood name" description: "Official neighbourhood name"
tests: data_tests:
- not_null - not_null
- name: geometry - name: geometry
description: "PostGIS geometry for mapping" description: "PostGIS geometry for mapping"
@@ -129,7 +129,7 @@ models:
description: "100 = city average amenities" description: "100 = city average amenities"
- name: amenity_tier - name: amenity_tier
description: "Amenity tier (1=best, 5=lowest)" description: "Amenity tier (1=best, 5=lowest)"
tests: data_tests:
- accepted_values: - accepted_values:
arguments: arguments:
values: [1, 2, 3, 4, 5] values: [1, 2, 3, 4, 5]

View File

@@ -0,0 +1,153 @@
-- Mart: Neighbourhood Overview with Composite Livability Score
-- Dashboard Tab: Overview
-- Grain: One row per neighbourhood per year
-- Time spine: Years 2014-2025 (driven by crime/rental data availability)
with years as (
select * from {{ ref('int_year_spine') }}
),
neighbourhoods as (
select * from {{ ref('stg_toronto__neighbourhoods') }}
),
-- Create base: all neighbourhoods × all years
neighbourhood_years as (
select
n.neighbourhood_id,
n.neighbourhood_name,
n.geometry,
y.year
from neighbourhoods n
cross join years y
),
-- Census data (available for 2016, 2021)
-- For each year, use the most recent census data available
census as (
select * from {{ ref('stg_toronto__census') }}
),
census_mapped as (
select
ny.neighbourhood_id,
ny.year,
c.population,
c.unemployment_rate,
c.pct_bachelors_or_higher as education_bachelors_pct
from neighbourhood_years ny
left join census c on ny.neighbourhood_id = c.neighbourhood_id
-- Use census year <= analysis year, prefer most recent
and c.census_year = (
select max(c2.census_year)
from {{ ref('stg_toronto__census') }} c2
where c2.neighbourhood_id = ny.neighbourhood_id
and c2.census_year <= ny.year
)
),
-- CMA-level census data (for income - not available at neighbourhood level)
cma_census as (
select * from {{ ref('int_census__toronto_cma') }}
),
-- Crime data (2014-2024)
crime as (
select * from {{ ref('int_neighbourhood__crime_summary') }}
),
-- Rentals (2019-2025) - CMA level applied to all neighbourhoods
rentals as (
select * from {{ ref('int_rentals__toronto_cma') }}
),
-- Compute scores
scored as (
select
ny.neighbourhood_id,
ny.neighbourhood_name,
ny.geometry,
ny.year,
cm.population,
-- Use CMA-level income (neighbourhood-level not available in Toronto Open Data)
cma.median_household_income,
-- Safety score: inverse of crime rate (higher = safer)
case
when cr.crime_rate_per_100k is not null
then 100 - percent_rank() over (
partition by ny.year
order by cr.crime_rate_per_100k
) * 100
else null
end as safety_score,
-- Affordability score: inverse of rent-to-income ratio
-- Using CMA-level income since neighbourhood-level not available
case
when cma.median_household_income > 0 and r.avg_rent_standard > 0
then 100 - percent_rank() over (
partition by ny.year
order by (r.avg_rent_standard * 12 / cma.median_household_income)
) * 100
else null
end as affordability_score,
-- Raw metrics
cr.crime_rate_per_100k,
case
when cma.median_household_income > 0 and r.avg_rent_standard > 0
then round((r.avg_rent_standard * 12 / cma.median_household_income) * 100, 2)
else null
end as rent_to_income_pct,
r.avg_rent_standard as avg_rent_2bed,
r.vacancy_rate
from neighbourhood_years ny
left join census_mapped cm
on ny.neighbourhood_id = cm.neighbourhood_id
and ny.year = cm.year
left join cma_census cma
on ny.year = cma.year
left join crime cr
on ny.neighbourhood_id = cr.neighbourhood_id
and ny.year = cr.year
left join rentals r
on ny.year = r.year
),
final as (
select
neighbourhood_id,
neighbourhood_name,
geometry,
year,
population,
median_household_income,
-- Component scores (0-100)
round(safety_score::numeric, 1) as safety_score,
round(affordability_score::numeric, 1) as affordability_score,
-- TODO: Replace with actual amenity score when fact_amenities is populated
-- Currently uses neutral placeholder (50.0) which affects livability_score accuracy
50.0 as amenity_score,
-- Composite livability score: safety (40%), affordability (40%), amenities (20%)
round(
(coalesce(safety_score, 50) * 0.40 +
coalesce(affordability_score, 50) * 0.40 +
50 * 0.20)::numeric,
1
) as livability_score,
-- Raw metrics
crime_rate_per_100k,
rent_to_income_pct,
avg_rent_2bed,
vacancy_rate,
null::numeric as total_amenities_per_1000
from scored
)
select * from final

View File

@@ -0,0 +1,33 @@
version: 2
models:
- name: stg_dimensions__time
description: "Staged time dimension - shared across all projects"
columns:
- name: date_key
description: "Primary key (YYYYMM format)"
data_tests:
- unique
- not_null
- name: full_date
description: "First day of month"
data_tests:
- not_null
- name: year
description: "Calendar year"
data_tests:
- not_null
- name: month
description: "Month number (1-12)"
data_tests:
- not_null
- name: quarter
description: "Quarter (1-4)"
data_tests:
- not_null
- name: month_name
description: "Month name"
data_tests:
- not_null
- name: is_month_start
description: "Always true (monthly grain)"

View File

@@ -0,0 +1,25 @@
version: 2
sources:
- name: shared
description: "Shared dimension tables used across all dashboards"
database: portfolio
schema: public
tables:
- name: dim_time
description: "Time dimension (monthly grain) - shared across all projects"
columns:
- name: date_key
description: "Primary key (YYYYMM format)"
- name: full_date
description: "First day of month"
- name: year
description: "Calendar year"
- name: month
description: "Month number (1-12)"
- name: quarter
description: "Quarter (1-4)"
- name: month_name
description: "Month name"
- name: is_month_start
description: "Always true (monthly grain)"

View File

@@ -1,9 +1,10 @@
-- Staged time dimension -- Staged time dimension
-- Source: dim_time table -- Source: shared.dim_time table
-- Grain: One row per month -- Grain: One row per month
-- Note: Shared dimension used across all dashboard projects
with source as ( with source as (
select * from {{ source('toronto_housing', 'dim_time') }} select * from {{ source('shared', 'dim_time') }}
), ),
staged as ( staged as (

View File

@@ -1,18 +0,0 @@
-- Staged CMHC zone dimension
-- Source: dim_cmhc_zone table
-- Grain: One row per zone
with source as (
select * from {{ source('toronto_housing', 'dim_cmhc_zone') }}
),
staged as (
select
zone_key,
zone_code,
zone_name,
geometry
from source
)
select * from staged

View File

@@ -1,10 +1,10 @@
version: 2 version: 2
sources: sources:
- name: toronto_housing - name: toronto
description: "Toronto housing data loaded from CMHC and City of Toronto sources" description: "Toronto data loaded from CMHC and City of Toronto sources"
database: portfolio database: portfolio
schema: public schema: raw_toronto
tables: tables:
- name: fact_rentals - name: fact_rentals
description: "CMHC annual rental survey data by zone and bedroom type" description: "CMHC annual rental survey data by zone and bedroom type"
@@ -16,12 +16,6 @@ sources:
- name: zone_key - name: zone_key
description: "Foreign key to dim_cmhc_zone" description: "Foreign key to dim_cmhc_zone"
- name: dim_time
description: "Time dimension (monthly grain)"
columns:
- name: date_key
description: "Primary key (YYYYMMDD format)"
- name: dim_cmhc_zone - name: dim_cmhc_zone
description: "CMHC zone dimension with geometry" description: "CMHC zone dimension with geometry"
columns: columns:

View File

@@ -6,25 +6,16 @@ models:
columns: columns:
- name: rental_id - name: rental_id
description: "Unique identifier for rental record" description: "Unique identifier for rental record"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: date_key - name: date_key
description: "Date dimension key (YYYYMMDD)" description: "Date dimension key (YYYYMMDD)"
tests: data_tests:
- not_null - not_null
- name: zone_key - name: zone_key
description: "CMHC zone dimension key" description: "CMHC zone dimension key"
tests: data_tests:
- not_null
- name: stg_dimensions__time
description: "Staged time dimension"
columns:
- name: date_key
description: "Date dimension key (YYYYMMDD)"
tests:
- unique
- not_null - not_null
- name: stg_dimensions__cmhc_zones - name: stg_dimensions__cmhc_zones
@@ -32,12 +23,12 @@ models:
columns: columns:
- name: zone_key - name: zone_key
description: "Zone dimension key" description: "Zone dimension key"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: zone_code - name: zone_code
description: "CMHC zone code" description: "CMHC zone code"
tests: data_tests:
- unique - unique
- not_null - not_null
@@ -46,12 +37,12 @@ models:
columns: columns:
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood primary key" description: "Neighbourhood primary key"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: neighbourhood_name - name: neighbourhood_name
description: "Official neighbourhood name" description: "Official neighbourhood name"
tests: data_tests:
- not_null - not_null
- name: geometry - name: geometry
description: "PostGIS geometry (POLYGON)" description: "PostGIS geometry (POLYGON)"
@@ -61,16 +52,16 @@ models:
columns: columns:
- name: census_id - name: census_id
description: "Census record identifier" description: "Census record identifier"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood foreign key" description: "Neighbourhood foreign key"
tests: data_tests:
- not_null - not_null
- name: census_year - name: census_year
description: "Census year (2016, 2021)" description: "Census year (2016, 2021)"
tests: data_tests:
- not_null - not_null
- name: stg_toronto__crime - name: stg_toronto__crime
@@ -78,16 +69,16 @@ models:
columns: columns:
- name: crime_id - name: crime_id
description: "Crime record identifier" description: "Crime record identifier"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood foreign key" description: "Neighbourhood foreign key"
tests: data_tests:
- not_null - not_null
- name: crime_type - name: crime_type
description: "Type of crime" description: "Type of crime"
tests: data_tests:
- not_null - not_null
- name: stg_toronto__amenities - name: stg_toronto__amenities
@@ -95,16 +86,16 @@ models:
columns: columns:
- name: amenity_id - name: amenity_id
description: "Amenity record identifier" description: "Amenity record identifier"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood foreign key" description: "Neighbourhood foreign key"
tests: data_tests:
- not_null - not_null
- name: amenity_type - name: amenity_type
description: "Type of amenity" description: "Type of amenity"
tests: data_tests:
- not_null - not_null
- name: stg_cmhc__zone_crosswalk - name: stg_cmhc__zone_crosswalk
@@ -112,18 +103,18 @@ models:
columns: columns:
- name: crosswalk_id - name: crosswalk_id
description: "Crosswalk record identifier" description: "Crosswalk record identifier"
tests: data_tests:
- unique - unique
- not_null - not_null
- name: cmhc_zone_code - name: cmhc_zone_code
description: "CMHC zone code" description: "CMHC zone code"
tests: data_tests:
- not_null - not_null
- name: neighbourhood_id - name: neighbourhood_id
description: "Neighbourhood foreign key" description: "Neighbourhood foreign key"
tests: data_tests:
- not_null - not_null
- name: area_weight - name: area_weight
description: "Proportional area weight (0-1)" description: "Proportional area weight (0-1)"
tests: data_tests:
- not_null - not_null

View File

@@ -1,9 +1,13 @@
-- Staged CMHC rental market survey data -- Staged CMHC rental market survey data
-- Source: fact_rentals table loaded from CMHC CSV exports -- Source: fact_rentals table loaded from CMHC/StatCan
-- Grain: One row per zone per bedroom type per survey year -- Grain: One row per zone per bedroom type per survey year
with source as ( with source as (
select * from {{ source('toronto_housing', 'fact_rentals') }} select
f.*,
t.year as survey_year
from {{ source('toronto', 'fact_rentals') }} f
join {{ source('shared', 'dim_time') }} t on f.date_key = t.date_key
), ),
staged as ( staged as (
@@ -11,6 +15,7 @@ staged as (
id as rental_id, id as rental_id,
date_key, date_key,
zone_key, zone_key,
survey_year as year,
bedroom_type, bedroom_type,
universe as rental_universe, universe as rental_universe,
avg_rent, avg_rent,

View File

@@ -3,7 +3,7 @@
-- Grain: One row per zone-neighbourhood intersection -- Grain: One row per zone-neighbourhood intersection
with source as ( with source as (
select * from {{ source('toronto_housing', 'bridge_cmhc_neighbourhood') }} select * from {{ source('toronto', 'bridge_cmhc_neighbourhood') }}
), ),
staged as ( staged as (

View File

@@ -0,0 +1,19 @@
-- Staged CMHC zone dimension
-- Source: dim_cmhc_zone table
-- Grain: One row per zone
with source as (
select * from {{ source('toronto', 'dim_cmhc_zone') }}
),
staged as (
select
zone_key,
zone_code,
zone_name
-- geometry column excluded: CMHC does not provide zone boundaries
-- Spatial analysis uses dim_neighbourhood geometry instead
from source
)
select * from staged

View File

@@ -3,7 +3,7 @@
-- Grain: One row per neighbourhood per amenity type per year -- Grain: One row per neighbourhood per amenity type per year
with source as ( with source as (
select * from {{ source('toronto_housing', 'fact_amenities') }} select * from {{ source('toronto', 'fact_amenities') }}
), ),
staged as ( staged as (

View File

@@ -3,7 +3,7 @@
-- Grain: One row per neighbourhood per census year -- Grain: One row per neighbourhood per census year
with source as ( with source as (
select * from {{ source('toronto_housing', 'fact_census') }} select * from {{ source('toronto', 'fact_census') }}
), ),
staged as ( staged as (

View File

@@ -3,7 +3,7 @@
-- Grain: One row per neighbourhood per year per crime type -- Grain: One row per neighbourhood per year per crime type
with source as ( with source as (
select * from {{ source('toronto_housing', 'fact_crime') }} select * from {{ source('toronto', 'fact_crime') }}
), ),
staged as ( staged as (

View File

@@ -3,7 +3,7 @@
-- Grain: One row per neighbourhood (158 total) -- Grain: One row per neighbourhood (158 total)
with source as ( with source as (
select * from {{ source('toronto_housing', 'dim_neighbourhood') }} select * from {{ source('toronto', 'dim_neighbourhood') }}
), ),
staged as ( staged as (

View File

@@ -1,4 +1,4 @@
toronto_housing: portfolio:
target: dev target: dev
outputs: outputs:
dev: dev:

View File

@@ -1,6 +1,6 @@
services: services:
db: db:
image: postgis/postgis:16-3.4 image: ${POSTGIS_IMAGE:-postgis/postgis:16-3.4}
container_name: portfolio-db container_name: portfolio-db
restart: unless-stopped restart: unless-stopped
ports: ports:

500
docs/CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,500 @@
# Developer Guide
Instructions for contributing to the Analytics Portfolio project.
---
## Table of Contents
1. [Development Setup](#development-setup)
2. [Adding a Blog Post](#adding-a-blog-post)
3. [Adding a New Page](#adding-a-new-page)
4. [Adding a Dashboard Tab](#adding-a-dashboard-tab)
5. [Creating Figure Factories](#creating-figure-factories)
6. [Branch Workflow](#branch-workflow)
7. [Code Standards](#code-standards)
---
## Development Setup
### Prerequisites
- Python 3.11+ (via pyenv)
- Docker and Docker Compose
- Git
### Initial Setup
```bash
# Clone repository
git clone https://gitea.hotserv.cloud/lmiranda/personal-portfolio.git
cd personal-portfolio
# Run setup (creates venv, installs deps, copies .env.example)
make setup
# Start PostgreSQL + PostGIS
make docker-up
# Initialize database
make db-init
# Start development server
make run
```
The app runs at `http://localhost:8050`.
### Useful Commands
```bash
make test # Run tests
make test-cov # Run tests with coverage
make lint # Check code style
make format # Auto-format code
make typecheck # Run mypy type checker
make ci # Run all checks (lint, typecheck, test)
make dbt-run # Run dbt transformations
make dbt-test # Run dbt tests
```
---
## Adding a Blog Post
Blog posts are Markdown files with YAML frontmatter, stored in `portfolio_app/content/blog/`.
### Step 1: Create the Markdown File
Create a new file in `portfolio_app/content/blog/`:
```bash
touch portfolio_app/content/blog/your-article-slug.md
```
The filename becomes the URL slug: `/blog/your-article-slug`
### Step 2: Add Frontmatter
Every blog post requires YAML frontmatter at the top:
```markdown
---
title: "Your Article Title"
date: "2026-01-17"
description: "A brief description for the article card (1-2 sentences)"
tags:
- data-engineering
- python
- lessons-learned
status: published
---
Your article content starts here...
```
**Required fields:**
| Field | Description |
|-------|-------------|
| `title` | Article title (displayed on cards and page) |
| `date` | Publication date in `YYYY-MM-DD` format |
| `description` | Short summary for article listing cards |
| `tags` | List of tags (displayed as badges) |
| `status` | `published` or `draft` (drafts are hidden from listing) |
### Step 3: Write Content
Use standard Markdown:
```markdown
## Section Heading
Regular paragraph text.
### Subsection
- Bullet points
- Another point
```python
# Code blocks with syntax highlighting
def example():
return "Hello"
```
**Bold text** and *italic text*.
> Blockquotes for callouts
```
### Step 4: Test Locally
```bash
make run
```
Visit `http://localhost:8050/blog` to see the article listing.
Visit `http://localhost:8050/blog/your-article-slug` for the full article.
### Example: Complete Blog Post
```markdown
---
title: "Building ETL Pipelines with Python"
date: "2026-01-17"
description: "Lessons from building production data pipelines at scale"
tags:
- python
- etl
- data-engineering
status: published
---
When I started building data pipelines, I made every mistake possible...
## The Problem
Most tutorials show toy examples. Real pipelines are different.
### Error Handling
```python
def safe_transform(df: pd.DataFrame) -> pd.DataFrame:
try:
return df.apply(transform_row, axis=1)
except ValueError as e:
logger.error(f"Transform failed: {e}")
raise
```
## Conclusion
Ship something that works, then iterate.
```
---
## Adding a New Page
Pages use Dash's automatic routing based on file location in `portfolio_app/pages/`.
### Step 1: Create the Page File
```bash
touch portfolio_app/pages/your_page.py
```
### Step 2: Register the Page
Every page must call `dash.register_page()`:
```python
"""Your page description."""
import dash
import dash_mantine_components as dmc
dash.register_page(
__name__,
path="/your-page", # URL path
name="Your Page", # Display name (for nav)
title="Your Page Title" # Browser tab title
)
def layout() -> dmc.Container:
"""Page layout function."""
return dmc.Container(
dmc.Stack(
[
dmc.Title("Your Page", order=1),
dmc.Text("Page content here."),
],
gap="lg",
),
size="md",
py="xl",
)
```
### Step 3: Page with Dynamic Content
For pages with URL parameters:
```python
# pages/blog/article.py
dash.register_page(
__name__,
path_template="/blog/<slug>", # Dynamic parameter
name="Article",
)
def layout(slug: str = "") -> dmc.Container:
"""Layout receives URL parameters as arguments."""
article = get_article(slug)
if not article:
return dmc.Text("Article not found")
return dmc.Container(
dmc.Title(article["meta"]["title"]),
# ...
)
```
### Step 4: Add Navigation (Optional)
To add the page to the sidebar, edit `portfolio_app/components/sidebar.py`:
```python
# For main pages (Home, About, Blog, etc.)
NAV_ITEMS_MAIN = [
{"path": "/", "icon": "tabler:home", "label": "Home"},
{"path": "/your-page", "icon": "tabler:star", "label": "Your Page"},
# ...
]
# For project/dashboard pages
NAV_ITEMS_PROJECTS = [
{"path": "/projects", "icon": "tabler:folder", "label": "Projects"},
{"path": "/your-dashboard", "icon": "tabler:chart-bar", "label": "Your Dashboard"},
# ...
]
```
The sidebar uses icon buttons with tooltips. Each item needs `path`, `icon` (Tabler icon name), and `label` (tooltip text).
### URL Routing Summary
| File Location | URL |
|---------------|-----|
| `pages/home.py` | `/` (if `path="/"`) |
| `pages/about.py` | `/about` |
| `pages/blog/index.py` | `/blog` |
| `pages/blog/article.py` | `/blog/<slug>` |
| `pages/toronto/dashboard.py` | `/toronto` |
---
## Adding a Dashboard Tab
Dashboard tabs are in `portfolio_app/pages/toronto/tabs/`.
### Step 1: Create Tab Layout
```python
# pages/toronto/tabs/your_tab.py
"""Your tab description."""
import dash_mantine_components as dmc
from portfolio_app.figures.toronto.choropleth import create_choropleth
from portfolio_app.toronto.demo_data import get_demo_data
def create_your_tab_layout() -> dmc.Stack:
"""Create the tab layout."""
data = get_demo_data()
return dmc.Stack(
[
dmc.Grid(
[
dmc.GridCol(
# Map on left
create_choropleth(data, "your_metric"),
span=8,
),
dmc.GridCol(
# KPI cards on right
create_kpi_cards(data),
span=4,
),
],
),
# Charts below
create_supporting_charts(data),
],
gap="lg",
)
```
### Step 2: Register in Dashboard
Edit `pages/toronto/dashboard.py` to add the tab:
```python
from portfolio_app.pages.toronto.tabs.your_tab import create_your_tab_layout
# In the tabs list:
dmc.TabsTab("Your Tab", value="your-tab"),
# In the panels:
dmc.TabsPanel(create_your_tab_layout(), value="your-tab"),
```
---
## Creating Figure Factories
Figure factories are organized by dashboard domain under `portfolio_app/figures/{domain}/`.
### Pattern
```python
# figures/toronto/your_chart.py
"""Your chart type factory for Toronto dashboard."""
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
def create_your_chart(
df: pd.DataFrame,
x_col: str,
y_col: str,
title: str = "",
) -> go.Figure:
"""Create a your_chart figure.
Args:
df: DataFrame with data.
x_col: Column for x-axis.
y_col: Column for y-axis.
title: Optional chart title.
Returns:
Configured Plotly figure.
"""
fig = px.bar(df, x=x_col, y=y_col, title=title)
fig.update_layout(
template="plotly_white",
margin=dict(l=40, r=40, t=40, b=40),
)
return fig
```
### Export from `__init__.py`
```python
# figures/toronto/__init__.py
from .your_chart import create_your_chart
__all__ = [
"create_your_chart",
# ...
]
```
### Importing Figure Factories
```python
# In callbacks or tabs
from portfolio_app.figures.toronto import create_choropleth_figure
from portfolio_app.figures.toronto.bar_charts import create_ranking_bar
```
---
## Branch Workflow
```
main (production)
staging (pre-production)
development (integration)
feature/XX-description (your work)
```
### Creating a Feature Branch
```bash
# Start from development
git checkout development
git pull origin development
# Create feature branch
git checkout -b feature/10-add-new-page
# Work, commit, push
git add .
git commit -m "feat: Add new page"
git push -u origin feature/10-add-new-page
```
### Merging
```bash
# Merge into development
git checkout development
git merge feature/10-add-new-page
git push origin development
# Delete feature branch
git branch -d feature/10-add-new-page
git push origin --delete feature/10-add-new-page
```
**Rules:**
- Never commit directly to `main` or `staging`
- Never delete `development`
- Feature branches are temporary
---
## Code Standards
### Type Hints
Use Python 3.10+ style:
```python
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
...
```
### Imports
| Context | Style |
|---------|-------|
| Same directory | `from .module import X` |
| Sibling directory | `from ..schemas.model import Y` |
| External packages | `import pandas as pd` |
### Formatting
```bash
make format # Runs ruff formatter
make lint # Checks style
```
### Docstrings
Google style, only for non-obvious functions:
```python
def calculate_score(values: list[float], weights: list[float]) -> float:
"""Calculate weighted score.
Args:
values: Raw metric values.
weights: Weight for each metric.
Returns:
Weighted average score.
"""
...
```
---
## Questions?
Check `CLAUDE.md` for AI assistant context and architectural decisions.

335
docs/DATABASE_SCHEMA.md Normal file
View File

@@ -0,0 +1,335 @@
# Database Schema
This document describes the PostgreSQL/PostGIS database schema for the Toronto Neighbourhood Dashboard.
## Entity Relationship Diagram
```mermaid
erDiagram
dim_time {
int date_key PK
date full_date UK
int year
int month
int quarter
string month_name
bool is_month_start
}
dim_cmhc_zone {
int zone_key PK
string zone_code UK
string zone_name
geometry geometry
}
dim_neighbourhood {
int neighbourhood_id PK
string name
geometry geometry
int population
numeric land_area_sqkm
numeric pop_density_per_sqkm
numeric pct_bachelors_or_higher
numeric median_household_income
numeric pct_owner_occupied
numeric pct_renter_occupied
int census_year
}
dim_policy_event {
int event_id PK
date event_date
date effective_date
string level
string category
string title
text description
string expected_direction
string source_url
string confidence
}
fact_rentals {
int id PK
int date_key FK
int zone_key FK
string bedroom_type
int universe
numeric avg_rent
numeric median_rent
numeric vacancy_rate
numeric availability_rate
numeric turnover_rate
numeric rent_change_pct
string reliability_code
}
fact_census {
int id PK
int neighbourhood_id FK
int census_year
int population
numeric population_density
numeric median_household_income
numeric average_household_income
numeric unemployment_rate
numeric pct_bachelors_or_higher
numeric pct_owner_occupied
numeric pct_renter_occupied
numeric median_age
numeric average_dwelling_value
}
fact_crime {
int id PK
int neighbourhood_id FK
int year
string crime_type
int count
numeric rate_per_100k
}
fact_amenities {
int id PK
int neighbourhood_id FK
string amenity_type
int count
int year
}
bridge_cmhc_neighbourhood {
int id PK
string cmhc_zone_code FK
int neighbourhood_id FK
numeric weight
}
dim_time ||--o{ fact_rentals : "date_key"
dim_cmhc_zone ||--o{ fact_rentals : "zone_key"
dim_neighbourhood ||--o{ fact_census : "neighbourhood_id"
dim_neighbourhood ||--o{ fact_crime : "neighbourhood_id"
dim_neighbourhood ||--o{ fact_amenities : "neighbourhood_id"
dim_cmhc_zone ||--o{ bridge_cmhc_neighbourhood : "zone_code"
dim_neighbourhood ||--o{ bridge_cmhc_neighbourhood : "neighbourhood_id"
```
## Schema Layers
### Database Schemas
| Schema | Purpose | Managed By |
|--------|---------|------------|
| `public` | Shared dimensions (dim_time) | SQLAlchemy |
| `raw_toronto` | Toronto dimension and fact tables | SQLAlchemy |
| `stg_toronto` | Toronto staging models | dbt |
| `int_toronto` | Toronto intermediate models | dbt |
| `mart_toronto` | Toronto analytical tables | dbt |
### Raw Toronto Schema (raw_toronto)
Toronto-specific tables loaded by SQLAlchemy:
| Table | Source | Description |
|-------|--------|-------------|
| `dim_neighbourhood` | City of Toronto API | 158 neighbourhood boundaries |
| `dim_cmhc_zone` | CMHC | ~20 rental market zones |
| `dim_policy_event` | Manual | Policy events for annotation |
| `fact_census` | City of Toronto API | Census profile data |
| `fact_crime` | Toronto Police API | Crime statistics |
| `fact_amenities` | City of Toronto API | Amenity counts |
| `fact_rentals` | CMHC Data Files | Rental market survey data |
| `bridge_cmhc_neighbourhood` | Computed | Zone-neighbourhood mapping |
### Public Schema
Shared dimensions used across all projects:
| Table | Description |
|-------|-------------|
| `dim_time` | Time dimension (monthly grain) |
### Staging Schema - stg_toronto (dbt)
Staging models provide 1:1 cleaned representations of source data:
| Model | Source Table | Purpose |
|-------|-------------|---------|
| `stg_toronto__neighbourhoods` | raw.neighbourhoods | Cleaned boundaries with standardized names |
| `stg_toronto__census` | raw.census_profiles | Typed census metrics |
| `stg_cmhc__rentals` | raw.cmhc_rentals | Validated rental data |
| `stg_toronto__crime` | raw.crime_data | Standardized crime categories |
| `stg_toronto__amenities` | raw.amenities | Typed amenity counts |
| `stg_dimensions__time` | generated | Time dimension |
| `stg_dimensions__cmhc_zones` | raw.cmhc_zones | CMHC zone boundaries |
| `stg_cmhc__zone_crosswalk` | raw.crosswalk | Zone-neighbourhood mapping |
### Marts Schema - mart_toronto (dbt)
Analytical tables ready for dashboard consumption:
| Model | Grain | Purpose |
|-------|-------|---------|
| `mart_neighbourhood_overview` | neighbourhood | Composite livability scores |
| `mart_neighbourhood_housing` | neighbourhood | Housing and rent metrics |
| `mart_neighbourhood_safety` | neighbourhood × year | Crime rate calculations |
| `mart_neighbourhood_demographics` | neighbourhood | Income, age, population metrics |
| `mart_neighbourhood_amenities` | neighbourhood | Amenity accessibility scores |
| `mart_toronto_rentals` | zone × month | Time-series rental analysis |
## Table Details
### Dimension Tables
#### dim_time
Time dimension for date-based analysis. Grain: one row per month.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| date_key | INTEGER | PK | Surrogate key (YYYYMM format) |
| full_date | DATE | UNIQUE, NOT NULL | First day of month |
| year | INTEGER | NOT NULL | Calendar year |
| month | INTEGER | NOT NULL | Month number (1-12) |
| quarter | INTEGER | NOT NULL | Quarter (1-4) |
| month_name | VARCHAR(20) | NOT NULL | Month name |
| is_month_start | BOOLEAN | DEFAULT TRUE | Always true (monthly grain) |
#### dim_cmhc_zone
CMHC rental market zones (~20 zones covering Toronto).
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| zone_key | INTEGER | PK, AUTO | Surrogate key |
| zone_code | VARCHAR(10) | UNIQUE, NOT NULL | CMHC zone identifier |
| zone_name | VARCHAR(100) | NOT NULL | Zone display name |
| geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS zone boundary |
#### dim_neighbourhood
Toronto's 158 official neighbourhoods.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| neighbourhood_id | INTEGER | PK | City-assigned ID |
| name | VARCHAR(100) | NOT NULL | Neighbourhood name |
| geometry | GEOMETRY(POLYGON) | SRID 4326 | PostGIS boundary |
| population | INTEGER | | Total population |
| land_area_sqkm | NUMERIC(10,4) | | Area in km² |
| pop_density_per_sqkm | NUMERIC(10,2) | | Population density |
| pct_bachelors_or_higher | NUMERIC(5,2) | | Education rate |
| median_household_income | NUMERIC(12,2) | | Median income |
| pct_owner_occupied | NUMERIC(5,2) | | Owner occupancy rate |
| pct_renter_occupied | NUMERIC(5,2) | | Renter occupancy rate |
| census_year | INTEGER | DEFAULT 2021 | Census reference year |
#### dim_policy_event
Policy events for time-series annotation (rent control, interest rates, etc.).
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| event_id | INTEGER | PK, AUTO | Surrogate key |
| event_date | DATE | NOT NULL | Announcement date |
| effective_date | DATE | | Implementation date |
| level | VARCHAR(20) | NOT NULL | federal/provincial/municipal |
| category | VARCHAR(20) | NOT NULL | monetary/tax/regulatory/supply/economic |
| title | VARCHAR(200) | NOT NULL | Event title |
| description | TEXT | | Detailed description |
| expected_direction | VARCHAR(10) | NOT NULL | bearish/bullish/neutral |
| source_url | VARCHAR(500) | | Reference link |
| confidence | VARCHAR(10) | DEFAULT 'medium' | high/medium/low |
### Fact Tables
#### fact_rentals
CMHC rental market survey data. Grain: zone × bedroom type × survey date.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PK, AUTO | Surrogate key |
| date_key | INTEGER | FK → dim_time | Survey date reference |
| zone_key | INTEGER | FK → dim_cmhc_zone | CMHC zone reference |
| bedroom_type | VARCHAR(20) | NOT NULL | bachelor/1-bed/2-bed/3+bed/total |
| universe | INTEGER | | Total rental units |
| avg_rent | NUMERIC(10,2) | | Average rent |
| median_rent | NUMERIC(10,2) | | Median rent |
| vacancy_rate | NUMERIC(5,2) | | Vacancy percentage |
| availability_rate | NUMERIC(5,2) | | Availability percentage |
| turnover_rate | NUMERIC(5,2) | | Turnover percentage |
| rent_change_pct | NUMERIC(5,2) | | Year-over-year change |
| reliability_code | VARCHAR(2) | | CMHC data quality code |
#### fact_census
Census statistics. Grain: neighbourhood × census year.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PK, AUTO | Surrogate key |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| census_year | INTEGER | NOT NULL | 2016, 2021, etc. |
| population | INTEGER | | Total population |
| population_density | NUMERIC(10,2) | | People per km² |
| median_household_income | NUMERIC(12,2) | | Median income |
| average_household_income | NUMERIC(12,2) | | Average income |
| unemployment_rate | NUMERIC(5,2) | | Unemployment % |
| pct_bachelors_or_higher | NUMERIC(5,2) | | Education rate |
| pct_owner_occupied | NUMERIC(5,2) | | Owner rate |
| pct_renter_occupied | NUMERIC(5,2) | | Renter rate |
| median_age | NUMERIC(5,2) | | Median resident age |
| average_dwelling_value | NUMERIC(12,2) | | Average home value |
#### fact_crime
Crime statistics. Grain: neighbourhood × year × crime type.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PK, AUTO | Surrogate key |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| year | INTEGER | NOT NULL | Calendar year |
| crime_type | VARCHAR(50) | NOT NULL | Crime category |
| count | INTEGER | NOT NULL | Number of incidents |
| rate_per_100k | NUMERIC(10,2) | | Rate per 100k population |
#### fact_amenities
Amenity counts. Grain: neighbourhood × amenity type × year.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PK, AUTO | Surrogate key |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| amenity_type | VARCHAR(50) | NOT NULL | parks/schools/transit/etc. |
| count | INTEGER | NOT NULL | Number of amenities |
| year | INTEGER | NOT NULL | Reference year |
### Bridge Tables
#### bridge_cmhc_neighbourhood
Maps CMHC zones to neighbourhoods with area-based weights for data disaggregation.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | INTEGER | PK, AUTO | Surrogate key |
| cmhc_zone_code | VARCHAR(10) | FK → dim_cmhc_zone | Zone reference |
| neighbourhood_id | INTEGER | FK → dim_neighbourhood | Neighbourhood reference |
| weight | NUMERIC(5,4) | NOT NULL | Proportional weight (0-1) |
## Indexes
| Table | Index | Columns | Purpose |
|-------|-------|---------|---------|
| fact_rentals | ix_fact_rentals_date_zone | date_key, zone_key | Time-series queries |
| fact_census | ix_fact_census_neighbourhood_year | neighbourhood_id, census_year | Census lookups |
| fact_crime | ix_fact_crime_neighbourhood_year | neighbourhood_id, year | Crime trends |
| fact_crime | ix_fact_crime_type | crime_type | Crime filtering |
| fact_amenities | ix_fact_amenities_neighbourhood_year | neighbourhood_id, year | Amenity queries |
| fact_amenities | ix_fact_amenities_type | amenity_type | Amenity filtering |
| bridge_cmhc_neighbourhood | ix_bridge_cmhc_zone | cmhc_zone_code | Zone lookups |
| bridge_cmhc_neighbourhood | ix_bridge_neighbourhood | neighbourhood_id | Neighbourhood lookups |
## PostGIS Extensions
The database requires PostGIS for geospatial operations:
```sql
CREATE EXTENSION IF NOT EXISTS postgis;
```
All geometry columns use SRID 4326 (WGS84) for compatibility with web mapping libraries.

View File

@@ -1,21 +1,193 @@
# Portfolio Project Reference # Portfolio Project Reference
**Project**: Analytics Portfolio **Project**: Analytics Portfolio
**Owner**: Leo **Owner**: Leo Miranda
**Status**: Ready for Sprint 1 **Status**: Sprint 9 Complete (Dashboard Implementation Done)
**Last Updated**: January 2026
--- ---
## Project Overview ## Project Overview
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities. Personal portfolio website with an interactive Toronto Neighbourhood Dashboard demonstrating data engineering, visualization, and analytics capabilities.
| Project | Domain | Key Skills | Phase | | Component | Description | Status |
|---------|--------|------------|-------| |-----------|-------------|--------|
| **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) | | Portfolio Website | Bio, About, Projects, Resume, Contact, Blog | Complete |
| **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) | | Toronto Dashboard | 5-tab neighbourhood analysis | Complete |
| Data Pipeline | dbt models, figure factories | Complete |
| Deployment | Production deployment | Pending |
**Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards). ---
## Completed Work
### Sprint 1-6: Foundation
- Repository setup, Docker, PostgreSQL + PostGIS
- Bio landing page implementation
- Initial data model design
### Sprint 7: Navigation & Theme
- Sidebar navigation
- Dark/light theme toggle
- dash-mantine-components integration
### Sprint 8: Portfolio Website
- About, Contact, Projects, Resume pages
- Blog system with Markdown/frontmatter
- Health endpoint
### Sprint 9: Neighbourhood Dashboard Transition
- Phase 1: Deleted legacy TRREB code
- Phase 2: Documentation cleanup
- Phase 3: New neighbourhood-centric data model
- Phase 4: dbt model restructuring
- Phase 5: 5-tab dashboard implementation
- Phase 6: 15 documentation notebooks
- Phase 7: Final documentation review
---
## Application Architecture
### URL Routes
| URL | Page | File |
|-----|------|------|
| `/` | Home | `pages/home.py` |
| `/about` | About | `pages/about.py` |
| `/contact` | Contact | `pages/contact.py` |
| `/projects` | Projects | `pages/projects.py` |
| `/resume` | Resume | `pages/resume.py` |
| `/blog` | Blog listing | `pages/blog/index.py` |
| `/blog/{slug}` | Article | `pages/blog/article.py` |
| `/toronto` | Dashboard | `pages/toronto/dashboard.py` |
| `/toronto/methodology` | Methodology | `pages/toronto/methodology.py` |
| `/health` | Health check | `pages/health.py` |
### Directory Structure
```
portfolio_app/
├── app.py # Dash app factory
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images
├── callbacks/ # Global callbacks (sidebar, theme)
├── components/ # Shared UI components
├── content/blog/ # Markdown blog articles
├── errors/ # Exception handling
├── figures/
│ └── toronto/ # Toronto figure factories
├── pages/
│ ├── home.py
│ ├── about.py
│ ├── contact.py
│ ├── projects.py
│ ├── resume.py
│ ├── health.py
│ ├── blog/
│ │ ├── index.py
│ │ └── article.py
│ └── toronto/
│ ├── dashboard.py
│ ├── methodology.py
│ ├── tabs/ # 5 tab layouts
│ └── callbacks/ # Dashboard interactions (map_callbacks, chart_callbacks, selection_callbacks)
├── toronto/ # Data logic
│ ├── parsers/ # API extraction (geo, toronto_open_data, toronto_police, cmhc)
│ ├── loaders/ # Database operations (base, cmhc, cmhc_crosswalk)
│ ├── schemas/ # Pydantic models
│ ├── models/ # SQLAlchemy ORM (raw_toronto schema)
│ ├── services/ # Query functions (neighbourhood_service, geometry_service)
│ └── demo_data.py # Sample data
└── utils/
└── markdown_loader.py # Blog article loading
dbt/ # dbt project: portfolio
├── models/
│ ├── shared/ # Cross-domain dimensions
│ ├── staging/toronto/ # Toronto staging models
│ ├── intermediate/toronto/ # Toronto intermediate models
│ └── marts/toronto/ # Toronto mart tables
notebooks/
└── toronto/ # Toronto documentation notebooks
```
---
## Toronto Dashboard
### Data Sources
| Source | Data | Format |
|--------|------|--------|
| City of Toronto Open Data | Neighbourhoods (158), Census profiles, Parks, Schools, Childcare, TTC | GeoJSON, CSV, API |
| Toronto Police Service | Crime rates, MCI, Shootings | CSV, API |
| CMHC | Rental Market Survey | CSV |
### Geographic Model
```
City of Toronto Neighbourhoods (158) ← Primary analysis unit
CMHC Zones (~20) ← Rental data (Census Tract aligned)
```
### Dashboard Tabs
| Tab | Choropleth Metric | Supporting Charts |
|-----|-------------------|-------------------|
| Overview | Livability score | Top/Bottom 10 bar, Income vs Safety scatter |
| Housing | Affordability index | Rent trend line, Tenure breakdown bar |
| Safety | Crime rate per 100K | Crime breakdown bar, Crime trend line |
| Demographics | Median income | Age distribution, Population density bar |
| Amenities | Amenity index | Amenity radar, Transit accessibility bar |
### Star Schema
| Table | Type | Description |
|-------|------|-------------|
| `dim_neighbourhood` | Dimension | 158 neighbourhoods with geometry |
| `dim_time` | Dimension | Date dimension |
| `dim_cmhc_zone` | Dimension | ~20 CMHC zones with geometry |
| `fact_census` | Fact | Census indicators by neighbourhood |
| `fact_crime` | Fact | Crime stats by neighbourhood |
| `fact_rentals` | Fact | Rental data by CMHC zone |
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
### dbt Project: `portfolio`
**Model Structure:**
```
dbt/models/
├── shared/ # Cross-domain dimensions (stg_dimensions__time)
├── staging/toronto/ # Toronto staging models
├── intermediate/toronto/ # Toronto intermediate models
└── marts/toronto/ # Toronto mart tables
```
| Layer | Naming | Example |
|-------|--------|---------|
| Shared | `stg_dimensions__*` | `stg_dimensions__time` |
| Staging | `stg_{source}__{entity}` | `stg_toronto__neighbourhoods` |
| Intermediate | `int_{domain}__{transform}` | `int_neighbourhood__demographics` |
| Marts | `mart_{domain}` | `mart_neighbourhood_overview` |
---
## Tech Stack
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | 2.x |
| ORM | SQLAlchemy | 2.x |
| Transformation | dbt-postgres | 1.7+ |
| Data Processing | Pandas, GeoPandas | Latest |
| Visualization | Dash + Plotly | 2.14+ |
| UI Components | dash-mantine-components | Latest |
| Testing | pytest | 7.0+ |
| Python | 3.11+ | Via pyenv |
--- ---
@@ -23,293 +195,51 @@ Two-project analytics portfolio demonstrating end-to-end data engineering, visua
| Branch | Purpose | Deploys To | | Branch | Purpose | Deploys To |
|--------|---------|------------| |--------|---------|------------|
| `main` | Production releases only | VPS (production) | | `main` | Production releases | VPS (production) |
| `staging` | Pre-production testing | VPS (staging) | | `staging` | Pre-production testing | VPS (staging) |
| `development` | Active development | Local only | | `development` | Active development | Local only |
**Rules**: **Rules:**
- All feature branches created FROM `development` - Feature branches from `development`: `feature/{sprint}-{description}`
- All feature branches merge INTO `development` - Merge into `development` when complete
- `development``staging` for testing - `development``staging` `main` for releases
- `staging``main` for release - Never delete `development`
- Direct commits to `main` or `staging` are forbidden
- Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}`
--- ---
## Tech Stack (Locked) ## Code Standards
| Layer | Technology | Version | ### Type Hints (Python 3.10+)
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | ≥2.0 |
| ORM | SQLAlchemy | ≥2.0 (2.0-style API only) |
| Transformation | dbt-postgres | ≥1.7 |
| Data Processing | Pandas | ≥2.1 |
| Geospatial | GeoPandas + Shapely | ≥0.14 |
| Visualization | Dash + Plotly | ≥2.14 |
| UI Components | dash-mantine-components | Latest stable |
| Testing | pytest | ≥7.0 |
| Python | 3.11+ | Via pyenv |
**Compatibility Notes**: ```python
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs def process(items: list[str], config: dict[str, int] | None = None) -> bool:
- PostGIS extension required—enable during db init ...
- Docker Compose V2 (no `version` field in compose files) ```
--- ### Imports
## Code Conventions | Context | Style |
|---------|-------|
### Import Style | Same directory | `from .module import X` |
| Sibling directory | `from ..schemas.model import Y` |
| Context | Style | Example | | External | `import pandas as pd` |
|---------|-------|---------|
| Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` |
| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` |
| External packages | Absolute | `import pandas as pd` |
### Module Separation
| Directory | Contains | Purpose |
|-----------|----------|---------|
| `schemas/` | Pydantic models | Data validation |
| `models/` | SQLAlchemy ORM | Database persistence |
| `parsers/` | API/CSV extraction | Raw data ingestion |
| `loaders/` | Database operations | Data loading |
| `figures/` | Chart factories | Plotly figure generation |
| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` |
| `errors/` | Exceptions + handlers | Error handling |
### Code Standards
- **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`)
- **Functions**: Single responsibility, verb naming, early returns over nesting
- **Docstrings**: Google style, minimal—only for non-obvious behavior
- **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config
### Error Handling ### Error Handling
```python ```python
# errors/exceptions.py
class PortfolioError(Exception): class PortfolioError(Exception):
"""Base exception.""" """Base exception."""
class ParseError(PortfolioError): class ParseError(PortfolioError):
"""PDF/CSV parsing failed.""" """Data parsing failed."""
class ValidationError(PortfolioError): class ValidationError(PortfolioError):
"""Pydantic or business rule validation failed.""" """Validation failed."""
class LoadError(PortfolioError): class LoadError(PortfolioError):
"""Database load operation failed.""" """Database load failed."""
``` ```
- Decorators for infrastructure concerns (logging, retry, transactions)
- Explicit handling for domain logic (business rules, recovery strategies)
---
## Application Architecture
### Dash Pages Structure
```
portfolio_app/
├── app.py # Dash app factory with Pages routing
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images (auto-served by Dash)
├── pages/
│ ├── home.py # Bio landing page → /
│ ├── toronto/
│ │ ├── dashboard.py # Layout only → /toronto
│ │ └── callbacks/ # Interaction logic
│ └── energy/ # Phase 3
├── components/ # Shared UI (navbar, footer, cards)
├── figures/ # Shared chart factories
├── toronto/ # Toronto data logic
│ ├── parsers/
│ ├── loaders/
│ ├── schemas/ # Pydantic
│ └── models/ # SQLAlchemy
└── errors/
```
### URL Routing (Automatic)
| URL | Page | Status |
|-----|------|--------|
| `/` | Bio landing page | Sprint 2 |
| `/toronto` | Toronto Housing Dashboard | Sprint 6 |
| `/energy` | Energy Pricing Dashboard | Phase 3 |
---
## Phase 1: Toronto Neighbourhood Dashboard
### Data Sources
| Track | Source | Format | Geography | Frequency |
|-------|--------|--------|-----------|-----------|
| Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual |
| Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
| Policy Events | Curated list | CSV | N/A | Event-based |
### Geographic Reality
```
┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit
├─────────────────────────────────────────────────────────────────┤
│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data
└─────────────────────────────────────────────────────────────────┘
```
### Data Model (Star Schema)
| Table | Type | Keys |
|-------|------|------|
| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone |
| `dim_time` | Dimension | date_key (PK) |
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
| `dim_policy_event` | Dimension | event_id (PK) |
### dbt Layer Structure
| Layer | Naming | Purpose |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
| Intermediate | `int_{domain}__{transform}` | Business logic, filtering |
| Marts | `mart_{domain}` | Final analytical tables |
---
## Sprint Overview
| Sprint | Focus | Milestone |
|--------|-------|-----------|
| 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** |
| 7 | Navigation & theme modernization | — |
| 8 | Portfolio website expansion | **Launch 2: Website Live** |
| 9 | Neighbourhood dashboard transition | Cleanup complete |
| 10+ | Dashboard implementation | **Launch 3: Dashboard Live** |
---
## Scope Boundaries
### Phase 1 — Build These
- Bio landing page and portfolio website
- CMHC rental data processor
- Toronto neighbourhood data integration
- PostgreSQL + PostGIS database layer
- Star schema (facts + dimensions)
- dbt models with tests
- Choropleth visualization (Dash)
- Policy event annotation layer
### Deferred Features
| Feature | Reason | When |
|---------|--------|------|
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase |
| ML prediction models | Energy project scope | Phase 3 |
| Multi-project shared infrastructure | Build first, abstract second | Future |
If a task seems to require deferred features, **stop and flag it**.
---
## File Structure
### Root-Level Files (Allowed)
| File | Purpose |
|------|---------|
| `README.md` | Project overview |
| `CLAUDE.md` | AI assistant context |
| `pyproject.toml` | Python packaging |
| `.gitignore` | Git ignore rules |
| `.env.example` | Environment template |
| `.python-version` | pyenv version |
| `.pre-commit-config.yaml` | Pre-commit hooks |
| `docker-compose.yml` | Container orchestration |
| `Makefile` | Task automation |
### Directory Structure
```
portfolio/
├── portfolio_app/ # Monolithic Dash application
│ ├── app.py
│ ├── config.py
│ ├── assets/
│ ├── pages/
│ ├── components/
│ ├── figures/
│ ├── toronto/
│ └── errors/
├── tests/
├── dbt/
├── data/
│ └── toronto/
│ ├── raw/
│ ├── processed/ # gitignored
│ └── reference/
├── scripts/
│ ├── db/
│ ├── docker/
│ ├── deploy/
│ ├── dbt/
│ └── dev/
├── docs/
├── notebooks/
├── backups/ # gitignored
└── reports/ # gitignored
```
### Gitignored Directories
- `data/*/processed/`
- `reports/`
- `backups/`
- `notebooks/*.html`
- `.env`
- `__pycache__/`
- `.venv/`
---
## Makefile Targets
| Target | Purpose |
|--------|---------|
| `setup` | Install deps, create .env, init pre-commit |
| `docker-up` | Start PostgreSQL + PostGIS |
| `docker-down` | Stop containers |
| `db-init` | Initialize database schema |
| `run` | Start Dash dev server |
| `test` | Run pytest |
| `dbt-run` | Run dbt models |
| `dbt-test` | Run dbt tests |
| `lint` | Run ruff linter |
| `format` | Run ruff formatter |
| `ci` | Run all checks |
| `deploy` | Deploy to production |
---
## Script Standards
All scripts in `scripts/`:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use `set -euo pipefail` for bash
- Log to stdout, errors to stderr
--- ---
## Environment Variables ## Environment Variables
@@ -328,41 +258,61 @@ LOG_LEVEL=INFO
--- ---
## Success Criteria ## Makefile Targets
### Launch 1 (Bio Live) | Target | Purpose |
- [x] Bio page accessible via HTTPS |--------|---------|
- [x] All bio content rendered | `setup` | Install deps, create .env, init pre-commit |
- [x] No placeholder text visible | `docker-up` | Start PostgreSQL + PostGIS (auto-detects x86/ARM) |
- [x] Mobile responsive | `docker-down` | Stop containers |
- [x] Social links functional | `docker-logs` | View container logs |
| `db-init` | Initialize database schema |
### Launch 2 (Website Live) | `db-reset` | Drop and recreate database (DESTRUCTIVE) |
- [x] Full portfolio website with navigation | `load-data` | Load Toronto data from APIs, seed dev data |
- [x] About, Contact, Projects, Resume, Blog pages | `load-toronto-only` | Load Toronto data without dbt or seeding |
- [x] Dark mode theme support | `seed-data` | Seed sample development data |
- [x] Sidebar navigation | `run` | Start Dash dev server |
| `test` | Run pytest |
### Launch 3 (Dashboard Live) | `test-cov` | Run pytest with coverage |
- [ ] Choropleth renders neighbourhoods and CMHC zones | `lint` | Run ruff linter |
- [ ] Rental data visualization works | `format` | Run ruff formatter |
- [ ] Time navigation works | `typecheck` | Run mypy type checker |
- [ ] Policy event markers visible | `ci` | Run all checks (lint, typecheck, test) |
- [ ] Methodology documentation published | `dbt-run` | Run dbt models |
- [ ] Data sources cited | `dbt-test` | Run dbt tests |
| `dbt-docs` | Generate and serve dbt documentation |
| `clean` | Remove build artifacts and caches |
--- ---
## Reference Documents ## Next Steps
For detailed specifications, see: ### Deployment (Sprint 10+)
- [ ] Production Docker configuration
- [ ] CI/CD pipeline
- [ ] HTTPS/SSL setup
- [ ] Domain configuration
| Document | Location | Use When | ### Data Enhancement
|----------|----------|----------| - [ ] Connect to live APIs (currently using demo data)
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification | - [ ] Data refresh automation
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning | - [ ] Historical data loading
### Future Projects
- Energy Pricing Analysis dashboard (planned)
--- ---
*Reference Version: 2.0* ## Related Documents
*Updated: Sprint 9*
| Document | Purpose |
|----------|---------|
| `README.md` | Quick start guide |
| `CLAUDE.md` | AI assistant context |
| `docs/CONTRIBUTING.md` | Developer guide |
| `notebooks/README.md` | Notebook documentation |
---
*Reference Version: 3.0*
*Updated: January 2026*

View File

@@ -1,134 +0,0 @@
# Portfolio Bio Content
**Version**: 2.0
**Last Updated**: January 2026
**Purpose**: Content source for `portfolio_app/pages/home.py`
---
## Document Context
| Attribute | Value |
|-----------|-------|
| **Parent Document** | `portfolio_project_plan_v5.md` |
| **Role** | Bio content and social links for landing page |
| **Consumed By** | `portfolio_app/pages/home.py` |
---
## Headline
**Primary**: Leo | Data Engineer & Analytics Developer
**Tagline**: I build data infrastructure that actually gets used.
---
## Professional Summary
Over the past 5 years, I've designed and evolved an enterprise analytics platform from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call abandon rates, and dashboards that executives actually open.
My approach: dimensional modeling (star schema), layered transformations (staging → intermediate → marts), and automation that eliminates manual work. I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states. Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the Project Management Institute.
---
## Tech Stack
| Category | Technologies |
|----------|--------------|
| **Languages** | Python, SQL |
| **Data Processing** | Pandas, SQLAlchemy, FastAPI |
| **Databases** | PostgreSQL, MSSQL |
| **Visualization** | Power BI, Plotly, Dash |
| **Patterns** | dbt, dimensional modeling, star schema |
| **Other** | Genesys Cloud |
**Display Format** (for landing page):
```
Python (Pandas, SQLAlchemy, FastAPI) • SQL (MSSQL, PostgreSQL) • Power BI • Plotly/Dash • Genesys Cloud • dbt patterns
```
---
## Side Project
**Bandit Labs** — Building automation and AI tooling for small businesses.
*Note: Keep this brief on portfolio; link only if separate landing page exists.*
---
## Social Links
| Platform | URL | Icon |
|----------|-----|------|
| **LinkedIn** | `https://linkedin.com/in/[USERNAME]` | `lucide-react: Linkedin` |
| **GitHub** | `https://github.com/[USERNAME]` | `lucide-react: Github` |
> **TODO**: Replace `[USERNAME]` placeholders with actual URLs before bio page launch.
---
## Availability Statement
Open to **Senior Data Analyst**, **Analytics Engineer**, and **BI Developer** opportunities in Toronto or remote.
---
## Portfolio Projects Section
*Dynamically populated based on deployed projects.*
| Project | Status | Link |
|---------|--------|------|
| Toronto Housing Dashboard | In Development | `/toronto` |
| Energy Pricing Analysis | Planned | `/energy` |
**Display Logic**:
- Show only projects with `status = deployed`
- "In Development" projects can show as coming soon or be hidden (user preference)
---
## Implementation Notes
### Content Hierarchy for `home.py`
```
1. Name + Tagline (hero section)
2. Professional Summary (2-3 paragraphs)
3. Tech Stack (horizontal chips or inline list)
4. Portfolio Projects (cards linking to dashboards)
5. Social Links (icon buttons)
6. Availability statement (subtle, bottom)
```
### Styling Recommendations
- Clean, minimal — let the projects speak
- Dark/light mode support via dash-mantine-components theme
- No headshot required (optional)
- Mobile-responsive layout
### Content Updates
When updating bio content:
1. Edit this document
2. Update `home.py` to reflect changes
3. Redeploy
---
## Related Documents
| Document | Relationship |
|----------|--------------|
| `portfolio_project_plan_v5.md` | Parent — references this for bio content |
| `portfolio_app/pages/home.py` | Consumer — implements this content |
---
*Document Version: 2.0*
*Updated: January 2026*

View File

@@ -1,276 +0,0 @@
# Toronto Neighbourhood Dashboard — Implementation Plan
**Document Type:** Execution Guide
**Target:** Transition from TRREB-based to Neighbourhood-based Dashboard
**Version:** 2.0 | January 2026
---
## Overview
Transition from TRREB district-based housing dashboard to a comprehensive Toronto Neighbourhood Dashboard built around the city's 158 official neighbourhoods.
**Key Changes:**
- Geographic foundation: TRREB districts (~35) → City Neighbourhoods (158)
- Data sources: PDF parsing → Open APIs (Toronto Open Data, Toronto Police, CMHC)
- Scope: Housing-only → 5 thematic tabs (Overview, Housing, Safety, Demographics, Amenities)
---
## Phase 1: Repository Cleanup
### Files to DELETE
| File | Reason |
|------|--------|
| `portfolio_app/toronto/schemas/trreb.py` | TRREB schema obsolete |
| `portfolio_app/toronto/parsers/trreb.py` | PDF parsing no longer needed |
| `portfolio_app/toronto/loaders/trreb.py` | TRREB loading logic obsolete |
| `dbt/models/staging/stg_trreb__purchases.sql` | TRREB staging obsolete |
| `dbt/models/intermediate/int_purchases__monthly.sql` | TRREB intermediate obsolete |
| `dbt/models/marts/mart_toronto_purchases.sql` | Will rebuild for neighbourhood grain |
### Files to MODIFY (Remove TRREB References)
| File | Action |
|------|--------|
| `portfolio_app/toronto/schemas/__init__.py` | Remove TRREB imports |
| `portfolio_app/toronto/parsers/__init__.py` | Remove TRREB parser imports |
| `portfolio_app/toronto/loaders/__init__.py` | Remove TRREB loader imports |
| `portfolio_app/toronto/models/facts.py` | Remove `FactPurchases` model |
| `portfolio_app/toronto/models/dimensions.py` | Remove `DimTRREBDistrict` model |
| `portfolio_app/toronto/demo_data.py` | Remove TRREB demo data |
| `dbt/models/sources.yml` | Remove TRREB source definitions |
| `dbt/models/schema.yml` | Remove TRREB model documentation |
### Files to KEEP (Reusable)
| File | Why |
|------|-----|
| `portfolio_app/toronto/schemas/cmhc.py` | CMHC data still used |
| `portfolio_app/toronto/parsers/cmhc.py` | Reusable with modifications |
| `portfolio_app/toronto/loaders/base.py` | Generic database utilities |
| `portfolio_app/toronto/loaders/dimensions.py` | Dimension loading patterns |
| `portfolio_app/toronto/models/base.py` | SQLAlchemy base class |
| `portfolio_app/figures/*.py` | All chart factories reusable |
| `portfolio_app/components/*.py` | All UI components reusable |
---
## Phase 2: Documentation Updates
| Document | Action |
|----------|--------|
| `CLAUDE.md` | Update data model section, mark transition complete |
| `docs/PROJECT_REFERENCE.md` | Update architecture, data sources |
| `docs/toronto_housing_dashboard_spec_v5.md` | Archive or delete |
| `docs/wbs_sprint_plan_v4.md` | Archive or delete |
---
## Phase 3: New Data Model
### Star Schema (Neighbourhood-Centric)
| Table | Type | Description |
|-------|------|-------------|
| `dim_neighbourhood` | Central Dimension | 158 neighbourhoods with geometry |
| `dim_time` | Dimension | Date dimension (keep existing) |
| `dim_cmhc_zone` | Bridge Dimension | 15 CMHC zones with neighbourhood mapping |
| `bridge_cmhc_neighbourhood` | Bridge | Zone-to-neighbourhood area weights |
| `fact_census` | Fact | Census indicators by neighbourhood |
| `fact_crime` | Fact | Crime stats by neighbourhood |
| `fact_rentals` | Fact | Rental data by CMHC zone (keep existing) |
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
### New Schema Files
| File | Contains |
|------|----------|
| `toronto/schemas/neighbourhood.py` | NeighbourhoodRecord, CensusRecord, CrimeRecord |
| `toronto/schemas/amenities.py` | AmenityType enum, AmenityRecord |
### New Parser Files
| File | Data Source | API |
|------|-------------|-----|
| `toronto/parsers/toronto_open_data.py` | Neighbourhoods, Census, Parks, Schools, Childcare | Toronto Open Data Portal |
| `toronto/parsers/toronto_police.py` | Crime Rates, MCI, Shootings | Toronto Police Portal |
### New Loader Files
| File | Purpose |
|------|---------|
| `toronto/loaders/neighbourhoods.py` | Load GeoJSON boundaries |
| `toronto/loaders/census.py` | Load neighbourhood profiles |
| `toronto/loaders/crime.py` | Load crime statistics |
| `toronto/loaders/amenities.py` | Load parks, schools, childcare |
| `toronto/loaders/cmhc_crosswalk.py` | Build CMHC-neighbourhood bridge |
---
## Phase 4: dbt Restructuring
### Staging Layer
| Model | Source |
|-------|--------|
| `stg_toronto__neighbourhoods` | dim_neighbourhood |
| `stg_toronto__census` | fact_census |
| `stg_toronto__crime` | fact_crime |
| `stg_toronto__amenities` | fact_amenities |
| `stg_cmhc__rentals` | fact_rentals (modify existing) |
| `stg_cmhc__zone_crosswalk` | bridge_cmhc_neighbourhood |
### Intermediate Layer
| Model | Purpose |
|-------|---------|
| `int_neighbourhood__demographics` | Combined census demographics |
| `int_neighbourhood__housing` | Housing indicators |
| `int_neighbourhood__crime_summary` | Aggregated crime by type |
| `int_neighbourhood__amenity_scores` | Normalized amenity metrics |
| `int_rentals__neighbourhood_allocated` | CMHC rentals allocated to neighbourhoods |
### Mart Layer (One per Tab)
| Model | Tab | Key Metrics |
|-------|-----|-------------|
| `mart_neighbourhood_overview` | Overview | Composite livability score |
| `mart_neighbourhood_housing` | Housing | Affordability index, rent-to-income |
| `mart_neighbourhood_safety` | Safety | Crime rates, YoY change |
| `mart_neighbourhood_demographics` | Demographics | Income, age, diversity |
| `mart_neighbourhood_amenities` | Amenities | Parks, schools, transit per capita |
---
## Phase 5: Dashboard Implementation
### Tab Structure
```
pages/toronto/
├── dashboard.py # Main layout with tab navigation
├── tabs/
│ ├── overview.py # Composite livability
│ ├── housing.py # Affordability
│ ├── safety.py # Crime
│ ├── demographics.py # Population
│ └── amenities.py # Services
└── callbacks/
├── map_callbacks.py
├── chart_callbacks.py
└── selection_callbacks.py
```
### Layout Pattern (All Tabs)
Each tab follows the same structure:
1. **Choropleth Map** (left) — 158 neighbourhoods, click to select
2. **KPI Cards** (right) — 3-4 contextual metrics
3. **Supporting Charts** (bottom) — Trend + comparison visualizations
4. **Details Panel** (collapsible) — All metrics for selected neighbourhood
### Graphs by Tab
| Tab | Choropleth Metric | Chart 1 | Chart 2 |
|-----|-------------------|---------|---------|
| Overview | Livability score | Top/Bottom 10 bar | Income vs Crime scatter |
| Housing | Affordability index | Rent trend (5yr line) | Dwelling types (pie/bar) |
| Safety | Crime rate per 100K | Crime breakdown (stacked bar) | Crime trend (5yr line) |
| Demographics | Median income | Age pyramid | Top languages (bar) |
| Amenities | Park area per capita | Amenity radar | Transit accessibility (bar) |
---
## Phase 6: Jupyter Notebooks
### Purpose
One notebook per graph to document:
1. **Data Reference** — How the data was built (query, transformation steps, sample output)
2. **Data Visualization** — Import figure factory, render the graph
### Directory Structure
```
notebooks/
├── README.md
├── overview/
├── housing/
├── safety/
├── demographics/
└── amenities/
```
### Notebook Template
```markdown
# [Graph Name]
## 1. Data Reference
### Source Tables
- List tables/marts used
- Grain of each table
### Query
```sql
SELECT ... FROM ...
```
### Transformation Steps
1. Step description
2. Step description
### Sample Data
```python
df = pd.read_sql(query, engine)
df.head(10)
```
## 2. Data Visualization
```python
from portfolio_app.figures.choropleth import create_choropleth_figure
fig = create_choropleth_figure(...)
fig.show()
```
```
Create one notebook per graph as each is implemented (15 total across 5 tabs).
---
## Phase 7: Final Documentation Review
After all implementation, audit and update:
- [ ] `CLAUDE.md` — Project status, app structure, data model, URL routes
- [ ] `README.md` — Project description, installation, quick start
- [ ] `docs/PROJECT_REFERENCE.md` — Architecture matches implementation
- [ ] Remove or archive legacy spec documents
---
## Data Source Reference
| Source | Datasets | URL |
|--------|----------|-----|
| Toronto Open Data | Neighbourhoods, Census Profiles, Parks, Schools, Childcare, TTC | open.toronto.ca |
| Toronto Police | Crime Rates, MCI, Shootings | data.torontopolice.on.ca |
| CMHC | Rental Market Survey | cmhc-schl.gc.ca |
---
## CMHC Zone Mapping Note
CMHC uses 15 zones that don't align with 158 neighbourhoods. Strategy:
- Create `bridge_cmhc_neighbourhood` with area weights
- Allocate rental metrics proportionally to overlapping neighbourhoods
- Document methodology in `/toronto/methodology` page
---
*Document Version: 2.0*
*Trimmed from v1.0 for execution clarity*

View File

@@ -1,423 +0,0 @@
# Toronto Neighbourhood Dashboard — Deliverables
**Project Type:** Interactive Data Visualization Dashboard
**Geographic Scope:** City of Toronto, 158 Official Neighbourhoods
**Author:** Leo Miranda
**Version:** 1.0 | January 2026
---
## Executive Summary
Multi-tab analytics dashboard built around Toronto's official neighbourhood boundaries. The core interaction is a choropleth map where users explore the city through different thematic lenses—housing affordability, safety, demographics, amenities—with supporting visualizations that tell a cohesive story per theme.
**Primary Goals:**
1. Demonstrate interactive data visualization skills (Plotly/Dash)
2. Showcase data engineering capabilities (multi-source ETL, dimensional modeling)
3. Create a portfolio piece with genuine analytical value
---
## Part 1: Geographic Foundation (Required First)
| Dataset | Source | Format | Last Updated | Download |
|---------|--------|--------|--------------|----------|
| **Neighbourhoods Boundaries** | Toronto Open Data | GeoJSON | 2024 | [Link](https://open.toronto.ca/dataset/neighbourhoods/) |
| **Neighbourhood Profiles** | Toronto Open Data | CSV | 2021 Census | [Link](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
**Critical Notes:**
- Toronto uses 158 official neighbourhoods (updated 2024, was 140)
- GeoJSON includes `AREA_ID` for joining to tabular data
- Neighbourhood Profiles has 2,400+ indicators per neighbourhood from Census
---
## Part 2: Tier 1 — MVP Datasets
| Dataset | Source | Measures Available | Update Freq | Granularity |
|---------|--------|-------------------|-------------|-------------|
| **Neighbourhoods GeoJSON** | Toronto Open Data | Boundary polygons, area IDs | Static | Neighbourhood |
| **Neighbourhood Profiles (full)** | Toronto Open Data | 2,400+ Census indicators | Every 5 years | Neighbourhood |
| **Neighbourhood Crime Rates** | Toronto Police Portal | MCI rates per 100K by year | Annual | Neighbourhood |
| **CMHC Rental Market Survey** | CMHC Portal | Avg rent by bedroom, vacancy rate | Annual (Oct) | 15 CMHC Zones |
| **Parks** | Toronto Open Data | Park locations, area, type | Annual | Point/Polygon |
**Total API/Download Calls:** 5
**Data Volume:** ~50MB combined
### Tier 1 Measures to Extract
**From Neighbourhood Profiles:**
- Population, population density
- Median household income
- Age distribution (0-14, 15-24, 25-44, 45-64, 65+)
- % Immigrants, % Visible minorities
- Top languages spoken
- Unemployment rate
- Education attainment (% with post-secondary)
- Housing tenure (own vs rent %)
- Dwelling types distribution
- Average rent, housing costs as % of income
**From Crime Rates:**
- Total MCI rate per 100K population
- Year-over-year crime trend
**From CMHC:**
- Average monthly rent (1BR, 2BR, 3BR)
- Vacancy rates
**From Parks:**
- Park count per neighbourhood
- Park area per capita
---
## Part 3: Tier 2 — Expansion Datasets
| Dataset | Source | Measures Available | Update Freq | Granularity |
|---------|--------|-------------------|-------------|-------------|
| **Major Crime Indicators (MCI)** | Toronto Police Portal | Assault, B&E, auto theft, robbery, theft over | Quarterly | Neighbourhood |
| **Shootings & Firearm Discharges** | Toronto Police Portal | Shooting incidents, injuries, fatalities | Quarterly | Neighbourhood |
| **Building Permits** | Toronto Open Data | New construction, permits by type | Monthly | Address-level |
| **Schools** | Toronto Open Data | Public/Catholic, elementary/secondary | Annual | Point |
| **TTC Routes & Stops** | Toronto Open Data | Route geometry, stop locations | Static | Route/Stop |
| **Licensed Child Care Centres** | Toronto Open Data | Capacity, ages served, locations | Annual | Point |
### Tier 2 Measures to Extract
**From MCI Details:**
- Breakdown by crime type (assault, B&E, auto theft, robbery, theft over)
**From Shootings:**
- Shooting incidents count
- Injuries/fatalities
**From Building Permits:**
- New construction permits (trailing 12 months)
- Permit types distribution
**From Schools:**
- Schools per 1000 children
- School type breakdown
**From TTC:**
- Transit stops within neighbourhood
- Transit accessibility score
**From Child Care:**
- Child care spaces per capita
- Coverage by age group
---
## Part 4: Data Sources by Thematic Group
### GROUP A: Housing & Affordability
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Neighbourhood Profiles (Housing) | 1 | Avg rent, ownership %, dwelling types, housing costs as % of income | Every 5 years |
| CMHC Rental Market Survey | 1 | Avg rent by bedroom, vacancy rate, rental universe | Annual |
| Building Permits | 2 | New construction, permits by type | Monthly |
**Calculated Metrics:**
- Rent-to-Income Ratio (CMHC rent ÷ Census income)
- Affordability Index (% of income spent on housing)
---
### GROUP B: Safety & Crime
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Neighbourhood Crime Rates | 1 | MCI rates per 100K pop by year | Annual |
| Major Crime Indicators (MCI) | 2 | Assault, B&E, auto theft, robbery, theft over | Quarterly |
| Shootings & Firearm Discharges | 2 | Shooting incidents, injuries, fatalities | Quarterly |
**Calculated Metrics:**
- Year-over-year crime change %
- Crime type distribution
---
### GROUP C: Demographics & Community
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Neighbourhood Profiles (Demographics) | 1 | Age distribution, household composition, income | Every 5 years |
| Neighbourhood Profiles (Immigration) | 1 | Immigration status, visible minorities, languages | Every 5 years |
| Neighbourhood Profiles (Education) | 1 | Education attainment, field of study | Every 5 years |
| Neighbourhood Profiles (Labour) | 1 | Employment rate, occupation, industry | Every 5 years |
---
### GROUP D: Transportation & Mobility
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Commute Mode (Census) | 1 | % car, transit, walk, bike | Every 5 years |
| TTC Routes & Stops | 2 | Route geometry, stop locations | Static |
**Calculated Metrics:**
- Transit accessibility (stops within 500m of neighbourhood centroid)
---
### GROUP E: Amenities & Services
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Parks | 1 | Park locations, area, type | Annual |
| Schools | 2 | Public/Catholic, elementary/secondary | Annual |
| Licensed Child Care Centres | 2 | Capacity, ages served | Annual |
**Calculated Metrics:**
- Park area per capita
- Schools per 1000 children (ages 5-17)
- Child care spaces per 1000 children (ages 0-4)
---
## Part 5: Tab Structure
### Tab Architecture
```
┌────────────────────────────────────────────────────────────────┐
│ [Overview] [Housing] [Safety] [Demographics] [Amenities] │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────┐ ┌────────────────┐ │
│ │ │ │ KPI Card 1 │ │
│ │ CHOROPLETH MAP │ ├────────────────┤ │
│ │ (158 Neighbourhoods) │ │ KPI Card 2 │ │
│ │ │ ├────────────────┤ │
│ │ Click to select │ │ KPI Card 3 │ │
│ │ │ └────────────────┘ │
│ └─────────────────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Supporting Chart 1 │ │ Supporting Chart 2 │ │
│ │ (Context/Trend) │ │ (Comparison/Rank) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
│ [Neighbourhood: Selected Name] ──────────────────────── │
│ Details panel with all metrics for selected area │
└────────────────────────────────────────────────────────────────┘
```
---
### Tab 1: Overview (Default Landing)
**Story:** "How do Toronto neighbourhoods compare across key livability metrics?"
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Composite livability score | Calculated from weighted metrics |
| KPI Cards | Population, Median Income, Avg Crime Rate | Neighbourhood Profiles, Crime Rates |
| Chart 1 | Top 10 / Bottom 10 by livability score | Calculated |
| Chart 2 | Income vs Crime scatter plot | Neighbourhood Profiles, Crime Rates |
**Metric Selector:** Allow user to change map colour by any single metric.
---
### Tab 2: Housing & Affordability
**Story:** "Where can you afford to live, and what's being built?"
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Rent-to-Income Ratio (Affordability Index) | CMHC + Census income |
| KPI Cards | Median Rent (1BR), Vacancy Rate, New Permits (12mo) | CMHC, Building Permits |
| Chart 1 | Rent trend (5-year line chart by bedroom) | CMHC historical |
| Chart 2 | Dwelling type breakdown (pie/bar) | Neighbourhood Profiles |
**Metric Selector:** Toggle between rent, ownership %, dwelling types.
---
### Tab 3: Safety
**Story:** "How safe is each neighbourhood, and what crimes are most common?"
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Total MCI Rate per 100K | Crime Rates |
| KPI Cards | Total Crimes, YoY Change %, Shooting Incidents | Crime Rates, Shootings |
| Chart 1 | Crime type breakdown (stacked bar) | MCI Details |
| Chart 2 | 5-year crime trend (line chart) | Crime Rates historical |
**Metric Selector:** Toggle between total crime, specific crime types, shootings.
---
### Tab 4: Demographics
**Story:** "Who lives here? Age, income, diversity."
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Median Household Income | Neighbourhood Profiles |
| KPI Cards | Population, % Immigrant, Unemployment Rate | Neighbourhood Profiles |
| Chart 1 | Age distribution (population pyramid or bar) | Neighbourhood Profiles |
| Chart 2 | Top languages spoken (horizontal bar) | Neighbourhood Profiles |
**Metric Selector:** Income, immigrant %, age groups, education.
---
### Tab 5: Amenities & Services
**Story:** "What's nearby? Parks, schools, child care, transit."
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Park Area per Capita | Parks + Population |
| KPI Cards | Parks Count, Schools Count, Child Care Spaces | Multiple datasets |
| Chart 1 | Amenity density comparison (radar or bar) | Calculated |
| Chart 2 | Transit accessibility (stops within 500m) | TTC Stops |
**Metric Selector:** Parks, schools, child care, transit access.
---
## Part 6: Data Pipeline Architecture
### ETL Flow
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ DATA SOURCES │ │ STAGING LAYER │ │ MART LAYER │
│ │ │ │ │ │
│ Toronto Open │────▶│ stg_geography │────▶│ dim_neighbourhood│
│ Data Portal │ │ stg_census │ │ fact_crime │
│ │ │ stg_crime │ │ fact_housing │
│ CMHC Portal │────▶│ stg_rental │ │ fact_amenities │
│ │ │ stg_permits │ │ │
│ Toronto Police │────▶│ stg_amenities │ │ agg_dashboard │
│ Portal │ │ stg_childcare │ │ (pre-computed) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
### Key Transformations
| Transformation | Description |
|----------------|-------------|
| **Geography Standardization** | Ensure all datasets use `neighbourhood_id` (AREA_ID from GeoJSON) |
| **Census Pivot** | Neighbourhood Profiles is wide format — pivot to metrics per neighbourhood |
| **CMHC Zone Mapping** | Create crosswalk from 15 CMHC zones to 158 neighbourhoods |
| **Amenity Aggregation** | Spatial join point data (schools, parks, child care) to neighbourhood polygons |
| **Rate Calculations** | Normalize counts to per-capita or per-100K |
### Data Refresh Schedule
| Layer | Frequency | Trigger |
|-------|-----------|---------|
| Staging (API pulls) | Weekly | Scheduled job |
| Marts (transforms) | Weekly | Post-staging |
| Dashboard cache | On-demand | User refresh button |
---
## Part 7: Technical Stack
### Core Stack
| Component | Technology | Rationale |
|-----------|------------|-----------|
| **Frontend** | Plotly Dash | Production-ready, rapid iteration |
| **Mapping** | Plotly `choropleth_mapbox` | Native Dash integration |
| **Data Store** | PostgreSQL + PostGIS | Spatial queries, existing expertise |
| **ETL** | Python (Pandas, SQLAlchemy) | Existing stack |
| **Deployment** | Render / Railway | Free tier, easy Dash hosting |
### Alternative (Portfolio Stretch)
| Component | Technology | Why Consider |
|-----------|------------|--------------|
| **Frontend** | React + deck.gl | More "modern" for portfolio |
| **Data Store** | DuckDB | Serverless, embeddable |
| **ETL** | dbt | Aligns with skills roadmap |
---
## Appendix A: Data Source URLs
| Source | URL |
|--------|-----|
| Toronto Open Data — Neighbourhoods | https://open.toronto.ca/dataset/neighbourhoods/ |
| Toronto Open Data — Neighbourhood Profiles | https://open.toronto.ca/dataset/neighbourhood-profiles/ |
| Toronto Police — Neighbourhood Crime Rates | https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-open-data |
| Toronto Police — MCI | https://data.torontopolice.on.ca/datasets/major-crime-indicators-open-data |
| Toronto Police — Shootings | https://data.torontopolice.on.ca/datasets/shootings-firearm-discharges-open-data |
| CMHC Rental Market Survey | https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market |
| Toronto Open Data — Parks | https://open.toronto.ca/dataset/parks/ |
| Toronto Open Data — Schools | https://open.toronto.ca/dataset/school-locations-all-types/ |
| Toronto Open Data — Building Permits | https://open.toronto.ca/dataset/building-permits-cleared-permits/ |
| Toronto Open Data — Child Care | https://open.toronto.ca/dataset/licensed-child-care-centres/ |
| Toronto Open Data — TTC Routes | https://open.toronto.ca/dataset/ttc-routes-and-schedules/ |
---
## Appendix B: Colour Palettes
### Affordability (Diverging)
| Status | Hex | Usage |
|--------|-----|-------|
| Affordable (<30% income) | `#2ecc71` | Green |
| Stretched (30-50%) | `#f1c40f` | Yellow |
| Unaffordable (>50%) | `#e74c3c` | Red |
### Safety (Sequential)
| Status | Hex | Usage |
|--------|-----|-------|
| Safest (lowest crime) | `#27ae60` | Dark green |
| Moderate | `#f39c12` | Orange |
| Highest Crime | `#c0392b` | Dark red |
### Demographics — Income (Sequential)
| Level | Hex | Usage |
|-------|-----|-------|
| Highest Income | `#1a5276` | Dark blue |
| Mid Income | `#5dade2` | Light blue |
| Lowest Income | `#ecf0f1` | Light gray |
### General Recommendation
Use **Viridis** or **Plasma** colorscales for perceptually uniform gradients on continuous metrics.
---
## Appendix C: Glossary
| Term | Definition |
|------|------------|
| **MCI** | Major Crime Indicators — Assault, B&E, Auto Theft, Robbery, Theft Over |
| **CMHC Zone** | Canada Mortgage and Housing Corporation rental market survey zones (15 in Toronto) |
| **Rent-to-Income Ratio** | Monthly rent ÷ monthly household income; <30% is considered affordable |
| **PostGIS** | PostgreSQL extension for geographic data |
| **Choropleth** | Thematic map where areas are shaded based on a statistical variable |
---
## Appendix D: Interview Talking Points
When discussing this project in interviews, emphasize:
1. **Data Engineering:** "I built a multi-source ETL pipeline that standardizes geographic keys across Census data, police data, and CMHC rental surveys—three different granularities I had to reconcile."
2. **Dimensional Modeling:** "The data model follows star schema patterns with a central neighbourhood dimension table and fact tables for crime, housing, and amenities."
3. **dbt Patterns:** "The transformation layer uses staging → intermediate → mart patterns, which I've documented for maintainability."
4. **Business Value:** "The dashboard answers questions like 'Where can a young professional afford to live that's safe and has good transit?' — turning raw data into actionable insights."
5. **Technical Decisions:** "I chose Plotly Dash over a React frontend because it let me iterate faster while maintaining production-quality interactivity. For a portfolio piece, speed to working demo matters."
---
*Document Version: 1.0*
*Created: January 2026*
*Author: Leo Miranda / Claude*

View File

@@ -10,6 +10,12 @@ This folder contains lessons learned from sprints and development work. These le
| Date | Sprint/Phase | Title | Tags | | Date | Sprint/Phase | Title | Tags |
|------|--------------|-------|------| |------|--------------|-------|------|
| 2026-02-01 | Sprint 10 | [Formspree Integration with Dash Callbacks](./sprint-10-formspree-dash-integration.md) | formspree, dash, callbacks, forms, spam-protection, honeypot, ajax |
| 2026-01-17 | Sprint 9 | [Gitea Labels API Requires Org Context](./sprint-9-gitea-labels-user-repos.md) | gitea, mcp, api, labels, projman, configuration |
| 2026-01-17 | Sprint 9 | [Always Read CLAUDE.md Before Asking Questions](./sprint-9-read-claude-md-first.md) | projman, claude-code, context, documentation, workflow |
| 2026-01-17 | Sprint 9-10 | [Graceful Error Handling in Service Layers](./sprint-9-10-graceful-error-handling.md) | python, postgresql, error-handling, dash, graceful-degradation, arm64 |
| 2026-01-17 | Sprint 9-10 | [Modular Callback Structure](./sprint-9-10-modular-callback-structure.md) | dash, callbacks, architecture, python, code-organization |
| 2026-01-17 | Sprint 9-10 | [Figure Factory Pattern](./sprint-9-10-figure-factory-pattern.md) | plotly, dash, design-patterns, python, visualization |
| 2026-01-16 | Phase 4 | [dbt Test Syntax Deprecation](./phase-4-dbt-test-syntax.md) | dbt, testing, yaml, deprecation | | 2026-01-16 | Phase 4 | [dbt Test Syntax Deprecation](./phase-4-dbt-test-syntax.md) | dbt, testing, yaml, deprecation |
--- ---

View File

@@ -0,0 +1,70 @@
# Sprint 10 - Formspree Integration with Dash Callbacks
## Context
Implementing a contact form on a Dash portfolio site that submits to Formspree, a third-party form handling service.
## Insights
### Formspree AJAX Submission
Formspree supports AJAX submissions (no page redirect) when you:
1. POST with `Content-Type: application/json`
2. Include `Accept: application/json` header
3. Send form data as JSON body
This returns a JSON response instead of redirecting to a thank-you page, which is ideal for single-page Dash applications.
### Dash Multi-Output Callbacks for Forms
When handling form submission with validation and feedback, use a multi-output callback pattern:
```python
@callback(
Output("feedback-container", "children"), # Success/error alert
Output("submit-button", "loading"), # Button loading state
Output("field-1", "value"), # Clear on success
Output("field-2", "value"), # Clear on success
Output("field-1", "error"), # Field-level errors
Output("field-2", "error"), # Field-level errors
Input("submit-button", "n_clicks"),
State("field-1", "value"),
State("field-2", "value"),
prevent_initial_call=True,
)
```
Use `no_update` for outputs you don't want to change (e.g., keep form values on validation error, only clear on success).
### Honeypot Spam Protection
Simple and effective bot protection without CAPTCHA:
1. Add a hidden text input field (CSS: `position: absolute; left: -9999px`)
2. Set `tabIndex=-1` and `autoComplete="off"` to prevent accidental filling
3. In callback, check if honeypot has value - if yes, it's a bot
4. For bots: return fake success (don't reveal detection)
5. For humans: proceed with real submission
Formspree also accepts `_gotcha` as a honeypot field name in the JSON payload.
## Code Pattern
```python
# Honeypot check - bots fill hidden fields
if honeypot_value:
# Fake success - don't let bots know they were caught
return (_create_success_alert(), False, "", "", None, None)
# Real submission for humans
response = requests.post(
FORMSPREE_ENDPOINT,
json=form_data,
headers={"Accept": "application/json", "Content-Type": "application/json"},
timeout=10,
)
```
## Prevention/Best Practices
- Always use `timeout` parameter with `requests.post()` to avoid hanging
- Wrap external API calls in try/except for network errors
- Return user-friendly error messages, not technical details
- Use DMC's `required=True` and `error` props for form validation feedback
## Tags
formspree, dash, callbacks, forms, spam-protection, honeypot, ajax, python, requests, validation

View File

@@ -0,0 +1,53 @@
# Sprint 9-10 - Figure Factory Pattern for Reusable Charts
## Context
Creating multiple chart types across 5 dashboard tabs, with consistent styling and behavior needed across all visualizations.
## Problem
Without a standardized approach, each callback would create figures inline with:
- Duplicated styling code (colors, fonts, backgrounds)
- Inconsistent hover templates
- Hard-to-maintain figure creation logic
- No reuse between tabs
## Solution
Created a `figures/` module with factory functions:
```
figures/
├── __init__.py # Exports all factories
├── choropleth.py # Map visualizations
├── bar_charts.py # ranking_bar, stacked_bar, horizontal_bar
├── scatter.py # scatter_figure, bubble_chart
├── radar.py # radar_figure, comparison_radar
└── demographics.py # age_pyramid, donut_chart
```
Factory pattern benefits:
1. **Consistent styling** - dark theme applied once
2. **Type-safe interfaces** - clear parameters for each chart type
3. **Easy testing** - factories can be unit tested with sample data
4. **Reusability** - same factory used across multiple tabs
Example factory signature:
```python
def create_ranking_bar(
data: list[dict],
name_column: str,
value_column: str,
title: str = "",
top_n: int = 5,
bottom_n: int = 5,
top_color: str = "#4CAF50",
bottom_color: str = "#F44336",
) -> go.Figure:
```
## Prevention
- **Create factories early** - before implementing callbacks
- **Design generic interfaces** - factories should work with any data matching the schema
- **Apply styling in one place** - use constants for colors, fonts
- **Test factories independently** - with synthetic data before integration
## Tags
plotly, dash, design-patterns, python, visualization, reusability, code-organization

View File

@@ -0,0 +1,34 @@
# Sprint 9-10 - Graceful Error Handling in Service Layers
## Context
Building the Toronto Neighbourhood Dashboard with a service layer that queries PostgreSQL/PostGIS dbt marts to provide data to Dash callbacks.
## Problem
Initial service layer implementation let database connection errors propagate as unhandled exceptions. When the PostGIS Docker container was unavailable (common on ARM64 systems where the x86_64 image fails), the entire dashboard would crash instead of gracefully degrading.
## Solution
Wrapped database queries in try/except blocks to return empty DataFrames/lists/dicts when the database is unavailable:
```python
def _execute_query(sql: str, params: dict | None = None) -> pd.DataFrame:
try:
engine = get_engine()
with engine.connect() as conn:
return pd.read_sql(text(sql), conn, params=params)
except Exception:
return pd.DataFrame()
```
This allows:
1. Dashboard to load and display empty states
2. Development/testing without running database
3. Graceful degradation in production
## Prevention
- **Always design service layers with graceful degradation** - assume external dependencies can fail
- **Return empty collections, not exceptions** - let UI components handle empty states
- **Test without database** - verify the app doesn't crash when DB is unavailable
- **Consider ARM64 compatibility** - PostGIS images may not support all platforms
## Tags
python, postgresql, service-layer, error-handling, dash, graceful-degradation, arm64

View File

@@ -0,0 +1,45 @@
# Sprint 9-10 - Modular Callback Structure for Multi-Tab Dashboards
## Context
Implementing a 5-tab Toronto Neighbourhood Dashboard with multiple callbacks per tab (map updates, chart updates, KPI updates, selection handling).
## Problem
Initial callback implementation approach would have placed all callbacks in a single file, leading to:
- A monolithic file with 500+ lines
- Difficult-to-navigate code
- Callbacks for different tabs interleaved
- Testing difficulties
## Solution
Organized callbacks into three focused modules:
```
callbacks/
├── __init__.py # Imports all modules to register callbacks
├── map_callbacks.py # Choropleth updates, map click handling
├── chart_callbacks.py # Supporting chart updates (scatter, trend, donut)
└── selection_callbacks.py # Dropdown population, KPI updates
```
Key patterns:
1. **Group by responsibility**, not by tab - all map-related callbacks together
2. **Use noqa comments** for imports that register callbacks as side effects
3. **Share helper functions** (like `_empty_chart()`) within modules
```python
# callbacks/__init__.py
from . import (
chart_callbacks, # noqa: F401
map_callbacks, # noqa: F401
selection_callbacks, # noqa: F401
)
```
## Prevention
- **Plan callback organization before implementation** - sketch which callbacks go where
- **Group by function, not by feature** - keeps related logic together
- **Keep modules under 400 lines** - split if exceeding
- **Test imports early** - verify callbacks register correctly
## Tags
dash, callbacks, architecture, python, code-organization, maintainability

View File

@@ -0,0 +1,29 @@
# Sprint 9 - Gitea Labels API Requires Org Context
## Context
Creating Gitea issues with labels via MCP tools during Sprint 9 planning for the personal-portfolio project.
## Problem
When calling `create_issue` with a `labels` parameter, received:
```
404 Client Error: Not Found for url: https://gitea.hotserv.cloud/api/v1/orgs/lmiranda/labels
```
The API attempted to fetch labels from an **organization** endpoint, but `lmiranda` is a **user account**, not an organization.
## Solution
Created issues without the `labels` parameter and documented intended labels in the issue body instead:
```markdown
**Labels:** Type/Feature, Priority/Medium, Complexity/Simple, Efforts/XS, Component/Docs, Tech/Python
```
This provides visibility into intended categorization while avoiding the API error.
## Prevention
- When working with user-owned repos (not org repos), avoid using the `labels` parameter in `create_issue`
- Document labels in issue body as a workaround
- Consider creating a repo-level label set for user repos (Gitea supports this)
- Update projman plugin to handle user vs org repos differently
## Tags
gitea, mcp, api, labels, projman, configuration

View File

@@ -0,0 +1,30 @@
# Sprint 9 - Always Read CLAUDE.md Before Asking Questions
## Context
Starting Sprint 9 planning session with `/projman:sprint-plan` command.
## Problem
Asked the user "what should I do?" when all the necessary context was already documented in CLAUDE.md:
- Current sprint number and phase
- Implementation plan location
- Remaining phases to complete
- Project conventions and workflows
This caused user frustration: "why are you asking what to do? cant you see this yourself"
## Solution
Before asking any questions about what to do:
1. Read `CLAUDE.md` in the project root
2. Check "Project Status" section for current sprint/phase
3. Follow references to implementation plans
4. Review "Projman Plugin Workflow" section for expected behavior
## Prevention
- **ALWAYS** read CLAUDE.md at the start of any sprint-related command
- Look for "Current Sprint" and "Phase" indicators
- Check for implementation plan references in `docs/changes/`
- Only ask questions if information is genuinely missing from documentation
- The projman plugin expects autonomous behavior based on documented context
## Tags
projman, claude-code, context, documentation, workflow, sprint-planning

View File

@@ -0,0 +1,265 @@
# Runbook: Adding a New Dashboard
This runbook describes how to add a new data dashboard to the portfolio application.
## Prerequisites
- [ ] Data sources identified and accessible
- [ ] Database schema designed
- [ ] Basic Dash/Plotly familiarity
## Directory Structure
Create the following structure:
### Application Code (`portfolio_app/`)
```
portfolio_app/
├── pages/
│ └── {dashboard_name}/
│ ├── dashboard.py # Main layout with tabs
│ ├── methodology.py # Data sources and methods page
│ ├── tabs/
│ │ ├── __init__.py
│ │ ├── overview.py # Overview tab layout
│ │ └── ... # Additional tab layouts
│ └── callbacks/
│ ├── __init__.py
│ └── ... # Callback modules
├── {dashboard_name}/ # Data logic (outside pages/)
│ ├── __init__.py
│ ├── parsers/ # API/CSV extraction
│ │ └── __init__.py
│ ├── loaders/ # Database operations
│ │ └── __init__.py
│ ├── schemas/ # Pydantic models
│ │ └── __init__.py
│ └── models/ # SQLAlchemy ORM (schema: raw_{dashboard_name})
│ └── __init__.py
└── figures/
└── {dashboard_name}/ # Figure factories for this dashboard
├── __init__.py
└── ... # Chart modules
```
### dbt Models (`dbt/models/`)
```
dbt/models/
├── staging/
│ └── {dashboard_name}/ # Staging models
│ ├── _sources.yml # Source definitions (schema: raw_{dashboard_name})
│ ├── _staging.yml # Model tests/docs
│ └── stg_*.sql # Staging models
├── intermediate/
│ └── {dashboard_name}/ # Intermediate models
│ ├── _intermediate.yml
│ └── int_*.sql
└── marts/
└── {dashboard_name}/ # Mart tables
├── _marts.yml
└── mart_*.sql
```
### Documentation (`notebooks/`)
```
notebooks/
└── {dashboard_name}/ # Domain subdirectories
├── overview/
├── ...
```
## Step-by-Step Checklist
### 1. Data Layer
- [ ] Create Pydantic schemas in `{dashboard_name}/schemas/`
- [ ] Create SQLAlchemy models in `{dashboard_name}/models/`
- [ ] Create parsers in `{dashboard_name}/parsers/`
- [ ] Create loaders in `{dashboard_name}/loaders/`
- [ ] Add database migrations if needed
### 2. Database Schema
- [ ] Define schema constant in models (e.g., `RAW_FOOTBALL_SCHEMA = "raw_football"`)
- [ ] Add `__table_args__ = {"schema": RAW_FOOTBALL_SCHEMA}` to all models
- [ ] Update `scripts/db/init_schema.py` to create the new schema
### 3. dbt Models
Create dbt models in `dbt/models/`:
- [ ] `staging/{dashboard_name}/_sources.yml` - Source definitions pointing to `raw_{dashboard_name}` schema
- [ ] `staging/{dashboard_name}/stg_{source}__{entity}.sql` - Raw data cleaning
- [ ] `intermediate/{dashboard_name}/int_{domain}__{transform}.sql` - Business logic
- [ ] `marts/{dashboard_name}/mart_{domain}.sql` - Final analytical tables
Update `dbt/dbt_project.yml` with new subdirectory config:
```yaml
models:
portfolio:
staging:
{dashboard_name}:
+materialized: view
+schema: stg_{dashboard_name}
intermediate:
{dashboard_name}:
+materialized: view
+schema: int_{dashboard_name}
marts:
{dashboard_name}:
+materialized: table
+schema: mart_{dashboard_name}
```
Follow naming conventions:
- Staging: `stg_{source}__{entity}`
- Intermediate: `int_{domain}__{transform}`
- Marts: `mart_{domain}`
### 4. Visualization Layer
- [ ] Create figure factories in `figures/{dashboard_name}/`
- [ ] Create `figures/{dashboard_name}/__init__.py` with exports
- [ ] Follow the factory pattern: `create_{chart_type}_figure(data, **kwargs)`
Import pattern:
```python
from portfolio_app.figures.{dashboard_name} import create_choropleth_figure
```
### 4. Dashboard Pages
#### Main Dashboard (`pages/{dashboard_name}/dashboard.py`)
```python
import dash
from dash import html, dcc
import dash_mantine_components as dmc
dash.register_page(
__name__,
path="/{dashboard_name}",
title="{Dashboard Title}",
description="{Description}"
)
def layout():
return dmc.Container([
# Header
dmc.Title("{Dashboard Title}", order=1),
# Tabs
dmc.Tabs([
dmc.TabsList([
dmc.TabsTab("Overview", value="overview"),
# Add more tabs
]),
dmc.TabsPanel(overview_tab(), value="overview"),
# Add more panels
], value="overview"),
])
```
#### Tab Layouts (`pages/{dashboard_name}/tabs/`)
- [ ] Create one file per tab
- [ ] Export layout function from each
#### Callbacks (`pages/{dashboard_name}/callbacks/`)
- [ ] Create callback modules for interactivity
- [ ] Import and register in dashboard.py
### 5. Navigation
Add to sidebar in `components/sidebar.py`:
```python
dmc.NavLink(
label="{Dashboard Name}",
href="/{dashboard_name}",
icon=DashIconify(icon="..."),
)
```
### 6. Documentation
- [ ] Create methodology page (`pages/{dashboard_name}/methodology.py`)
- [ ] Document data sources
- [ ] Document transformation logic
- [ ] Add notebooks to `notebooks/{dashboard_name}/` if needed
### 7. Testing
- [ ] Add unit tests for parsers
- [ ] Add unit tests for loaders
- [ ] Add integration tests for callbacks
- [ ] Run `make test`
### 8. Final Verification
- [ ] All pages render without errors
- [ ] All callbacks respond correctly
- [ ] Data loads successfully
- [ ] dbt models run cleanly (`make dbt-run`)
- [ ] Linting passes (`make lint`)
- [ ] Tests pass (`make test`)
## Example: Toronto Dashboard
Reference implementation: `portfolio_app/pages/toronto/`
Key files:
- `dashboard.py` - Main layout with 5 tabs
- `tabs/overview.py` - Livability scores, scatter plots
- `callbacks/map_callbacks.py` - Choropleth interactions
- `toronto/models/dimensions.py` - Dimension tables
- `toronto/models/facts.py` - Fact tables
## Common Patterns
### Figure Factories
```python
# figures/choropleth.py
def create_choropleth_figure(
gdf: gpd.GeoDataFrame,
value_column: str,
title: str,
**kwargs
) -> go.Figure:
...
```
### Callbacks
```python
# callbacks/map_callbacks.py
@callback(
Output("neighbourhood-details", "children"),
Input("choropleth-map", "clickData"),
)
def update_details(click_data):
...
```
### Data Loading
```python
# {dashboard_name}/loaders/load.py
def load_data(session: Session) -> None:
# Parse from source
records = parse_source_data()
# Validate with Pydantic
validated = [Schema(**r) for r in records]
# Load to database
for record in validated:
session.add(Model(**record.model_dump()))
session.commit()
```

232
docs/runbooks/deployment.md Normal file
View File

@@ -0,0 +1,232 @@
# Runbook: Deployment
This runbook covers deployment procedures for the Analytics Portfolio application.
## Environments
| Environment | Branch | Server | URL |
|-------------|--------|--------|-----|
| Development | `development` | Local | http://localhost:8050 |
| Staging | `staging` | Homelab (hotserv) | Internal |
| Production | `main` | Bandit Labs VPS | https://leodata.science |
## CI/CD Pipeline
### Automatic Deployment
Deployments are triggered automatically via Gitea Actions:
1. **Push to `staging`** → Deploys to staging server
2. **Push to `main`** → Deploys to production server
### Workflow Files
- `.gitea/workflows/ci.yml` - Runs linting and tests on all branches
- `.gitea/workflows/deploy-staging.yml` - Staging deployment
- `.gitea/workflows/deploy-production.yml` - Production deployment
### Required Secrets
Configure these in Gitea repository settings:
| Secret | Description |
|--------|-------------|
| `STAGING_HOST` | Staging server hostname/IP |
| `STAGING_USER` | SSH username for staging |
| `STAGING_SSH_KEY` | Private key for staging SSH |
| `PROD_HOST` | Production server hostname/IP |
| `PROD_USER` | SSH username for production |
| `PROD_SSH_KEY` | Private key for production SSH |
## Manual Deployment
### Prerequisites
- SSH access to target server
- Repository cloned at `~/apps/personal-portfolio`
- Virtual environment created at `.venv`
- Docker and Docker Compose installed
- PostgreSQL container running
### Steps
```bash
# 1. SSH to server
ssh user@server
# 2. Navigate to app directory
cd ~/apps/personal-portfolio
# 3. Pull latest changes
git fetch origin {branch}
git reset --hard origin/{branch}
# 4. Activate virtual environment
source .venv/bin/activate
# 5. Install dependencies
pip install -r requirements.txt
# 6. Run database migrations (if any)
# python -m alembic upgrade head
# 7. Run dbt models
cd dbt && dbt run --profiles-dir . && cd ..
# 8. Restart application
docker compose down
docker compose up -d
# 9. Verify health
curl http://localhost:8050/health
```
## Rollback Procedure
### Quick Rollback
If deployment fails, rollback to previous commit:
```bash
# 1. Find previous working commit
git log --oneline -10
# 2. Reset to that commit
git reset --hard {commit_hash}
# 3. Restart services
docker compose down
docker compose up -d
# 4. Verify
curl http://localhost:8050/health
```
### Full Rollback (Database)
If database changes need to be reverted:
```bash
# 1. Stop application
docker compose down
# 2. Restore database backup
pg_restore -h localhost -U portfolio -d portfolio backup.dump
# 3. Revert code
git reset --hard {commit_hash}
# 4. Run dbt at that version
cd dbt && dbt run --profiles-dir . && cd ..
# 5. Restart
docker compose up -d
```
## Health Checks
### Application Health
```bash
curl http://localhost:8050/health
```
Expected response:
```json
{"status": "healthy"}
```
### Database Health
```bash
docker compose exec postgres pg_isready -U portfolio
```
### Container Status
```bash
docker compose ps
```
## Monitoring
### View Logs
```bash
# All services
make logs
# Specific service
make logs SERVICE=postgres
# Or directly
docker compose logs -f
```
### Check Resource Usage
```bash
docker stats
```
## Troubleshooting
### Application Won't Start
1. Check container logs: `docker compose logs app`
2. Verify environment variables: `cat .env`
3. Check database connectivity: `docker compose exec postgres pg_isready`
4. Verify port availability: `lsof -i :8050`
### Database Connection Errors
1. Check postgres container: `docker compose ps postgres`
2. Verify DATABASE_URL in `.env`
3. Check postgres logs: `docker compose logs postgres`
4. Test connection: `docker compose exec postgres psql -U portfolio -c '\l'`
### dbt Failures
1. Check dbt logs: `cd dbt && dbt debug`
2. Verify profiles.yml: `cat dbt/profiles.yml`
3. Run with verbose output: `dbt run --debug`
### Out of Memory
1. Check memory usage: `free -h`
2. Review container limits in docker-compose.yml
3. Consider increasing swap or server resources
## Backup Procedures
### Database Backup
```bash
# Create backup
docker compose exec postgres pg_dump -U portfolio portfolio > backup_$(date +%Y%m%d).sql
# Compressed backup
docker compose exec postgres pg_dump -U portfolio -Fc portfolio > backup_$(date +%Y%m%d).dump
```
### Restore from Backup
```bash
# From SQL file
docker compose exec -T postgres psql -U portfolio portfolio < backup.sql
# From dump file
docker compose exec -T postgres pg_restore -U portfolio -d portfolio < backup.dump
```
## Deployment Checklist
Before deploying to production:
- [ ] All tests pass (`make test`)
- [ ] Linting passes (`make lint`)
- [ ] Staging deployment successful
- [ ] Manual testing on staging complete
- [ ] Database backup taken
- [ ] Rollback plan confirmed
- [ ] Team notified of deployment window

70
notebooks/README.md Normal file
View File

@@ -0,0 +1,70 @@
# Dashboard Documentation Notebooks
Documentation notebooks organized by dashboard project. Each notebook documents how data is queried, transformed, and visualized using the figure factory pattern.
## Directory Structure
```
notebooks/
├── README.md # This file
└── toronto/ # Toronto Neighbourhood Dashboard
├── overview/ # Overview tab visualizations
├── housing/ # Housing tab visualizations
├── safety/ # Safety tab visualizations
├── demographics/ # Demographics tab visualizations
└── amenities/ # Amenities tab visualizations
```
## Notebook Template
Each notebook follows a standard two-section structure:
### Section 1: Data Reference
Documents the data pipeline:
- **Source Tables**: List of dbt marts/tables used
- **SQL Query**: The exact query to fetch data
- **Transformation Steps**: Any pandas/python transformations
- **Sample Output**: First 10 rows of the result
### Section 2: Data Visualization
Documents the figure creation:
- **Figure Factory**: Import from `portfolio_app.figures`
- **Parameters**: Key configuration options
- **Rendered Output**: The actual visualization
## Available Figure Factories
| Factory | Module | Use Case |
|---------|--------|----------|
| `create_choropleth` | `figures.choropleth` | Map visualizations |
| `create_ranking_bar` | `figures.bar_charts` | Top/bottom N rankings |
| `create_stacked_bar` | `figures.bar_charts` | Category breakdowns |
| `create_scatter` | `figures.scatter` | Correlation plots |
| `create_radar` | `figures.radar` | Multi-metric comparisons |
| `create_age_pyramid` | `figures.demographics` | Age distributions |
| `create_time_series` | `figures.time_series` | Trend lines |
## Usage
1. Start Jupyter from project root:
```bash
jupyter notebook notebooks/
```
2. Ensure database is running:
```bash
make docker-up
```
3. Each notebook is self-contained - run all cells top to bottom.
## Notebook Naming Convention
`{metric}_{chart_type}.ipynb`
Examples:
- `livability_choropleth.ipynb`
- `crime_trend_line.ipynb`
- `age_pyramid.ipynb`

View File

View File

@@ -0,0 +1,182 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Amenity Index Choropleth Map\n",
"\n",
"Displays total amenities per 1,000 residents across Toronto's 158 neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_amenities` | neighbourhood × year | amenity_index, total_amenities_per_1000, amenity_tier, geometry |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_id,\n",
" neighbourhood_name,\n",
" geometry,\n",
" year,\n",
" total_amenities_per_1000,\n",
" amenity_index,\n",
" amenity_tier,\n",
" parks_per_1000,\n",
" schools_per_1000,\n",
" transit_per_1000,\n",
" total_amenities,\n",
" population\n",
"FROM public_marts.mart_neighbourhood_amenities\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_amenities)\n",
"ORDER BY total_amenities_per_1000 DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent year\n",
"2. Convert geometry to GeoJSON"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import geopandas as gpd\n",
"\n",
"gdf = gpd.GeoDataFrame(\n",
" df, geometry=gpd.GeoSeries.from_wkb(df[\"geometry\"]), crs=\"EPSG:4326\"\n",
")\n",
"\n",
"geojson = json.loads(gdf.to_json())\n",
"data = df.drop(columns=[\"geometry\"]).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\"neighbourhood_name\", \"total_amenities_per_1000\", \"amenity_index\", \"amenity_tier\"]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_choropleth_figure` from `portfolio_app.figures.toronto.choropleth`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.choropleth import create_choropleth_figure\n",
"\n",
"fig = create_choropleth_figure(\n",
" geojson=geojson,\n",
" data=data,\n",
" location_key=\"neighbourhood_id\",\n",
" color_column=\"total_amenities_per_1000\",\n",
" hover_data=[\n",
" \"neighbourhood_name\",\n",
" \"amenity_index\",\n",
" \"parks_per_1000\",\n",
" \"schools_per_1000\",\n",
" ],\n",
" color_scale=\"Greens\",\n",
" title=\"Toronto Amenities per 1,000 Population\",\n",
" zoom=10,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Amenity Tier Interpretation\n",
"\n",
"| Tier | Meaning |\n",
"|------|--------|\n",
"| 1 | Best served (top 20%) |\n",
"| 2-4 | Middle tiers |\n",
"| 5 | Underserved (bottom 20%) |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,191 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Amenity Radar Chart\n",
"\n",
"Spider/radar chart comparing amenity categories for selected neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_amenities` | neighbourhood × year | parks_index, schools_index, transit_index |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" parks_index,\n",
" schools_index,\n",
" transit_index,\n",
" amenity_index,\n",
" amenity_tier\n",
"FROM public_marts.mart_neighbourhood_amenities\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_amenities)\n",
"ORDER BY amenity_index DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Select top 5 and bottom 5 neighbourhoods by amenity index\n",
"2. Reshape for radar chart format"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Select representative neighbourhoods\n",
"top_5 = df.head(5)\n",
"bottom_5 = df.tail(5)\n",
"\n",
"# Prepare radar data\n",
"categories = [\"Parks\", \"Schools\", \"Transit\"]\n",
"index_columns = [\"parks_index\", \"schools_index\", \"transit_index\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Top 5 Amenity-Rich Neighbourhoods:\")\n",
"display(\n",
" top_5[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"parks_index\",\n",
" \"schools_index\",\n",
" \"transit_index\",\n",
" \"amenity_index\",\n",
" ]\n",
" ]\n",
")\n",
"print(\"\\nBottom 5 Underserved Neighbourhoods:\")\n",
"display(\n",
" bottom_5[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"parks_index\",\n",
" \"schools_index\",\n",
" \"transit_index\",\n",
" \"amenity_index\",\n",
" ]\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_radar` from `portfolio_app.figures.toronto.radar`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.radar import create_comparison_radar\n",
"\n",
"# Compare top neighbourhood vs city average (100)\n",
"top_hood = top_5.iloc[0]\n",
"metrics = [\"parks_index\", \"schools_index\", \"transit_index\"]\n",
"\n",
"fig = create_comparison_radar(\n",
" selected_data=top_hood.to_dict(),\n",
" average_data={\"parks_index\": 100, \"schools_index\": 100, \"transit_index\": 100},\n",
" metrics=metrics,\n",
" selected_name=top_hood[\"neighbourhood_name\"],\n",
" average_name=\"City Average\",\n",
" title=f\"Amenity Profile: {top_hood['neighbourhood_name']} vs City Average\",\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Index Interpretation\n",
"\n",
"| Value | Meaning |\n",
"|-------|--------|\n",
"| < 100 | Below city average |\n",
"| = 100 | City average |\n",
"| > 100 | Above city average |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,169 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Transit Accessibility Bar Chart\n",
"\n",
"Shows transit stops per 1,000 residents across Toronto neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_amenities` | neighbourhood × year | transit_per_1000, transit_index, transit_count |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" transit_per_1000,\n",
" transit_index,\n",
" transit_count,\n",
" population,\n",
" amenity_tier\n",
"FROM public_marts.mart_neighbourhood_amenities\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_amenities)\n",
" AND transit_per_1000 IS NOT NULL\n",
"ORDER BY transit_per_1000 DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Sort by transit accessibility\n",
"2. Select top 20 for visualization"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = df.head(20).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[[\"neighbourhood_name\", \"transit_per_1000\", \"transit_index\", \"transit_count\"]].head(\n",
" 10\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_horizontal_bar` from `portfolio_app.figures.toronto.bar_charts`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.bar_charts import create_horizontal_bar\n",
"\n",
"fig = create_horizontal_bar(\n",
" data=data,\n",
" name_column=\"neighbourhood_name\",\n",
" value_column=\"transit_per_1000\",\n",
" title=\"Top 20 Neighbourhoods by Transit Accessibility\",\n",
" color=\"#00BCD4\",\n",
" value_format=\".2f\",\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transit Statistics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"City-wide Transit Statistics:\")\n",
"print(f\" Total Transit Stops: {df['transit_count'].sum():,.0f}\")\n",
"print(f\" Average per 1,000 pop: {df['transit_per_1000'].mean():.2f}\")\n",
"print(f\" Median per 1,000 pop: {df['transit_per_1000'].median():.2f}\")\n",
"print(f\" Best Access: {df['transit_per_1000'].max():.2f} per 1,000\")\n",
"print(f\" Worst Access: {df['transit_per_1000'].min():.2f} per 1,000\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

View File

@@ -0,0 +1,183 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Age Distribution Analysis\n",
"\n",
"Compares median age and age index across Toronto neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_demographics` | neighbourhood × year | median_age, age_index, city_avg_age |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" median_age,\n",
" age_index,\n",
" city_avg_age,\n",
" population,\n",
" income_quintile,\n",
" pct_renter_occupied\n",
"FROM public_marts.mart_neighbourhood_demographics\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_demographics)\n",
" AND median_age IS NOT NULL\n",
"ORDER BY median_age DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods with age data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent census year\n",
"2. Calculate deviation from city average\n",
"3. Classify as younger/older than average"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"city_avg = df[\"city_avg_age\"].iloc[0]\n",
"df[\"age_category\"] = df[\"median_age\"].apply(\n",
" lambda x: \"Younger\" if x < city_avg else \"Older\"\n",
")\n",
"df[\"age_deviation\"] = df[\"median_age\"] - city_avg\n",
"\n",
"data = df.to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f\"City Average Age: {city_avg:.1f}\")\n",
"print(\"\\nYoungest Neighbourhoods:\")\n",
"display(\n",
" df.tail(5)[[\"neighbourhood_name\", \"median_age\", \"age_index\", \"pct_renter_occupied\"]]\n",
")\n",
"print(\"\\nOldest Neighbourhoods:\")\n",
"display(\n",
" df.head(5)[[\"neighbourhood_name\", \"median_age\", \"age_index\", \"pct_renter_occupied\"]]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_ranking_bar` from `portfolio_app.figures.toronto.bar_charts`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.bar_charts import create_ranking_bar\n",
"\n",
"fig = create_ranking_bar(\n",
" data=data,\n",
" name_column=\"neighbourhood_name\",\n",
" value_column=\"median_age\",\n",
" title=\"Youngest & Oldest Neighbourhoods (Median Age)\",\n",
" top_n=10,\n",
" bottom_n=10,\n",
" color_top=\"#FF9800\", # Orange for older\n",
" color_bottom=\"#2196F3\", # Blue for younger\n",
" value_format=\".1f\",\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Age vs Income Correlation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Age by income quintile\n",
"print(\"Median Age by Income Quintile:\")\n",
"df.groupby(\"income_quintile\")[\"median_age\"].mean().round(1)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,182 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Median Income Choropleth Map\n",
"\n",
"Displays median household income across Toronto's 158 neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_demographics` | neighbourhood × year | median_household_income, income_index, income_quintile, geometry |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_id,\n",
" neighbourhood_name,\n",
" geometry,\n",
" year,\n",
" median_household_income,\n",
" income_index,\n",
" income_quintile,\n",
" population,\n",
" unemployment_rate\n",
"FROM public_marts.mart_neighbourhood_demographics\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_demographics)\n",
"ORDER BY median_household_income DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent census year\n",
"2. Convert geometry to GeoJSON\n",
"3. Scale income to thousands for readability"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import geopandas as gpd\n",
"\n",
"df[\"income_thousands\"] = df[\"median_household_income\"] / 1000\n",
"\n",
"gdf = gpd.GeoDataFrame(\n",
" df, geometry=gpd.GeoSeries.from_wkb(df[\"geometry\"]), crs=\"EPSG:4326\"\n",
")\n",
"\n",
"geojson = json.loads(gdf.to_json())\n",
"data = df.drop(columns=[\"geometry\"]).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\"neighbourhood_name\", \"median_household_income\", \"income_index\", \"income_quintile\"]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_choropleth_figure` from `portfolio_app.figures.toronto.choropleth`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.choropleth import create_choropleth_figure\n",
"\n",
"fig = create_choropleth_figure(\n",
" geojson=geojson,\n",
" data=data,\n",
" location_key=\"neighbourhood_id\",\n",
" color_column=\"median_household_income\",\n",
" hover_data=[\"neighbourhood_name\", \"income_index\", \"income_quintile\"],\n",
" color_scale=\"Viridis\",\n",
" title=\"Toronto Median Household Income by Neighbourhood\",\n",
" zoom=10,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Income Quintile Distribution"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.groupby(\"income_quintile\")[\"median_household_income\"].agg(\n",
" [\"count\", \"mean\", \"min\", \"max\"]\n",
").round(0)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,169 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Population Density Bar Chart\n",
"\n",
"Shows population density (people per sq km) across Toronto neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_demographics` | neighbourhood × year | population_density, population, land_area_sqkm |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" population_density,\n",
" population,\n",
" land_area_sqkm,\n",
" median_household_income,\n",
" pct_renter_occupied\n",
"FROM public_marts.mart_neighbourhood_demographics\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_demographics)\n",
" AND population_density IS NOT NULL\n",
"ORDER BY population_density DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Sort by population density\n",
"2. Select top 20 most dense neighbourhoods"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = df.head(20).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[[\"neighbourhood_name\", \"population_density\", \"population\", \"land_area_sqkm\"]].head(\n",
" 10\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_horizontal_bar` from `portfolio_app.figures.toronto.bar_charts`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.bar_charts import create_horizontal_bar\n",
"\n",
"fig = create_horizontal_bar(\n",
" data=data,\n",
" name_column=\"neighbourhood_name\",\n",
" value_column=\"population_density\",\n",
" title=\"Top 20 Most Dense Neighbourhoods\",\n",
" color=\"#9C27B0\",\n",
" value_format=\",.0f\",\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Density Statistics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"City-wide Statistics:\")\n",
"print(f\" Total Population: {df['population'].sum():,.0f}\")\n",
"print(f\" Total Area: {df['land_area_sqkm'].sum():,.1f} sq km\")\n",
"print(f\" Average Density: {df['population_density'].mean():,.0f} per sq km\")\n",
"print(f\" Max Density: {df['population_density'].max():,.0f} per sq km\")\n",
"print(f\" Min Density: {df['population_density'].min():,.0f} per sq km\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,187 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Affordability Index Choropleth Map\n",
"\n",
"Displays housing affordability across Toronto's 158 neighbourhoods. Index of 100 = city average."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_housing` | neighbourhood × year | affordability_index, rent_to_income_pct, avg_rent_2bed, geometry |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_id,\n",
" neighbourhood_name,\n",
" geometry,\n",
" year,\n",
" affordability_index,\n",
" rent_to_income_pct,\n",
" avg_rent_2bed,\n",
" median_household_income,\n",
" is_affordable\n",
"FROM public_marts.mart_neighbourhood_housing\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_housing)\n",
"ORDER BY affordability_index ASC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent year\n",
"2. Convert geometry to GeoJSON\n",
"3. Lower index = more affordable (inverted for visualization clarity)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import geopandas as gpd\n",
"\n",
"gdf = gpd.GeoDataFrame(\n",
" df, geometry=gpd.GeoSeries.from_wkb(df[\"geometry\"]), crs=\"EPSG:4326\"\n",
")\n",
"\n",
"geojson = json.loads(gdf.to_json())\n",
"data = df.drop(columns=[\"geometry\"]).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"affordability_index\",\n",
" \"rent_to_income_pct\",\n",
" \"avg_rent_2bed\",\n",
" \"is_affordable\",\n",
" ]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_choropleth_figure` from `portfolio_app.figures.toronto.choropleth`.\n",
"\n",
"**Key Parameters:**\n",
"- `color_column`: 'affordability_index'\n",
"- `color_scale`: 'RdYlGn_r' (reversed: green=affordable, red=expensive)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.choropleth import create_choropleth_figure\n",
"\n",
"fig = create_choropleth_figure(\n",
" geojson=geojson,\n",
" data=data,\n",
" location_key=\"neighbourhood_id\",\n",
" color_column=\"affordability_index\",\n",
" hover_data=[\"neighbourhood_name\", \"rent_to_income_pct\", \"avg_rent_2bed\"],\n",
" color_scale=\"RdYlGn_r\", # Reversed: lower index (affordable) = green\n",
" title=\"Toronto Housing Affordability Index\",\n",
" zoom=10,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Index Interpretation\n",
"\n",
"| Index | Meaning |\n",
"|-------|--------|\n",
"| < 100 | More affordable than city average |\n",
"| = 100 | City average affordability |\n",
"| > 100 | Less affordable than city average |\n",
"\n",
"Affordability calculated as: `rent_to_income_pct / city_avg_rent_to_income * 100`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,200 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Rent Trend Line Chart\n",
"\n",
"Shows 5-year rental price trends across Toronto neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_housing` | neighbourhood × year | year, avg_rent_2bed, rent_yoy_change_pct |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"# City-wide average rent by year\n",
"query = \"\"\"\n",
"SELECT\n",
" year,\n",
" AVG(avg_rent_bachelor) as avg_rent_bachelor,\n",
" AVG(avg_rent_1bed) as avg_rent_1bed,\n",
" AVG(avg_rent_2bed) as avg_rent_2bed,\n",
" AVG(avg_rent_3bed) as avg_rent_3bed,\n",
" AVG(rent_yoy_change_pct) as avg_yoy_change\n",
"FROM public_marts.mart_neighbourhood_housing\n",
"WHERE year >= (SELECT MAX(year) - 5 FROM public_marts.mart_neighbourhood_housing)\n",
"GROUP BY year\n",
"ORDER BY year\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} years of rent data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Aggregate rent by year (city-wide average)\n",
"2. Convert year to datetime for proper x-axis\n",
"3. Reshape for multi-line chart by bedroom type"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create date column from year\n",
"df[\"date\"] = pd.to_datetime(df[\"year\"].astype(str) + \"-01-01\")\n",
"\n",
"# Melt for multi-line chart\n",
"df_melted = df.melt(\n",
" id_vars=[\"year\", \"date\"],\n",
" value_vars=[\"avg_rent_bachelor\", \"avg_rent_1bed\", \"avg_rent_2bed\", \"avg_rent_3bed\"],\n",
" var_name=\"bedroom_type\",\n",
" value_name=\"avg_rent\",\n",
")\n",
"\n",
"# Clean labels\n",
"df_melted[\"bedroom_type\"] = df_melted[\"bedroom_type\"].map(\n",
" {\n",
" \"avg_rent_bachelor\": \"Bachelor\",\n",
" \"avg_rent_1bed\": \"1 Bedroom\",\n",
" \"avg_rent_2bed\": \"2 Bedroom\",\n",
" \"avg_rent_3bed\": \"3 Bedroom\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\n",
" \"year\",\n",
" \"avg_rent_bachelor\",\n",
" \"avg_rent_1bed\",\n",
" \"avg_rent_2bed\",\n",
" \"avg_rent_3bed\",\n",
" \"avg_yoy_change\",\n",
" ]\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_price_time_series` from `portfolio_app.figures.toronto.time_series`.\n",
"\n",
"**Key Parameters:**\n",
"- `date_column`: 'date'\n",
"- `price_column`: 'avg_rent'\n",
"- `group_column`: 'bedroom_type' (for multi-line)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.time_series import create_price_time_series\n",
"\n",
"data = df_melted.to_dict(\"records\")\n",
"\n",
"fig = create_price_time_series(\n",
" data=data,\n",
" date_column=\"date\",\n",
" price_column=\"avg_rent\",\n",
" group_column=\"bedroom_type\",\n",
" title=\"Toronto Average Rent Trend (5 Years)\",\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### YoY Change Analysis"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show year-over-year changes\n",
"print(\"Year-over-Year Rent Change (%)\")\n",
"df[[\"year\", \"avg_yoy_change\"]].dropna()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,202 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Housing Tenure Breakdown Bar Chart\n",
"\n",
"Shows the distribution of owner-occupied vs renter-occupied dwellings across neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_housing` | neighbourhood × year | pct_owner_occupied, pct_renter_occupied, income_quintile |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" pct_owner_occupied,\n",
" pct_renter_occupied,\n",
" income_quintile,\n",
" total_rental_units,\n",
" average_dwelling_value\n",
"FROM public_marts.mart_neighbourhood_housing\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_housing)\n",
" AND pct_owner_occupied IS NOT NULL\n",
"ORDER BY pct_renter_occupied DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods with tenure data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent year with tenure data\n",
"2. Melt owner/renter columns for stacked bar\n",
"3. Sort by renter percentage (highest first)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Prepare for stacked bar\n",
"df_stacked = df.melt(\n",
" id_vars=[\"neighbourhood_name\", \"income_quintile\"],\n",
" value_vars=[\"pct_owner_occupied\", \"pct_renter_occupied\"],\n",
" var_name=\"tenure_type\",\n",
" value_name=\"percentage\",\n",
")\n",
"\n",
"df_stacked[\"tenure_type\"] = df_stacked[\"tenure_type\"].map(\n",
" {\"pct_owner_occupied\": \"Owner\", \"pct_renter_occupied\": \"Renter\"}\n",
")\n",
"\n",
"data = df_stacked.to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Highest Renter Neighbourhoods:\")\n",
"df[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"pct_renter_occupied\",\n",
" \"pct_owner_occupied\",\n",
" \"income_quintile\",\n",
" ]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_stacked_bar` from `portfolio_app.figures.toronto.bar_charts`.\n",
"\n",
"**Key Parameters:**\n",
"- `x_column`: 'neighbourhood_name'\n",
"- `value_column`: 'percentage'\n",
"- `category_column`: 'tenure_type'\n",
"- `show_percentages`: True"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.bar_charts import create_stacked_bar\n",
"\n",
"# Show top 20 by renter percentage\n",
"top_20_names = df.head(20)[\"neighbourhood_name\"].tolist()\n",
"data_filtered = [d for d in data if d[\"neighbourhood_name\"] in top_20_names]\n",
"\n",
"fig = create_stacked_bar(\n",
" data=data_filtered,\n",
" x_column=\"neighbourhood_name\",\n",
" value_column=\"percentage\",\n",
" category_column=\"tenure_type\",\n",
" title=\"Housing Tenure Mix - Top 20 Renter Neighbourhoods\",\n",
" color_map={\"Owner\": \"#4CAF50\", \"Renter\": \"#2196F3\"},\n",
" show_percentages=True,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### City-Wide Distribution"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# City-wide averages\n",
"print(f\"City Average Owner-Occupied: {df['pct_owner_occupied'].mean():.1f}%\")\n",
"print(f\"City Average Renter-Occupied: {df['pct_renter_occupied'].mean():.1f}%\")\n",
"\n",
"# By income quintile\n",
"print(\"\\nTenure by Income Quintile:\")\n",
"df.groupby(\"income_quintile\")[\n",
" [\"pct_owner_occupied\", \"pct_renter_occupied\"]\n",
"].mean().round(1)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,196 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Income vs Safety Scatter Plot\n",
"\n",
"Explores the correlation between median household income and safety score across Toronto neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_overview` | neighbourhood × year | neighbourhood_name, median_household_income, safety_score, population |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" median_household_income,\n",
" safety_score,\n",
" population,\n",
" livability_score,\n",
" crime_rate_per_100k\n",
"FROM public_marts.mart_neighbourhood_overview\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_overview)\n",
" AND median_household_income IS NOT NULL\n",
" AND safety_score IS NOT NULL\n",
"ORDER BY median_household_income DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods with income and safety data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter out null values for income and safety\n",
"2. Optionally scale income to thousands for readability\n",
"3. Pass to scatter figure factory with optional trendline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Scale income to thousands for better axis readability\n",
"df[\"income_thousands\"] = df[\"median_household_income\"] / 1000\n",
"\n",
"# Prepare data for figure factory\n",
"data = df.to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"median_household_income\",\n",
" \"safety_score\",\n",
" \"crime_rate_per_100k\",\n",
" ]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_scatter_figure` from `portfolio_app.figures.toronto.scatter`.\n",
"\n",
"**Key Parameters:**\n",
"- `x_column`: 'income_thousands' (median household income in $K)\n",
"- `y_column`: 'safety_score' (0-100 percentile rank)\n",
"- `name_column`: 'neighbourhood_name' (hover label)\n",
"- `size_column`: 'population' (optional, bubble size)\n",
"- `trendline`: True (adds OLS regression line)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.scatter import create_scatter_figure\n",
"\n",
"fig = create_scatter_figure(\n",
" data=data,\n",
" x_column=\"income_thousands\",\n",
" y_column=\"safety_score\",\n",
" name_column=\"neighbourhood_name\",\n",
" size_column=\"population\",\n",
" title=\"Income vs Safety by Neighbourhood\",\n",
" x_title=\"Median Household Income ($K)\",\n",
" y_title=\"Safety Score (0-100)\",\n",
" trendline=True,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation\n",
"\n",
"This scatter plot reveals the relationship between income and safety:\n",
"\n",
"- **Positive correlation**: Higher income neighbourhoods tend to have higher safety scores\n",
"- **Bubble size**: Represents population (larger = more people)\n",
"- **Trendline**: Orange dashed line shows the overall trend\n",
"- **Outliers**: Neighbourhoods far from the trendline are interesting cases\n",
" - Above line: Safer than income would predict\n",
" - Below line: Less safe than income would predict"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Calculate correlation coefficient\n",
"correlation = df[\"median_household_income\"].corr(df[\"safety_score\"])\n",
"print(f\"Correlation coefficient (Income vs Safety): {correlation:.3f}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Livability Score Choropleth Map\n",
"\n",
"Displays neighbourhood livability scores on an interactive map of Toronto's 158 neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_overview` | neighbourhood × year | livability_score, safety_score, affordability_score, amenity_score, geometry |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_id,\n",
" neighbourhood_name,\n",
" geometry,\n",
" year,\n",
" livability_score,\n",
" safety_score,\n",
" affordability_score,\n",
" amenity_score,\n",
" population,\n",
" median_household_income\n",
"FROM public_marts.mart_neighbourhood_overview\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_overview)\n",
"ORDER BY livability_score DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent year of data\n",
"2. Extract GeoJSON from PostGIS geometry column\n",
"3. Pass to choropleth figure factory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Transform geometry to GeoJSON\n",
"import json\n",
"\n",
"import geopandas as gpd\n",
"\n",
"# Convert WKB geometry to GeoDataFrame\n",
"gdf = gpd.GeoDataFrame(\n",
" df, geometry=gpd.GeoSeries.from_wkb(df[\"geometry\"]), crs=\"EPSG:4326\"\n",
")\n",
"\n",
"# Create GeoJSON FeatureCollection\n",
"geojson = json.loads(gdf.to_json())\n",
"\n",
"# Prepare data for figure factory\n",
"data = df.drop(columns=[\"geometry\"]).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"livability_score\",\n",
" \"safety_score\",\n",
" \"affordability_score\",\n",
" \"amenity_score\",\n",
" ]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_choropleth_figure` from `portfolio_app.figures.toronto.choropleth`.\n",
"\n",
"**Key Parameters:**\n",
"- `geojson`: GeoJSON FeatureCollection with neighbourhood boundaries\n",
"- `data`: List of dicts with neighbourhood_id and scores\n",
"- `location_key`: 'neighbourhood_id'\n",
"- `color_column`: 'livability_score' (or safety_score, etc.)\n",
"- `color_scale`: 'RdYlGn' (red=low, yellow=mid, green=high)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.choropleth import create_choropleth_figure\n",
"\n",
"fig = create_choropleth_figure(\n",
" geojson=geojson,\n",
" data=data,\n",
" location_key=\"neighbourhood_id\",\n",
" color_column=\"livability_score\",\n",
" hover_data=[\n",
" \"neighbourhood_name\",\n",
" \"safety_score\",\n",
" \"affordability_score\",\n",
" \"amenity_score\",\n",
" ],\n",
" color_scale=\"RdYlGn\",\n",
" title=\"Toronto Neighbourhood Livability Score\",\n",
" zoom=10,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Score Components\n",
"\n",
"The livability score is a weighted composite:\n",
"\n",
"| Component | Weight | Source |\n",
"|-----------|--------|--------|\n",
"| Safety | 30% | Inverse of crime rate per 100K |\n",
"| Affordability | 40% | Inverse of rent-to-income ratio |\n",
"| Amenities | 30% | Amenities per 1,000 residents |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,173 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Top & Bottom 10 Neighbourhoods Bar Chart\n",
"\n",
"Horizontal bar chart showing the highest and lowest scoring neighbourhoods by livability."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_overview` | neighbourhood × year | neighbourhood_name, livability_score |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" livability_score,\n",
" safety_score,\n",
" affordability_score,\n",
" amenity_score\n",
"FROM public_marts.mart_neighbourhood_overview\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_overview)\n",
" AND livability_score IS NOT NULL\n",
"ORDER BY livability_score DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods with scores\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Sort by livability_score descending\n",
"2. Take top 10 and bottom 10\n",
"3. Pass to ranking bar figure factory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The figure factory handles top/bottom selection internally\n",
"# Just prepare as list of dicts\n",
"data = df.to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Top 5:\")\n",
"display(df.head(5))\n",
"print(\"\\nBottom 5:\")\n",
"display(df.tail(5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_ranking_bar` from `portfolio_app.figures.toronto.bar_charts`.\n",
"\n",
"**Key Parameters:**\n",
"- `data`: List of dicts with all neighbourhoods\n",
"- `name_column`: 'neighbourhood_name'\n",
"- `value_column`: 'livability_score'\n",
"- `top_n`: 10 (green bars)\n",
"- `bottom_n`: 10 (red bars)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.bar_charts import create_ranking_bar\n",
"\n",
"fig = create_ranking_bar(\n",
" data=data,\n",
" name_column=\"neighbourhood_name\",\n",
" value_column=\"livability_score\",\n",
" title=\"Top & Bottom 10 Neighbourhoods by Livability\",\n",
" top_n=10,\n",
" bottom_n=10,\n",
" color_top=\"#4CAF50\", # Green for top performers\n",
" color_bottom=\"#F44336\", # Red for bottom performers\n",
" value_format=\".1f\",\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation\n",
"\n",
"- **Green bars**: Highest livability scores (best combination of safety, affordability, and amenities)\n",
"- **Red bars**: Lowest livability scores (areas that may need targeted investment)\n",
"\n",
"The ranking bar chart provides quick context for which neighbourhoods stand out at either extreme."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

View File

@@ -0,0 +1,200 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Crime Type Breakdown Bar Chart\n",
"\n",
"Stacked bar chart showing crime composition by Major Crime Indicator (MCI) categories."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_safety` | neighbourhood × year | assault_count, auto_theft_count, break_enter_count, robbery_count, etc. |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" assault_count,\n",
" auto_theft_count,\n",
" break_enter_count,\n",
" robbery_count,\n",
" theft_over_count,\n",
" homicide_count,\n",
" total_incidents,\n",
" crime_rate_per_100k\n",
"FROM public_marts.mart_neighbourhood_safety\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_safety)\n",
"ORDER BY total_incidents DESC\n",
"LIMIT 15\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded top {len(df)} neighbourhoods by crime volume\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Select top 15 neighbourhoods by total incidents\n",
"2. Melt crime type columns into rows\n",
"3. Pass to stacked bar figure factory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_melted = df.melt(\n",
" id_vars=[\"neighbourhood_name\", \"total_incidents\"],\n",
" value_vars=[\n",
" \"assault_count\",\n",
" \"auto_theft_count\",\n",
" \"break_enter_count\",\n",
" \"robbery_count\",\n",
" \"theft_over_count\",\n",
" \"homicide_count\",\n",
" ],\n",
" var_name=\"crime_type\",\n",
" value_name=\"count\",\n",
")\n",
"\n",
"# Clean labels\n",
"df_melted[\"crime_type\"] = (\n",
" df_melted[\"crime_type\"].str.replace(\"_count\", \"\").str.replace(\"_\", \" \").str.title()\n",
")\n",
"\n",
"data = df_melted.to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"assault_count\",\n",
" \"auto_theft_count\",\n",
" \"break_enter_count\",\n",
" \"total_incidents\",\n",
" ]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_stacked_bar` from `portfolio_app.figures.toronto.bar_charts`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.bar_charts import create_stacked_bar\n",
"\n",
"fig = create_stacked_bar(\n",
" data=data,\n",
" x_column=\"neighbourhood_name\",\n",
" value_column=\"count\",\n",
" category_column=\"crime_type\",\n",
" title=\"Crime Type Breakdown - Top 15 Neighbourhoods\",\n",
" color_map={\n",
" \"Assault\": \"#d62728\",\n",
" \"Auto Theft\": \"#ff7f0e\",\n",
" \"Break Enter\": \"#9467bd\",\n",
" \"Robbery\": \"#8c564b\",\n",
" \"Theft Over\": \"#e377c2\",\n",
" \"Homicide\": \"#1f77b4\",\n",
" },\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MCI Categories\n",
"\n",
"| Category | Description |\n",
"|----------|------------|\n",
"| Assault | Physical attacks |\n",
"| Auto Theft | Vehicle theft |\n",
"| Break & Enter | Burglary |\n",
"| Robbery | Theft with force/threat |\n",
"| Theft Over | Theft > $5,000 |\n",
"| Homicide | Murder/manslaughter |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,185 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Crime Rate Choropleth Map\n",
"\n",
"Displays crime rates per 100,000 population across Toronto's 158 neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_safety` | neighbourhood × year | crime_rate_per_100k, crime_index, safety_tier, geometry |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_id,\n",
" neighbourhood_name,\n",
" geometry,\n",
" year,\n",
" crime_rate_per_100k,\n",
" crime_index,\n",
" safety_tier,\n",
" total_incidents,\n",
" population\n",
"FROM public_marts.mart_neighbourhood_safety\n",
"WHERE year = (SELECT MAX(year) FROM public_marts.mart_neighbourhood_safety)\n",
"ORDER BY crime_rate_per_100k DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter to most recent year\n",
"2. Convert geometry to GeoJSON\n",
"3. Use reversed color scale (green=low crime, red=high crime)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"import geopandas as gpd\n",
"\n",
"gdf = gpd.GeoDataFrame(\n",
" df, geometry=gpd.GeoSeries.from_wkb(df[\"geometry\"]), crs=\"EPSG:4326\"\n",
")\n",
"\n",
"geojson = json.loads(gdf.to_json())\n",
"data = df.drop(columns=[\"geometry\"]).to_dict(\"records\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\n",
" [\n",
" \"neighbourhood_name\",\n",
" \"crime_rate_per_100k\",\n",
" \"crime_index\",\n",
" \"safety_tier\",\n",
" \"total_incidents\",\n",
" ]\n",
"].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_choropleth_figure` from `portfolio_app.figures.toronto.choropleth`.\n",
"\n",
"**Key Parameters:**\n",
"- `color_column`: 'crime_rate_per_100k'\n",
"- `color_scale`: 'RdYlGn_r' (red=high crime, green=low crime)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.choropleth import create_choropleth_figure\n",
"\n",
"fig = create_choropleth_figure(\n",
" geojson=geojson,\n",
" data=data,\n",
" location_key=\"neighbourhood_id\",\n",
" color_column=\"crime_rate_per_100k\",\n",
" hover_data=[\"neighbourhood_name\", \"crime_index\", \"total_incidents\"],\n",
" color_scale=\"RdYlGn_r\",\n",
" title=\"Toronto Crime Rate per 100,000 Population\",\n",
" zoom=10,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Safety Tier Interpretation\n",
"\n",
"| Tier | Meaning |\n",
"|------|--------|\n",
"| 1 | Highest crime (top 20%) |\n",
"| 2-4 | Middle tiers |\n",
"| 5 | Lowest crime (bottom 20%) |"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,198 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Crime Trend Line Chart\n",
"\n",
"Shows 5-year crime rate trends across Toronto."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_safety` | neighbourhood × year | year, crime_rate_per_100k, crime_yoy_change_pct |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import pandas as pd\n",
"from dotenv import load_dotenv\n",
"from sqlalchemy import create_engine\n",
"\n",
"# Load .env from project root\n",
"load_dotenv(\"../../.env\")\n",
"\n",
"engine = create_engine(os.environ[\"DATABASE_URL\"])\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" year,\n",
" AVG(crime_rate_per_100k) as avg_crime_rate,\n",
" AVG(assault_rate_per_100k) as avg_assault_rate,\n",
" AVG(auto_theft_rate_per_100k) as avg_auto_theft_rate,\n",
" AVG(break_enter_rate_per_100k) as avg_break_enter_rate,\n",
" SUM(total_incidents) as total_city_incidents,\n",
" AVG(crime_yoy_change_pct) as avg_yoy_change\n",
"FROM public_marts.mart_neighbourhood_safety\n",
"WHERE year >= (SELECT MAX(year) - 5 FROM public_marts.mart_neighbourhood_safety)\n",
"GROUP BY year\n",
"ORDER BY year\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} years of crime data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Aggregate by year (city-wide)\n",
"2. Convert year to datetime\n",
"3. Melt for multi-line by crime type"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[\"date\"] = pd.to_datetime(df[\"year\"].astype(str) + \"-01-01\")\n",
"\n",
"# Melt for multi-line\n",
"df_melted = df.melt(\n",
" id_vars=[\"year\", \"date\"],\n",
" value_vars=[\"avg_assault_rate\", \"avg_auto_theft_rate\", \"avg_break_enter_rate\"],\n",
" var_name=\"crime_type\",\n",
" value_name=\"rate_per_100k\",\n",
")\n",
"\n",
"df_melted[\"crime_type\"] = df_melted[\"crime_type\"].map(\n",
" {\n",
" \"avg_assault_rate\": \"Assault\",\n",
" \"avg_auto_theft_rate\": \"Auto Theft\",\n",
" \"avg_break_enter_rate\": \"Break & Enter\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[[\"year\", \"avg_crime_rate\", \"total_city_incidents\", \"avg_yoy_change\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_price_time_series` (reused for any numeric trend)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"sys.path.insert(0, \"../..\")\n",
"\n",
"from portfolio_app.figures.toronto.time_series import create_price_time_series\n",
"\n",
"data = df_melted.to_dict(\"records\")\n",
"\n",
"fig = create_price_time_series(\n",
" data=data,\n",
" date_column=\"date\",\n",
" price_column=\"rate_per_100k\",\n",
" group_column=\"crime_type\",\n",
" title=\"Toronto Crime Trends by Type (5 Years)\",\n",
")\n",
"\n",
"# Remove dollar sign formatting since this is rate data\n",
"fig.update_layout(yaxis_tickprefix=\"\", yaxis_title=\"Rate per 100K\")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overall Trend"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Total crime rate trend\n",
"total_data = (\n",
" df[[\"date\", \"avg_crime_rate\"]]\n",
" .rename(columns={\"avg_crime_rate\": \"total_rate\"})\n",
" .to_dict(\"records\")\n",
")\n",
"\n",
"fig2 = create_price_time_series(\n",
" data=total_data,\n",
" date_column=\"date\",\n",
" price_column=\"total_rate\",\n",
" title=\"Toronto Overall Crime Rate Trend\",\n",
")\n",
"fig2.update_layout(yaxis_tickprefix=\"\", yaxis_title=\"Rate per 100K\")\n",
"fig2.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,5 +1,5 @@
"""Application-level callbacks for the portfolio app.""" """Application-level callbacks for the portfolio app."""
from . import sidebar, theme from . import contact, sidebar, theme
__all__ = ["sidebar", "theme"] __all__ = ["contact", "sidebar", "theme"]

View File

@@ -0,0 +1,214 @@
"""Contact form submission callback with Formspree integration."""
import re
from typing import Any
import dash_mantine_components as dmc
import requests
from dash import Input, Output, State, callback, no_update
from dash_iconify import DashIconify
FORMSPREE_ENDPOINT = "https://formspree.io/f/mqelqzpd"
EMAIL_REGEX = re.compile(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
def _validate_form(
name: str | None, email: str | None, message: str | None
) -> str | None:
"""Validate form fields and return error message if invalid."""
if not name or not name.strip():
return "Please enter your name."
if not email or not email.strip():
return "Please enter your email address."
if not EMAIL_REGEX.match(email.strip()):
return "Please enter a valid email address."
if not message or not message.strip():
return "Please enter a message."
return None
def _create_success_alert() -> dmc.Alert:
"""Create success feedback alert."""
return dmc.Alert(
"Thank you for your message! I'll get back to you soon.",
title="Message Sent",
color="green",
variant="light",
icon=DashIconify(icon="tabler:check", width=20),
withCloseButton=True,
)
def _create_error_alert(message: str) -> dmc.Alert:
"""Create error feedback alert."""
return dmc.Alert(
message,
title="Error",
color="red",
variant="light",
icon=DashIconify(icon="tabler:alert-circle", width=20),
withCloseButton=True,
)
@callback( # type: ignore[misc]
Output("contact-feedback", "children"),
Output("contact-submit", "loading"),
Output("contact-name", "value"),
Output("contact-email", "value"),
Output("contact-subject", "value"),
Output("contact-message", "value"),
Output("contact-name", "error"),
Output("contact-email", "error"),
Output("contact-message", "error"),
Input("contact-submit", "n_clicks"),
State("contact-name", "value"),
State("contact-email", "value"),
State("contact-subject", "value"),
State("contact-message", "value"),
State("contact-gotcha", "value"),
prevent_initial_call=True,
)
def submit_contact_form(
n_clicks: int | None,
name: str | None,
email: str | None,
subject: str | None,
message: str | None,
gotcha: str | None,
) -> tuple[Any, ...]:
"""Submit contact form to Formspree.
Args:
n_clicks: Button click count.
name: User's name.
email: User's email address.
subject: Message subject (optional).
message: Message content.
gotcha: Honeypot field value (should be empty for real users).
Returns:
Tuple of (feedback, loading, name, email, subject, message,
name_error, email_error, message_error).
"""
if not n_clicks:
return (no_update,) * 9
# Check honeypot - if filled, silently "succeed" (it's a bot)
if gotcha:
return (
_create_success_alert(),
False,
"",
"",
None,
"",
None,
None,
None,
)
# Validate form
validation_error = _validate_form(name, email, message)
if validation_error:
# Determine which field has the error
name_error = "Required" if not name or not name.strip() else None
email_error = None
message_error = "Required" if not message or not message.strip() else None
if not email or not email.strip():
email_error = "Required"
elif not EMAIL_REGEX.match(email.strip()):
email_error = "Invalid email format"
return (
_create_error_alert(validation_error),
False,
no_update,
no_update,
no_update,
no_update,
name_error,
email_error,
message_error,
)
# Prepare form data (validation passed, so name/email/message are not None)
assert name is not None
assert email is not None
assert message is not None
form_data = {
"name": name.strip(),
"email": email.strip(),
"subject": subject or "General Inquiry",
"message": message.strip(),
"_gotcha": "", # Formspree honeypot
}
# Submit to Formspree
try:
response = requests.post(
FORMSPREE_ENDPOINT,
json=form_data,
headers={
"Accept": "application/json",
"Content-Type": "application/json",
},
timeout=10,
)
if response.status_code == 200:
# Success - clear form
return (
_create_success_alert(),
False,
"",
"",
None,
"",
None,
None,
None,
)
else:
# Formspree returned an error
return (
_create_error_alert(
"Failed to send message. Please try again or use direct contact."
),
False,
no_update,
no_update,
no_update,
no_update,
None,
None,
None,
)
except requests.exceptions.Timeout:
return (
_create_error_alert("Request timed out. Please try again."),
False,
no_update,
no_update,
no_update,
no_update,
None,
None,
None,
)
except requests.exceptions.RequestException:
return (
_create_error_alert(
"Network error. Please check your connection and try again."
),
False,
no_update,
no_update,
no_update,
no_update,
None,
None,
None,
)

View File

@@ -28,7 +28,7 @@ def create_metric_selector(
label=label, label=label,
data=options, data=options,
value=default_value or (options[0]["value"] if options else None), value=default_value or (options[0]["value"] if options else None),
style={"width": "200px"}, w=200,
) )
@@ -64,7 +64,7 @@ def create_map_controls(
id=f"{id_prefix}-layer-toggle", id=f"{id_prefix}-layer-toggle",
label="Show Boundaries", label="Show Boundaries",
checked=True, checked=True,
style={"marginTop": "10px"}, mt="sm",
) )
) )

View File

@@ -5,7 +5,7 @@ from typing import Any
import dash_mantine_components as dmc import dash_mantine_components as dmc
from dash import dcc from dash import dcc
from portfolio_app.figures.summary_cards import create_metric_card_figure from portfolio_app.figures.toronto.summary_cards import create_metric_card_figure
class MetricCard: class MetricCard:

View File

@@ -38,7 +38,7 @@ def create_year_selector(
label=label, label=label,
data=options, data=options,
value=str(default_year), value=str(default_year),
style={"width": "120px"}, w=120,
) )
@@ -83,7 +83,8 @@ def create_time_slider(
marks=marks, marks=marks,
step=1, step=1,
minRange=1, minRange=1,
style={"marginTop": "20px", "marginBottom": "10px"}, mt="md",
mb="sm",
), ),
], ],
p="md", p="md",
@@ -131,5 +132,5 @@ def create_month_selector(
label=label, label=label,
data=options, data=options,
value=str(default_month), value=str(default_month),
style={"width": "140px"}, w=140,
) )

View File

@@ -0,0 +1,48 @@
"""Design system tokens and utilities."""
from .tokens import (
CHART_PALETTE,
COLOR_ACCENT,
COLOR_NEGATIVE,
COLOR_POSITIVE,
COLOR_WARNING,
GRID_COLOR,
GRID_COLOR_DARK,
PALETTE_COMPARISON,
PALETTE_GENDER,
PALETTE_TREND,
PAPER_BG,
PLOT_BG,
POLICY_COLORS,
TEXT_MUTED,
TEXT_PRIMARY,
TEXT_SECONDARY,
get_colorbar_defaults,
get_default_layout,
)
__all__ = [
# Text colors
"TEXT_PRIMARY",
"TEXT_SECONDARY",
"TEXT_MUTED",
# Chart backgrounds
"GRID_COLOR",
"GRID_COLOR_DARK",
"PAPER_BG",
"PLOT_BG",
# Semantic colors
"COLOR_POSITIVE",
"COLOR_NEGATIVE",
"COLOR_WARNING",
"COLOR_ACCENT",
# Palettes
"CHART_PALETTE",
"PALETTE_COMPARISON",
"PALETTE_GENDER",
"PALETTE_TREND",
"POLICY_COLORS",
# Utility functions
"get_default_layout",
"get_colorbar_defaults",
]

View File

@@ -0,0 +1,162 @@
"""Centralized design tokens for consistent styling across the application.
This module provides a single source of truth for colors, ensuring:
- Consistent styling across all Plotly figures and components
- Accessibility compliance (WCAG color contrast)
- Easy theme updates without hunting through multiple files
Usage:
from portfolio_app.design import TEXT_PRIMARY, CHART_PALETTE
fig.update_layout(font_color=TEXT_PRIMARY)
"""
from typing import Any
# =============================================================================
# TEXT COLORS (Dark Theme)
# =============================================================================
TEXT_PRIMARY = "#c9c9c9"
"""Primary text color for labels, titles, and body text."""
TEXT_SECONDARY = "#888888"
"""Secondary text color for subtitles, captions, and muted text."""
TEXT_MUTED = "#666666"
"""Muted text color for disabled states and placeholders."""
# =============================================================================
# CHART BACKGROUND & GRID
# =============================================================================
GRID_COLOR = "rgba(128, 128, 128, 0.2)"
"""Standard grid line color with transparency."""
GRID_COLOR_DARK = "rgba(128, 128, 128, 0.3)"
"""Darker grid for radar charts and polar plots."""
PAPER_BG = "rgba(0, 0, 0, 0)"
"""Transparent paper background for charts."""
PLOT_BG = "rgba(0, 0, 0, 0)"
"""Transparent plot background for charts."""
# =============================================================================
# SEMANTIC COLORS
# =============================================================================
COLOR_POSITIVE = "#40c057"
"""Positive/success indicator (Mantine green-6)."""
COLOR_NEGATIVE = "#fa5252"
"""Negative/error indicator (Mantine red-6)."""
COLOR_WARNING = "#fab005"
"""Warning indicator (Mantine yellow-6)."""
COLOR_ACCENT = "#228be6"
"""Primary accent color (Mantine blue-6)."""
# =============================================================================
# ACCESSIBLE CHART PALETTE
# =============================================================================
# Okabe-Ito palette - optimized for all color vision deficiencies
# Reference: https://jfly.uni-koeln.de/color/
CHART_PALETTE = [
"#0072B2", # Blue (primary data series)
"#E69F00", # Orange
"#56B4E9", # Sky blue
"#009E73", # Teal/green
"#F0E442", # Yellow
"#D55E00", # Vermillion
"#CC79A7", # Pink
"#000000", # Black (use sparingly)
]
"""
Accessible categorical palette (Okabe-Ito).
Distinguishable for deuteranopia, protanopia, and tritanopia.
Use indices 0-6 for most charts; index 7 (black) for emphasis only.
"""
# Semantic subsets for specific use cases
PALETTE_COMPARISON = [CHART_PALETTE[0], CHART_PALETTE[1]]
"""Two-color palette for A/B comparisons."""
PALETTE_GENDER = {
"male": "#56B4E9", # Sky blue
"female": "#CC79A7", # Pink
}
"""Gender-specific colors (accessible contrast)."""
PALETTE_TREND = {
"positive": COLOR_POSITIVE,
"negative": COLOR_NEGATIVE,
"neutral": TEXT_SECONDARY,
}
"""Trend indicator colors for sparklines and deltas."""
# =============================================================================
# POLICY/EVENT MARKERS (Time Series)
# =============================================================================
POLICY_COLORS = {
"policy_change": "#E69F00", # Orange - policy changes
"major_event": "#D55E00", # Vermillion - major events
"data_note": "#56B4E9", # Sky blue - data annotations
"forecast": "#009E73", # Teal - forecast periods
"highlight": "#F0E442", # Yellow - highlighted regions
}
"""Colors for policy markers and event annotations on time series."""
# =============================================================================
# CHART LAYOUT DEFAULTS
# =============================================================================
def get_default_layout() -> dict[str, Any]:
"""Return default Plotly layout settings with design tokens.
Returns:
dict: Layout configuration for fig.update_layout()
Example:
fig.update_layout(**get_default_layout())
"""
return {
"paper_bgcolor": PAPER_BG,
"plot_bgcolor": PLOT_BG,
"font": {"color": TEXT_PRIMARY},
"title": {"font": {"color": TEXT_PRIMARY}},
"legend": {"font": {"color": TEXT_PRIMARY}},
"xaxis": {
"gridcolor": GRID_COLOR,
"linecolor": GRID_COLOR,
"tickfont": {"color": TEXT_PRIMARY},
"title": {"font": {"color": TEXT_PRIMARY}},
},
"yaxis": {
"gridcolor": GRID_COLOR,
"linecolor": GRID_COLOR,
"tickfont": {"color": TEXT_PRIMARY},
"title": {"font": {"color": TEXT_PRIMARY}},
},
}
def get_colorbar_defaults() -> dict[str, Any]:
"""Return default colorbar settings with design tokens.
Returns:
dict: Colorbar configuration for choropleth/heatmap traces
"""
return {
"tickfont": {"color": TEXT_PRIMARY},
"title": {"font": {"color": TEXT_PRIMARY}},
}

View File

@@ -1,29 +1,15 @@
"""Plotly figure factories for data visualization.""" """Plotly figure factories for data visualization.
from .choropleth import ( Figure factories are organized by dashboard domain:
create_choropleth_figure, - toronto/ : Toronto Neighbourhood Dashboard figures
create_zone_map,
) Usage:
from .summary_cards import create_metric_card_figure, create_summary_metrics from portfolio_app.figures.toronto import create_choropleth_figure
from .time_series import ( from portfolio_app.figures.toronto import create_ranking_bar
add_policy_markers, """
create_market_comparison_chart,
create_price_time_series, from . import toronto
create_time_series_with_events,
create_volume_time_series,
)
__all__ = [ __all__ = [
# Choropleth "toronto",
"create_choropleth_figure",
"create_zone_map",
# Time series
"create_price_time_series",
"create_volume_time_series",
"create_market_comparison_chart",
"create_time_series_with_events",
"add_policy_markers",
# Summary
"create_metric_card_figure",
"create_summary_metrics",
] ]

View File

@@ -0,0 +1,61 @@
"""Plotly figure factories for Toronto dashboard visualizations."""
from .bar_charts import (
create_horizontal_bar,
create_ranking_bar,
create_stacked_bar,
)
from .choropleth import (
create_choropleth_figure,
create_zone_map,
)
from .demographics import (
create_age_pyramid,
create_donut_chart,
create_income_distribution,
)
from .radar import (
create_comparison_radar,
create_radar_figure,
)
from .scatter import (
create_bubble_chart,
create_scatter_figure,
)
from .summary_cards import create_metric_card_figure, create_summary_metrics
from .time_series import (
add_policy_markers,
create_market_comparison_chart,
create_price_time_series,
create_time_series_with_events,
create_volume_time_series,
)
__all__ = [
# Choropleth
"create_choropleth_figure",
"create_zone_map",
# Time series
"create_price_time_series",
"create_volume_time_series",
"create_market_comparison_chart",
"create_time_series_with_events",
"add_policy_markers",
# Summary
"create_metric_card_figure",
"create_summary_metrics",
# Bar charts
"create_ranking_bar",
"create_stacked_bar",
"create_horizontal_bar",
# Scatter plots
"create_scatter_figure",
"create_bubble_chart",
# Radar charts
"create_radar_figure",
"create_comparison_radar",
# Demographics
"create_age_pyramid",
"create_donut_chart",
"create_income_distribution",
]

View File

@@ -0,0 +1,249 @@
"""Bar chart figure factories for dashboard visualizations."""
from typing import Any
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from portfolio_app.design import (
CHART_PALETTE,
COLOR_NEGATIVE,
COLOR_POSITIVE,
GRID_COLOR,
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
def create_ranking_bar(
data: list[dict[str, Any]],
name_column: str,
value_column: str,
title: str | None = None,
top_n: int = 10,
bottom_n: int = 10,
color_top: str = COLOR_POSITIVE,
color_bottom: str = COLOR_NEGATIVE,
value_format: str = ",.0f",
) -> go.Figure:
"""Create horizontal bar chart showing top and bottom rankings.
Args:
data: List of data records.
name_column: Column name for labels.
value_column: Column name for values.
title: Optional chart title.
top_n: Number of top items to show.
bottom_n: Number of bottom items to show.
color_top: Color for top performers.
color_bottom: Color for bottom performers.
value_format: Number format string for values.
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Rankings")
df = pd.DataFrame(data).sort_values(value_column, ascending=False)
# Get top and bottom
top_df = df.head(top_n).copy()
bottom_df = df.tail(bottom_n).copy()
top_df["group"] = "Top"
bottom_df["group"] = "Bottom"
# Combine with gap in the middle
combined = pd.concat([top_df, bottom_df])
combined["color"] = combined["group"].map(
{"Top": color_top, "Bottom": color_bottom}
)
fig = go.Figure()
# Add top bars
fig.add_trace(
go.Bar(
y=top_df[name_column],
x=top_df[value_column],
orientation="h",
marker_color=color_top,
name="Top",
text=top_df[value_column].apply(lambda x: f"{x:{value_format}}"),
textposition="auto",
hovertemplate=f"%{{y}}<br>{value_column}: %{{x:{value_format}}}<extra></extra>",
)
)
# Add bottom bars
fig.add_trace(
go.Bar(
y=bottom_df[name_column],
x=bottom_df[value_column],
orientation="h",
marker_color=color_bottom,
name="Bottom",
text=bottom_df[value_column].apply(lambda x: f"{x:{value_format}}"),
textposition="auto",
hovertemplate=f"%{{y}}<br>{value_column}: %{{x:{value_format}}}<extra></extra>",
)
)
fig.update_layout(
title=title,
barmode="group",
showlegend=True,
legend={"orientation": "h", "yanchor": "bottom", "y": 1.02},
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"gridcolor": GRID_COLOR, "title": None},
yaxis={"autorange": "reversed", "title": None},
margin={"l": 10, "r": 10, "t": 40, "b": 10},
)
return fig
def create_stacked_bar(
data: list[dict[str, Any]],
x_column: str,
value_column: str,
category_column: str,
title: str | None = None,
color_map: dict[str, str] | None = None,
show_percentages: bool = False,
) -> go.Figure:
"""Create stacked bar chart for breakdown visualizations.
Args:
data: List of data records.
x_column: Column name for x-axis categories.
value_column: Column name for values.
category_column: Column name for stacking categories.
title: Optional chart title.
color_map: Mapping of category to color.
show_percentages: Whether to normalize to 100%.
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Breakdown")
df = pd.DataFrame(data)
# Default color scheme using accessible palette
if color_map is None:
categories = df[category_column].unique()
colors = CHART_PALETTE[: len(categories)]
color_map = dict(zip(categories, colors, strict=False))
fig = px.bar(
df,
x=x_column,
y=value_column,
color=category_column,
color_discrete_map=color_map,
barmode="stack",
text=value_column if not show_percentages else None,
)
if show_percentages:
fig.update_traces(texttemplate="%{y:.1f}%", textposition="inside")
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"gridcolor": GRID_COLOR, "title": None},
yaxis={"gridcolor": GRID_COLOR, "title": None},
legend={"orientation": "h", "yanchor": "bottom", "y": 1.02},
margin={"l": 10, "r": 10, "t": 60, "b": 10},
)
return fig
def create_horizontal_bar(
data: list[dict[str, Any]],
name_column: str,
value_column: str,
title: str | None = None,
color: str = CHART_PALETTE[0],
value_format: str = ",.0f",
sort: bool = True,
) -> go.Figure:
"""Create simple horizontal bar chart.
Args:
data: List of data records.
name_column: Column name for labels.
value_column: Column name for values.
title: Optional chart title.
color: Bar color.
value_format: Number format string.
sort: Whether to sort by value descending.
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Bar Chart")
df = pd.DataFrame(data)
if sort:
df = df.sort_values(value_column, ascending=True)
fig = go.Figure(
go.Bar(
y=df[name_column],
x=df[value_column],
orientation="h",
marker_color=color,
text=df[value_column].apply(lambda x: f"{x:{value_format}}"),
textposition="outside",
hovertemplate=f"%{{y}}<br>Value: %{{x:{value_format}}}<extra></extra>",
)
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"gridcolor": GRID_COLOR, "title": None},
yaxis={"title": None},
margin={"l": 10, "r": 10, "t": 40, "b": 10},
)
return fig
def _create_empty_figure(title: str) -> go.Figure:
"""Create an empty figure with a message."""
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"visible": False},
yaxis={"visible": False},
)
return fig

View File

@@ -5,6 +5,13 @@ from typing import Any
import plotly.express as px import plotly.express as px
import plotly.graph_objects as go import plotly.graph_objects as go
from portfolio_app.design import (
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
def create_choropleth_figure( def create_choropleth_figure(
geojson: dict[str, Any] | None, geojson: dict[str, Any] | None,
@@ -55,9 +62,9 @@ def create_choropleth_figure(
margin={"l": 0, "r": 0, "t": 40, "b": 0}, margin={"l": 0, "r": 0, "t": 40, "b": 0},
title=title or "Toronto Housing Map", title=title or "Toronto Housing Map",
height=500, height=500,
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
) )
fig.add_annotation( fig.add_annotation(
text="No geometry data available. Complete QGIS digitization to enable map.", text="No geometry data available. Complete QGIS digitization to enable map.",
@@ -66,7 +73,7 @@ def create_choropleth_figure(
x=0.5, x=0.5,
y=0.5, y=0.5,
showarrow=False, showarrow=False,
font={"size": 14, "color": "#888888"}, font={"size": 14, "color": TEXT_SECONDARY},
) )
return fig return fig
@@ -98,17 +105,17 @@ def create_choropleth_figure(
margin={"l": 0, "r": 0, "t": 40, "b": 0}, margin={"l": 0, "r": 0, "t": 40, "b": 0},
title=title, title=title,
height=500, height=500,
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
coloraxis_colorbar={ coloraxis_colorbar={
"title": { "title": {
"text": color_column.replace("_", " ").title(), "text": color_column.replace("_", " ").title(),
"font": {"color": "#c9c9c9"}, "font": {"color": TEXT_PRIMARY},
}, },
"thickness": 15, "thickness": 15,
"len": 0.7, "len": 0.7,
"tickfont": {"color": "#c9c9c9"}, "tickfont": {"color": TEXT_PRIMARY},
}, },
) )

View File

@@ -0,0 +1,242 @@
"""Demographics-specific chart factories."""
from typing import Any
import pandas as pd
import plotly.graph_objects as go
from portfolio_app.design import (
CHART_PALETTE,
GRID_COLOR,
PALETTE_GENDER,
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
def create_age_pyramid(
data: list[dict[str, Any]],
age_groups: list[str],
male_column: str = "male",
female_column: str = "female",
title: str | None = None,
) -> go.Figure:
"""Create population pyramid by age and gender.
Args:
data: List with one record per age group containing male/female counts.
age_groups: List of age group labels in order (youngest to oldest).
male_column: Column name for male population.
female_column: Column name for female population.
title: Optional chart title.
Returns:
Plotly Figure object.
"""
if not data or not age_groups:
return _create_empty_figure(title or "Age Distribution")
df = pd.DataFrame(data)
# Ensure data is ordered by age groups
if "age_group" in df.columns:
df["age_order"] = df["age_group"].apply(
lambda x: age_groups.index(x) if x in age_groups else -1
)
df = df.sort_values("age_order")
male_values = df[male_column].tolist() if male_column in df.columns else []
female_values = df[female_column].tolist() if female_column in df.columns else []
# Make male values negative for pyramid effect
male_values_neg = [-v for v in male_values]
fig = go.Figure()
# Male bars (left side, negative values)
fig.add_trace(
go.Bar(
y=age_groups,
x=male_values_neg,
orientation="h",
name="Male",
marker_color=PALETTE_GENDER["male"],
hovertemplate="%{y}<br>Male: %{customdata:,}<extra></extra>",
customdata=male_values,
)
)
# Female bars (right side, positive values)
fig.add_trace(
go.Bar(
y=age_groups,
x=female_values,
orientation="h",
name="Female",
marker_color=PALETTE_GENDER["female"],
hovertemplate="%{y}<br>Female: %{x:,}<extra></extra>",
)
)
# Calculate max for symmetric axis
max_val = max(max(male_values, default=0), max(female_values, default=0))
fig.update_layout(
title=title,
barmode="overlay",
bargap=0.1,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={
"title": "Population",
"gridcolor": GRID_COLOR,
"range": [-max_val * 1.1, max_val * 1.1],
"tickvals": [-max_val, -max_val / 2, 0, max_val / 2, max_val],
"ticktext": [
f"{max_val:,.0f}",
f"{max_val / 2:,.0f}",
"0",
f"{max_val / 2:,.0f}",
f"{max_val:,.0f}",
],
},
yaxis={"title": None, "gridcolor": GRID_COLOR},
legend={"orientation": "h", "yanchor": "bottom", "y": 1.02},
margin={"l": 10, "r": 10, "t": 60, "b": 10},
)
return fig
def create_donut_chart(
data: list[dict[str, Any]],
name_column: str,
value_column: str,
title: str | None = None,
colors: list[str] | None = None,
hole_size: float = 0.4,
) -> go.Figure:
"""Create donut chart for percentage breakdowns.
Args:
data: List of data records with name and value.
name_column: Column name for labels.
value_column: Column name for values.
title: Optional chart title.
colors: List of colors for segments.
hole_size: Size of center hole (0-1).
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Distribution")
df = pd.DataFrame(data)
# Use accessible palette by default
if colors is None:
colors = CHART_PALETTE
fig = go.Figure(
go.Pie(
labels=df[name_column],
values=df[value_column],
hole=hole_size,
marker_colors=colors[: len(df)],
textinfo="percent+label",
textposition="outside",
hovertemplate="%{label}<br>%{value:,} (%{percent})<extra></extra>",
)
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
font_color=TEXT_PRIMARY,
showlegend=False,
margin={"l": 10, "r": 10, "t": 60, "b": 10},
)
return fig
def create_income_distribution(
data: list[dict[str, Any]],
bracket_column: str,
count_column: str,
title: str | None = None,
color: str = CHART_PALETTE[3], # Teal
) -> go.Figure:
"""Create histogram-style bar chart for income distribution.
Args:
data: List of data records with income brackets and counts.
bracket_column: Column name for income brackets.
count_column: Column name for household counts.
title: Optional chart title.
color: Bar color.
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Income Distribution")
df = pd.DataFrame(data)
fig = go.Figure(
go.Bar(
x=df[bracket_column],
y=df[count_column],
marker_color=color,
text=df[count_column].apply(lambda x: f"{x:,}"),
textposition="outside",
hovertemplate="%{x}<br>Households: %{y:,}<extra></extra>",
)
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={
"title": "Income Bracket",
"gridcolor": GRID_COLOR,
"tickangle": -45,
},
yaxis={
"title": "Households",
"gridcolor": GRID_COLOR,
},
margin={"l": 10, "r": 10, "t": 60, "b": 80},
)
return fig
def _create_empty_figure(title: str) -> go.Figure:
"""Create an empty figure with a message."""
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"visible": False},
yaxis={"visible": False},
)
return fig

View File

@@ -0,0 +1,167 @@
"""Radar/spider chart figure factory for multi-metric comparison."""
from typing import Any
import plotly.graph_objects as go
from portfolio_app.design import (
CHART_PALETTE,
GRID_COLOR_DARK,
PAPER_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
def create_radar_figure(
data: list[dict[str, Any]],
metrics: list[str],
name_column: str | None = None,
title: str | None = None,
fill: bool = True,
colors: list[str] | None = None,
) -> go.Figure:
"""Create radar/spider chart for multi-axis comparison.
Each record in data represents one entity (e.g., a neighbourhood)
with values for each metric that will be plotted on a separate axis.
Args:
data: List of data records, each with values for the metrics.
metrics: List of metric column names to display on radar axes.
name_column: Column name for entity labels.
title: Optional chart title.
fill: Whether to fill the radar polygons.
colors: List of colors for each data series.
Returns:
Plotly Figure object.
"""
if not data or not metrics:
return _create_empty_figure(title or "Radar Chart")
# Use accessible palette by default
if colors is None:
colors = CHART_PALETTE
fig = go.Figure()
# Format axis labels
axis_labels = [m.replace("_", " ").title() for m in metrics]
for i, record in enumerate(data):
values = [record.get(m, 0) or 0 for m in metrics]
# Close the radar polygon
values_closed = values + [values[0]]
labels_closed = axis_labels + [axis_labels[0]]
name = (
record.get(name_column, f"Series {i + 1}")
if name_column
else f"Series {i + 1}"
)
color = colors[i % len(colors)]
fig.add_trace(
go.Scatterpolar(
r=values_closed,
theta=labels_closed,
name=name,
line={"color": color, "width": 2},
fill="toself" if fill else None,
fillcolor=f"rgba{_hex_to_rgba(color, 0.2)}" if fill else None,
hovertemplate="%{theta}: %{r:.1f}<extra></extra>",
)
)
fig.update_layout(
title=title,
polar={
"radialaxis": {
"visible": True,
"gridcolor": GRID_COLOR_DARK,
"linecolor": GRID_COLOR_DARK,
"tickfont": {"color": TEXT_PRIMARY},
},
"angularaxis": {
"gridcolor": GRID_COLOR_DARK,
"linecolor": GRID_COLOR_DARK,
"tickfont": {"color": TEXT_PRIMARY},
},
"bgcolor": PAPER_BG,
},
paper_bgcolor=PAPER_BG,
font_color=TEXT_PRIMARY,
showlegend=len(data) > 1,
legend={"orientation": "h", "yanchor": "bottom", "y": -0.2},
margin={"l": 40, "r": 40, "t": 60, "b": 40},
)
return fig
def create_comparison_radar(
selected_data: dict[str, Any],
average_data: dict[str, Any],
metrics: list[str],
selected_name: str = "Selected",
average_name: str = "City Average",
title: str | None = None,
) -> go.Figure:
"""Create radar chart comparing a selection to city average.
Args:
selected_data: Data for the selected entity.
average_data: Data for the city average.
metrics: List of metric column names.
selected_name: Label for selected entity.
average_name: Label for average.
title: Optional chart title.
Returns:
Plotly Figure object.
"""
if not selected_data or not average_data:
return _create_empty_figure(title or "Comparison")
data = [
{**selected_data, "__name__": selected_name},
{**average_data, "__name__": average_name},
]
return create_radar_figure(
data=data,
metrics=metrics,
name_column="__name__",
title=title,
colors=[CHART_PALETTE[3], TEXT_SECONDARY], # Teal for selected, gray for avg
)
def _hex_to_rgba(hex_color: str, alpha: float) -> tuple[int, int, int, float]:
"""Convert hex color to RGBA tuple."""
hex_color = hex_color.lstrip("#")
r = int(hex_color[0:2], 16)
g = int(hex_color[2:4], 16)
b = int(hex_color[4:6], 16)
return (r, g, b, alpha)
def _create_empty_figure(title: str) -> go.Figure:
"""Create an empty figure with a message."""
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
font_color=TEXT_PRIMARY,
)
return fig

View File

@@ -0,0 +1,194 @@
"""Scatter plot figure factory for correlation views."""
from typing import Any
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from portfolio_app.design import (
CHART_PALETTE,
GRID_COLOR,
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
def create_scatter_figure(
data: list[dict[str, Any]],
x_column: str,
y_column: str,
name_column: str | None = None,
size_column: str | None = None,
color_column: str | None = None,
title: str | None = None,
x_title: str | None = None,
y_title: str | None = None,
trendline: bool = False,
color_scale: str = "Blues",
) -> go.Figure:
"""Create scatter plot for correlation visualization.
Args:
data: List of data records.
x_column: Column name for x-axis values.
y_column: Column name for y-axis values.
name_column: Column name for point labels (hover).
size_column: Column name for point sizes.
color_column: Column name for color encoding.
title: Optional chart title.
x_title: X-axis title.
y_title: Y-axis title.
trendline: Whether to add OLS trendline.
color_scale: Plotly color scale for continuous colors.
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Scatter Plot")
df = pd.DataFrame(data)
# Build hover_data
hover_data = {}
if name_column and name_column in df.columns:
hover_data[name_column] = True
# Create scatter plot
fig = px.scatter(
df,
x=x_column,
y=y_column,
size=size_column if size_column and size_column in df.columns else None,
color=color_column if color_column and color_column in df.columns else None,
color_continuous_scale=color_scale,
hover_name=name_column,
trendline="ols" if trendline else None,
opacity=0.7,
)
# Style the markers
fig.update_traces(
marker={
"line": {"width": 1, "color": "rgba(255,255,255,0.3)"},
},
)
# Trendline styling
if trendline:
fig.update_traces(
selector={"mode": "lines"},
line={"color": CHART_PALETTE[1], "dash": "dash", "width": 2},
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={
"gridcolor": GRID_COLOR,
"title": x_title or x_column.replace("_", " ").title(),
"zeroline": False,
},
yaxis={
"gridcolor": GRID_COLOR,
"title": y_title or y_column.replace("_", " ").title(),
"zeroline": False,
},
margin={"l": 10, "r": 10, "t": 40, "b": 10},
showlegend=color_column is not None,
)
return fig
def create_bubble_chart(
data: list[dict[str, Any]],
x_column: str,
y_column: str,
size_column: str,
name_column: str | None = None,
color_column: str | None = None,
title: str | None = None,
x_title: str | None = None,
y_title: str | None = None,
size_max: int = 50,
) -> go.Figure:
"""Create bubble chart with sized markers.
Args:
data: List of data records.
x_column: Column name for x-axis values.
y_column: Column name for y-axis values.
size_column: Column name for bubble sizes.
name_column: Column name for labels.
color_column: Column name for colors.
title: Optional chart title.
x_title: X-axis title.
y_title: Y-axis title.
size_max: Maximum marker size in pixels.
Returns:
Plotly Figure object.
"""
if not data:
return _create_empty_figure(title or "Bubble Chart")
df = pd.DataFrame(data)
fig = px.scatter(
df,
x=x_column,
y=y_column,
size=size_column,
color=color_column,
hover_name=name_column,
size_max=size_max,
opacity=0.7,
color_discrete_sequence=CHART_PALETTE,
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={
"gridcolor": GRID_COLOR,
"title": x_title or x_column.replace("_", " ").title(),
},
yaxis={
"gridcolor": GRID_COLOR,
"title": y_title or y_column.replace("_", " ").title(),
},
margin={"l": 10, "r": 10, "t": 40, "b": 10},
)
return fig
def _create_empty_figure(title: str) -> go.Figure:
"""Create an empty figure with a message."""
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
fig.update_layout(
title=title,
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"visible": False},
yaxis={"visible": False},
)
return fig

View File

@@ -4,6 +4,14 @@ from typing import Any
import plotly.graph_objects as go import plotly.graph_objects as go
from portfolio_app.design import (
COLOR_NEGATIVE,
COLOR_POSITIVE,
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
)
def create_metric_card_figure( def create_metric_card_figure(
value: float | int | str, value: float | int | str,
@@ -59,8 +67,12 @@ def create_metric_card_figure(
"relative": False, "relative": False,
"valueformat": ".1f", "valueformat": ".1f",
"suffix": delta_suffix, "suffix": delta_suffix,
"increasing": {"color": "green" if positive_is_good else "red"}, "increasing": {
"decreasing": {"color": "red" if positive_is_good else "green"}, "color": COLOR_POSITIVE if positive_is_good else COLOR_NEGATIVE
},
"decreasing": {
"color": COLOR_NEGATIVE if positive_is_good else COLOR_POSITIVE
},
} }
fig.add_trace(go.Indicator(**indicator_config)) fig.add_trace(go.Indicator(**indicator_config))
@@ -68,9 +80,9 @@ def create_metric_card_figure(
fig.update_layout( fig.update_layout(
height=120, height=120,
margin={"l": 20, "r": 20, "t": 40, "b": 20}, margin={"l": 20, "r": 20, "t": 40, "b": 20},
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font={"family": "Inter, sans-serif", "color": "#c9c9c9"}, font={"family": "Inter, sans-serif", "color": TEXT_PRIMARY},
) )
return fig return fig

View File

@@ -5,6 +5,15 @@ from typing import Any
import plotly.express as px import plotly.express as px
import plotly.graph_objects as go import plotly.graph_objects as go
from portfolio_app.design import (
CHART_PALETTE,
GRID_COLOR,
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
def create_price_time_series( def create_price_time_series(
data: list[dict[str, Any]], data: list[dict[str, Any]],
@@ -38,14 +47,14 @@ def create_price_time_series(
x=0.5, x=0.5,
y=0.5, y=0.5,
showarrow=False, showarrow=False,
font={"color": "#888888"}, font={"color": TEXT_SECONDARY},
) )
fig.update_layout( fig.update_layout(
title=title, title=title,
height=350, height=350,
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
) )
return fig return fig
@@ -59,6 +68,7 @@ def create_price_time_series(
y=price_column, y=price_column,
color=group_column, color=group_column,
title=title, title=title,
color_discrete_sequence=CHART_PALETTE,
) )
else: else:
fig = px.line( fig = px.line(
@@ -67,6 +77,7 @@ def create_price_time_series(
y=price_column, y=price_column,
title=title, title=title,
) )
fig.update_traces(line_color=CHART_PALETTE[0])
fig.update_layout( fig.update_layout(
height=350, height=350,
@@ -76,11 +87,11 @@ def create_price_time_series(
yaxis_tickprefix="$", yaxis_tickprefix="$",
yaxis_tickformat=",", yaxis_tickformat=",",
hovermode="x unified", hovermode="x unified",
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
xaxis={"gridcolor": "#333333", "linecolor": "#444444"}, xaxis={"gridcolor": GRID_COLOR, "linecolor": GRID_COLOR},
yaxis={"gridcolor": "#333333", "linecolor": "#444444"}, yaxis={"gridcolor": GRID_COLOR, "linecolor": GRID_COLOR},
) )
return fig return fig
@@ -118,14 +129,14 @@ def create_volume_time_series(
x=0.5, x=0.5,
y=0.5, y=0.5,
showarrow=False, showarrow=False,
font={"color": "#888888"}, font={"color": TEXT_SECONDARY},
) )
fig.update_layout( fig.update_layout(
title=title, title=title,
height=350, height=350,
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
) )
return fig return fig
@@ -140,6 +151,7 @@ def create_volume_time_series(
y=volume_column, y=volume_column,
color=group_column, color=group_column,
title=title, title=title,
color_discrete_sequence=CHART_PALETTE,
) )
else: else:
fig = px.bar( fig = px.bar(
@@ -148,6 +160,7 @@ def create_volume_time_series(
y=volume_column, y=volume_column,
title=title, title=title,
) )
fig.update_traces(marker_color=CHART_PALETTE[0])
else: else:
if group_column and group_column in df.columns: if group_column and group_column in df.columns:
fig = px.line( fig = px.line(
@@ -156,6 +169,7 @@ def create_volume_time_series(
y=volume_column, y=volume_column,
color=group_column, color=group_column,
title=title, title=title,
color_discrete_sequence=CHART_PALETTE,
) )
else: else:
fig = px.line( fig = px.line(
@@ -164,6 +178,7 @@ def create_volume_time_series(
y=volume_column, y=volume_column,
title=title, title=title,
) )
fig.update_traces(line_color=CHART_PALETTE[0])
fig.update_layout( fig.update_layout(
height=350, height=350,
@@ -172,11 +187,11 @@ def create_volume_time_series(
yaxis_title=volume_column.replace("_", " ").title(), yaxis_title=volume_column.replace("_", " ").title(),
yaxis_tickformat=",", yaxis_tickformat=",",
hovermode="x unified", hovermode="x unified",
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
xaxis={"gridcolor": "#333333", "linecolor": "#444444"}, xaxis={"gridcolor": GRID_COLOR, "linecolor": GRID_COLOR},
yaxis={"gridcolor": "#333333", "linecolor": "#444444"}, yaxis={"gridcolor": GRID_COLOR, "linecolor": GRID_COLOR},
) )
return fig return fig
@@ -211,14 +226,14 @@ def create_market_comparison_chart(
x=0.5, x=0.5,
y=0.5, y=0.5,
showarrow=False, showarrow=False,
font={"color": "#888888"}, font={"color": TEXT_SECONDARY},
) )
fig.update_layout( fig.update_layout(
title=title, title=title,
height=400, height=400,
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
) )
return fig return fig
@@ -230,8 +245,6 @@ def create_market_comparison_chart(
fig = make_subplots(specs=[[{"secondary_y": True}]]) fig = make_subplots(specs=[[{"secondary_y": True}]])
colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728"]
for i, metric in enumerate(metrics[:4]): for i, metric in enumerate(metrics[:4]):
if metric not in df.columns: if metric not in df.columns:
continue continue
@@ -242,7 +255,7 @@ def create_market_comparison_chart(
x=df[date_column], x=df[date_column],
y=df[metric], y=df[metric],
name=metric.replace("_", " ").title(), name=metric.replace("_", " ").title(),
line={"color": colors[i % len(colors)]}, line={"color": CHART_PALETTE[i % len(CHART_PALETTE)]},
), ),
secondary_y=secondary, secondary_y=secondary,
) )
@@ -252,18 +265,18 @@ def create_market_comparison_chart(
height=400, height=400,
margin={"l": 40, "r": 40, "t": 50, "b": 40}, margin={"l": 40, "r": 40, "t": 50, "b": 40},
hovermode="x unified", hovermode="x unified",
paper_bgcolor="rgba(0,0,0,0)", paper_bgcolor=PAPER_BG,
plot_bgcolor="rgba(0,0,0,0)", plot_bgcolor=PLOT_BG,
font_color="#c9c9c9", font_color=TEXT_PRIMARY,
xaxis={"gridcolor": "#333333", "linecolor": "#444444"}, xaxis={"gridcolor": GRID_COLOR, "linecolor": GRID_COLOR},
yaxis={"gridcolor": "#333333", "linecolor": "#444444"}, yaxis={"gridcolor": GRID_COLOR, "linecolor": GRID_COLOR},
legend={ legend={
"orientation": "h", "orientation": "h",
"yanchor": "bottom", "yanchor": "bottom",
"y": 1.02, "y": 1.02,
"xanchor": "right", "xanchor": "right",
"x": 1, "x": 1,
"font": {"color": "#c9c9c9"}, "font": {"color": TEXT_PRIMARY},
}, },
) )
@@ -290,13 +303,13 @@ def add_policy_markers(
if not policy_events: if not policy_events:
return fig return fig
# Color mapping for policy categories # Color mapping for policy categories using design tokens
category_colors = { category_colors = {
"monetary": "#1f77b4", # Blue "monetary": CHART_PALETTE[0], # Blue
"tax": "#2ca02c", # Green "tax": CHART_PALETTE[3], # Teal/green
"regulatory": "#ff7f0e", # Orange "regulatory": CHART_PALETTE[1], # Orange
"supply": "#9467bd", # Purple "supply": CHART_PALETTE[6], # Pink
"economic": "#d62728", # Red "economic": CHART_PALETTE[5], # Vermillion
} }
# Symbol mapping for expected direction # Symbol mapping for expected direction
@@ -313,7 +326,7 @@ def add_policy_markers(
title = event.get("title", "Policy Event") title = event.get("title", "Policy Event")
level = event.get("level", "federal") level = event.get("level", "federal")
color = category_colors.get(category, "#666666") color = category_colors.get(category, TEXT_SECONDARY)
symbol = direction_symbols.get(direction, "circle") symbol = direction_symbols.get(direction, "circle")
# Add vertical line for the event # Add vertical line for the event
@@ -335,7 +348,7 @@ def add_policy_markers(
"symbol": symbol, "symbol": symbol,
"size": 12, "size": 12,
"color": color, "color": color,
"line": {"width": 1, "color": "white"}, "line": {"width": 1, "color": TEXT_PRIMARY},
}, },
name=title, name=title,
hovertemplate=( hovertemplate=(

View File

@@ -2,6 +2,7 @@
import dash import dash
import dash_mantine_components as dmc import dash_mantine_components as dmc
from dash import html
from dash_iconify import DashIconify from dash_iconify import DashIconify
dash.register_page(__name__, path="/contact", name="Contact") dash.register_page(__name__, path="/contact", name="Contact")
@@ -51,51 +52,57 @@ def create_intro_section() -> dmc.Stack:
def create_contact_form() -> dmc.Paper: def create_contact_form() -> dmc.Paper:
"""Create the contact form (disabled in Phase 1).""" """Create the contact form with Formspree integration."""
return dmc.Paper( return dmc.Paper(
dmc.Stack( dmc.Stack(
[ [
dmc.Title("Send a Message", order=2, size="h4"), dmc.Title("Send a Message", order=2, size="h4"),
dmc.Alert( # Feedback container for success/error messages
"Contact form submission is coming soon. Please use the direct contact " html.Div(id="contact-feedback"),
"methods below for now.",
title="Form Coming Soon",
color="blue",
variant="light",
),
dmc.TextInput( dmc.TextInput(
id="contact-name",
label="Name", label="Name",
placeholder="Your name", placeholder="Your name",
leftSection=DashIconify(icon="tabler:user", width=18), leftSection=DashIconify(icon="tabler:user", width=18),
disabled=True, required=True,
), ),
dmc.TextInput( dmc.TextInput(
id="contact-email",
label="Email", label="Email",
placeholder="your.email@example.com", placeholder="your.email@example.com",
leftSection=DashIconify(icon="tabler:mail", width=18), leftSection=DashIconify(icon="tabler:mail", width=18),
disabled=True, required=True,
), ),
dmc.Select( dmc.Select(
id="contact-subject",
label="Subject", label="Subject",
placeholder="Select a subject", placeholder="Select a subject",
data=SUBJECT_OPTIONS, data=SUBJECT_OPTIONS,
leftSection=DashIconify(icon="tabler:tag", width=18), leftSection=DashIconify(icon="tabler:tag", width=18),
disabled=True,
), ),
dmc.Textarea( dmc.Textarea(
id="contact-message",
label="Message", label="Message",
placeholder="Your message...", placeholder="Your message...",
minRows=4, minRows=4,
disabled=True, required=True,
),
# Honeypot field for spam protection (hidden from users)
dmc.TextInput(
id="contact-gotcha",
style={"position": "absolute", "left": "-9999px"},
tabIndex=-1,
autoComplete="off",
), ),
dmc.Button( dmc.Button(
"Send Message", "Send Message",
id="contact-submit",
fullWidth=True, fullWidth=True,
leftSection=DashIconify(icon="tabler:send", width=18), leftSection=DashIconify(icon="tabler:send", width=18),
disabled=True,
), ),
], ],
gap="md", gap="md",
style={"position": "relative"},
), ),
p="xl", p="xl",
radius="md", radius="md",

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,417 @@
"""Chart callbacks for supporting visualizations."""
# mypy: disable-error-code="misc,no-untyped-def,arg-type"
import pandas as pd
import plotly.graph_objects as go
from dash import Input, Output, callback
from portfolio_app.design import (
CHART_PALETTE,
GRID_COLOR,
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
from portfolio_app.figures.toronto import (
create_donut_chart,
create_horizontal_bar,
create_radar_figure,
create_scatter_figure,
)
from portfolio_app.toronto.services import (
get_amenities_data,
get_city_averages,
get_demographics_data,
get_housing_data,
get_neighbourhood_details,
get_safety_data,
)
@callback(
Output("overview-scatter-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_overview_scatter(year: str) -> go.Figure:
"""Update income vs safety scatter plot."""
year_int = int(year) if year else 2021
df = get_demographics_data(year_int)
safety_df = get_safety_data(year_int)
if df.empty or safety_df.empty:
return _empty_chart("No data available")
# Merge demographics with safety
merged = df.merge(
safety_df[["neighbourhood_id", "total_crime_rate"]],
on="neighbourhood_id",
how="left",
)
# Compute safety score (inverse of crime rate)
if "total_crime_rate" in merged.columns:
max_crime = merged["total_crime_rate"].max()
if max_crime and max_crime > 0:
merged["safety_score"] = 100 - (
merged["total_crime_rate"] / max_crime * 100
)
else:
merged["safety_score"] = 50 # Default if no crime data
# Fill NULL population with median or default value for sizing
if "population" in merged.columns:
median_pop = merged["population"].median()
default_pop = median_pop if pd.notna(median_pop) else 10000
merged["population"] = merged["population"].fillna(default_pop)
# Filter rows with required data for scatter plot
merged = merged.dropna(subset=["median_household_income", "safety_score"])
if merged.empty:
return _empty_chart("Insufficient data for scatter plot")
data = merged.to_dict("records")
return create_scatter_figure(
data=data,
x_column="median_household_income",
y_column="safety_score",
name_column="neighbourhood_name",
size_column="population",
title="Income vs Safety",
x_title="Median Household Income ($)",
y_title="Safety Score",
trendline=True,
)
@callback(
Output("housing-trend-chart", "figure"),
Input("toronto-year-select", "value"),
Input("toronto-selected-neighbourhood", "data"),
)
def update_housing_trend(year: str, neighbourhood_id: int | None) -> go.Figure:
"""Update housing rent trend chart."""
# For now, show city averages as we don't have multi-year data
# This would be a time series if we had historical data
year_int = int(year) if year else 2021
averages = get_city_averages(year_int)
if not averages:
return _empty_chart("No trend data available")
# Placeholder for trend data - would be historical
base_rent = averages.get("avg_rent_2bed") or 2000
data = [
{"year": "2019", "avg_rent": base_rent * 0.85},
{"year": "2020", "avg_rent": base_rent * 0.88},
{"year": "2021", "avg_rent": base_rent * 0.92},
{"year": "2022", "avg_rent": base_rent * 0.96},
{"year": "2023", "avg_rent": base_rent},
]
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=[d["year"] for d in data],
y=[d["avg_rent"] for d in data],
mode="lines+markers",
line={"color": CHART_PALETTE[0], "width": 2},
marker={"size": 8},
name="City Average",
)
)
fig.update_layout(
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"gridcolor": GRID_COLOR},
yaxis={"gridcolor": GRID_COLOR, "title": "Avg Rent (2BR)"},
showlegend=False,
margin={"l": 40, "r": 10, "t": 10, "b": 30},
)
return fig
@callback(
Output("housing-types-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_housing_types(year: str) -> go.Figure:
"""Update dwelling types breakdown chart."""
year_int = int(year) if year else 2021
df = get_housing_data(year_int)
if df.empty:
return _empty_chart("No data available")
# Aggregate tenure types across city
owner_pct = df["pct_owner_occupied"].mean()
renter_pct = df["pct_renter_occupied"].mean()
data = [
{"type": "Owner Occupied", "percentage": owner_pct},
{"type": "Renter Occupied", "percentage": renter_pct},
]
return create_donut_chart(
data=data,
name_column="type",
value_column="percentage",
colors=[CHART_PALETTE[3], CHART_PALETTE[0]], # Teal for owner, blue for renter
)
@callback(
Output("safety-trend-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_safety_trend(year: str) -> go.Figure:
"""Update crime trend chart."""
# Placeholder for trend - would need historical data
data = [
{"year": "2019", "crime_rate": 4500},
{"year": "2020", "crime_rate": 4200},
{"year": "2021", "crime_rate": 4100},
{"year": "2022", "crime_rate": 4300},
{"year": "2023", "crime_rate": 4250},
]
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=[d["year"] for d in data],
y=[d["crime_rate"] for d in data],
mode="lines+markers",
line={"color": CHART_PALETTE[5], "width": 2}, # Vermillion
marker={"size": 8},
fill="tozeroy",
fillcolor="rgba(213, 94, 0, 0.1)", # Vermillion with opacity
)
)
fig.update_layout(
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"gridcolor": GRID_COLOR},
yaxis={"gridcolor": GRID_COLOR, "title": "Crime Rate per 100K"},
showlegend=False,
margin={"l": 40, "r": 10, "t": 10, "b": 30},
)
return fig
@callback(
Output("safety-types-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_safety_types(year: str) -> go.Figure:
"""Update crime by category chart."""
year_int = int(year) if year else 2021
df = get_safety_data(year_int)
if df.empty:
return _empty_chart("No data available")
# Aggregate crime types across city
violent = df["violent_crimes"].sum() if "violent_crimes" in df.columns else 0
property_crimes = (
df["property_crimes"].sum() if "property_crimes" in df.columns else 0
)
theft = df["theft_crimes"].sum() if "theft_crimes" in df.columns else 0
other = (
df["total_crimes"].sum() - violent - property_crimes - theft
if "total_crimes" in df.columns
else 0
)
data = [
{"category": "Violent", "count": int(violent)},
{"category": "Property", "count": int(property_crimes)},
{"category": "Theft", "count": int(theft)},
{"category": "Other", "count": int(max(0, other))},
]
return create_horizontal_bar(
data=data,
name_column="category",
value_column="count",
color=CHART_PALETTE[5], # Vermillion for crime
)
@callback(
Output("demographics-age-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_demographics_age(year: str) -> go.Figure:
"""Update age distribution chart."""
year_int = int(year) if year else 2021
df = get_demographics_data(year_int)
if df.empty:
return _empty_chart("No data available")
# Calculate average age distribution
under_18 = df["pct_under_18"].mean() if "pct_under_18" in df.columns else 20
age_18_64 = df["pct_18_to_64"].mean() if "pct_18_to_64" in df.columns else 65
over_65 = df["pct_65_plus"].mean() if "pct_65_plus" in df.columns else 15
data = [
{"age_group": "Under 18", "percentage": under_18},
{"age_group": "18-64", "percentage": age_18_64},
{"age_group": "65+", "percentage": over_65},
]
return create_donut_chart(
data=data,
name_column="age_group",
value_column="percentage",
colors=[
CHART_PALETTE[2],
CHART_PALETTE[0],
CHART_PALETTE[4],
], # Sky, Blue, Yellow
)
@callback(
Output("demographics-income-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_demographics_income(year: str) -> go.Figure:
"""Update income distribution chart."""
year_int = int(year) if year else 2021
df = get_demographics_data(year_int)
if df.empty:
return _empty_chart("No data available")
# Create income quintile distribution
if "income_quintile" in df.columns:
quintile_counts = df["income_quintile"].value_counts().sort_index()
data = [
{"bracket": f"Q{q}", "count": int(count)}
for q, count in quintile_counts.items()
]
else:
# Fallback to placeholder
data = [
{"bracket": "Q1 (Low)", "count": 32},
{"bracket": "Q2", "count": 32},
{"bracket": "Q3 (Mid)", "count": 32},
{"bracket": "Q4", "count": 31},
{"bracket": "Q5 (High)", "count": 31},
]
return create_horizontal_bar(
data=data,
name_column="bracket",
value_column="count",
color=CHART_PALETTE[3], # Teal
sort=False,
)
@callback(
Output("amenities-breakdown-chart", "figure"),
Input("toronto-year-select", "value"),
)
def update_amenities_breakdown(year: str) -> go.Figure:
"""Update amenity breakdown chart."""
year_int = int(year) if year else 2021
df = get_amenities_data(year_int)
if df.empty:
return _empty_chart("No data available")
# Aggregate amenity counts
parks = df["park_count"].sum() if "park_count" in df.columns else 0
schools = df["school_count"].sum() if "school_count" in df.columns else 0
childcare = df["childcare_count"].sum() if "childcare_count" in df.columns else 0
data = [
{"type": "Parks", "count": int(parks)},
{"type": "Schools", "count": int(schools)},
{"type": "Childcare", "count": int(childcare)},
]
return create_horizontal_bar(
data=data,
name_column="type",
value_column="count",
color=CHART_PALETTE[3], # Teal
)
@callback(
Output("amenities-radar-chart", "figure"),
Input("toronto-year-select", "value"),
Input("toronto-selected-neighbourhood", "data"),
)
def update_amenities_radar(year: str, neighbourhood_id: int | None) -> go.Figure:
"""Update amenity comparison radar chart."""
year_int = int(year) if year else 2021
# Get city averages
averages = get_city_averages(year_int)
amenity_score = averages.get("avg_amenity_score") or 50
city_data = {
"parks_per_1000": amenity_score / 100 * 10,
"schools_per_1000": amenity_score / 100 * 5,
"childcare_per_1000": amenity_score / 100 * 3,
"transit_access": 70,
}
data = [city_data]
# Add selected neighbourhood if available
if neighbourhood_id:
details = get_neighbourhood_details(neighbourhood_id, year_int)
if details:
selected_data = {
"parks_per_1000": details.get("park_count", 0) / 10,
"schools_per_1000": details.get("school_count", 0) / 5,
"childcare_per_1000": 3,
"transit_access": 70,
}
data.insert(0, selected_data)
return create_radar_figure(
data=data,
metrics=[
"parks_per_1000",
"schools_per_1000",
"childcare_per_1000",
"transit_access",
],
fill=True,
)
def _empty_chart(message: str) -> go.Figure:
"""Create an empty chart with a message."""
fig = go.Figure()
fig.update_layout(
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"visible": False},
yaxis={"visible": False},
)
fig.add_annotation(
text=message,
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
return fig

View File

@@ -0,0 +1,310 @@
"""Map callbacks for choropleth interactions."""
# mypy: disable-error-code="misc,no-untyped-def,arg-type,no-any-return"
import plotly.graph_objects as go
from dash import Input, Output, State, callback, no_update
from portfolio_app.design import (
PAPER_BG,
PLOT_BG,
TEXT_PRIMARY,
TEXT_SECONDARY,
)
from portfolio_app.figures.toronto import create_choropleth_figure, create_ranking_bar
from portfolio_app.toronto.services import (
get_amenities_data,
get_demographics_data,
get_housing_data,
get_neighbourhoods_geojson,
get_overview_data,
get_safety_data,
)
@callback(
Output("overview-choropleth", "figure"),
Input("overview-metric-select", "value"),
Input("toronto-year-select", "value"),
)
def update_overview_choropleth(metric: str, year: str) -> go.Figure:
"""Update the overview tab choropleth map."""
year_int = int(year) if year else 2021
df = get_overview_data(year_int)
geojson = get_neighbourhoods_geojson(year_int)
if df.empty:
return _empty_map("No data available")
data = df.to_dict("records")
# Color scales based on metric
color_scale = {
"livability_score": "Viridis",
"safety_score": "Greens",
"affordability_score": "Blues",
"amenity_score": "Purples",
}.get(metric, "Viridis")
return create_choropleth_figure(
geojson=geojson,
data=data,
location_key="neighbourhood_id",
color_column=metric or "livability_score",
hover_data=["neighbourhood_name", "population"],
color_scale=color_scale,
)
@callback(
Output("housing-choropleth", "figure"),
Input("housing-metric-select", "value"),
Input("toronto-year-select", "value"),
)
def update_housing_choropleth(metric: str, year: str) -> go.Figure:
"""Update the housing tab choropleth map."""
year_int = int(year) if year else 2021
df = get_housing_data(year_int)
geojson = get_neighbourhoods_geojson(year_int)
if df.empty:
return _empty_map("No housing data available")
data = df.to_dict("records")
color_scale = {
"affordability_index": "RdYlGn_r",
"avg_rent_2bed": "Oranges",
"rent_to_income_pct": "Reds",
"vacancy_rate": "Blues",
}.get(metric, "Oranges")
return create_choropleth_figure(
geojson=geojson,
data=data,
location_key="neighbourhood_id",
color_column=metric or "affordability_index",
hover_data=["neighbourhood_name", "avg_rent_2bed", "vacancy_rate"],
color_scale=color_scale,
)
@callback(
Output("safety-choropleth", "figure"),
Input("safety-metric-select", "value"),
Input("toronto-year-select", "value"),
)
def update_safety_choropleth(metric: str, year: str) -> go.Figure:
"""Update the safety tab choropleth map."""
year_int = int(year) if year else 2021
df = get_safety_data(year_int)
geojson = get_neighbourhoods_geojson(year_int)
if df.empty:
return _empty_map("No safety data available")
data = df.to_dict("records")
return create_choropleth_figure(
geojson=geojson,
data=data,
location_key="neighbourhood_id",
color_column=metric or "total_crime_rate",
hover_data=["neighbourhood_name", "total_crimes"],
color_scale="Reds",
)
@callback(
Output("demographics-choropleth", "figure"),
Input("demographics-metric-select", "value"),
Input("toronto-year-select", "value"),
)
def update_demographics_choropleth(metric: str, year: str) -> go.Figure:
"""Update the demographics tab choropleth map."""
year_int = int(year) if year else 2021
df = get_demographics_data(year_int)
geojson = get_neighbourhoods_geojson(year_int)
if df.empty:
return _empty_map("No demographics data available")
data = df.to_dict("records")
color_scale = {
"population": "YlOrBr",
"median_income": "Greens",
"median_age": "Blues",
"diversity_index": "Purples",
}.get(metric, "YlOrBr")
# Map frontend metric names to column names
column_map = {
"population": "population",
"median_income": "median_household_income",
"median_age": "median_age",
"diversity_index": "diversity_index",
}
column = column_map.get(metric, "population")
return create_choropleth_figure(
geojson=geojson,
data=data,
location_key="neighbourhood_id",
color_column=column,
hover_data=["neighbourhood_name"],
color_scale=color_scale,
)
@callback(
Output("amenities-choropleth", "figure"),
Input("amenities-metric-select", "value"),
Input("toronto-year-select", "value"),
)
def update_amenities_choropleth(metric: str, year: str) -> go.Figure:
"""Update the amenities tab choropleth map."""
year_int = int(year) if year else 2021
df = get_amenities_data(year_int)
geojson = get_neighbourhoods_geojson(year_int)
if df.empty:
return _empty_map("No amenities data available")
data = df.to_dict("records")
# Map frontend metric names to column names
column_map = {
"amenity_score": "amenity_score",
"parks_per_capita": "parks_per_1000",
"schools_per_capita": "schools_per_1000",
"transit_score": "total_amenities_per_1000",
}
column = column_map.get(metric, "amenity_score")
return create_choropleth_figure(
geojson=geojson,
data=data,
location_key="neighbourhood_id",
color_column=column,
hover_data=["neighbourhood_name", "park_count", "school_count"],
color_scale="Greens",
)
@callback(
Output("toronto-selected-neighbourhood", "data"),
Input("overview-choropleth", "clickData"),
Input("housing-choropleth", "clickData"),
Input("safety-choropleth", "clickData"),
Input("demographics-choropleth", "clickData"),
Input("amenities-choropleth", "clickData"),
State("toronto-tabs", "value"),
prevent_initial_call=True,
)
def handle_map_click(
overview_click,
housing_click,
safety_click,
demographics_click,
amenities_click,
active_tab: str,
) -> int | None:
"""Extract neighbourhood ID from map click."""
# Get the click data for the active tab
click_map = {
"overview": overview_click,
"housing": housing_click,
"safety": safety_click,
"demographics": demographics_click,
"amenities": amenities_click,
}
click_data = click_map.get(active_tab)
if not click_data:
return no_update
try:
# Extract neighbourhood_id from click data
point = click_data["points"][0]
location = point.get("location") or point.get("customdata", [None])[0]
if location:
return int(location)
except (KeyError, IndexError, TypeError):
pass
return no_update
@callback(
Output("overview-rankings-chart", "figure"),
Input("overview-metric-select", "value"),
Input("toronto-year-select", "value"),
)
def update_rankings_chart(metric: str, year: str) -> go.Figure:
"""Update the top/bottom rankings bar chart."""
year_int = int(year) if year else 2021
df = get_overview_data(year_int)
if df.empty:
return _empty_chart("No data available")
# Use the selected metric for ranking
metric = metric or "livability_score"
data = df.to_dict("records")
return create_ranking_bar(
data=data,
name_column="neighbourhood_name",
value_column=metric,
title=f"Top & Bottom 10 by {metric.replace('_', ' ').title()}",
top_n=10,
bottom_n=10,
)
def _empty_map(message: str) -> go.Figure:
"""Create an empty map with a message."""
fig = go.Figure()
fig.update_layout(
mapbox={
"style": "carto-darkmatter",
"center": {"lat": 43.7, "lon": -79.4},
"zoom": 9.5,
},
margin={"l": 0, "r": 0, "t": 0, "b": 0},
paper_bgcolor=PAPER_BG,
font_color=TEXT_PRIMARY,
)
fig.add_annotation(
text=message,
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
return fig
def _empty_chart(message: str) -> go.Figure:
"""Create an empty chart with a message."""
fig = go.Figure()
fig.update_layout(
paper_bgcolor=PAPER_BG,
plot_bgcolor=PLOT_BG,
font_color=TEXT_PRIMARY,
xaxis={"visible": False},
yaxis={"visible": False},
)
fig.add_annotation(
text=message,
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": TEXT_SECONDARY},
)
return fig

View File

@@ -0,0 +1,309 @@
"""Selection callbacks for dropdowns and neighbourhood details."""
# mypy: disable-error-code="misc,no-untyped-def,type-arg"
import dash_mantine_components as dmc
from dash import Input, Output, callback
from portfolio_app.toronto.services import (
get_city_averages,
get_neighbourhood_details,
get_neighbourhood_list,
)
@callback(
Output("toronto-neighbourhood-select", "data"),
Input("toronto-year-select", "value"),
)
def populate_neighbourhood_dropdown(year: str) -> list[dict]:
"""Populate the neighbourhood search dropdown."""
year_int = int(year) if year else 2021
neighbourhoods = get_neighbourhood_list(year_int)
return [
{"value": str(n["neighbourhood_id"]), "label": n["neighbourhood_name"]}
for n in neighbourhoods
]
@callback(
Output("toronto-selected-neighbourhood", "data", allow_duplicate=True),
Input("toronto-neighbourhood-select", "value"),
prevent_initial_call=True,
)
def select_from_dropdown(value: str | None) -> int | None:
"""Update selected neighbourhood from dropdown."""
if value:
return int(value)
return None
@callback(
Output("toronto-compare-btn", "disabled"),
Input("toronto-selected-neighbourhood", "data"),
)
def toggle_compare_button(neighbourhood_id: int | None) -> bool:
"""Enable compare button when a neighbourhood is selected."""
return neighbourhood_id is None
# Overview tab KPIs
@callback(
Output("overview-city-avg", "children"),
Input("toronto-year-select", "value"),
)
def update_overview_city_avg(year: str) -> str:
"""Update the city average livability score."""
year_int = int(year) if year else 2021
averages = get_city_averages(year_int)
score = averages.get("avg_livability_score", 72)
return f"{score:.0f}" if score else ""
@callback(
Output("overview-selected-name", "children"),
Output("overview-selected-scores", "children"),
Input("toronto-selected-neighbourhood", "data"),
Input("toronto-year-select", "value"),
)
def update_overview_selected(neighbourhood_id: int | None, year: str):
"""Update the selected neighbourhood details in overview tab."""
if not neighbourhood_id:
return "Click map to select", [dmc.Text("", c="dimmed")]
year_int = int(year) if year else 2021
details = get_neighbourhood_details(neighbourhood_id, year_int)
if not details:
return "Unknown", [dmc.Text("No data", c="dimmed")]
name = details.get("neighbourhood_name", "Unknown")
scores = [
dmc.Group(
[
dmc.Text("Livability:", size="sm"),
dmc.Text(
f"{details.get('livability_score', 0):.0f}", size="sm", fw=700
),
],
justify="space-between",
),
dmc.Group(
[
dmc.Text("Safety:", size="sm"),
dmc.Text(f"{details.get('safety_score', 0):.0f}", size="sm", fw=700),
],
justify="space-between",
),
dmc.Group(
[
dmc.Text("Affordability:", size="sm"),
dmc.Text(
f"{details.get('affordability_score', 0):.0f}", size="sm", fw=700
),
],
justify="space-between",
),
]
return name, scores
# Housing tab KPIs
@callback(
Output("housing-city-rent", "children"),
Output("housing-rent-change", "children"),
Input("toronto-year-select", "value"),
)
def update_housing_kpis(year: str):
"""Update housing tab KPI cards."""
year_int = int(year) if year else 2021
averages = get_city_averages(year_int)
rent = averages.get("avg_rent_2bed", 2450)
rent_str = f"${rent:,.0f}" if rent else ""
# Placeholder change - would come from historical data
change = "+4.2% YoY"
return rent_str, change
@callback(
Output("housing-selected-name", "children"),
Output("housing-selected-details", "children"),
Input("toronto-selected-neighbourhood", "data"),
Input("toronto-year-select", "value"),
)
def update_housing_selected(neighbourhood_id: int | None, year: str):
"""Update selected neighbourhood details in housing tab."""
if not neighbourhood_id:
return "Click map to select", [dmc.Text("", c="dimmed")]
year_int = int(year) if year else 2021
details = get_neighbourhood_details(neighbourhood_id, year_int)
if not details:
return "Unknown", [dmc.Text("No data", c="dimmed")]
name = details.get("neighbourhood_name", "Unknown")
rent = details.get("avg_rent_2bed")
vacancy = details.get("vacancy_rate")
info = [
dmc.Text(f"2BR Rent: ${rent:,.0f}" if rent else "2BR Rent: —", size="sm"),
dmc.Text(f"Vacancy: {vacancy:.1f}%" if vacancy else "Vacancy: —", size="sm"),
]
return name, info
# Safety tab KPIs
@callback(
Output("safety-city-rate", "children"),
Output("safety-rate-change", "children"),
Input("toronto-year-select", "value"),
)
def update_safety_kpis(year: str):
"""Update safety tab KPI cards."""
year_int = int(year) if year else 2021
averages = get_city_averages(year_int)
rate = averages.get("avg_crime_rate", 4250)
rate_str = f"{rate:,.0f}" if rate else ""
# Placeholder change
change = "-2.1% YoY"
return rate_str, change
@callback(
Output("safety-selected-name", "children"),
Output("safety-selected-details", "children"),
Input("toronto-selected-neighbourhood", "data"),
Input("toronto-year-select", "value"),
)
def update_safety_selected(neighbourhood_id: int | None, year: str):
"""Update selected neighbourhood details in safety tab."""
if not neighbourhood_id:
return "Click map to select", [dmc.Text("", c="dimmed")]
year_int = int(year) if year else 2021
details = get_neighbourhood_details(neighbourhood_id, year_int)
if not details:
return "Unknown", [dmc.Text("No data", c="dimmed")]
name = details.get("neighbourhood_name", "Unknown")
crime_rate = details.get("crime_rate_per_100k")
info = [
dmc.Text(
f"Crime Rate: {crime_rate:,.0f}/100K" if crime_rate else "Crime Rate: —",
size="sm",
),
]
return name, info
# Demographics tab KPIs
@callback(
Output("demographics-city-pop", "children"),
Output("demographics-pop-change", "children"),
Input("toronto-year-select", "value"),
)
def update_demographics_kpis(year: str):
"""Update demographics tab KPI cards."""
year_int = int(year) if year else 2021
averages = get_city_averages(year_int)
pop = averages.get("total_population", 2790000)
if pop and pop >= 1000000:
pop_str = f"{pop / 1000000:.2f}M"
elif pop:
pop_str = f"{pop:,.0f}"
else:
pop_str = ""
change = "+2.3% since 2016"
return pop_str, change
@callback(
Output("demographics-selected-name", "children"),
Output("demographics-selected-details", "children"),
Input("toronto-selected-neighbourhood", "data"),
Input("toronto-year-select", "value"),
)
def update_demographics_selected(neighbourhood_id: int | None, year: str):
"""Update selected neighbourhood details in demographics tab."""
if not neighbourhood_id:
return "Click map to select", [dmc.Text("", c="dimmed")]
year_int = int(year) if year else 2021
details = get_neighbourhood_details(neighbourhood_id, year_int)
if not details:
return "Unknown", [dmc.Text("No data", c="dimmed")]
name = details.get("neighbourhood_name", "Unknown")
pop = details.get("population")
income = details.get("median_household_income")
info = [
dmc.Text(f"Population: {pop:,}" if pop else "Population: —", size="sm"),
dmc.Text(
f"Median Income: ${income:,.0f}" if income else "Median Income: —",
size="sm",
),
]
return name, info
# Amenities tab KPIs
@callback(
Output("amenities-city-score", "children"),
Input("toronto-year-select", "value"),
)
def update_amenities_kpis(year: str) -> str:
"""Update amenities tab KPI cards."""
year_int = int(year) if year else 2021
averages = get_city_averages(year_int)
score = averages.get("avg_amenity_score", 68)
return f"{score:.0f}" if score else ""
@callback(
Output("amenities-selected-name", "children"),
Output("amenities-selected-details", "children"),
Input("toronto-selected-neighbourhood", "data"),
Input("toronto-year-select", "value"),
)
def update_amenities_selected(neighbourhood_id: int | None, year: str):
"""Update selected neighbourhood details in amenities tab."""
if not neighbourhood_id:
return "Click map to select", [dmc.Text("", c="dimmed")]
year_int = int(year) if year else 2021
details = get_neighbourhood_details(neighbourhood_id, year_int)
if not details:
return "Unknown", [dmc.Text("No data", c="dimmed")]
name = details.get("neighbourhood_name", "Unknown")
parks = details.get("park_count")
schools = details.get("school_count")
info = [
dmc.Text(f"Parks: {parks}" if parks is not None else "Parks: —", size="sm"),
dmc.Text(
f"Schools: {schools}" if schools is not None else "Schools: —", size="sm"
),
]
return name, info

Some files were not shown because too many files have changed in this diff Show More