26 Commits

Author SHA1 Message Date
d64f90b3d3 Merge branch 'feature/7-nav-theme-modernization' into development 2026-01-15 11:53:22 -05:00
b3fb94c7cb feat: Add floating sidebar navigation and dark theme support
- Add floating pill-shaped sidebar with navigation icons
- Implement dark/light theme toggle with localStorage persistence
- Update all figure factories for transparent backgrounds
- Use carto-darkmatter map style for choropleths
- Add methodology link button to Toronto dashboard header
- Add back to dashboard button on methodology page
- Remove social links from home page (now in sidebar)
- Update CLAUDE.md to Sprint 7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 11:53:13 -05:00
1e0ea9cca2 Merge pull request 'feat: Add GeoJSON parsers and choropleth map visualization' (#26) from feature/geo-parsers-choropleth into development 2026-01-14 23:02:21 +00:00
9dfa24fb76 feat: add GeoJSON parsers and choropleth map visualization
- Add geo.py parser module with CMHCZoneParser, TRREBDistrictParser,
  and NeighbourhoodParser for loading geographic boundaries
- Add coordinate reprojection support (EPSG:3857 to WGS84)
- Organize geo data in data/toronto/raw/geo/ directory
- Add CMHC zones GeoJSON (31 zones) for rental market choropleth
- Add Toronto neighbourhoods GeoJSON (158) as purchase market proxy
- Update callbacks with real CMHC 2024 rental data
- Add sample purchase data for all 158 neighbourhoods
- Update pre-commit config to exclude geo data files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 17:58:13 -05:00
8701a12b41 Merge pull request 'Upload files to "/"' (#24) from lmiranda-cmhc-zones into development
Reviewed-on: lmiranda/personal-portfolio#24
2026-01-14 21:04:24 +00:00
6ef5460ad0 Upload files to "/" 2026-01-14 21:04:06 +00:00
19ffc04573 Merge pull request 'fix: Toronto page registration for Dash Pages' (#23) from fix/toronto-page-registration into development 2026-01-12 03:19:49 +00:00
08aa61f85e fix: rename Toronto page __init__.py to dashboard.py for Dash Pages
Dash Pages does not auto-discover __init__.py files as page modules.
Renamed to dashboard.py so the page registers correctly at /toronto.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 22:08:49 -05:00
2a6db2a252 Merge pull request 'feat: Sprint 6 - Polish and deployment preparation' (#22) from feature/sprint6-polish-deploy into development 2026-01-12 02:51:14 +00:00
140d3085bf feat: Sprint 6 polish - methodology, demo data, deployment prep
- Add policy event markers to time series charts
- Create methodology page (/toronto/methodology) with data sources
- Add demo data module for testing without full pipeline
- Update README with project documentation
- Add health check endpoint (/health)
- Add database initialization script
- Export new figure factory functions

Closes #21

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 21:50:45 -05:00
ad6ee3d37f Merge pull request 'feat: Sprint 5 - Visualization' (#19) from feature/sprint5-visualization into development 2026-01-11 21:22:59 +00:00
077e426d34 feat: add Sprint 5 visualization components and Toronto dashboard
- Add figure factories: choropleth, time_series, summary_cards
- Add shared components: map_controls, time_slider, metric_card
- Create Toronto dashboard page with KPI cards, choropleth maps, and time series
- Add dashboard callbacks for interactivity
- Placeholder data for demonstration until QGIS boundaries are complete

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 16:20:01 -05:00
b7907e68e4 Merge pull request 'feat: Sprint 4 - Loaders and dbt models' (#17) from feature/sprint4-loaders-dbt into development 2026-01-11 21:08:01 +00:00
457bb49395 feat: add loaders and dbt models for Toronto housing data
Sprint 4 implementation:

Loaders:
- base.py: Session management, bulk insert, upsert utilities
- dimensions.py: Load time, district, zone, neighbourhood, policy dimensions
- trreb.py: Load TRREB purchase data to fact_purchases
- cmhc.py: Load CMHC rental data to fact_rentals

dbt Project:
- Project configuration (dbt_project.yml, packages.yml)
- Staging models for all fact and dimension tables
- Intermediate models with dimension enrichment
- Marts: purchase analysis, rental analysis, market summary

Closes #16

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 16:07:30 -05:00
88e23674a8 Merge pull request 'data: add TRREB and CMHC raw data files' (#15) from data/raw-data-files into development 2026-01-11 21:01:15 +00:00
1c42533834 data: add TRREB and CMHC raw data files
- TRREB Market Watch PDFs (2024-2025, 24 files)
- CMHC Rental Market Survey Excel files (2021-2025, 5 files)
- Update pre-commit to exclude data/raw/ from large file check

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 15:58:32 -05:00
802efab8b8 Merge pull request 'feat: Sprint 3 - Pydantic schemas, SQLAlchemy models, and parser structure' (#14) from feature/sprint3-schemas-models into development 2026-01-11 20:00:20 +00:00
ead6d91a28 feat: add Pydantic schemas, SQLAlchemy models, and parser structure
Sprint 3 implementation:
- Pydantic schemas for TRREB, CMHC, and dimension data validation
- SQLAlchemy models with PostGIS geometry for fact and dimension tables
- Parser structure (stubs) for TRREB PDF and CMHC CSV processing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 14:58:31 -05:00
549e1fcbaf Merge pull request 'feat: implement bio landing page with dash-mantine-components' (#12) from feature/sprint2-bio-page into development 2026-01-11 19:44:23 +00:00
3ee4c20f5e feat: implement bio landing page with dash-mantine-components
- Full bio page with hero, summary, tech stack, projects, social links
- MantineProvider theme integration in app.py
- Responsive layout using DMC SimpleGrid
- Added dash-iconify for social link icons
- Updated mypy overrides for DMC/iconify modules

Closes #11

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 14:43:50 -05:00
68cc5bbe66 Merge pull request 'Upload files to "docs"' (#10) from lmiranda-patch-bio-doc-added into development
Reviewed-on: lmiranda/personal-portfolio#10
2026-01-11 19:38:17 +00:00
58f2c692e3 Upload files to "docs" 2026-01-11 19:38:03 +00:00
8200bbaa99 Merge pull request 'fix: update all dependencies to current versions' (#9) from fix/update-dependencies into development 2026-01-11 19:31:20 +00:00
15da8a97ce fix: update all dependencies to current versions
Updated to January 2026 versions:
- dash: 3.3+
- plotly: 6.5+
- dash-mantine-components: 2.4+
- pandas: 2.3+
- geopandas: 1.1+
- sqlalchemy: 2.0.45+
- pydantic: 2.10+
- pytest: 8.3+
- ruff: 0.8+
- mypy: 1.14+
- dbt-postgres: 1.9+

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 14:29:15 -05:00
eb01ad1101 Merge pull request 'feat: add app foundation (config.py, app.py, home page)' (#8) from feature/sprint1-app-foundation into development 2026-01-11 19:09:43 +00:00
8453f78e31 feat: add app foundation (config.py, app.py, home page)
- config.py: Pydantic BaseSettings for env loading
- app.py: Dash factory with Pages routing
- pages/home.py: Placeholder landing page

Closes #7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 14:09:28 -05:00
94 changed files with 7002 additions and 24 deletions

View File

@@ -7,6 +7,7 @@ repos:
- id: check-yaml - id: check-yaml
- id: check-added-large-files - id: check-added-large-files
args: ['--maxkb=1000'] args: ['--maxkb=1000']
exclude: ^data/(raw/|toronto/raw/geo/)
- id: check-merge-conflict - id: check-merge-conflict
- repo: https://github.com/astral-sh/ruff-pre-commit - repo: https://github.com/astral-sh/ruff-pre-commit

View File

@@ -6,7 +6,7 @@ Working context for Claude Code on the Analytics Portfolio project.
## Project Status ## Project Status
**Current Sprint**: 1 (Project Bootstrap) **Current Sprint**: 7 (Navigation & Theme Modernization)
**Phase**: 1 - Toronto Housing Dashboard **Phase**: 1 - Toronto Housing Dashboard
**Branch**: `development` (feature branches merge here) **Branch**: `development` (feature branches merge here)
@@ -254,4 +254,4 @@ All scripts in `scripts/`:
--- ---
*Last Updated: Sprint 1* *Last Updated: Sprint 7*

120
README.md
View File

@@ -1,2 +1,120 @@
# personal-portfolio # Analytics Portfolio
A data analytics portfolio showcasing end-to-end data engineering, visualization, and analysis capabilities.
## Projects
### Toronto Housing Dashboard
An interactive choropleth dashboard analyzing Toronto's housing market using multi-source data integration.
**Features:**
- Purchase market analysis from TRREB monthly reports
- Rental market analysis from CMHC annual surveys
- Interactive choropleth maps by district/zone
- Time series visualization with policy event annotations
- Purchase/Rental mode toggle
**Data Sources:**
- [TRREB Market Watch](https://trreb.ca/market-data/market-watch/) - Monthly purchase statistics
- [CMHC Rental Market Survey](https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market) - Annual rental data
**Tech Stack:**
- Python 3.11+ / Dash / Plotly
- PostgreSQL + PostGIS
- dbt for data transformation
- Pydantic for validation
- SQLAlchemy 2.0
## Quick Start
```bash
# Clone and setup
git clone https://github.com/lmiranda/personal-portfolio.git
cd personal-portfolio
# Install dependencies and configure environment
make setup
# Start database
make docker-up
# Initialize database schema
make db-init
# Run development server
make run
```
Visit `http://localhost:8050` to view the portfolio.
## Project Structure
```
portfolio_app/
├── app.py # Dash app factory
├── config.py # Pydantic settings
├── pages/
│ ├── home.py # Bio landing page (/)
│ └── toronto/ # Toronto dashboard (/toronto)
├── components/ # Shared UI components
├── figures/ # Plotly figure factories
└── toronto/ # Toronto data logic
├── parsers/ # PDF/CSV extraction
├── loaders/ # Database operations
├── schemas/ # Pydantic models
└── models/ # SQLAlchemy ORM
dbt/
├── models/
│ ├── staging/ # 1:1 source tables
│ ├── intermediate/ # Business logic
│ └── marts/ # Analytical tables
```
## Development
```bash
make test # Run tests
make lint # Run linter
make format # Format code
make ci # Run all checks
```
## Data Pipeline
```
Raw Files (PDF/Excel)
Parsers (pdfplumber, pandas)
Pydantic Validation
SQLAlchemy Loaders
PostgreSQL + PostGIS
dbt Transformations
Dash Visualization
```
## Environment Variables
Copy `.env.example` to `.env` and configure:
```bash
DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
```
## License
MIT
## Author
Leo Miranda - [GitHub](https://github.com/lmiranda) | [LinkedIn](https://linkedin.com/in/yourprofile)

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

BIN
data/raw/trreb/mw2401.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2402.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2403.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2404.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2405.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2406.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2407.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2408.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2409.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2410.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2411.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2412.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2501.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2502.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2503.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2504.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2505.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2506.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2507.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2508.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2509.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2510.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2511.pdf Normal file

Binary file not shown.

BIN
data/raw/trreb/mw2512.pdf Normal file

Binary file not shown.

View File

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

28
dbt/dbt_project.yml Normal file
View File

@@ -0,0 +1,28 @@
name: 'toronto_housing'
version: '1.0.0'
config-version: 2
profile: 'toronto_housing'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
clean-targets:
- "target"
- "dbt_packages"
models:
toronto_housing:
staging:
+materialized: view
+schema: staging
intermediate:
+materialized: view
+schema: intermediate
marts:
+materialized: table
+schema: marts

View File

@@ -0,0 +1,24 @@
version: 2
models:
- name: int_purchases__monthly
description: "Purchase data enriched with time and district dimensions"
columns:
- name: purchase_id
tests:
- unique
- not_null
- name: district_code
tests:
- not_null
- name: int_rentals__annual
description: "Rental data enriched with time and zone dimensions"
columns:
- name: rental_id
tests:
- unique
- not_null
- name: zone_code
tests:
- not_null

View File

@@ -0,0 +1,62 @@
-- Intermediate: Monthly purchase data enriched with dimensions
-- Joins purchases with time and district dimensions for analysis
with purchases as (
select * from {{ ref('stg_trreb__purchases') }}
),
time_dim as (
select * from {{ ref('stg_dimensions__time') }}
),
district_dim as (
select * from {{ ref('stg_dimensions__trreb_districts') }}
),
enriched as (
select
p.purchase_id,
-- Time attributes
t.date_key,
t.full_date,
t.year,
t.month,
t.quarter,
t.month_name,
-- District attributes
d.district_key,
d.district_code,
d.district_name,
d.area_type,
-- Metrics
p.sales_count,
p.dollar_volume,
p.avg_price,
p.median_price,
p.new_listings,
p.active_listings,
p.days_on_market,
p.sale_to_list_ratio,
-- Calculated metrics
case
when p.active_listings > 0
then round(p.sales_count::numeric / p.active_listings, 3)
else null
end as absorption_rate,
case
when p.sales_count > 0
then round(p.active_listings::numeric / p.sales_count, 1)
else null
end as months_of_inventory
from purchases p
inner join time_dim t on p.date_key = t.date_key
inner join district_dim d on p.district_key = d.district_key
)
select * from enriched

View File

@@ -0,0 +1,57 @@
-- Intermediate: Annual rental data enriched with dimensions
-- Joins rentals with time and zone dimensions for analysis
with rentals as (
select * from {{ ref('stg_cmhc__rentals') }}
),
time_dim as (
select * from {{ ref('stg_dimensions__time') }}
),
zone_dim as (
select * from {{ ref('stg_dimensions__cmhc_zones') }}
),
enriched as (
select
r.rental_id,
-- Time attributes
t.date_key,
t.full_date,
t.year,
t.month,
t.quarter,
-- Zone attributes
z.zone_key,
z.zone_code,
z.zone_name,
-- Bedroom type
r.bedroom_type,
-- Metrics
r.rental_universe,
r.avg_rent,
r.median_rent,
r.vacancy_rate,
r.availability_rate,
r.turnover_rate,
r.year_over_year_rent_change,
r.reliability_code,
-- Calculated metrics
case
when r.rental_universe > 0 and r.vacancy_rate is not null
then round(r.rental_universe * (r.vacancy_rate / 100), 0)
else null
end as vacant_units_estimate
from rentals r
inner join time_dim t on r.date_key = t.date_key
inner join zone_dim z on r.zone_key = z.zone_key
)
select * from enriched

View File

@@ -0,0 +1,23 @@
version: 2
models:
- name: mart_toronto_purchases
description: "Final mart for Toronto purchase/sales analysis by district and time"
columns:
- name: purchase_id
description: "Unique purchase record identifier"
tests:
- unique
- not_null
- name: mart_toronto_rentals
description: "Final mart for Toronto rental market analysis by zone and time"
columns:
- name: rental_id
description: "Unique rental record identifier"
tests:
- unique
- not_null
- name: mart_toronto_market_summary
description: "Combined market summary aggregating purchases and rentals at Toronto level"

View File

@@ -0,0 +1,81 @@
-- Mart: Toronto Market Summary
-- Aggregated view combining purchase and rental market indicators
-- Grain: One row per year-month
with purchases_agg as (
select
year,
month,
month_name,
quarter,
-- Aggregate purchase metrics across all districts
sum(sales_count) as total_sales,
sum(dollar_volume) as total_dollar_volume,
round(avg(avg_price), 0) as avg_price_all_districts,
round(avg(median_price), 0) as median_price_all_districts,
sum(new_listings) as total_new_listings,
sum(active_listings) as total_active_listings,
round(avg(days_on_market), 0) as avg_days_on_market,
round(avg(sale_to_list_ratio), 2) as avg_sale_to_list_ratio,
round(avg(absorption_rate), 3) as avg_absorption_rate,
round(avg(months_of_inventory), 1) as avg_months_of_inventory,
round(avg(avg_price_yoy_pct), 2) as avg_price_yoy_pct
from {{ ref('mart_toronto_purchases') }}
group by year, month, month_name, quarter
),
rentals_agg as (
select
year,
-- Aggregate rental metrics across all zones (all bedroom types)
round(avg(avg_rent), 0) as avg_rent_all_zones,
round(avg(vacancy_rate), 2) as avg_vacancy_rate,
round(avg(rent_change_pct), 2) as avg_rent_change_pct,
sum(rental_universe) as total_rental_universe
from {{ ref('mart_toronto_rentals') }}
group by year
),
final as (
select
p.year,
p.month,
p.month_name,
p.quarter,
-- Purchase market indicators
p.total_sales,
p.total_dollar_volume,
p.avg_price_all_districts,
p.median_price_all_districts,
p.total_new_listings,
p.total_active_listings,
p.avg_days_on_market,
p.avg_sale_to_list_ratio,
p.avg_absorption_rate,
p.avg_months_of_inventory,
p.avg_price_yoy_pct,
-- Rental market indicators (annual, so join on year)
r.avg_rent_all_zones,
r.avg_vacancy_rate,
r.avg_rent_change_pct,
r.total_rental_universe,
-- Affordability indicator (price to rent ratio)
case
when r.avg_rent_all_zones > 0
then round(p.avg_price_all_districts / (r.avg_rent_all_zones * 12), 1)
else null
end as price_to_annual_rent_ratio
from purchases_agg p
left join rentals_agg r on p.year = r.year
)
select * from final
order by year desc, month desc

View File

@@ -0,0 +1,79 @@
-- Mart: Toronto Purchase Market Analysis
-- Final analytical table for purchase/sales data visualization
-- Grain: One row per district per month
with purchases as (
select * from {{ ref('int_purchases__monthly') }}
),
-- Add year-over-year calculations
with_yoy as (
select
p.*,
-- Previous year same month values
lag(p.avg_price, 12) over (
partition by p.district_code
order by p.date_key
) as avg_price_prev_year,
lag(p.sales_count, 12) over (
partition by p.district_code
order by p.date_key
) as sales_count_prev_year,
lag(p.median_price, 12) over (
partition by p.district_code
order by p.date_key
) as median_price_prev_year
from purchases p
),
final as (
select
purchase_id,
date_key,
full_date,
year,
month,
quarter,
month_name,
district_key,
district_code,
district_name,
area_type,
sales_count,
dollar_volume,
avg_price,
median_price,
new_listings,
active_listings,
days_on_market,
sale_to_list_ratio,
absorption_rate,
months_of_inventory,
-- Year-over-year changes
case
when avg_price_prev_year > 0
then round(((avg_price - avg_price_prev_year) / avg_price_prev_year) * 100, 2)
else null
end as avg_price_yoy_pct,
case
when sales_count_prev_year > 0
then round(((sales_count - sales_count_prev_year)::numeric / sales_count_prev_year) * 100, 2)
else null
end as sales_count_yoy_pct,
case
when median_price_prev_year > 0
then round(((median_price - median_price_prev_year) / median_price_prev_year) * 100, 2)
else null
end as median_price_yoy_pct
from with_yoy
)
select * from final

View File

@@ -0,0 +1,64 @@
-- Mart: Toronto Rental Market Analysis
-- Final analytical table for rental market visualization
-- Grain: One row per zone per bedroom type per survey year
with rentals as (
select * from {{ ref('int_rentals__annual') }}
),
-- Add year-over-year calculations
with_yoy as (
select
r.*,
-- Previous year values
lag(r.avg_rent, 1) over (
partition by r.zone_code, r.bedroom_type
order by r.year
) as avg_rent_prev_year,
lag(r.vacancy_rate, 1) over (
partition by r.zone_code, r.bedroom_type
order by r.year
) as vacancy_rate_prev_year
from rentals r
),
final as (
select
rental_id,
date_key,
full_date,
year,
quarter,
zone_key,
zone_code,
zone_name,
bedroom_type,
rental_universe,
avg_rent,
median_rent,
vacancy_rate,
availability_rate,
turnover_rate,
year_over_year_rent_change,
reliability_code,
vacant_units_estimate,
-- Calculated year-over-year (if not provided)
coalesce(
year_over_year_rent_change,
case
when avg_rent_prev_year > 0
then round(((avg_rent - avg_rent_prev_year) / avg_rent_prev_year) * 100, 2)
else null
end
) as rent_change_pct,
vacancy_rate - vacancy_rate_prev_year as vacancy_rate_change
from with_yoy
)
select * from final

View File

@@ -0,0 +1,61 @@
version: 2
sources:
- name: toronto_housing
description: "Toronto housing data loaded from TRREB and CMHC sources"
database: portfolio
schema: public
tables:
- name: fact_purchases
description: "TRREB monthly purchase/sales statistics by district"
columns:
- name: id
description: "Primary key"
- name: date_key
description: "Foreign key to dim_time"
- name: district_key
description: "Foreign key to dim_trreb_district"
- name: fact_rentals
description: "CMHC annual rental survey data by zone and bedroom type"
columns:
- name: id
description: "Primary key"
- name: date_key
description: "Foreign key to dim_time"
- name: zone_key
description: "Foreign key to dim_cmhc_zone"
- name: dim_time
description: "Time dimension (monthly grain)"
columns:
- name: date_key
description: "Primary key (YYYYMMDD format)"
- name: dim_trreb_district
description: "TRREB district dimension with geometry"
columns:
- name: district_key
description: "Primary key"
- name: district_code
description: "TRREB district code"
- name: dim_cmhc_zone
description: "CMHC zone dimension with geometry"
columns:
- name: zone_key
description: "Primary key"
- name: zone_code
description: "CMHC zone code"
- name: dim_neighbourhood
description: "City of Toronto neighbourhoods (reference only)"
columns:
- name: neighbourhood_id
description: "Primary key"
- name: dim_policy_event
description: "Housing policy events for annotation"
columns:
- name: event_id
description: "Primary key"

View File

@@ -0,0 +1,73 @@
version: 2
models:
- name: stg_trreb__purchases
description: "Staged TRREB purchase/sales data from fact_purchases"
columns:
- name: purchase_id
description: "Unique identifier for purchase record"
tests:
- unique
- not_null
- name: date_key
description: "Date dimension key (YYYYMMDD)"
tests:
- not_null
- name: district_key
description: "TRREB district dimension key"
tests:
- not_null
- name: stg_cmhc__rentals
description: "Staged CMHC rental market data from fact_rentals"
columns:
- name: rental_id
description: "Unique identifier for rental record"
tests:
- unique
- not_null
- name: date_key
description: "Date dimension key (YYYYMMDD)"
tests:
- not_null
- name: zone_key
description: "CMHC zone dimension key"
tests:
- not_null
- name: stg_dimensions__time
description: "Staged time dimension"
columns:
- name: date_key
description: "Date dimension key (YYYYMMDD)"
tests:
- unique
- not_null
- name: stg_dimensions__trreb_districts
description: "Staged TRREB district dimension"
columns:
- name: district_key
description: "District dimension key"
tests:
- unique
- not_null
- name: district_code
description: "TRREB district code (e.g., W01, C01)"
tests:
- unique
- not_null
- name: stg_dimensions__cmhc_zones
description: "Staged CMHC zone dimension"
columns:
- name: zone_key
description: "Zone dimension key"
tests:
- unique
- not_null
- name: zone_code
description: "CMHC zone code"
tests:
- unique
- not_null

View File

@@ -0,0 +1,26 @@
-- Staged CMHC rental market survey data
-- Source: fact_rentals table loaded from CMHC CSV exports
-- Grain: One row per zone per bedroom type per survey year
with source as (
select * from {{ source('toronto_housing', 'fact_rentals') }}
),
staged as (
select
id as rental_id,
date_key,
zone_key,
bedroom_type,
universe as rental_universe,
avg_rent,
median_rent,
vacancy_rate,
availability_rate,
turnover_rate,
rent_change_pct as year_over_year_rent_change,
reliability_code
from source
)
select * from staged

View File

@@ -0,0 +1,18 @@
-- Staged CMHC zone dimension
-- Source: dim_cmhc_zone table
-- Grain: One row per zone
with source as (
select * from {{ source('toronto_housing', 'dim_cmhc_zone') }}
),
staged as (
select
zone_key,
zone_code,
zone_name,
geometry
from source
)
select * from staged

View File

@@ -0,0 +1,21 @@
-- Staged time dimension
-- Source: dim_time table
-- Grain: One row per month
with source as (
select * from {{ source('toronto_housing', 'dim_time') }}
),
staged as (
select
date_key,
full_date,
year,
month,
quarter,
month_name,
is_month_start
from source
)
select * from staged

View File

@@ -0,0 +1,19 @@
-- Staged TRREB district dimension
-- Source: dim_trreb_district table
-- Grain: One row per district
with source as (
select * from {{ source('toronto_housing', 'dim_trreb_district') }}
),
staged as (
select
district_key,
district_code,
district_name,
area_type,
geometry
from source
)
select * from staged

View File

@@ -0,0 +1,25 @@
-- Staged TRREB purchase/sales data
-- Source: fact_purchases table loaded from TRREB Market Watch PDFs
-- Grain: One row per district per month
with source as (
select * from {{ source('toronto_housing', 'fact_purchases') }}
),
staged as (
select
id as purchase_id,
date_key,
district_key,
sales_count,
dollar_volume,
avg_price,
median_price,
new_listings,
active_listings,
avg_dom as days_on_market,
avg_sp_lp as sale_to_list_ratio
from source
)
select * from staged

5
dbt/packages.yml Normal file
View File

@@ -0,0 +1,5 @@
packages:
- package: dbt-labs/dbt_utils
version: ">=1.0.0"
- package: calogica/dbt_expectations
version: ">=0.10.0"

21
dbt/profiles.yml.example Normal file
View File

@@ -0,0 +1,21 @@
toronto_housing:
target: dev
outputs:
dev:
type: postgres
host: localhost
user: portfolio
password: "{{ env_var('POSTGRES_PASSWORD') }}"
port: 5432
dbname: portfolio
schema: public
threads: 4
prod:
type: postgres
host: "{{ env_var('POSTGRES_HOST') }}"
user: "{{ env_var('POSTGRES_USER') }}"
password: "{{ env_var('POSTGRES_PASSWORD') }}"
port: 5432
dbname: portfolio
schema: public
threads: 4

134
docs/bio_content_v2.md Normal file
View File

@@ -0,0 +1,134 @@
# Portfolio Bio Content
**Version**: 2.0
**Last Updated**: January 2026
**Purpose**: Content source for `portfolio_app/pages/home.py`
---
## Document Context
| Attribute | Value |
|-----------|-------|
| **Parent Document** | `portfolio_project_plan_v5.md` |
| **Role** | Bio content and social links for landing page |
| **Consumed By** | `portfolio_app/pages/home.py` |
---
## Headline
**Primary**: Leo | Data Engineer & Analytics Developer
**Tagline**: I build data infrastructure that actually gets used.
---
## Professional Summary
Over the past 5 years, I've designed and evolved an enterprise analytics platform from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call abandon rates, and dashboards that executives actually open.
My approach: dimensional modeling (star schema), layered transformations (staging → intermediate → marts), and automation that eliminates manual work. I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states. Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the Project Management Institute.
---
## Tech Stack
| Category | Technologies |
|----------|--------------|
| **Languages** | Python, SQL |
| **Data Processing** | Pandas, SQLAlchemy, FastAPI |
| **Databases** | PostgreSQL, MSSQL |
| **Visualization** | Power BI, Plotly, Dash |
| **Patterns** | dbt, dimensional modeling, star schema |
| **Other** | Genesys Cloud |
**Display Format** (for landing page):
```
Python (Pandas, SQLAlchemy, FastAPI) • SQL (MSSQL, PostgreSQL) • Power BI • Plotly/Dash • Genesys Cloud • dbt patterns
```
---
## Side Project
**Bandit Labs** — Building automation and AI tooling for small businesses.
*Note: Keep this brief on portfolio; link only if separate landing page exists.*
---
## Social Links
| Platform | URL | Icon |
|----------|-----|------|
| **LinkedIn** | `https://linkedin.com/in/[USERNAME]` | `lucide-react: Linkedin` |
| **GitHub** | `https://github.com/[USERNAME]` | `lucide-react: Github` |
> **TODO**: Replace `[USERNAME]` placeholders with actual URLs before bio page launch.
---
## Availability Statement
Open to **Senior Data Analyst**, **Analytics Engineer**, and **BI Developer** opportunities in Toronto or remote.
---
## Portfolio Projects Section
*Dynamically populated based on deployed projects.*
| Project | Status | Link |
|---------|--------|------|
| Toronto Housing Dashboard | In Development | `/toronto` |
| Energy Pricing Analysis | Planned | `/energy` |
**Display Logic**:
- Show only projects with `status = deployed`
- "In Development" projects can show as coming soon or be hidden (user preference)
---
## Implementation Notes
### Content Hierarchy for `home.py`
```
1. Name + Tagline (hero section)
2. Professional Summary (2-3 paragraphs)
3. Tech Stack (horizontal chips or inline list)
4. Portfolio Projects (cards linking to dashboards)
5. Social Links (icon buttons)
6. Availability statement (subtle, bottom)
```
### Styling Recommendations
- Clean, minimal — let the projects speak
- Dark/light mode support via dash-mantine-components theme
- No headshot required (optional)
- Mobile-responsive layout
### Content Updates
When updating bio content:
1. Edit this document
2. Update `home.py` to reflect changes
3. Redeploy
---
## Related Documents
| Document | Relationship |
|----------|--------------|
| `portfolio_project_plan_v5.md` | Parent — references this for bio content |
| `portfolio_app/pages/home.py` | Consumer — implements this content |
---
*Document Version: 2.0*
*Updated: January 2026*

58
portfolio_app/app.py Normal file
View File

@@ -0,0 +1,58 @@
"""Dash application factory with Pages routing."""
import dash
import dash_mantine_components as dmc
from dash import dcc, html
from .components import create_sidebar
from .config import get_settings
def create_app() -> dash.Dash:
"""Create and configure the Dash application."""
app = dash.Dash(
__name__,
use_pages=True,
suppress_callback_exceptions=True,
title="Analytics Portfolio",
external_stylesheets=dmc.styles.ALL,
)
app.layout = dmc.MantineProvider(
id="mantine-provider",
children=[
dcc.Location(id="url", refresh=False),
dcc.Store(id="theme-store", storage_type="local", data="dark"),
dcc.Store(id="theme-init-dummy"), # Dummy store for theme init callback
html.Div(
[
create_sidebar(),
html.Div(
dash.page_container,
className="page-content-wrapper",
),
],
),
],
theme={
"primaryColor": "blue",
"fontFamily": "'Inter', sans-serif",
},
defaultColorScheme="dark",
)
# Import callbacks to register them
from . import callbacks # noqa: F401
return app
def main() -> None:
"""Run the development server."""
settings = get_settings()
app = create_app()
app.run(debug=settings.dash_debug, host="0.0.0.0", port=8050)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,139 @@
/* Floating sidebar navigation styles */
/* Sidebar container */
.floating-sidebar {
position: fixed;
left: 16px;
top: 50%;
transform: translateY(-50%);
width: 60px;
padding: 16px 8px;
border-radius: 32px;
z-index: 1000;
display: flex;
flex-direction: column;
align-items: center;
gap: 8px;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);
transition: background-color 0.2s ease;
}
/* Page content offset to prevent sidebar overlap */
.page-content-wrapper {
margin-left: 92px; /* sidebar width (60px) + left margin (16px) + gap (16px) */
min-height: 100vh;
}
/* Dark theme (default) */
[data-mantine-color-scheme="dark"] .floating-sidebar {
background-color: #141414;
}
[data-mantine-color-scheme="dark"] body {
background-color: #000000;
}
/* Light theme */
[data-mantine-color-scheme="light"] .floating-sidebar {
background-color: #f0f0f0;
}
[data-mantine-color-scheme="light"] body {
background-color: #ffffff;
}
/* Brand initials styling */
.sidebar-brand {
width: 40px;
height: 40px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
background-color: var(--mantine-color-blue-filled);
margin-bottom: 4px;
transition: transform 0.2s ease;
}
.sidebar-brand:hover {
transform: scale(1.05);
}
.sidebar-brand-link {
font-weight: 700;
font-size: 16px;
color: white;
text-decoration: none;
line-height: 1;
}
/* Divider between sections */
.sidebar-divider {
width: 32px;
height: 1px;
background-color: var(--mantine-color-dimmed);
margin: 4px 0;
opacity: 0.3;
}
/* Active nav icon indicator */
.nav-icon-active {
background-color: var(--mantine-color-blue-filled) !important;
}
/* Navigation icon hover effects */
.floating-sidebar .mantine-ActionIcon-root {
transition: transform 0.15s ease, background-color 0.15s ease;
}
.floating-sidebar .mantine-ActionIcon-root:hover {
transform: scale(1.1);
}
/* Ensure links don't have underlines */
.floating-sidebar a {
text-decoration: none;
}
/* Theme toggle specific styling */
#theme-toggle {
transition: transform 0.3s ease;
}
#theme-toggle:hover {
transform: rotate(15deg) scale(1.1);
}
/* Responsive adjustments for smaller screens */
@media (max-width: 768px) {
.floating-sidebar {
left: 8px;
width: 50px;
padding: 12px 6px;
border-radius: 25px;
}
.page-content-wrapper {
margin-left: 70px;
}
.sidebar-brand {
width: 34px;
height: 34px;
}
.sidebar-brand-link {
font-size: 14px;
}
}
/* Very small screens - hide sidebar, show minimal navigation */
@media (max-width: 480px) {
.floating-sidebar {
display: none;
}
.page-content-wrapper {
margin-left: 0;
}
}

View File

@@ -0,0 +1,5 @@
"""Application-level callbacks for the portfolio app."""
from . import theme
__all__ = ["theme"]

View File

@@ -0,0 +1,38 @@
"""Theme toggle callbacks using clientside JavaScript."""
from dash import Input, Output, State, clientside_callback
# Toggle theme on button click
# Stores new theme value and updates the DOM attribute
clientside_callback(
"""
function(n_clicks, currentTheme) {
if (n_clicks === undefined || n_clicks === null) {
return window.dash_clientside.no_update;
}
const newTheme = currentTheme === 'dark' ? 'light' : 'dark';
document.documentElement.setAttribute('data-mantine-color-scheme', newTheme);
return newTheme;
}
""",
Output("theme-store", "data"),
Input("theme-toggle", "n_clicks"),
State("theme-store", "data"),
prevent_initial_call=True,
)
# Initialize theme from localStorage on page load
# Uses a dummy output since we only need the side effect of setting the DOM attribute
clientside_callback(
"""
function(theme) {
if (theme) {
document.documentElement.setAttribute('data-mantine-color-scheme', theme);
}
return theme;
}
""",
Output("theme-init-dummy", "data"),
Input("theme-store", "data"),
prevent_initial_call=False,
)

View File

@@ -0,0 +1,16 @@
"""Shared Dash components for the portfolio application."""
from .map_controls import create_map_controls, create_metric_selector
from .metric_card import MetricCard, create_metric_cards_row
from .sidebar import create_sidebar
from .time_slider import create_time_slider, create_year_selector
__all__ = [
"create_map_controls",
"create_metric_selector",
"create_sidebar",
"create_time_slider",
"create_year_selector",
"MetricCard",
"create_metric_cards_row",
]

View File

@@ -0,0 +1,79 @@
"""Map control components for choropleth visualizations."""
from typing import Any
import dash_mantine_components as dmc
from dash import html
def create_metric_selector(
id_prefix: str,
options: list[dict[str, str]],
default_value: str | None = None,
label: str = "Select Metric",
) -> dmc.Select:
"""Create a metric selector dropdown.
Args:
id_prefix: Prefix for component IDs.
options: List of options with 'label' and 'value' keys.
default_value: Initial selected value.
label: Label text for the selector.
Returns:
Mantine Select component.
"""
return dmc.Select(
id=f"{id_prefix}-metric-selector",
label=label,
data=options,
value=default_value or (options[0]["value"] if options else None),
style={"width": "200px"},
)
def create_map_controls(
id_prefix: str,
metric_options: list[dict[str, str]],
default_metric: str | None = None,
show_layer_toggle: bool = True,
) -> dmc.Paper:
"""Create a control panel for map visualizations.
Args:
id_prefix: Prefix for component IDs.
metric_options: Options for metric selector.
default_metric: Default selected metric.
show_layer_toggle: Whether to show layer visibility toggle.
Returns:
Mantine Paper component containing controls.
"""
controls: list[Any] = [
create_metric_selector(
id_prefix=id_prefix,
options=metric_options,
default_value=default_metric,
label="Display Metric",
),
]
if show_layer_toggle:
controls.append(
dmc.Switch(
id=f"{id_prefix}-layer-toggle",
label="Show Boundaries",
checked=True,
style={"marginTop": "10px"},
)
)
return dmc.Paper(
children=[
dmc.Text("Map Controls", fw=500, size="sm", mb="xs"),
html.Div(controls),
],
p="md",
radius="sm",
withBorder=True,
)

View File

@@ -0,0 +1,115 @@
"""Metric card components for KPI display."""
from typing import Any
import dash_mantine_components as dmc
from dash import dcc
from portfolio_app.figures.summary_cards import create_metric_card_figure
class MetricCard:
"""A reusable metric card component."""
def __init__(
self,
id_prefix: str,
title: str,
value: float | int | str = 0,
delta: float | None = None,
prefix: str = "",
suffix: str = "",
format_spec: str = ",.0f",
positive_is_good: bool = True,
):
"""Initialize a metric card.
Args:
id_prefix: Prefix for component IDs.
title: Card title.
value: Main metric value.
delta: Change value for delta indicator.
prefix: Value prefix (e.g., '$').
suffix: Value suffix.
format_spec: Python format specification.
positive_is_good: Whether positive delta is good.
"""
self.id_prefix = id_prefix
self.title = title
self.value = value
self.delta = delta
self.prefix = prefix
self.suffix = suffix
self.format_spec = format_spec
self.positive_is_good = positive_is_good
def render(self) -> dmc.Paper:
"""Render the metric card component.
Returns:
Mantine Paper component with embedded graph.
"""
fig = create_metric_card_figure(
value=self.value,
title=self.title,
delta=self.delta,
prefix=self.prefix,
suffix=self.suffix,
format_spec=self.format_spec,
positive_is_good=self.positive_is_good,
)
return dmc.Paper(
children=[
dcc.Graph(
id=f"{self.id_prefix}-graph",
figure=fig,
config={"displayModeBar": False},
style={"height": "120px"},
)
],
p="xs",
radius="sm",
withBorder=True,
)
def create_metric_cards_row(
metrics: list[dict[str, Any]],
id_prefix: str = "metric",
) -> dmc.SimpleGrid:
"""Create a row of metric cards.
Args:
metrics: List of metric configurations with keys:
- title: Card title
- value: Metric value
- delta: Optional change value
- prefix: Optional value prefix
- suffix: Optional value suffix
- format_spec: Optional format specification
- positive_is_good: Optional delta color logic
id_prefix: Prefix for component IDs.
Returns:
Mantine SimpleGrid component with metric cards.
"""
cards = []
for i, metric in enumerate(metrics):
card = MetricCard(
id_prefix=f"{id_prefix}-{i}",
title=metric.get("title", ""),
value=metric.get("value", 0),
delta=metric.get("delta"),
prefix=metric.get("prefix", ""),
suffix=metric.get("suffix", ""),
format_spec=metric.get("format_spec", ",.0f"),
positive_is_good=metric.get("positive_is_good", True),
)
cards.append(card.render())
return dmc.SimpleGrid(
cols={"base": 1, "sm": 2, "md": len(cards)},
spacing="md",
children=cards,
)

View File

@@ -0,0 +1,179 @@
"""Floating sidebar navigation component."""
import dash_mantine_components as dmc
from dash import dcc, html
from dash_iconify import DashIconify
# Navigation items configuration
NAV_ITEMS = [
{"path": "/", "icon": "tabler:home", "label": "Home"},
{"path": "/toronto", "icon": "tabler:map-2", "label": "Toronto Housing"},
]
# External links configuration
EXTERNAL_LINKS = [
{
"url": "https://github.com/leomiranda",
"icon": "tabler:brand-github",
"label": "GitHub",
},
{
"url": "https://linkedin.com/in/leobmiranda",
"icon": "tabler:brand-linkedin",
"label": "LinkedIn",
},
]
def create_brand_logo() -> html.Div:
"""Create the brand initials logo."""
return html.Div(
dcc.Link(
"LM",
href="/",
className="sidebar-brand-link",
),
className="sidebar-brand",
)
def create_nav_icon(
icon: str,
label: str,
path: str,
current_path: str,
) -> dmc.Tooltip:
"""Create a navigation icon with tooltip.
Args:
icon: Iconify icon string.
label: Tooltip label.
path: Navigation path.
current_path: Current page path for active state.
Returns:
Tooltip-wrapped navigation icon.
"""
is_active = current_path == path or (path != "/" and current_path.startswith(path))
return dmc.Tooltip(
dcc.Link(
dmc.ActionIcon(
DashIconify(icon=icon, width=20),
variant="subtle" if not is_active else "filled",
size="lg",
radius="xl",
color="blue" if is_active else "gray",
className="nav-icon-active" if is_active else "",
),
href=path,
),
label=label,
position="right",
withArrow=True,
)
def create_theme_toggle(current_theme: str = "dark") -> dmc.Tooltip:
"""Create the theme toggle button.
Args:
current_theme: Current theme ('dark' or 'light').
Returns:
Tooltip-wrapped theme toggle icon.
"""
icon = "tabler:sun" if current_theme == "dark" else "tabler:moon"
label = "Switch to light mode" if current_theme == "dark" else "Switch to dark mode"
return dmc.Tooltip(
dmc.ActionIcon(
DashIconify(icon=icon, width=20, id="theme-toggle-icon"),
id="theme-toggle",
variant="subtle",
size="lg",
radius="xl",
color="gray",
),
label=label,
position="right",
withArrow=True,
)
def create_external_link(url: str, icon: str, label: str) -> dmc.Tooltip:
"""Create an external link icon with tooltip.
Args:
url: External URL.
icon: Iconify icon string.
label: Tooltip label.
Returns:
Tooltip-wrapped external link icon.
"""
return dmc.Tooltip(
dmc.Anchor(
dmc.ActionIcon(
DashIconify(icon=icon, width=20),
variant="subtle",
size="lg",
radius="xl",
color="gray",
),
href=url,
target="_blank",
),
label=label,
position="right",
withArrow=True,
)
def create_sidebar_divider() -> html.Div:
"""Create a horizontal divider for the sidebar."""
return html.Div(className="sidebar-divider")
def create_sidebar(current_path: str = "/", current_theme: str = "dark") -> html.Div:
"""Create the floating sidebar navigation.
Args:
current_path: Current page path for active state highlighting.
current_theme: Current theme for toggle icon state.
Returns:
Complete sidebar component.
"""
return html.Div(
[
# Brand logo
create_brand_logo(),
create_sidebar_divider(),
# Navigation icons
*[
create_nav_icon(
icon=item["icon"],
label=item["label"],
path=item["path"],
current_path=current_path,
)
for item in NAV_ITEMS
],
create_sidebar_divider(),
# Theme toggle
create_theme_toggle(current_theme),
create_sidebar_divider(),
# External links
*[
create_external_link(
url=link["url"],
icon=link["icon"],
label=link["label"],
)
for link in EXTERNAL_LINKS
],
],
className="floating-sidebar",
id="floating-sidebar",
)

View File

@@ -0,0 +1,135 @@
"""Time selection components for temporal data filtering."""
from datetime import date
import dash_mantine_components as dmc
def create_year_selector(
id_prefix: str,
min_year: int = 2020,
max_year: int | None = None,
default_year: int | None = None,
label: str = "Select Year",
) -> dmc.Select:
"""Create a year selector dropdown.
Args:
id_prefix: Prefix for component IDs.
min_year: Minimum year option.
max_year: Maximum year option (defaults to current year).
default_year: Initial selected year.
label: Label text for the selector.
Returns:
Mantine Select component.
"""
if max_year is None:
max_year = date.today().year
if default_year is None:
default_year = max_year
years = list(range(max_year, min_year - 1, -1))
options = [{"label": str(year), "value": str(year)} for year in years]
return dmc.Select(
id=f"{id_prefix}-year-selector",
label=label,
data=options,
value=str(default_year),
style={"width": "120px"},
)
def create_time_slider(
id_prefix: str,
min_year: int = 2020,
max_year: int | None = None,
default_range: tuple[int, int] | None = None,
label: str = "Time Range",
) -> dmc.Paper:
"""Create a time range slider component.
Args:
id_prefix: Prefix for component IDs.
min_year: Minimum year for the slider.
max_year: Maximum year for the slider.
default_range: Default (start, end) year range.
label: Label text for the slider.
Returns:
Mantine Paper component containing the slider.
"""
if max_year is None:
max_year = date.today().year
if default_range is None:
default_range = (min_year, max_year)
# Create marks for every year
marks = [
{"value": year, "label": str(year)} for year in range(min_year, max_year + 1)
]
return dmc.Paper(
children=[
dmc.Text(label, fw=500, size="sm", mb="xs"),
dmc.RangeSlider(
id=f"{id_prefix}-time-slider",
min=min_year,
max=max_year,
value=list(default_range),
marks=marks,
step=1,
minRange=1,
style={"marginTop": "20px", "marginBottom": "10px"},
),
],
p="md",
radius="sm",
withBorder=True,
)
def create_month_selector(
id_prefix: str,
default_month: int | None = None,
label: str = "Select Month",
) -> dmc.Select:
"""Create a month selector dropdown.
Args:
id_prefix: Prefix for component IDs.
default_month: Initial selected month (1-12).
label: Label text for the selector.
Returns:
Mantine Select component.
"""
months = [
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]
options = [{"label": month, "value": str(i + 1)} for i, month in enumerate(months)]
if default_month is None:
default_month = date.today().month
return dmc.Select(
id=f"{id_prefix}-month-selector",
label=label,
data=options,
value=str(default_month),
style={"width": "140px"},
)

34
portfolio_app/config.py Normal file
View File

@@ -0,0 +1,34 @@
"""Application configuration using Pydantic BaseSettings."""
from functools import lru_cache
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings): # type: ignore[misc]
"""Application settings loaded from environment variables."""
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
extra="ignore",
)
# Database
database_url: str = "postgresql://portfolio:portfolio_dev@localhost:5432/portfolio"
postgres_user: str = "portfolio"
postgres_password: str = "portfolio_dev"
postgres_db: str = "portfolio"
# Application
dash_debug: bool = True
secret_key: str = "change-me-in-production"
# Logging
log_level: str = "INFO"
@lru_cache
def get_settings() -> Settings:
"""Get cached settings instance."""
return Settings()

View File

@@ -0,0 +1,31 @@
"""Plotly figure factories for data visualization."""
from .choropleth import (
create_choropleth_figure,
create_district_map,
create_zone_map,
)
from .summary_cards import create_metric_card_figure, create_summary_metrics
from .time_series import (
add_policy_markers,
create_market_comparison_chart,
create_price_time_series,
create_time_series_with_events,
create_volume_time_series,
)
__all__ = [
# Choropleth
"create_choropleth_figure",
"create_district_map",
"create_zone_map",
# Time series
"create_price_time_series",
"create_volume_time_series",
"create_market_comparison_chart",
"create_time_series_with_events",
"add_policy_markers",
# Summary
"create_metric_card_figure",
"create_summary_metrics",
]

View File

@@ -0,0 +1,171 @@
"""Choropleth map figure factory for Toronto housing data."""
from typing import Any
import plotly.express as px
import plotly.graph_objects as go
def create_choropleth_figure(
geojson: dict[str, Any] | None,
data: list[dict[str, Any]],
location_key: str,
color_column: str,
hover_data: list[str] | None = None,
color_scale: str = "Blues",
title: str | None = None,
map_style: str = "carto-positron",
center: dict[str, float] | None = None,
zoom: float = 9.5,
) -> go.Figure:
"""Create a choropleth map figure.
Args:
geojson: GeoJSON FeatureCollection for boundaries.
data: List of data records with location keys and values.
location_key: Column name for location identifier.
color_column: Column name for color values.
hover_data: Additional columns to show on hover.
color_scale: Plotly color scale name.
title: Optional chart title.
map_style: Mapbox style (carto-positron, open-street-map, etc.).
center: Map center coordinates {"lat": float, "lon": float}.
zoom: Initial zoom level.
Returns:
Plotly Figure object.
"""
# Default center to Toronto
if center is None:
center = {"lat": 43.7, "lon": -79.4}
# Use dark-mode friendly map style by default
if map_style == "carto-positron":
map_style = "carto-darkmatter"
# If no geojson provided, create a placeholder map
if geojson is None or not data:
fig = go.Figure(go.Scattermapbox())
fig.update_layout(
mapbox={
"style": map_style,
"center": center,
"zoom": zoom,
},
margin={"l": 0, "r": 0, "t": 40, "b": 0},
title=title or "Toronto Housing Map",
height=500,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
)
fig.add_annotation(
text="No geometry data available. Complete QGIS digitization to enable map.",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"size": 14, "color": "#888888"},
)
return fig
# Create choropleth with data
import pandas as pd
df = pd.DataFrame(data)
# Use dark-mode friendly map style
effective_map_style = (
"carto-darkmatter" if map_style == "carto-positron" else map_style
)
fig = px.choropleth_mapbox(
df,
geojson=geojson,
locations=location_key,
featureidkey=f"properties.{location_key}",
color=color_column,
color_continuous_scale=color_scale,
hover_data=hover_data,
mapbox_style=effective_map_style,
center=center,
zoom=zoom,
opacity=0.7,
)
fig.update_layout(
margin={"l": 0, "r": 0, "t": 40, "b": 0},
title=title,
height=500,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
coloraxis_colorbar={
"title": {
"text": color_column.replace("_", " ").title(),
"font": {"color": "#c9c9c9"},
},
"thickness": 15,
"len": 0.7,
"tickfont": {"color": "#c9c9c9"},
},
)
return fig
def create_district_map(
districts_geojson: dict[str, Any] | None,
purchase_data: list[dict[str, Any]],
metric: str = "avg_price",
) -> go.Figure:
"""Create choropleth map for TRREB districts.
Args:
districts_geojson: GeoJSON for TRREB district boundaries.
purchase_data: Purchase statistics by district.
metric: Metric to display (avg_price, sales_count, etc.).
Returns:
Plotly Figure object.
"""
hover_columns = ["district_name", "sales_count", "avg_price", "median_price"]
return create_choropleth_figure(
geojson=districts_geojson,
data=purchase_data,
location_key="district_code",
color_column=metric,
hover_data=[c for c in hover_columns if c != metric],
color_scale="Blues" if "price" in metric else "Greens",
title="Toronto Purchase Market by District",
)
def create_zone_map(
zones_geojson: dict[str, Any] | None,
rental_data: list[dict[str, Any]],
metric: str = "avg_rent",
) -> go.Figure:
"""Create choropleth map for CMHC zones.
Args:
zones_geojson: GeoJSON for CMHC zone boundaries.
rental_data: Rental statistics by zone.
metric: Metric to display (avg_rent, vacancy_rate, etc.).
Returns:
Plotly Figure object.
"""
hover_columns = ["zone_name", "avg_rent", "vacancy_rate", "rental_universe"]
return create_choropleth_figure(
geojson=zones_geojson,
data=rental_data,
location_key="zone_code",
color_column=metric,
hover_data=[c for c in hover_columns if c != metric],
color_scale="Oranges" if "rent" in metric else "Purples",
title="Toronto Rental Market by Zone",
)

View File

@@ -0,0 +1,107 @@
"""Summary card figure factories for KPI display."""
from typing import Any
import plotly.graph_objects as go
def create_metric_card_figure(
value: float | int | str,
title: str,
delta: float | None = None,
delta_suffix: str = "%",
prefix: str = "",
suffix: str = "",
format_spec: str = ",.0f",
positive_is_good: bool = True,
) -> go.Figure:
"""Create a KPI indicator figure.
Args:
value: The main metric value.
title: Card title.
delta: Optional change value (for delta indicator).
delta_suffix: Suffix for delta value (e.g., '%').
prefix: Prefix for main value (e.g., '$').
suffix: Suffix for main value.
format_spec: Python format specification for the value.
positive_is_good: Whether positive delta is good (green) or bad (red).
Returns:
Plotly Figure object.
"""
# Determine numeric value for indicator
if isinstance(value, int | float):
number_value: float | None = float(value)
else:
number_value = None
fig = go.Figure()
# Add indicator trace
indicator_config: dict[str, Any] = {
"mode": "number",
"value": number_value if number_value is not None else 0,
"title": {"text": title, "font": {"size": 14}},
"number": {
"font": {"size": 32},
"prefix": prefix,
"suffix": suffix,
"valueformat": format_spec,
},
}
# Add delta if provided
if delta is not None:
indicator_config["mode"] = "number+delta"
indicator_config["delta"] = {
"reference": number_value - delta if number_value else 0,
"relative": False,
"valueformat": ".1f",
"suffix": delta_suffix,
"increasing": {"color": "green" if positive_is_good else "red"},
"decreasing": {"color": "red" if positive_is_good else "green"},
}
fig.add_trace(go.Indicator(**indicator_config))
fig.update_layout(
height=120,
margin={"l": 20, "r": 20, "t": 40, "b": 20},
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font={"family": "Inter, sans-serif", "color": "#c9c9c9"},
)
return fig
def create_summary_metrics(
metrics: dict[str, dict[str, Any]],
) -> list[go.Figure]:
"""Create multiple metric card figures.
Args:
metrics: Dictionary of metric configurations.
Key: metric name
Value: dict with 'value', 'title', 'delta' (optional), etc.
Returns:
List of Plotly Figure objects.
"""
figures = []
for metric_config in metrics.values():
fig = create_metric_card_figure(
value=metric_config.get("value", 0),
title=metric_config.get("title", ""),
delta=metric_config.get("delta"),
delta_suffix=metric_config.get("delta_suffix", "%"),
prefix=metric_config.get("prefix", ""),
suffix=metric_config.get("suffix", ""),
format_spec=metric_config.get("format_spec", ",.0f"),
positive_is_good=metric_config.get("positive_is_good", True),
)
figures.append(fig)
return figures

View File

@@ -0,0 +1,386 @@
"""Time series figure factories for Toronto housing data."""
from typing import Any
import plotly.express as px
import plotly.graph_objects as go
def create_price_time_series(
data: list[dict[str, Any]],
date_column: str = "full_date",
price_column: str = "avg_price",
group_column: str | None = None,
title: str = "Average Price Over Time",
show_yoy: bool = True,
) -> go.Figure:
"""Create a time series chart for price data.
Args:
data: List of records with date and price columns.
date_column: Column name for dates.
price_column: Column name for price values.
group_column: Optional column for grouping (e.g., district_code).
title: Chart title.
show_yoy: Whether to show year-over-year change annotations.
Returns:
Plotly Figure object.
"""
import pandas as pd
if not data:
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"color": "#888888"},
)
fig.update_layout(
title=title,
height=350,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
)
return fig
df = pd.DataFrame(data)
df[date_column] = pd.to_datetime(df[date_column])
if group_column and group_column in df.columns:
fig = px.line(
df,
x=date_column,
y=price_column,
color=group_column,
title=title,
)
else:
fig = px.line(
df,
x=date_column,
y=price_column,
title=title,
)
fig.update_layout(
height=350,
margin={"l": 40, "r": 20, "t": 50, "b": 40},
xaxis_title="Date",
yaxis_title=price_column.replace("_", " ").title(),
yaxis_tickprefix="$",
yaxis_tickformat=",",
hovermode="x unified",
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
xaxis={"gridcolor": "#333333", "linecolor": "#444444"},
yaxis={"gridcolor": "#333333", "linecolor": "#444444"},
)
return fig
def create_volume_time_series(
data: list[dict[str, Any]],
date_column: str = "full_date",
volume_column: str = "sales_count",
group_column: str | None = None,
title: str = "Sales Volume Over Time",
chart_type: str = "bar",
) -> go.Figure:
"""Create a time series chart for volume/count data.
Args:
data: List of records with date and volume columns.
date_column: Column name for dates.
volume_column: Column name for volume values.
group_column: Optional column for grouping.
title: Chart title.
chart_type: 'bar' or 'line'.
Returns:
Plotly Figure object.
"""
import pandas as pd
if not data:
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"color": "#888888"},
)
fig.update_layout(
title=title,
height=350,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
)
return fig
df = pd.DataFrame(data)
df[date_column] = pd.to_datetime(df[date_column])
if chart_type == "bar":
if group_column and group_column in df.columns:
fig = px.bar(
df,
x=date_column,
y=volume_column,
color=group_column,
title=title,
)
else:
fig = px.bar(
df,
x=date_column,
y=volume_column,
title=title,
)
else:
if group_column and group_column in df.columns:
fig = px.line(
df,
x=date_column,
y=volume_column,
color=group_column,
title=title,
)
else:
fig = px.line(
df,
x=date_column,
y=volume_column,
title=title,
)
fig.update_layout(
height=350,
margin={"l": 40, "r": 20, "t": 50, "b": 40},
xaxis_title="Date",
yaxis_title=volume_column.replace("_", " ").title(),
yaxis_tickformat=",",
hovermode="x unified",
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
xaxis={"gridcolor": "#333333", "linecolor": "#444444"},
yaxis={"gridcolor": "#333333", "linecolor": "#444444"},
)
return fig
def create_market_comparison_chart(
data: list[dict[str, Any]],
date_column: str = "full_date",
metrics: list[str] | None = None,
title: str = "Market Indicators",
) -> go.Figure:
"""Create a multi-metric comparison chart.
Args:
data: List of records with date and metric columns.
date_column: Column name for dates.
metrics: List of metric columns to display.
title: Chart title.
Returns:
Plotly Figure object with secondary y-axis.
"""
import pandas as pd
from plotly.subplots import make_subplots
if not data:
fig = go.Figure()
fig.add_annotation(
text="No data available",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font={"color": "#888888"},
)
fig.update_layout(
title=title,
height=400,
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
)
return fig
if metrics is None:
metrics = ["avg_price", "sales_count"]
df = pd.DataFrame(data)
df[date_column] = pd.to_datetime(df[date_column])
fig = make_subplots(specs=[[{"secondary_y": True}]])
colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728"]
for i, metric in enumerate(metrics[:4]):
if metric not in df.columns:
continue
secondary = i > 0
fig.add_trace(
go.Scatter(
x=df[date_column],
y=df[metric],
name=metric.replace("_", " ").title(),
line={"color": colors[i % len(colors)]},
),
secondary_y=secondary,
)
fig.update_layout(
title=title,
height=400,
margin={"l": 40, "r": 40, "t": 50, "b": 40},
hovermode="x unified",
paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
font_color="#c9c9c9",
xaxis={"gridcolor": "#333333", "linecolor": "#444444"},
yaxis={"gridcolor": "#333333", "linecolor": "#444444"},
legend={
"orientation": "h",
"yanchor": "bottom",
"y": 1.02,
"xanchor": "right",
"x": 1,
"font": {"color": "#c9c9c9"},
},
)
return fig
def add_policy_markers(
fig: go.Figure,
policy_events: list[dict[str, Any]],
date_column: str = "event_date",
y_position: float | None = None,
) -> go.Figure:
"""Add policy event markers to an existing time series figure.
Args:
fig: Existing Plotly figure to add markers to.
policy_events: List of policy event dicts with date and metadata.
date_column: Column name for event dates.
y_position: Y position for markers. If None, uses top of chart.
Returns:
Updated Plotly Figure object with policy markers.
"""
if not policy_events:
return fig
# Color mapping for policy categories
category_colors = {
"monetary": "#1f77b4", # Blue
"tax": "#2ca02c", # Green
"regulatory": "#ff7f0e", # Orange
"supply": "#9467bd", # Purple
"economic": "#d62728", # Red
}
# Symbol mapping for expected direction
direction_symbols = {
"bullish": "triangle-up",
"bearish": "triangle-down",
"neutral": "circle",
}
for event in policy_events:
event_date = event.get(date_column)
category = event.get("category", "economic")
direction = event.get("expected_direction", "neutral")
title = event.get("title", "Policy Event")
level = event.get("level", "federal")
color = category_colors.get(category, "#666666")
symbol = direction_symbols.get(direction, "circle")
# Add vertical line for the event
fig.add_vline(
x=event_date,
line_dash="dot",
line_color=color,
opacity=0.5,
annotation_text="",
)
# Add marker with hover info
fig.add_trace(
go.Scatter(
x=[event_date],
y=[y_position] if y_position else [None], # type: ignore[list-item]
mode="markers",
marker={
"symbol": symbol,
"size": 12,
"color": color,
"line": {"width": 1, "color": "white"},
},
name=title,
hovertemplate=(
f"<b>{title}</b><br>"
f"Date: %{{x}}<br>"
f"Level: {level.title()}<br>"
f"Category: {category.title()}<br>"
f"<extra></extra>"
),
showlegend=False,
)
)
return fig
def create_time_series_with_events(
data: list[dict[str, Any]],
policy_events: list[dict[str, Any]],
date_column: str = "full_date",
value_column: str = "avg_price",
title: str = "Price Trend with Policy Events",
) -> go.Figure:
"""Create a time series chart with policy event markers.
Args:
data: Time series data.
policy_events: Policy events to overlay.
date_column: Column name for dates.
value_column: Column name for values.
title: Chart title.
Returns:
Plotly Figure with time series and policy markers.
"""
# Create base time series
fig = create_price_time_series(
data=data,
date_column=date_column,
price_column=value_column,
title=title,
)
# Add policy markers at the top of the chart
if policy_events:
fig = add_policy_markers(fig, policy_events)
return fig

View File

@@ -0,0 +1,20 @@
"""Health check endpoint for deployment monitoring."""
import dash
from dash import html
dash.register_page(
__name__,
path="/health",
title="Health Check",
)
def layout() -> html.Div:
"""Return simple health check response."""
return html.Div(
[
html.Pre("status: ok"),
],
id="health-check",
)

169
portfolio_app/pages/home.py Normal file
View File

@@ -0,0 +1,169 @@
"""Bio landing page."""
import dash
import dash_mantine_components as dmc
dash.register_page(__name__, path="/", name="Home")
# Content from bio_content_v2.md
HEADLINE = "Leo | Data Engineer & Analytics Developer"
TAGLINE = "I build data infrastructure that actually gets used."
SUMMARY = """Over the past 5 years, I've designed and evolved an enterprise analytics platform
from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and
dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call
abandon rates, and dashboards that executives actually open.
My approach: dimensional modeling (star schema), layered transformations
(staging → intermediate → marts), and automation that eliminates manual work.
I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states.
Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the
Project Management Institute."""
TECH_STACK = [
"Python",
"Pandas",
"SQLAlchemy",
"FastAPI",
"SQL",
"PostgreSQL",
"MSSQL",
"Power BI",
"Plotly/Dash",
"dbt patterns",
"Genesys Cloud",
]
PROJECTS = [
{
"title": "Toronto Housing Dashboard",
"description": "Choropleth visualization of GTA real estate trends with TRREB and CMHC data.",
"status": "In Development",
"link": "/toronto",
},
{
"title": "Energy Pricing Analysis",
"description": "Time series analysis and ML prediction for utility market pricing.",
"status": "Planned",
"link": "/energy",
},
]
AVAILABILITY = "Open to Senior Data Analyst, Analytics Engineer, and BI Developer opportunities in Toronto or remote."
def create_hero_section() -> dmc.Stack:
"""Create the hero section with name and tagline."""
return dmc.Stack(
[
dmc.Title(HEADLINE, order=1, ta="center"),
dmc.Text(TAGLINE, size="xl", c="dimmed", ta="center"),
],
gap="xs",
py="xl",
)
def create_summary_section() -> dmc.Paper:
"""Create the professional summary section."""
paragraphs = SUMMARY.strip().split("\n\n")
return dmc.Paper(
dmc.Stack(
[
dmc.Title("About", order=2, size="h3"),
*[dmc.Text(p.replace("\n", " "), size="md") for p in paragraphs],
],
gap="md",
),
p="xl",
radius="md",
withBorder=True,
)
def create_tech_stack_section() -> dmc.Paper:
"""Create the tech stack section with badges."""
return dmc.Paper(
dmc.Stack(
[
dmc.Title("Tech Stack", order=2, size="h3"),
dmc.Group(
[
dmc.Badge(tech, size="lg", variant="light", radius="sm")
for tech in TECH_STACK
],
gap="sm",
),
],
gap="md",
),
p="xl",
radius="md",
withBorder=True,
)
def create_project_card(project: dict[str, str]) -> dmc.Card:
"""Create a project card."""
status_color = "blue" if project["status"] == "In Development" else "gray"
return dmc.Card(
[
dmc.Group(
[
dmc.Text(project["title"], fw=500, size="lg"),
dmc.Badge(project["status"], color=status_color, variant="light"),
],
justify="space-between",
align="center",
),
dmc.Text(project["description"], size="sm", c="dimmed", mt="sm"),
],
withBorder=True,
radius="md",
p="lg",
)
def create_projects_section() -> dmc.Paper:
"""Create the portfolio projects section."""
return dmc.Paper(
dmc.Stack(
[
dmc.Title("Portfolio Projects", order=2, size="h3"),
dmc.SimpleGrid(
[create_project_card(p) for p in PROJECTS],
cols={"base": 1, "sm": 2},
spacing="lg",
),
],
gap="md",
),
p="xl",
radius="md",
withBorder=True,
)
def create_availability_section() -> dmc.Text:
"""Create the availability statement."""
return dmc.Text(AVAILABILITY, size="sm", c="dimmed", ta="center", fs="italic")
layout = dmc.Container(
dmc.Stack(
[
create_hero_section(),
create_summary_section(),
create_tech_stack_section(),
create_projects_section(),
dmc.Divider(my="lg"),
create_availability_section(),
dmc.Space(h=40),
],
gap="xl",
),
size="md",
py="xl",
)

View File

@@ -1 +1 @@
"""Toronto Housing Dashboard page.""" """Toronto Housing Dashboard pages."""

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,294 @@
"""Toronto Housing Dashboard page."""
import dash
import dash_mantine_components as dmc
from dash import dcc, html
from dash_iconify import DashIconify
from portfolio_app.components import (
create_map_controls,
create_metric_cards_row,
create_time_slider,
create_year_selector,
)
dash.register_page(__name__, path="/toronto", name="Toronto Housing")
# Metric options for the purchase market
PURCHASE_METRIC_OPTIONS = [
{"label": "Average Price", "value": "avg_price"},
{"label": "Median Price", "value": "median_price"},
{"label": "Sales Volume", "value": "sales_count"},
{"label": "Days on Market", "value": "avg_dom"},
]
# Metric options for the rental market
RENTAL_METRIC_OPTIONS = [
{"label": "Average Rent", "value": "avg_rent"},
{"label": "Vacancy Rate", "value": "vacancy_rate"},
{"label": "Rental Universe", "value": "rental_universe"},
]
# Sample metrics for KPI cards (will be populated by callbacks)
SAMPLE_METRICS = [
{
"title": "Avg. Price",
"value": 1125000,
"delta": 2.3,
"prefix": "$",
"format_spec": ",.0f",
},
{
"title": "Sales Volume",
"value": 4850,
"delta": -5.1,
"format_spec": ",",
},
{
"title": "Avg. DOM",
"value": 18,
"delta": 3,
"suffix": " days",
"positive_is_good": False,
},
{
"title": "Avg. Rent",
"value": 2450,
"delta": 4.2,
"prefix": "$",
"format_spec": ",.0f",
},
]
def create_header() -> dmc.Group:
"""Create the dashboard header with title and controls."""
return dmc.Group(
[
dmc.Stack(
[
dmc.Title("Toronto Housing Dashboard", order=1),
dmc.Text(
"Real estate market analysis for the Greater Toronto Area",
c="dimmed",
),
],
gap="xs",
),
dmc.Group(
[
dcc.Link(
dmc.Button(
"Methodology",
leftSection=DashIconify(
icon="tabler:info-circle", width=18
),
variant="subtle",
color="gray",
),
href="/toronto/methodology",
),
create_year_selector(
id_prefix="toronto",
min_year=2020,
default_year=2024,
label="Year",
),
],
gap="md",
),
],
justify="space-between",
align="flex-start",
)
def create_kpi_section() -> dmc.Box:
"""Create the KPI metrics row."""
return dmc.Box(
children=[
dmc.Title("Key Metrics", order=3, size="h4", mb="sm"),
html.Div(
id="toronto-kpi-cards",
children=[
create_metric_cards_row(SAMPLE_METRICS, id_prefix="toronto-kpi")
],
),
],
)
def create_purchase_map_section() -> dmc.Grid:
"""Create the purchase market choropleth section."""
return dmc.Grid(
[
dmc.GridCol(
create_map_controls(
id_prefix="purchase-map",
metric_options=PURCHASE_METRIC_OPTIONS,
default_metric="avg_price",
),
span={"base": 12, "md": 3},
),
dmc.GridCol(
dmc.Paper(
children=[
dcc.Graph(
id="purchase-choropleth",
config={"scrollZoom": True},
style={"height": "500px"},
),
],
p="xs",
radius="sm",
withBorder=True,
),
span={"base": 12, "md": 9},
),
],
gutter="md",
)
def create_rental_map_section() -> dmc.Grid:
"""Create the rental market choropleth section."""
return dmc.Grid(
[
dmc.GridCol(
create_map_controls(
id_prefix="rental-map",
metric_options=RENTAL_METRIC_OPTIONS,
default_metric="avg_rent",
),
span={"base": 12, "md": 3},
),
dmc.GridCol(
dmc.Paper(
children=[
dcc.Graph(
id="rental-choropleth",
config={"scrollZoom": True},
style={"height": "500px"},
),
],
p="xs",
radius="sm",
withBorder=True,
),
span={"base": 12, "md": 9},
),
],
gutter="md",
)
def create_time_series_section() -> dmc.Grid:
"""Create the time series charts section."""
return dmc.Grid(
[
dmc.GridCol(
dmc.Paper(
children=[
dmc.Title("Price Trends", order=4, size="h5", mb="sm"),
dcc.Graph(
id="price-time-series",
config={"displayModeBar": False},
style={"height": "350px"},
),
],
p="md",
radius="sm",
withBorder=True,
),
span={"base": 12, "md": 6},
),
dmc.GridCol(
dmc.Paper(
children=[
dmc.Title("Sales Volume", order=4, size="h5", mb="sm"),
dcc.Graph(
id="volume-time-series",
config={"displayModeBar": False},
style={"height": "350px"},
),
],
p="md",
radius="sm",
withBorder=True,
),
span={"base": 12, "md": 6},
),
],
gutter="md",
)
def create_market_comparison_section() -> dmc.Paper:
"""Create the market comparison chart section."""
return dmc.Paper(
children=[
dmc.Group(
[
dmc.Title("Market Indicators", order=4, size="h5"),
create_time_slider(
id_prefix="market-comparison",
min_year=2020,
label="",
),
],
justify="space-between",
align="center",
mb="md",
),
dcc.Graph(
id="market-comparison-chart",
config={"displayModeBar": False},
style={"height": "400px"},
),
],
p="md",
radius="sm",
withBorder=True,
)
def create_data_notice() -> dmc.Alert:
"""Create a notice about data availability."""
return dmc.Alert(
children=[
dmc.Text(
"This dashboard uses TRREB and CMHC data. "
"Geographic boundaries require QGIS digitization to enable choropleth maps. "
"Sample data is shown below.",
size="sm",
),
],
title="Data Notice",
color="blue",
variant="light",
)
# Register callbacks
from portfolio_app.pages.toronto import callbacks # noqa: E402, F401
layout = dmc.Container(
dmc.Stack(
[
create_header(),
create_data_notice(),
create_kpi_section(),
dmc.Divider(my="md", label="Purchase Market", labelPosition="center"),
create_purchase_map_section(),
dmc.Divider(my="md", label="Rental Market", labelPosition="center"),
create_rental_map_section(),
dmc.Divider(my="md", label="Trends", labelPosition="center"),
create_time_series_section(),
create_market_comparison_section(),
dmc.Space(h=40),
],
gap="lg",
),
size="xl",
py="xl",
)

View File

@@ -0,0 +1,274 @@
"""Methodology page for Toronto Housing Dashboard."""
import dash
import dash_mantine_components as dmc
from dash import dcc, html
from dash_iconify import DashIconify
dash.register_page(
__name__,
path="/toronto/methodology",
title="Methodology | Toronto Housing Dashboard",
description="Data sources, methodology, and limitations for the Toronto Housing Dashboard",
)
def layout() -> dmc.Container:
"""Render the methodology page layout."""
return dmc.Container(
size="md",
py="xl",
children=[
# Back to Dashboard button
dcc.Link(
dmc.Button(
"Back to Dashboard",
leftSection=DashIconify(icon="tabler:arrow-left", width=18),
variant="subtle",
color="gray",
),
href="/toronto",
),
# Header
dmc.Title("Methodology", order=1, mb="lg", mt="md"),
dmc.Text(
"This page documents the data sources, processing methodology, "
"and known limitations of the Toronto Housing Dashboard.",
size="lg",
c="dimmed",
mb="xl",
),
# Data Sources Section
dmc.Paper(
p="lg",
radius="md",
withBorder=True,
mb="lg",
children=[
dmc.Title("Data Sources", order=2, mb="md"),
# TRREB
dmc.Title("Purchase Data: TRREB", order=3, size="h4", mb="sm"),
dmc.Text(
[
"The Toronto Regional Real Estate Board (TRREB) publishes monthly ",
html.Strong("Market Watch"),
" reports containing aggregate statistics for residential real estate "
"transactions across the Greater Toronto Area.",
],
mb="sm",
),
dmc.List(
[
dmc.ListItem("Source: TRREB Market Watch Reports (PDF)"),
dmc.ListItem("Geographic granularity: ~35 TRREB Districts"),
dmc.ListItem("Temporal granularity: Monthly"),
dmc.ListItem("Coverage: 2021-present"),
dmc.ListItem(
[
"Metrics: Sales count, average/median price, new listings, ",
"active listings, days on market, sale-to-list ratio",
]
),
],
mb="md",
),
dmc.Anchor(
"TRREB Market Watch Archive",
href="https://trreb.ca/market-data/market-watch/market-watch-archive/",
target="_blank",
mb="lg",
),
# CMHC
dmc.Title(
"Rental Data: CMHC", order=3, size="h4", mb="sm", mt="md"
),
dmc.Text(
[
"Canada Mortgage and Housing Corporation (CMHC) conducts the annual ",
html.Strong("Rental Market Survey"),
" providing rental market statistics for major urban centres.",
],
mb="sm",
),
dmc.List(
[
dmc.ListItem("Source: CMHC Rental Market Survey (Excel)"),
dmc.ListItem(
"Geographic granularity: ~20 CMHC Zones (Census Tract aligned)"
),
dmc.ListItem(
"Temporal granularity: Annual (October survey)"
),
dmc.ListItem("Coverage: 2021-present"),
dmc.ListItem(
[
"Metrics: Average/median rent, vacancy rate, universe count, ",
"turnover rate, year-over-year rent change",
]
),
],
mb="md",
),
dmc.Anchor(
"CMHC Housing Market Information Portal",
href="https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market",
target="_blank",
),
],
),
# Geographic Considerations
dmc.Paper(
p="lg",
radius="md",
withBorder=True,
mb="lg",
children=[
dmc.Title("Geographic Considerations", order=2, mb="md"),
dmc.Alert(
title="Important: Non-Aligned Geographies",
color="yellow",
mb="md",
children=[
"TRREB Districts and CMHC Zones do ",
html.Strong("not"),
" align geographically. They are displayed as separate layers and "
"should not be directly compared at the sub-regional level.",
],
),
dmc.Text(
"The dashboard presents three geographic layers:",
mb="sm",
),
dmc.List(
[
dmc.ListItem(
[
html.Strong("TRREB Districts (~35): "),
"Used for purchase/sales data visualization. "
"Districts are defined by TRREB and labeled with codes like W01, C01, E01.",
]
),
dmc.ListItem(
[
html.Strong("CMHC Zones (~20): "),
"Used for rental data visualization. "
"Zones are aligned with Census Tract boundaries.",
]
),
dmc.ListItem(
[
html.Strong("City Neighbourhoods (158): "),
"Reference overlay only. "
"These are official City of Toronto neighbourhood boundaries.",
]
),
],
),
],
),
# Policy Events
dmc.Paper(
p="lg",
radius="md",
withBorder=True,
mb="lg",
children=[
dmc.Title("Policy Event Annotations", order=2, mb="md"),
dmc.Text(
"The time series charts include markers for significant policy events "
"that may have influenced housing market conditions. These annotations are "
"for contextual reference only.",
mb="md",
),
dmc.Alert(
title="No Causation Claims",
color="blue",
children=[
"The presence of a policy marker near a market trend change does ",
html.Strong("not"),
" imply causation. Housing markets are influenced by numerous factors "
"beyond policy interventions.",
],
),
],
),
# Limitations
dmc.Paper(
p="lg",
radius="md",
withBorder=True,
mb="lg",
children=[
dmc.Title("Limitations", order=2, mb="md"),
dmc.List(
[
dmc.ListItem(
[
html.Strong("Aggregate Data: "),
"All statistics are aggregates. Individual property characteristics, "
"condition, and micro-location are not reflected.",
]
),
dmc.ListItem(
[
html.Strong("Reporting Lag: "),
"TRREB data reflects closed transactions, which may lag market "
"conditions by 1-3 months. CMHC data is annual.",
]
),
dmc.ListItem(
[
html.Strong("Geographic Boundaries: "),
"TRREB district boundaries were manually digitized from reference maps "
"and may contain minor inaccuracies.",
]
),
dmc.ListItem(
[
html.Strong("Data Suppression: "),
"Some cells may be suppressed for confidentiality when transaction "
"counts are below thresholds.",
]
),
],
),
],
),
# Technical Implementation
dmc.Paper(
p="lg",
radius="md",
withBorder=True,
children=[
dmc.Title("Technical Implementation", order=2, mb="md"),
dmc.Text("This dashboard is built with:", mb="sm"),
dmc.List(
[
dmc.ListItem("Python 3.11+ with Dash and Plotly"),
dmc.ListItem("PostgreSQL with PostGIS for geospatial data"),
dmc.ListItem("dbt for data transformation"),
dmc.ListItem("Pydantic for data validation"),
dmc.ListItem("SQLAlchemy 2.0 for database operations"),
],
mb="md",
),
dmc.Anchor(
"View source code on GitHub",
href="https://github.com/lmiranda/personal-portfolio",
target="_blank",
),
],
),
# Back link
dmc.Group(
mt="xl",
children=[
dmc.Anchor(
"← Back to Dashboard",
href="/toronto",
size="lg",
),
],
),
],
)

View File

@@ -0,0 +1,257 @@
"""Demo/sample data for testing the Toronto Housing Dashboard without full pipeline.
This module provides synthetic data for development and demonstration purposes.
Replace with real data from the database in production.
"""
from datetime import date
from typing import Any
def get_demo_districts() -> list[dict[str, Any]]:
"""Return sample TRREB district data."""
return [
{"district_code": "W01", "district_name": "Long Branch", "area_type": "West"},
{"district_code": "W02", "district_name": "Mimico", "area_type": "West"},
{
"district_code": "W03",
"district_name": "Kingsway South",
"area_type": "West",
},
{"district_code": "W04", "district_name": "Edenbridge", "area_type": "West"},
{"district_code": "W05", "district_name": "Islington", "area_type": "West"},
{"district_code": "W06", "district_name": "Rexdale", "area_type": "West"},
{"district_code": "W07", "district_name": "Willowdale", "area_type": "West"},
{"district_code": "W08", "district_name": "York", "area_type": "West"},
{
"district_code": "C01",
"district_name": "Downtown Core",
"area_type": "Central",
},
{"district_code": "C02", "district_name": "Annex", "area_type": "Central"},
{
"district_code": "C03",
"district_name": "Forest Hill",
"area_type": "Central",
},
{
"district_code": "C04",
"district_name": "Lawrence Park",
"area_type": "Central",
},
{
"district_code": "C06",
"district_name": "Willowdale East",
"area_type": "Central",
},
{"district_code": "C07", "district_name": "Thornhill", "area_type": "Central"},
{"district_code": "C08", "district_name": "Waterfront", "area_type": "Central"},
{"district_code": "E01", "district_name": "Leslieville", "area_type": "East"},
{"district_code": "E02", "district_name": "The Beaches", "area_type": "East"},
{"district_code": "E03", "district_name": "Danforth", "area_type": "East"},
{"district_code": "E04", "district_name": "Birch Cliff", "area_type": "East"},
{"district_code": "E05", "district_name": "Scarborough", "area_type": "East"},
]
def get_demo_purchase_data() -> list[dict[str, Any]]:
"""Return sample purchase data for time series visualization."""
import random
random.seed(42)
data = []
base_prices = {
"W01": 850000,
"C01": 1200000,
"E01": 950000,
}
for year in [2024, 2025]:
for month in range(1, 13):
if year == 2025 and month > 12:
break
for district, base_price in base_prices.items():
# Add some randomness and trend
trend = (year - 2024) * 12 + month
price_variation = random.uniform(-0.05, 0.05)
trend_factor = 1 + (trend * 0.002) # Slight upward trend
avg_price = int(base_price * trend_factor * (1 + price_variation))
sales = random.randint(50, 200)
data.append(
{
"district_code": district,
"full_date": date(year, month, 1),
"year": year,
"month": month,
"avg_price": avg_price,
"median_price": int(avg_price * 0.95),
"sales_count": sales,
"new_listings": int(sales * random.uniform(1.2, 1.8)),
"active_listings": int(sales * random.uniform(2.0, 3.5)),
"days_on_market": random.randint(15, 45),
"sale_to_list_ratio": round(random.uniform(0.95, 1.05), 2),
}
)
return data
def get_demo_rental_data() -> list[dict[str, Any]]:
"""Return sample rental data for visualization."""
data = []
zones = [
("Zone01", "Downtown"),
("Zone02", "Midtown"),
("Zone03", "North York"),
("Zone04", "Scarborough"),
("Zone05", "Etobicoke"),
]
bedroom_types = ["bachelor", "1_bedroom", "2_bedroom", "3_bedroom"]
base_rents = {
"bachelor": 1800,
"1_bedroom": 2200,
"2_bedroom": 2800,
"3_bedroom": 3400,
}
for year in [2021, 2022, 2023, 2024, 2025]:
for zone_code, zone_name in zones:
for bedroom in bedroom_types:
# Rental trend: ~5% increase per year
year_factor = 1 + ((year - 2021) * 0.05)
base_rent = base_rents[bedroom]
data.append(
{
"zone_code": zone_code,
"zone_name": zone_name,
"survey_year": year,
"full_date": date(year, 10, 1),
"bedroom_type": bedroom,
"average_rent": int(base_rent * year_factor),
"median_rent": int(base_rent * year_factor * 0.98),
"vacancy_rate": round(
2.5 - (year - 2021) * 0.3, 1
), # Decreasing vacancy
"universe": 5000 + (year - 2021) * 200,
}
)
return data
def get_demo_policy_events() -> list[dict[str, Any]]:
"""Return sample policy events for annotation."""
return [
{
"event_date": date(2024, 6, 5),
"effective_date": date(2024, 6, 5),
"level": "federal",
"category": "monetary",
"title": "BoC Rate Cut (25bp)",
"description": "Bank of Canada cuts overnight rate by 25 basis points to 4.75%",
"expected_direction": "bullish",
},
{
"event_date": date(2024, 7, 24),
"effective_date": date(2024, 7, 24),
"level": "federal",
"category": "monetary",
"title": "BoC Rate Cut (25bp)",
"description": "Bank of Canada cuts overnight rate by 25 basis points to 4.50%",
"expected_direction": "bullish",
},
{
"event_date": date(2024, 9, 4),
"effective_date": date(2024, 9, 4),
"level": "federal",
"category": "monetary",
"title": "BoC Rate Cut (25bp)",
"description": "Bank of Canada cuts overnight rate by 25 basis points to 4.25%",
"expected_direction": "bullish",
},
{
"event_date": date(2024, 10, 23),
"effective_date": date(2024, 10, 23),
"level": "federal",
"category": "monetary",
"title": "BoC Rate Cut (50bp)",
"description": "Bank of Canada cuts overnight rate by 50 basis points to 3.75%",
"expected_direction": "bullish",
},
{
"event_date": date(2024, 12, 11),
"effective_date": date(2024, 12, 11),
"level": "federal",
"category": "monetary",
"title": "BoC Rate Cut (50bp)",
"description": "Bank of Canada cuts overnight rate by 50 basis points to 3.25%",
"expected_direction": "bullish",
},
{
"event_date": date(2024, 9, 16),
"effective_date": date(2024, 12, 15),
"level": "federal",
"category": "regulatory",
"title": "CMHC 30-Year Amortization",
"description": "30-year amortization extended to all first-time buyers and new builds",
"expected_direction": "bullish",
},
{
"event_date": date(2024, 9, 16),
"effective_date": date(2024, 12, 15),
"level": "federal",
"category": "regulatory",
"title": "Insured Mortgage Cap $1.5M",
"description": "Insured mortgage cap raised from $1M to $1.5M",
"expected_direction": "bullish",
},
]
def get_demo_summary_metrics() -> dict[str, dict[str, Any]]:
"""Return summary metrics for KPI cards."""
return {
"avg_price": {
"value": 1067968,
"title": "Avg. Price (2025)",
"delta": -4.7,
"delta_suffix": "%",
"prefix": "$",
"format_spec": ",.0f",
"positive_is_good": True,
},
"total_sales": {
"value": 67610,
"title": "Total Sales (2024)",
"delta": 2.6,
"delta_suffix": "%",
"format_spec": ",.0f",
"positive_is_good": True,
},
"avg_rent": {
"value": 2450,
"title": "Avg. Rent (2025)",
"delta": 3.2,
"delta_suffix": "%",
"prefix": "$",
"format_spec": ",.0f",
"positive_is_good": False,
},
"vacancy_rate": {
"value": 1.8,
"title": "Vacancy Rate",
"delta": -0.4,
"delta_suffix": "pp",
"suffix": "%",
"format_spec": ".1f",
"positive_is_good": False,
},
}

View File

@@ -1 +1,32 @@
"""Database loaders for Toronto housing data.""" """Database loaders for Toronto housing data."""
from .base import bulk_insert, get_session, upsert_by_key
from .cmhc import load_cmhc_record, load_cmhc_rentals
from .dimensions import (
generate_date_key,
load_cmhc_zones,
load_neighbourhoods,
load_policy_events,
load_time_dimension,
load_trreb_districts,
)
from .trreb import load_trreb_purchases, load_trreb_record
__all__ = [
# Base utilities
"get_session",
"bulk_insert",
"upsert_by_key",
# Dimension loaders
"generate_date_key",
"load_time_dimension",
"load_trreb_districts",
"load_cmhc_zones",
"load_neighbourhoods",
"load_policy_events",
# Fact loaders
"load_trreb_purchases",
"load_trreb_record",
"load_cmhc_rentals",
"load_cmhc_record",
]

View File

@@ -0,0 +1,85 @@
"""Base loader utilities for database operations."""
from collections.abc import Generator
from contextlib import contextmanager
from typing import Any, TypeVar
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import get_session_factory
T = TypeVar("T")
@contextmanager
def get_session() -> Generator[Session, None, None]:
"""Get a database session with automatic cleanup.
Yields:
SQLAlchemy session that auto-commits on success, rollbacks on error.
"""
session_factory = get_session_factory()
session = session_factory()
try:
yield session
session.commit()
except Exception:
session.rollback()
raise
finally:
session.close()
def bulk_insert(session: Session, objects: list[T]) -> int:
"""Bulk insert objects into the database.
Args:
session: Active SQLAlchemy session.
objects: List of ORM model instances to insert.
Returns:
Number of objects inserted.
"""
session.add_all(objects)
session.flush()
return len(objects)
def upsert_by_key(
session: Session,
model_class: Any,
objects: list[T],
key_columns: list[str],
) -> tuple[int, int]:
"""Upsert objects based on unique key columns.
Args:
session: Active SQLAlchemy session.
model_class: The ORM model class.
objects: List of ORM model instances to upsert.
key_columns: Column names that form the unique key.
Returns:
Tuple of (inserted_count, updated_count).
"""
inserted = 0
updated = 0
for obj in objects:
# Build filter for existing record
filters = {col: getattr(obj, col) for col in key_columns}
existing = session.query(model_class).filter_by(**filters).first()
if existing:
# Update existing record
for column in model_class.__table__.columns:
if column.name not in key_columns and column.name != "id":
setattr(existing, column.name, getattr(obj, column.name))
updated += 1
else:
# Insert new record
session.add(obj)
inserted += 1
session.flush()
return inserted, updated

View File

@@ -0,0 +1,137 @@
"""Loader for CMHC rental data into fact_rentals."""
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import DimCMHCZone, DimTime, FactRentals
from portfolio_app.toronto.schemas import CMHCAnnualSurvey, CMHCRentalRecord
from .base import get_session, upsert_by_key
from .dimensions import generate_date_key
def load_cmhc_rentals(
survey: CMHCAnnualSurvey,
session: Session | None = None,
) -> int:
"""Load CMHC annual survey data into fact_rentals.
Args:
survey: Validated CMHC annual survey containing records.
session: Optional existing session.
Returns:
Number of records loaded.
"""
from datetime import date
def _load(sess: Session) -> int:
# Get zone key mapping
zones = sess.query(DimCMHCZone).all()
zone_map = {z.zone_code: z.zone_key for z in zones}
# CMHC surveys are annual - use October 1st as reference date
survey_date = date(survey.survey_year, 10, 1)
date_key = generate_date_key(survey_date)
# Verify time dimension exists
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
if not time_dim:
raise ValueError(
f"Time dimension not found for date_key {date_key}. "
"Load time dimension first."
)
records = []
for record in survey.records:
zone_key = zone_map.get(record.zone_code)
if not zone_key:
# Skip records for unknown zones
continue
fact = FactRentals(
date_key=date_key,
zone_key=zone_key,
bedroom_type=record.bedroom_type.value,
universe=record.universe,
avg_rent=record.average_rent,
median_rent=record.median_rent,
vacancy_rate=record.vacancy_rate,
availability_rate=record.availability_rate,
turnover_rate=record.turnover_rate,
rent_change_pct=record.rent_change_pct,
reliability_code=record.average_rent_reliability.value
if record.average_rent_reliability
else None,
)
records.append(fact)
inserted, updated = upsert_by_key(
sess, FactRentals, records, ["date_key", "zone_key", "bedroom_type"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_cmhc_record(
record: CMHCRentalRecord,
survey_year: int,
session: Session | None = None,
) -> int:
"""Load a single CMHC record into fact_rentals.
Args:
record: Single validated CMHC rental record.
survey_year: Year of the survey.
session: Optional existing session.
Returns:
Number of records loaded (0 or 1).
"""
from datetime import date
def _load(sess: Session) -> int:
# Get zone key
zone = sess.query(DimCMHCZone).filter_by(zone_code=record.zone_code).first()
if not zone:
return 0
survey_date = date(survey_year, 10, 1)
date_key = generate_date_key(survey_date)
# Verify time dimension exists
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
if not time_dim:
raise ValueError(
f"Time dimension not found for date_key {date_key}. "
"Load time dimension first."
)
fact = FactRentals(
date_key=date_key,
zone_key=zone.zone_key,
bedroom_type=record.bedroom_type.value,
universe=record.universe,
avg_rent=record.average_rent,
median_rent=record.median_rent,
vacancy_rate=record.vacancy_rate,
availability_rate=record.availability_rate,
turnover_rate=record.turnover_rate,
rent_change_pct=record.rent_change_pct,
reliability_code=record.average_rent_reliability.value
if record.average_rent_reliability
else None,
)
inserted, updated = upsert_by_key(
sess, FactRentals, [fact], ["date_key", "zone_key", "bedroom_type"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)

View File

@@ -0,0 +1,251 @@
"""Loaders for dimension tables."""
from datetime import date
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import (
DimCMHCZone,
DimNeighbourhood,
DimPolicyEvent,
DimTime,
DimTRREBDistrict,
)
from portfolio_app.toronto.schemas import (
CMHCZone,
Neighbourhood,
PolicyEvent,
TRREBDistrict,
)
from .base import get_session, upsert_by_key
def generate_date_key(d: date) -> int:
"""Generate integer date key from date (YYYYMMDD format).
Args:
d: Date to convert.
Returns:
Integer in YYYYMMDD format.
"""
return d.year * 10000 + d.month * 100 + d.day
def load_time_dimension(
start_date: date,
end_date: date,
session: Session | None = None,
) -> int:
"""Load time dimension with date range.
Args:
start_date: Start of date range.
end_date: End of date range (inclusive).
session: Optional existing session.
Returns:
Number of records loaded.
"""
month_names = [
"",
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
]
def _load(sess: Session) -> int:
records = []
current = start_date.replace(day=1) # Start at month beginning
while current <= end_date:
quarter = (current.month - 1) // 3 + 1
dim = DimTime(
date_key=generate_date_key(current),
full_date=current,
year=current.year,
month=current.month,
quarter=quarter,
month_name=month_names[current.month],
is_month_start=True,
)
records.append(dim)
# Move to next month
if current.month == 12:
current = current.replace(year=current.year + 1, month=1)
else:
current = current.replace(month=current.month + 1)
inserted, updated = upsert_by_key(sess, DimTime, records, ["date_key"])
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_trreb_districts(
districts: list[TRREBDistrict],
session: Session | None = None,
) -> int:
"""Load TRREB district dimension.
Args:
districts: List of validated district schemas.
session: Optional existing session.
Returns:
Number of records loaded.
"""
def _load(sess: Session) -> int:
records = []
for d in districts:
dim = DimTRREBDistrict(
district_code=d.district_code,
district_name=d.district_name,
area_type=d.area_type.value,
geometry=d.geometry_wkt,
)
records.append(dim)
inserted, updated = upsert_by_key(
sess, DimTRREBDistrict, records, ["district_code"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_cmhc_zones(
zones: list[CMHCZone],
session: Session | None = None,
) -> int:
"""Load CMHC zone dimension.
Args:
zones: List of validated zone schemas.
session: Optional existing session.
Returns:
Number of records loaded.
"""
def _load(sess: Session) -> int:
records = []
for z in zones:
dim = DimCMHCZone(
zone_code=z.zone_code,
zone_name=z.zone_name,
geometry=z.geometry_wkt,
)
records.append(dim)
inserted, updated = upsert_by_key(sess, DimCMHCZone, records, ["zone_code"])
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_neighbourhoods(
neighbourhoods: list[Neighbourhood],
session: Session | None = None,
) -> int:
"""Load neighbourhood dimension.
Args:
neighbourhoods: List of validated neighbourhood schemas.
session: Optional existing session.
Returns:
Number of records loaded.
"""
def _load(sess: Session) -> int:
records = []
for n in neighbourhoods:
dim = DimNeighbourhood(
neighbourhood_id=n.neighbourhood_id,
name=n.name,
geometry=n.geometry_wkt,
population=n.population,
land_area_sqkm=n.land_area_sqkm,
pop_density_per_sqkm=n.pop_density_per_sqkm,
pct_bachelors_or_higher=n.pct_bachelors_or_higher,
median_household_income=n.median_household_income,
pct_owner_occupied=n.pct_owner_occupied,
pct_renter_occupied=n.pct_renter_occupied,
census_year=n.census_year,
)
records.append(dim)
inserted, updated = upsert_by_key(
sess, DimNeighbourhood, records, ["neighbourhood_id"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_policy_events(
events: list[PolicyEvent],
session: Session | None = None,
) -> int:
"""Load policy event dimension.
Args:
events: List of validated policy event schemas.
session: Optional existing session.
Returns:
Number of records loaded.
"""
def _load(sess: Session) -> int:
records = []
for e in events:
dim = DimPolicyEvent(
event_date=e.event_date,
effective_date=e.effective_date,
level=e.level.value,
category=e.category.value,
title=e.title,
description=e.description,
expected_direction=e.expected_direction.value,
source_url=e.source_url,
confidence=e.confidence.value,
)
records.append(dim)
# For policy events, use event_date + title as unique key
inserted, updated = upsert_by_key(
sess, DimPolicyEvent, records, ["event_date", "title"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)

View File

@@ -0,0 +1,129 @@
"""Loader for TRREB purchase data into fact_purchases."""
from sqlalchemy.orm import Session
from portfolio_app.toronto.models import DimTime, DimTRREBDistrict, FactPurchases
from portfolio_app.toronto.schemas import TRREBMonthlyRecord, TRREBMonthlyReport
from .base import get_session, upsert_by_key
from .dimensions import generate_date_key
def load_trreb_purchases(
report: TRREBMonthlyReport,
session: Session | None = None,
) -> int:
"""Load TRREB monthly report data into fact_purchases.
Args:
report: Validated TRREB monthly report containing records.
session: Optional existing session.
Returns:
Number of records loaded.
"""
def _load(sess: Session) -> int:
# Get district key mapping
districts = sess.query(DimTRREBDistrict).all()
district_map = {d.district_code: d.district_key for d in districts}
# Build date key from report date
date_key = generate_date_key(report.report_date)
# Verify time dimension exists
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
if not time_dim:
raise ValueError(
f"Time dimension not found for date_key {date_key}. "
"Load time dimension first."
)
records = []
for record in report.records:
district_key = district_map.get(record.area_code)
if not district_key:
# Skip records for unknown districts (e.g., aggregate rows)
continue
fact = FactPurchases(
date_key=date_key,
district_key=district_key,
sales_count=record.sales,
dollar_volume=record.dollar_volume,
avg_price=record.avg_price,
median_price=record.median_price,
new_listings=record.new_listings,
active_listings=record.active_listings,
avg_dom=record.avg_dom,
avg_sp_lp=record.avg_sp_lp,
)
records.append(fact)
inserted, updated = upsert_by_key(
sess, FactPurchases, records, ["date_key", "district_key"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)
def load_trreb_record(
record: TRREBMonthlyRecord,
session: Session | None = None,
) -> int:
"""Load a single TRREB record into fact_purchases.
Args:
record: Single validated TRREB monthly record.
session: Optional existing session.
Returns:
Number of records loaded (0 or 1).
"""
def _load(sess: Session) -> int:
# Get district key
district = (
sess.query(DimTRREBDistrict)
.filter_by(district_code=record.area_code)
.first()
)
if not district:
return 0
date_key = generate_date_key(record.report_date)
# Verify time dimension exists
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
if not time_dim:
raise ValueError(
f"Time dimension not found for date_key {date_key}. "
"Load time dimension first."
)
fact = FactPurchases(
date_key=date_key,
district_key=district.district_key,
sales_count=record.sales,
dollar_volume=record.dollar_volume,
avg_price=record.avg_price,
median_price=record.median_price,
new_listings=record.new_listings,
active_listings=record.active_listings,
avg_dom=record.avg_dom,
avg_sp_lp=record.avg_sp_lp,
)
inserted, updated = upsert_by_key(
sess, FactPurchases, [fact], ["date_key", "district_key"]
)
return inserted + updated
if session:
return _load(session)
with get_session() as sess:
return _load(sess)

View File

@@ -1 +1,28 @@
"""SQLAlchemy models for Toronto housing data.""" """SQLAlchemy models for Toronto housing data."""
from .base import Base, create_tables, get_engine, get_session_factory
from .dimensions import (
DimCMHCZone,
DimNeighbourhood,
DimPolicyEvent,
DimTime,
DimTRREBDistrict,
)
from .facts import FactPurchases, FactRentals
__all__ = [
# Base
"Base",
"get_engine",
"get_session_factory",
"create_tables",
# Dimensions
"DimTime",
"DimTRREBDistrict",
"DimCMHCZone",
"DimNeighbourhood",
"DimPolicyEvent",
# Facts
"FactPurchases",
"FactRentals",
]

View File

@@ -0,0 +1,30 @@
"""SQLAlchemy base configuration and engine setup."""
from sqlalchemy import Engine, create_engine
from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
from portfolio_app.config import get_settings
class Base(DeclarativeBase): # type: ignore[misc]
"""Base class for all SQLAlchemy models."""
pass
def get_engine() -> Engine:
"""Create database engine from settings."""
settings = get_settings()
return create_engine(settings.database_url, echo=False)
def get_session_factory() -> sessionmaker[Session]:
"""Create session factory."""
engine = get_engine()
return sessionmaker(bind=engine)
def create_tables() -> None:
"""Create all tables in database."""
engine = get_engine()
Base.metadata.create_all(engine)

View File

@@ -0,0 +1,104 @@
"""SQLAlchemy models for dimension tables."""
from datetime import date
from geoalchemy2 import Geometry
from sqlalchemy import Boolean, Date, Integer, Numeric, String, Text
from sqlalchemy.orm import Mapped, mapped_column
from .base import Base
class DimTime(Base):
"""Time dimension table."""
__tablename__ = "dim_time"
date_key: Mapped[int] = mapped_column(Integer, primary_key=True)
full_date: Mapped[date] = mapped_column(Date, nullable=False, unique=True)
year: Mapped[int] = mapped_column(Integer, nullable=False)
month: Mapped[int] = mapped_column(Integer, nullable=False)
quarter: Mapped[int] = mapped_column(Integer, nullable=False)
month_name: Mapped[str] = mapped_column(String(20), nullable=False)
is_month_start: Mapped[bool] = mapped_column(Boolean, default=True)
class DimTRREBDistrict(Base):
"""TRREB district dimension table with PostGIS geometry."""
__tablename__ = "dim_trreb_district"
district_key: Mapped[int] = mapped_column(
Integer, primary_key=True, autoincrement=True
)
district_code: Mapped[str] = mapped_column(String(3), nullable=False, unique=True)
district_name: Mapped[str] = mapped_column(String(100), nullable=False)
area_type: Mapped[str] = mapped_column(String(10), nullable=False)
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
class DimCMHCZone(Base):
"""CMHC zone dimension table with PostGIS geometry."""
__tablename__ = "dim_cmhc_zone"
zone_key: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
zone_code: Mapped[str] = mapped_column(String(10), nullable=False, unique=True)
zone_name: Mapped[str] = mapped_column(String(100), nullable=False)
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
class DimNeighbourhood(Base):
"""City of Toronto neighbourhood dimension.
Note: No FK to fact tables in V1 - reference overlay only.
"""
__tablename__ = "dim_neighbourhood"
neighbourhood_id: Mapped[int] = mapped_column(Integer, primary_key=True)
name: Mapped[str] = mapped_column(String(100), nullable=False)
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
population: Mapped[int | None] = mapped_column(Integer, nullable=True)
land_area_sqkm: Mapped[float | None] = mapped_column(Numeric(10, 4), nullable=True)
pop_density_per_sqkm: Mapped[float | None] = mapped_column(
Numeric(10, 2), nullable=True
)
pct_bachelors_or_higher: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
median_household_income: Mapped[float | None] = mapped_column(
Numeric(12, 2), nullable=True
)
pct_owner_occupied: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
pct_renter_occupied: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
census_year: Mapped[int] = mapped_column(Integer, default=2021)
class DimPolicyEvent(Base):
"""Policy event dimension for time-series annotation."""
__tablename__ = "dim_policy_event"
event_id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
event_date: Mapped[date] = mapped_column(Date, nullable=False)
effective_date: Mapped[date | None] = mapped_column(Date, nullable=True)
level: Mapped[str] = mapped_column(
String(20), nullable=False
) # federal/provincial/municipal
category: Mapped[str] = mapped_column(
String(20), nullable=False
) # monetary/tax/regulatory/supply/economic
title: Mapped[str] = mapped_column(String(200), nullable=False)
description: Mapped[str | None] = mapped_column(Text, nullable=True)
expected_direction: Mapped[str] = mapped_column(
String(10), nullable=False
) # bearish/bullish/neutral
source_url: Mapped[str | None] = mapped_column(String(500), nullable=True)
confidence: Mapped[str] = mapped_column(
String(10), default="medium"
) # high/medium/low

View File

@@ -0,0 +1,69 @@
"""SQLAlchemy models for fact tables."""
from sqlalchemy import ForeignKey, Integer, Numeric, String
from sqlalchemy.orm import Mapped, mapped_column, relationship
from .base import Base
class FactPurchases(Base):
"""Fact table for TRREB purchase/sales data.
Grain: One row per district per month.
"""
__tablename__ = "fact_purchases"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
date_key: Mapped[int] = mapped_column(
Integer, ForeignKey("dim_time.date_key"), nullable=False
)
district_key: Mapped[int] = mapped_column(
Integer, ForeignKey("dim_trreb_district.district_key"), nullable=False
)
sales_count: Mapped[int] = mapped_column(Integer, nullable=False)
dollar_volume: Mapped[float] = mapped_column(Numeric(15, 2), nullable=False)
avg_price: Mapped[float] = mapped_column(Numeric(12, 2), nullable=False)
median_price: Mapped[float] = mapped_column(Numeric(12, 2), nullable=False)
new_listings: Mapped[int] = mapped_column(Integer, nullable=False)
active_listings: Mapped[int] = mapped_column(Integer, nullable=False)
avg_dom: Mapped[int] = mapped_column(Integer, nullable=False) # Days on market
avg_sp_lp: Mapped[float] = mapped_column(
Numeric(5, 2), nullable=False
) # Sale/List ratio
# Relationships
time = relationship("DimTime", backref="purchases")
district = relationship("DimTRREBDistrict", backref="purchases")
class FactRentals(Base):
"""Fact table for CMHC rental market data.
Grain: One row per zone per bedroom type per survey year.
"""
__tablename__ = "fact_rentals"
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
date_key: Mapped[int] = mapped_column(
Integer, ForeignKey("dim_time.date_key"), nullable=False
)
zone_key: Mapped[int] = mapped_column(
Integer, ForeignKey("dim_cmhc_zone.zone_key"), nullable=False
)
bedroom_type: Mapped[str] = mapped_column(String(20), nullable=False)
universe: Mapped[int | None] = mapped_column(Integer, nullable=True)
avg_rent: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
median_rent: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
vacancy_rate: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
availability_rate: Mapped[float | None] = mapped_column(
Numeric(5, 2), nullable=True
)
turnover_rate: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
rent_change_pct: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
reliability_code: Mapped[str | None] = mapped_column(String(2), nullable=True)
# Relationships
time = relationship("DimTime", backref="rentals")
zone = relationship("DimCMHCZone", backref="rentals")

View File

@@ -1 +1,20 @@
"""Data parsers for Toronto housing data sources.""" """Parsers for Toronto housing data sources."""
from .cmhc import CMHCParser
from .geo import (
CMHCZoneParser,
NeighbourhoodParser,
TRREBDistrictParser,
load_geojson,
)
from .trreb import TRREBParser
__all__ = [
"TRREBParser",
"CMHCParser",
# GeoJSON parsers
"CMHCZoneParser",
"TRREBDistrictParser",
"NeighbourhoodParser",
"load_geojson",
]

View File

@@ -0,0 +1,147 @@
"""CMHC CSV processor for rental market survey data.
This module provides the structure for processing CMHC (Canada Mortgage and Housing
Corporation) rental market survey data from CSV exports.
"""
from pathlib import Path
from typing import Any, cast
import pandas as pd
from portfolio_app.toronto.schemas import CMHCAnnualSurvey, CMHCRentalRecord
class CMHCParser:
"""Parser for CMHC Rental Market Survey CSV data.
CMHC conducts annual rental market surveys and publishes data including:
- Average and median rents by zone and bedroom type
- Vacancy rates
- Universe (total rental units)
- Year-over-year rent changes
Data is available via the Housing Market Information Portal as CSV exports.
"""
# Expected columns in CMHC CSV exports
REQUIRED_COLUMNS = {
"zone_code",
"zone_name",
"bedroom_type",
"survey_year",
}
# Column name mappings from CMHC export format
COLUMN_MAPPINGS = {
"Zone Code": "zone_code",
"Zone Name": "zone_name",
"Bedroom Type": "bedroom_type",
"Survey Year": "survey_year",
"Universe": "universe",
"Average Rent ($)": "avg_rent",
"Median Rent ($)": "median_rent",
"Vacancy Rate (%)": "vacancy_rate",
"Availability Rate (%)": "availability_rate",
"Turnover Rate (%)": "turnover_rate",
"% Change in Rent": "rent_change_pct",
"Reliability Code": "reliability_code",
}
def __init__(self, csv_path: Path) -> None:
"""Initialize parser with path to CSV file.
Args:
csv_path: Path to the CMHC CSV export file.
"""
self.csv_path = csv_path
self._validate_path()
def _validate_path(self) -> None:
"""Validate that the CSV path exists and is readable."""
if not self.csv_path.exists():
raise FileNotFoundError(f"CSV not found: {self.csv_path}")
if not self.csv_path.suffix.lower() == ".csv":
raise ValueError(f"Expected CSV file, got: {self.csv_path.suffix}")
def parse(self) -> CMHCAnnualSurvey:
"""Parse the CSV and return structured data.
Returns:
CMHCAnnualSurvey containing all extracted records.
Raises:
ValueError: If required columns are missing.
"""
df = self._load_csv()
df = self._normalize_columns(df)
self._validate_columns(df)
records = self._convert_to_records(df)
survey_year = self._infer_survey_year(df)
return CMHCAnnualSurvey(survey_year=survey_year, records=records)
def _load_csv(self) -> pd.DataFrame:
"""Load CSV file into DataFrame.
Returns:
Raw DataFrame from CSV.
"""
return pd.read_csv(self.csv_path)
def _normalize_columns(self, df: pd.DataFrame) -> pd.DataFrame:
"""Normalize column names to standard format.
Args:
df: DataFrame with original column names.
Returns:
DataFrame with normalized column names.
"""
rename_map = {k: v for k, v in self.COLUMN_MAPPINGS.items() if k in df.columns}
return df.rename(columns=rename_map)
def _validate_columns(self, df: pd.DataFrame) -> None:
"""Validate that all required columns are present.
Args:
df: DataFrame to validate.
Raises:
ValueError: If required columns are missing.
"""
missing = self.REQUIRED_COLUMNS - set(df.columns)
if missing:
raise ValueError(f"Missing required columns: {missing}")
def _convert_to_records(self, df: pd.DataFrame) -> list[CMHCRentalRecord]:
"""Convert DataFrame rows to validated schema records.
Args:
df: Normalized DataFrame.
Returns:
List of validated CMHCRentalRecord objects.
"""
records = []
for _, row in df.iterrows():
record_data = row.to_dict()
# Handle NaN values
record_data = {
k: (None if pd.isna(v) else v) for k, v in record_data.items()
}
records.append(CMHCRentalRecord(**cast(dict[str, Any], record_data)))
return records
def _infer_survey_year(self, df: pd.DataFrame) -> int:
"""Infer survey year from data.
Args:
df: DataFrame with survey_year column.
Returns:
Survey year as integer.
"""
if "survey_year" in df.columns:
return int(df["survey_year"].iloc[0])
raise ValueError("Cannot infer survey year from data.")

View File

@@ -0,0 +1,463 @@
"""GeoJSON parser for geographic boundary files.
This module provides parsers for loading geographic boundary files
(GeoJSON format) and converting them to Pydantic schemas for database
loading or direct use in Plotly choropleth maps.
"""
import json
from pathlib import Path
from typing import Any
from pyproj import Transformer
from shapely.geometry import mapping, shape
from shapely.ops import transform
from portfolio_app.toronto.schemas import CMHCZone, Neighbourhood, TRREBDistrict
from portfolio_app.toronto.schemas.dimensions import AreaType
# Transformer for reprojecting from Web Mercator to WGS84
_TRANSFORMER_3857_TO_4326 = Transformer.from_crs(
"EPSG:3857", "EPSG:4326", always_xy=True
)
def load_geojson(path: Path) -> dict[str, Any]:
"""Load a GeoJSON file and return as dictionary.
Args:
path: Path to the GeoJSON file.
Returns:
GeoJSON as dictionary (FeatureCollection).
Raises:
FileNotFoundError: If file does not exist.
ValueError: If file is not valid GeoJSON.
"""
if not path.exists():
raise FileNotFoundError(f"GeoJSON file not found: {path}")
if path.suffix.lower() not in (".geojson", ".json"):
raise ValueError(f"Expected GeoJSON file, got: {path.suffix}")
with open(path, encoding="utf-8") as f:
data = json.load(f)
if data.get("type") != "FeatureCollection":
raise ValueError("GeoJSON must be a FeatureCollection")
return dict(data)
def geometry_to_wkt(geometry: dict[str, Any]) -> str:
"""Convert GeoJSON geometry to WKT string.
Args:
geometry: GeoJSON geometry dictionary.
Returns:
WKT representation of the geometry.
"""
return str(shape(geometry).wkt)
def reproject_geometry(
geometry: dict[str, Any], source_crs: str = "EPSG:3857"
) -> dict[str, Any]:
"""Reproject a GeoJSON geometry to WGS84 (EPSG:4326).
Args:
geometry: GeoJSON geometry dictionary.
source_crs: Source CRS (default EPSG:3857 Web Mercator).
Returns:
GeoJSON geometry in WGS84 coordinates.
"""
if source_crs == "EPSG:3857":
transformer = _TRANSFORMER_3857_TO_4326
else:
transformer = Transformer.from_crs(source_crs, "EPSG:4326", always_xy=True)
geom = shape(geometry)
reprojected = transform(transformer.transform, geom)
return dict(mapping(reprojected))
class CMHCZoneParser:
"""Parser for CMHC zone boundary GeoJSON files.
CMHC zone boundaries are extracted from the R `cmhc` package using
`get_cmhc_geography(geography_type="ZONE", cma="Toronto")`.
Expected GeoJSON properties:
- zone_code or Zone_Code: Zone identifier
- zone_name or Zone_Name: Zone name
"""
# Property name mappings for different GeoJSON formats
CODE_PROPERTIES = ["zone_code", "Zone_Code", "ZONE_CODE", "zonecode", "code"]
NAME_PROPERTIES = [
"zone_name",
"Zone_Name",
"ZONE_NAME",
"ZONE_NAME_EN",
"NAME_EN",
"zonename",
"name",
"NAME",
]
def __init__(self, geojson_path: Path) -> None:
"""Initialize parser with path to GeoJSON file.
Args:
geojson_path: Path to the CMHC zones GeoJSON file.
"""
self.geojson_path = geojson_path
self._geojson: dict[str, Any] | None = None
@property
def geojson(self) -> dict[str, Any]:
"""Lazy-load and return raw GeoJSON data."""
if self._geojson is None:
self._geojson = load_geojson(self.geojson_path)
return self._geojson
def _find_property(
self, properties: dict[str, Any], candidates: list[str]
) -> str | None:
"""Find a property value by checking multiple candidate names."""
for name in candidates:
if name in properties and properties[name] is not None:
return str(properties[name])
return None
def parse(self) -> list[CMHCZone]:
"""Parse GeoJSON and return list of CMHCZone schemas.
Returns:
List of validated CMHCZone objects.
Raises:
ValueError: If required properties are missing.
"""
zones = []
for feature in self.geojson.get("features", []):
props = feature.get("properties", {})
geom = feature.get("geometry")
zone_code = self._find_property(props, self.CODE_PROPERTIES)
zone_name = self._find_property(props, self.NAME_PROPERTIES)
if not zone_code:
raise ValueError(
f"Zone code not found in properties: {list(props.keys())}"
)
if not zone_name:
zone_name = zone_code # Fallback to code if name missing
geometry_wkt = geometry_to_wkt(geom) if geom else None
zones.append(
CMHCZone(
zone_code=zone_code,
zone_name=zone_name,
geometry_wkt=geometry_wkt,
)
)
return zones
def _needs_reprojection(self) -> bool:
"""Check if GeoJSON needs reprojection to WGS84."""
crs = self.geojson.get("crs", {})
crs_name = crs.get("properties", {}).get("name", "")
# EPSG:3857 or Web Mercator needs reprojection
return "3857" in crs_name or "900913" in crs_name
def get_geojson_for_choropleth(
self, key_property: str = "zone_code"
) -> dict[str, Any]:
"""Get GeoJSON formatted for Plotly choropleth maps.
Ensures the feature properties include a standardized key for
joining with data. Automatically reprojects from EPSG:3857 to
WGS84 if needed.
Args:
key_property: Property name to use as feature identifier.
Returns:
GeoJSON FeatureCollection with standardized properties in WGS84.
"""
needs_reproject = self._needs_reprojection()
features = []
for feature in self.geojson.get("features", []):
props = feature.get("properties", {})
new_props = dict(props)
# Ensure standardized property names exist
zone_code = self._find_property(props, self.CODE_PROPERTIES)
zone_name = self._find_property(props, self.NAME_PROPERTIES)
new_props["zone_code"] = zone_code
new_props["zone_name"] = zone_name or zone_code
# Reproject geometry if needed
geometry = feature.get("geometry")
if needs_reproject and geometry:
geometry = reproject_geometry(geometry)
features.append(
{
"type": "Feature",
"properties": new_props,
"geometry": geometry,
}
)
return {"type": "FeatureCollection", "features": features}
class TRREBDistrictParser:
"""Parser for TRREB district boundary GeoJSON files.
TRREB district boundaries are manually digitized from the TRREB PDF map
using QGIS.
Expected GeoJSON properties:
- district_code: District code (W01, C01, E01, etc.)
- district_name: District name
- area_type: West, Central, East, or North
"""
CODE_PROPERTIES = [
"district_code",
"District_Code",
"DISTRICT_CODE",
"districtcode",
"code",
]
NAME_PROPERTIES = [
"district_name",
"District_Name",
"DISTRICT_NAME",
"districtname",
"name",
"NAME",
]
AREA_PROPERTIES = [
"area_type",
"Area_Type",
"AREA_TYPE",
"areatype",
"area",
"type",
]
def __init__(self, geojson_path: Path) -> None:
"""Initialize parser with path to GeoJSON file."""
self.geojson_path = geojson_path
self._geojson: dict[str, Any] | None = None
@property
def geojson(self) -> dict[str, Any]:
"""Lazy-load and return raw GeoJSON data."""
if self._geojson is None:
self._geojson = load_geojson(self.geojson_path)
return self._geojson
def _find_property(
self, properties: dict[str, Any], candidates: list[str]
) -> str | None:
"""Find a property value by checking multiple candidate names."""
for name in candidates:
if name in properties and properties[name] is not None:
return str(properties[name])
return None
def _infer_area_type(self, district_code: str) -> AreaType:
"""Infer area type from district code prefix."""
prefix = district_code[0].upper()
mapping = {"W": AreaType.WEST, "C": AreaType.CENTRAL, "E": AreaType.EAST}
return mapping.get(prefix, AreaType.NORTH)
def parse(self) -> list[TRREBDistrict]:
"""Parse GeoJSON and return list of TRREBDistrict schemas."""
districts = []
for feature in self.geojson.get("features", []):
props = feature.get("properties", {})
geom = feature.get("geometry")
district_code = self._find_property(props, self.CODE_PROPERTIES)
district_name = self._find_property(props, self.NAME_PROPERTIES)
area_type_str = self._find_property(props, self.AREA_PROPERTIES)
if not district_code:
raise ValueError(
f"District code not found in properties: {list(props.keys())}"
)
if not district_name:
district_name = district_code
# Infer or parse area type
if area_type_str:
try:
area_type = AreaType(area_type_str)
except ValueError:
area_type = self._infer_area_type(district_code)
else:
area_type = self._infer_area_type(district_code)
geometry_wkt = geometry_to_wkt(geom) if geom else None
districts.append(
TRREBDistrict(
district_code=district_code,
district_name=district_name,
area_type=area_type,
geometry_wkt=geometry_wkt,
)
)
return districts
def get_geojson_for_choropleth(
self, key_property: str = "district_code"
) -> dict[str, Any]:
"""Get GeoJSON formatted for Plotly choropleth maps."""
features = []
for feature in self.geojson.get("features", []):
props = feature.get("properties", {})
new_props = dict(props)
district_code = self._find_property(props, self.CODE_PROPERTIES)
district_name = self._find_property(props, self.NAME_PROPERTIES)
new_props["district_code"] = district_code
new_props["district_name"] = district_name or district_code
features.append(
{
"type": "Feature",
"properties": new_props,
"geometry": feature.get("geometry"),
}
)
return {"type": "FeatureCollection", "features": features}
class NeighbourhoodParser:
"""Parser for City of Toronto neighbourhood boundary GeoJSON files.
Neighbourhood boundaries are from the City of Toronto Open Data portal.
Expected GeoJSON properties:
- neighbourhood_id or AREA_ID: Neighbourhood ID (1-158)
- name or AREA_NAME: Neighbourhood name
"""
ID_PROPERTIES = [
"neighbourhood_id",
"AREA_SHORT_CODE", # City of Toronto 158 neighbourhoods
"AREA_LONG_CODE",
"AREA_ID",
"area_id",
"id",
"ID",
"HOOD_ID",
]
NAME_PROPERTIES = [
"AREA_NAME", # City of Toronto 158 neighbourhoods
"name",
"NAME",
"area_name",
"neighbourhood_name",
]
def __init__(self, geojson_path: Path) -> None:
"""Initialize parser with path to GeoJSON file."""
self.geojson_path = geojson_path
self._geojson: dict[str, Any] | None = None
@property
def geojson(self) -> dict[str, Any]:
"""Lazy-load and return raw GeoJSON data."""
if self._geojson is None:
self._geojson = load_geojson(self.geojson_path)
return self._geojson
def _find_property(
self, properties: dict[str, Any], candidates: list[str]
) -> str | None:
"""Find a property value by checking multiple candidate names."""
for name in candidates:
if name in properties and properties[name] is not None:
return str(properties[name])
return None
def parse(self) -> list[Neighbourhood]:
"""Parse GeoJSON and return list of Neighbourhood schemas.
Note: This parser only extracts ID, name, and geometry.
Census enrichment data (population, income, etc.) should be
loaded separately and merged.
"""
neighbourhoods = []
for feature in self.geojson.get("features", []):
props = feature.get("properties", {})
geom = feature.get("geometry")
neighbourhood_id_str = self._find_property(props, self.ID_PROPERTIES)
name = self._find_property(props, self.NAME_PROPERTIES)
if not neighbourhood_id_str:
raise ValueError(
f"Neighbourhood ID not found in properties: {list(props.keys())}"
)
neighbourhood_id = int(neighbourhood_id_str)
if not name:
name = f"Neighbourhood {neighbourhood_id}"
geometry_wkt = geometry_to_wkt(geom) if geom else None
neighbourhoods.append(
Neighbourhood(
neighbourhood_id=neighbourhood_id,
name=name,
geometry_wkt=geometry_wkt,
)
)
return neighbourhoods
def get_geojson_for_choropleth(
self, key_property: str = "neighbourhood_id"
) -> dict[str, Any]:
"""Get GeoJSON formatted for Plotly choropleth maps."""
features = []
for feature in self.geojson.get("features", []):
props = feature.get("properties", {})
new_props = dict(props)
neighbourhood_id = self._find_property(props, self.ID_PROPERTIES)
name = self._find_property(props, self.NAME_PROPERTIES)
new_props["neighbourhood_id"] = (
int(neighbourhood_id) if neighbourhood_id else None
)
new_props["name"] = name
features.append(
{
"type": "Feature",
"properties": new_props,
"geometry": feature.get("geometry"),
}
)
return {"type": "FeatureCollection", "features": features}

View File

@@ -0,0 +1,82 @@
"""TRREB PDF parser for monthly market watch reports.
This module provides the structure for parsing TRREB (Toronto Regional Real Estate Board)
monthly Market Watch PDF reports into structured data.
"""
from pathlib import Path
from typing import Any
from portfolio_app.toronto.schemas import TRREBMonthlyRecord, TRREBMonthlyReport
class TRREBParser:
"""Parser for TRREB Market Watch PDF reports.
TRREB publishes monthly Market Watch reports as PDFs containing:
- Summary statistics by area (416, 905, Total)
- District-level breakdowns
- Year-over-year comparisons
The parser extracts tabular data from these PDFs and validates
against the TRREBMonthlyRecord schema.
"""
def __init__(self, pdf_path: Path) -> None:
"""Initialize parser with path to PDF file.
Args:
pdf_path: Path to the TRREB Market Watch PDF file.
"""
self.pdf_path = pdf_path
self._validate_path()
def _validate_path(self) -> None:
"""Validate that the PDF path exists and is readable."""
if not self.pdf_path.exists():
raise FileNotFoundError(f"PDF not found: {self.pdf_path}")
if not self.pdf_path.suffix.lower() == ".pdf":
raise ValueError(f"Expected PDF file, got: {self.pdf_path.suffix}")
def parse(self) -> TRREBMonthlyReport:
"""Parse the PDF and return structured data.
Returns:
TRREBMonthlyReport containing all extracted records.
Raises:
NotImplementedError: PDF parsing not yet implemented.
"""
raise NotImplementedError(
"PDF parsing requires pdfplumber/tabula-py. "
"Implementation pending Sprint 4 data ingestion."
)
def _extract_tables(self) -> list[dict[str, Any]]:
"""Extract raw tables from PDF pages.
Returns:
List of dictionaries representing table data.
"""
raise NotImplementedError("Table extraction not yet implemented.")
def _parse_district_table(
self, table_data: list[dict[str, Any]]
) -> list[TRREBMonthlyRecord]:
"""Parse district-level statistics table.
Args:
table_data: Raw table data extracted from PDF.
Returns:
List of validated TRREBMonthlyRecord objects.
"""
raise NotImplementedError("District table parsing not yet implemented.")
def _infer_report_date(self) -> tuple[int, int]:
"""Infer report year and month from PDF filename or content.
Returns:
Tuple of (year, month).
"""
raise NotImplementedError("Date inference not yet implemented.")

View File

@@ -1 +1,39 @@
"""Pydantic schemas for Toronto housing data validation.""" """Pydantic schemas for Toronto housing data validation."""
from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
from .dimensions import (
AreaType,
CMHCZone,
Confidence,
ExpectedDirection,
Neighbourhood,
PolicyCategory,
PolicyEvent,
PolicyLevel,
TimeDimension,
TRREBDistrict,
)
from .trreb import TRREBMonthlyRecord, TRREBMonthlyReport
__all__ = [
# TRREB
"TRREBMonthlyRecord",
"TRREBMonthlyReport",
# CMHC
"CMHCRentalRecord",
"CMHCAnnualSurvey",
"BedroomType",
"ReliabilityCode",
# Dimensions
"TimeDimension",
"TRREBDistrict",
"CMHCZone",
"Neighbourhood",
"PolicyEvent",
# Enums
"AreaType",
"PolicyLevel",
"PolicyCategory",
"ExpectedDirection",
"Confidence",
]

View File

@@ -0,0 +1,81 @@
"""Pydantic schemas for CMHC rental market data."""
from decimal import Decimal
from enum import Enum
from pydantic import BaseModel, Field
class BedroomType(str, Enum):
"""CMHC bedroom type categories."""
BACHELOR = "Bachelor"
ONE_BED = "1 Bedroom"
TWO_BED = "2 Bedroom"
THREE_BED_PLUS = "3 Bedroom+"
TOTAL = "Total"
class ReliabilityCode(str, Enum):
"""CMHC data reliability codes.
Based on coefficient of variation (CV).
"""
EXCELLENT = "a" # CV <= 2.5%
GOOD = "b" # 2.5% < CV <= 5%
FAIR = "c" # 5% < CV <= 10%
POOR = "d" # CV > 10%
SUPPRESSED = "**" # Sample too small
class CMHCRentalRecord(BaseModel):
"""Schema for a single CMHC rental survey record.
Represents rental data for one zone and bedroom type in one survey year.
"""
survey_year: int = Field(ge=1990, description="Survey year (October snapshot)")
zone_code: str = Field(max_length=10, description="CMHC zone identifier")
zone_name: str = Field(max_length=100, description="Zone name")
bedroom_type: BedroomType = Field(description="Bedroom category")
universe: int | None = Field(
default=None, ge=0, description="Total rental units in zone"
)
vacancy_rate: Decimal | None = Field(
default=None, ge=0, le=100, description="Vacancy rate (%)"
)
vacancy_rate_reliability: ReliabilityCode | None = Field(default=None)
availability_rate: Decimal | None = Field(
default=None, ge=0, le=100, description="Availability rate (%)"
)
average_rent: Decimal | None = Field(
default=None, ge=0, description="Average monthly rent ($)"
)
average_rent_reliability: ReliabilityCode | None = Field(default=None)
median_rent: Decimal | None = Field(
default=None, ge=0, description="Median monthly rent ($)"
)
rent_change_pct: Decimal | None = Field(
default=None, description="YoY rent change (%)"
)
turnover_rate: Decimal | None = Field(
default=None, ge=0, le=100, description="Unit turnover rate (%)"
)
model_config = {"str_strip_whitespace": True}
class CMHCAnnualSurvey(BaseModel):
"""Schema for a complete CMHC annual survey for Toronto.
Contains all zone and bedroom type combinations for one survey year.
"""
survey_year: int
records: list[CMHCRentalRecord]
@property
def zone_count(self) -> int:
"""Number of unique zones in survey."""
return len({r.zone_code for r in self.records})

View File

@@ -0,0 +1,121 @@
"""Pydantic schemas for dimension tables."""
from datetime import date
from decimal import Decimal
from enum import Enum
from pydantic import BaseModel, Field, HttpUrl
class PolicyLevel(str, Enum):
"""Government level for policy events."""
FEDERAL = "federal"
PROVINCIAL = "provincial"
MUNICIPAL = "municipal"
class PolicyCategory(str, Enum):
"""Policy event category."""
MONETARY = "monetary"
TAX = "tax"
REGULATORY = "regulatory"
SUPPLY = "supply"
ECONOMIC = "economic"
class ExpectedDirection(str, Enum):
"""Expected price impact direction."""
BULLISH = "bullish" # Expected to increase prices
BEARISH = "bearish" # Expected to decrease prices
NEUTRAL = "neutral" # Uncertain or mixed impact
class Confidence(str, Enum):
"""Confidence level in policy event data."""
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
class AreaType(str, Enum):
"""TRREB area type."""
WEST = "West"
CENTRAL = "Central"
EAST = "East"
NORTH = "North"
class TimeDimension(BaseModel):
"""Schema for time dimension record."""
date_key: int = Field(description="Date key in YYYYMMDD format")
full_date: date
year: int = Field(ge=2000, le=2100)
month: int = Field(ge=1, le=12)
quarter: int = Field(ge=1, le=4)
month_name: str = Field(max_length=20)
is_month_start: bool = True
class TRREBDistrict(BaseModel):
"""Schema for TRREB district dimension."""
district_code: str = Field(max_length=3, description="W01, C01, E01, etc.")
district_name: str = Field(max_length=100)
area_type: AreaType
geometry_wkt: str | None = Field(default=None, description="WKT geometry string")
class CMHCZone(BaseModel):
"""Schema for CMHC zone dimension."""
zone_code: str = Field(max_length=10)
zone_name: str = Field(max_length=100)
geometry_wkt: str | None = Field(default=None, description="WKT geometry string")
class Neighbourhood(BaseModel):
"""Schema for City of Toronto neighbourhood dimension.
Note: No FK to fact tables in V1 - reference overlay only.
"""
neighbourhood_id: int = Field(ge=1, le=200)
name: str = Field(max_length=100)
geometry_wkt: str | None = Field(default=None)
population: int | None = Field(default=None, ge=0)
land_area_sqkm: Decimal | None = Field(default=None, ge=0)
pop_density_per_sqkm: Decimal | None = Field(default=None, ge=0)
pct_bachelors_or_higher: Decimal | None = Field(default=None, ge=0, le=100)
median_household_income: Decimal | None = Field(default=None, ge=0)
pct_owner_occupied: Decimal | None = Field(default=None, ge=0, le=100)
pct_renter_occupied: Decimal | None = Field(default=None, ge=0, le=100)
census_year: int = Field(default=2021, description="Census year for SCD tracking")
class PolicyEvent(BaseModel):
"""Schema for policy event dimension.
Used for time-series annotation. No causation claims.
"""
event_date: date = Field(description="Date event was announced/occurred")
effective_date: date | None = Field(
default=None, description="Date policy took effect"
)
level: PolicyLevel
category: PolicyCategory
title: str = Field(max_length=200, description="Short event title for display")
description: str | None = Field(
default=None, description="Longer description for tooltip"
)
expected_direction: ExpectedDirection
source_url: HttpUrl | None = Field(default=None)
confidence: Confidence = Field(default=Confidence.MEDIUM)
model_config = {"str_strip_whitespace": True}

View File

@@ -0,0 +1,52 @@
"""Pydantic schemas for TRREB monthly market data."""
from datetime import date
from decimal import Decimal
from pydantic import BaseModel, Field
class TRREBMonthlyRecord(BaseModel):
"""Schema for a single TRREB monthly summary record.
Represents aggregated sales data for one district in one month.
"""
report_date: date = Field(description="First of month (YYYY-MM-01)")
area_code: str = Field(
max_length=3, description="District code (W01, C01, E01, etc.)"
)
area_name: str = Field(max_length=100, description="District name")
area_type: str = Field(max_length=10, description="West / Central / East / North")
sales: int = Field(ge=0, description="Number of transactions")
dollar_volume: Decimal = Field(ge=0, description="Total sales volume ($)")
avg_price: Decimal = Field(ge=0, description="Average sale price ($)")
median_price: Decimal = Field(ge=0, description="Median sale price ($)")
new_listings: int = Field(ge=0, description="New listings count")
active_listings: int = Field(ge=0, description="Active listings at month end")
avg_sp_lp: Decimal = Field(
ge=0, le=200, description="Avg sale price / list price ratio (%)"
)
avg_dom: int = Field(ge=0, description="Average days on market")
model_config = {"str_strip_whitespace": True}
class TRREBMonthlyReport(BaseModel):
"""Schema for a complete TRREB monthly report.
Contains all district records for a single month.
"""
report_date: date
records: list[TRREBMonthlyRecord]
@property
def total_sales(self) -> int:
"""Total sales across all districts."""
return sum(r.sales for r in self.records)
@property
def district_count(self) -> int:
"""Number of districts in report."""
return len(self.records)

View File

@@ -22,53 +22,54 @@ classifiers = [
] ]
dependencies = [ dependencies = [
# Database # Database
"sqlalchemy>=2.0", "sqlalchemy>=2.0.45",
"psycopg2-binary>=2.9", "psycopg2-binary>=2.9",
"geoalchemy2>=0.14", "geoalchemy2>=0.15",
# Validation # Validation
"pydantic>=2.0", "pydantic>=2.10",
"pydantic-settings>=2.0", "pydantic-settings>=2.6",
# Data Processing # Data Processing
"pandas>=2.1", "pandas>=2.3",
"geopandas>=0.14", "geopandas>=1.1",
"shapely>=2.0", "shapely>=2.0",
# Visualization # Visualization
"dash>=2.14", "dash>=3.3",
"plotly>=5.18", "plotly>=6.5",
"dash-mantine-components>=0.14", "dash-mantine-components>=2.4",
"dash-iconify>=0.1",
# PDF Parsing # PDF Parsing
"pdfplumber>=0.10", "pdfplumber>=0.11",
"tabula-py>=2.9", "tabula-py>=2.9",
# Utilities # Utilities
"python-dotenv>=1.0", "python-dotenv>=1.0",
"httpx>=0.25", "httpx>=0.28",
] ]
[project.optional-dependencies] [project.optional-dependencies]
dev = [ dev = [
# Testing # Testing
"pytest>=7.0", "pytest>=8.3",
"pytest-cov>=4.0", "pytest-cov>=6.0",
"pytest-asyncio>=0.21", "pytest-asyncio>=0.24",
# Linting & Formatting # Linting & Formatting
"ruff>=0.1", "ruff>=0.8",
"mypy>=1.7", "mypy>=1.14",
# Pre-commit # Pre-commit
"pre-commit>=3.5", "pre-commit>=4.0",
# Type stubs # Type stubs
"pandas-stubs", "pandas-stubs",
"types-requests", "types-requests",
] ]
dbt = [ dbt = [
"dbt-postgres>=1.7", "dbt-postgres>=1.9",
] ]
[project.scripts] [project.scripts]
@@ -132,17 +133,20 @@ skip-magic-trailing-comma = false
python_version = "3.11" python_version = "3.11"
strict = true strict = true
warn_return_any = true warn_return_any = true
warn_unused_ignores = true warn_unused_ignores = false
disallow_untyped_defs = true disallow_untyped_defs = true
plugins = ["pydantic.mypy"] plugins = ["pydantic.mypy"]
[[tool.mypy.overrides]] [[tool.mypy.overrides]]
module = [ module = [
"dash.*", "dash.*",
"dash_mantine_components.*",
"dash_iconify.*",
"plotly.*", "plotly.*",
"geopandas.*", "geopandas.*",
"shapely.*", "shapely.*",
"pdfplumber.*", "pdfplumber.*",
"tabula.*", "tabula.*",
"pydantic_settings.*",
] ]
ignore_missing_imports = true ignore_missing_imports = true

52
scripts/db/init_schema.py Normal file
View File

@@ -0,0 +1,52 @@
#!/usr/bin/env python3
"""Initialize database schema.
Usage:
python scripts/db/init_schema.py
This script creates all SQLAlchemy tables in the database.
Run this after docker-compose up to initialize the schema.
"""
import sys
from pathlib import Path
# Add project root to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from portfolio_app.toronto.models import create_tables, get_engine # noqa: E402
def main() -> int:
"""Initialize the database schema."""
print("Initializing database schema...")
try:
engine = get_engine()
# Test connection
with engine.connect() as conn:
result = conn.execute("SELECT 1")
result.fetchone()
print("Database connection successful")
# Create all tables
create_tables()
print("Schema created successfully")
# List created tables
from sqlalchemy import inspect
inspector = inspect(engine)
tables = inspector.get_table_names()
print(f"Created tables: {', '.join(tables)}")
return 0
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,6 @@
"""Placeholder test to ensure pytest collection succeeds."""
def test_placeholder():
"""Remove this once real tests are added."""
assert True