Compare commits
22 Commits
8200bbaa99
...
sprint-7-c
| Author | SHA1 | Date | |
|---|---|---|---|
| d64f90b3d3 | |||
| b3fb94c7cb | |||
| 1e0ea9cca2 | |||
| 9dfa24fb76 | |||
| 8701a12b41 | |||
| 6ef5460ad0 | |||
| 19ffc04573 | |||
| 08aa61f85e | |||
| 2a6db2a252 | |||
| 140d3085bf | |||
| ad6ee3d37f | |||
| 077e426d34 | |||
| b7907e68e4 | |||
| 457bb49395 | |||
| 88e23674a8 | |||
| 1c42533834 | |||
| 802efab8b8 | |||
| ead6d91a28 | |||
| 549e1fcbaf | |||
| 3ee4c20f5e | |||
| 68cc5bbe66 | |||
| 58f2c692e3 |
@@ -7,6 +7,7 @@ repos:
|
||||
- id: check-yaml
|
||||
- id: check-added-large-files
|
||||
args: ['--maxkb=1000']
|
||||
exclude: ^data/(raw/|toronto/raw/geo/)
|
||||
- id: check-merge-conflict
|
||||
|
||||
- repo: https://github.com/astral-sh/ruff-pre-commit
|
||||
|
||||
@@ -6,7 +6,7 @@ Working context for Claude Code on the Analytics Portfolio project.
|
||||
|
||||
## Project Status
|
||||
|
||||
**Current Sprint**: 1 (Project Bootstrap)
|
||||
**Current Sprint**: 7 (Navigation & Theme Modernization)
|
||||
**Phase**: 1 - Toronto Housing Dashboard
|
||||
**Branch**: `development` (feature branches merge here)
|
||||
|
||||
@@ -254,4 +254,4 @@ All scripts in `scripts/`:
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: Sprint 1*
|
||||
*Last Updated: Sprint 7*
|
||||
|
||||
120
README.md
120
README.md
@@ -1,2 +1,120 @@
|
||||
# personal-portfolio
|
||||
# Analytics Portfolio
|
||||
|
||||
A data analytics portfolio showcasing end-to-end data engineering, visualization, and analysis capabilities.
|
||||
|
||||
## Projects
|
||||
|
||||
### Toronto Housing Dashboard
|
||||
|
||||
An interactive choropleth dashboard analyzing Toronto's housing market using multi-source data integration.
|
||||
|
||||
**Features:**
|
||||
- Purchase market analysis from TRREB monthly reports
|
||||
- Rental market analysis from CMHC annual surveys
|
||||
- Interactive choropleth maps by district/zone
|
||||
- Time series visualization with policy event annotations
|
||||
- Purchase/Rental mode toggle
|
||||
|
||||
**Data Sources:**
|
||||
- [TRREB Market Watch](https://trreb.ca/market-data/market-watch/) - Monthly purchase statistics
|
||||
- [CMHC Rental Market Survey](https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market) - Annual rental data
|
||||
|
||||
**Tech Stack:**
|
||||
- Python 3.11+ / Dash / Plotly
|
||||
- PostgreSQL + PostGIS
|
||||
- dbt for data transformation
|
||||
- Pydantic for validation
|
||||
- SQLAlchemy 2.0
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Clone and setup
|
||||
git clone https://github.com/lmiranda/personal-portfolio.git
|
||||
cd personal-portfolio
|
||||
|
||||
# Install dependencies and configure environment
|
||||
make setup
|
||||
|
||||
# Start database
|
||||
make docker-up
|
||||
|
||||
# Initialize database schema
|
||||
make db-init
|
||||
|
||||
# Run development server
|
||||
make run
|
||||
```
|
||||
|
||||
Visit `http://localhost:8050` to view the portfolio.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
portfolio_app/
|
||||
├── app.py # Dash app factory
|
||||
├── config.py # Pydantic settings
|
||||
├── pages/
|
||||
│ ├── home.py # Bio landing page (/)
|
||||
│ └── toronto/ # Toronto dashboard (/toronto)
|
||||
├── components/ # Shared UI components
|
||||
├── figures/ # Plotly figure factories
|
||||
└── toronto/ # Toronto data logic
|
||||
├── parsers/ # PDF/CSV extraction
|
||||
├── loaders/ # Database operations
|
||||
├── schemas/ # Pydantic models
|
||||
└── models/ # SQLAlchemy ORM
|
||||
|
||||
dbt/
|
||||
├── models/
|
||||
│ ├── staging/ # 1:1 source tables
|
||||
│ ├── intermediate/ # Business logic
|
||||
│ └── marts/ # Analytical tables
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
make test # Run tests
|
||||
make lint # Run linter
|
||||
make format # Format code
|
||||
make ci # Run all checks
|
||||
```
|
||||
|
||||
## Data Pipeline
|
||||
|
||||
```
|
||||
Raw Files (PDF/Excel)
|
||||
↓
|
||||
Parsers (pdfplumber, pandas)
|
||||
↓
|
||||
Pydantic Validation
|
||||
↓
|
||||
SQLAlchemy Loaders
|
||||
↓
|
||||
PostgreSQL + PostGIS
|
||||
↓
|
||||
dbt Transformations
|
||||
↓
|
||||
Dash Visualization
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Copy `.env.example` to `.env` and configure:
|
||||
|
||||
```bash
|
||||
DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
|
||||
POSTGRES_USER=portfolio
|
||||
POSTGRES_PASSWORD=<secure>
|
||||
POSTGRES_DB=portfolio
|
||||
DASH_DEBUG=true
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
## Author
|
||||
|
||||
Leo Miranda - [GitHub](https://github.com/lmiranda) | [LinkedIn](https://linkedin.com/in/yourprofile)
|
||||
|
||||
BIN
data/raw/cmhc/rmr-toronto-2021-en.xlsx
Normal file
BIN
data/raw/cmhc/rmr-toronto-2021-en.xlsx
Normal file
Binary file not shown.
BIN
data/raw/cmhc/rmr-toronto-2022-en.xlsx
Normal file
BIN
data/raw/cmhc/rmr-toronto-2022-en.xlsx
Normal file
Binary file not shown.
BIN
data/raw/cmhc/rmr-toronto-2023-en.xlsx
Normal file
BIN
data/raw/cmhc/rmr-toronto-2023-en.xlsx
Normal file
Binary file not shown.
BIN
data/raw/cmhc/rmr-toronto-2024-en.xlsx
Normal file
BIN
data/raw/cmhc/rmr-toronto-2024-en.xlsx
Normal file
Binary file not shown.
BIN
data/raw/cmhc/rmr-toronto-2025-en.xlsx
Normal file
BIN
data/raw/cmhc/rmr-toronto-2025-en.xlsx
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2401.pdf
Normal file
BIN
data/raw/trreb/mw2401.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2402.pdf
Normal file
BIN
data/raw/trreb/mw2402.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2403.pdf
Normal file
BIN
data/raw/trreb/mw2403.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2404.pdf
Normal file
BIN
data/raw/trreb/mw2404.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2405.pdf
Normal file
BIN
data/raw/trreb/mw2405.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2406.pdf
Normal file
BIN
data/raw/trreb/mw2406.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2407.pdf
Normal file
BIN
data/raw/trreb/mw2407.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2408.pdf
Normal file
BIN
data/raw/trreb/mw2408.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2409.pdf
Normal file
BIN
data/raw/trreb/mw2409.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2410.pdf
Normal file
BIN
data/raw/trreb/mw2410.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2411.pdf
Normal file
BIN
data/raw/trreb/mw2411.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2412.pdf
Normal file
BIN
data/raw/trreb/mw2412.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2501.pdf
Normal file
BIN
data/raw/trreb/mw2501.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2502.pdf
Normal file
BIN
data/raw/trreb/mw2502.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2503.pdf
Normal file
BIN
data/raw/trreb/mw2503.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2504.pdf
Normal file
BIN
data/raw/trreb/mw2504.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2505.pdf
Normal file
BIN
data/raw/trreb/mw2505.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2506.pdf
Normal file
BIN
data/raw/trreb/mw2506.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2507.pdf
Normal file
BIN
data/raw/trreb/mw2507.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2508.pdf
Normal file
BIN
data/raw/trreb/mw2508.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2509.pdf
Normal file
BIN
data/raw/trreb/mw2509.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2510.pdf
Normal file
BIN
data/raw/trreb/mw2510.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2511.pdf
Normal file
BIN
data/raw/trreb/mw2511.pdf
Normal file
Binary file not shown.
BIN
data/raw/trreb/mw2512.pdf
Normal file
BIN
data/raw/trreb/mw2512.pdf
Normal file
Binary file not shown.
0
data/toronto/raw/geo/.gitkeep
Normal file
0
data/toronto/raw/geo/.gitkeep
Normal file
38
data/toronto/raw/geo/cmhc_zones.geojson
Normal file
38
data/toronto/raw/geo/cmhc_zones.geojson
Normal file
File diff suppressed because one or more lines are too long
1
data/toronto/raw/geo/toronto_neighbourhoods.geojson
Normal file
1
data/toronto/raw/geo/toronto_neighbourhoods.geojson
Normal file
File diff suppressed because one or more lines are too long
28
dbt/dbt_project.yml
Normal file
28
dbt/dbt_project.yml
Normal file
@@ -0,0 +1,28 @@
|
||||
name: 'toronto_housing'
|
||||
version: '1.0.0'
|
||||
config-version: 2
|
||||
|
||||
profile: 'toronto_housing'
|
||||
|
||||
model-paths: ["models"]
|
||||
analysis-paths: ["analyses"]
|
||||
test-paths: ["tests"]
|
||||
seed-paths: ["seeds"]
|
||||
macro-paths: ["macros"]
|
||||
snapshot-paths: ["snapshots"]
|
||||
|
||||
clean-targets:
|
||||
- "target"
|
||||
- "dbt_packages"
|
||||
|
||||
models:
|
||||
toronto_housing:
|
||||
staging:
|
||||
+materialized: view
|
||||
+schema: staging
|
||||
intermediate:
|
||||
+materialized: view
|
||||
+schema: intermediate
|
||||
marts:
|
||||
+materialized: table
|
||||
+schema: marts
|
||||
24
dbt/models/intermediate/_intermediate.yml
Normal file
24
dbt/models/intermediate/_intermediate.yml
Normal file
@@ -0,0 +1,24 @@
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: int_purchases__monthly
|
||||
description: "Purchase data enriched with time and district dimensions"
|
||||
columns:
|
||||
- name: purchase_id
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: district_code
|
||||
tests:
|
||||
- not_null
|
||||
|
||||
- name: int_rentals__annual
|
||||
description: "Rental data enriched with time and zone dimensions"
|
||||
columns:
|
||||
- name: rental_id
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: zone_code
|
||||
tests:
|
||||
- not_null
|
||||
62
dbt/models/intermediate/int_purchases__monthly.sql
Normal file
62
dbt/models/intermediate/int_purchases__monthly.sql
Normal file
@@ -0,0 +1,62 @@
|
||||
-- Intermediate: Monthly purchase data enriched with dimensions
|
||||
-- Joins purchases with time and district dimensions for analysis
|
||||
|
||||
with purchases as (
|
||||
select * from {{ ref('stg_trreb__purchases') }}
|
||||
),
|
||||
|
||||
time_dim as (
|
||||
select * from {{ ref('stg_dimensions__time') }}
|
||||
),
|
||||
|
||||
district_dim as (
|
||||
select * from {{ ref('stg_dimensions__trreb_districts') }}
|
||||
),
|
||||
|
||||
enriched as (
|
||||
select
|
||||
p.purchase_id,
|
||||
|
||||
-- Time attributes
|
||||
t.date_key,
|
||||
t.full_date,
|
||||
t.year,
|
||||
t.month,
|
||||
t.quarter,
|
||||
t.month_name,
|
||||
|
||||
-- District attributes
|
||||
d.district_key,
|
||||
d.district_code,
|
||||
d.district_name,
|
||||
d.area_type,
|
||||
|
||||
-- Metrics
|
||||
p.sales_count,
|
||||
p.dollar_volume,
|
||||
p.avg_price,
|
||||
p.median_price,
|
||||
p.new_listings,
|
||||
p.active_listings,
|
||||
p.days_on_market,
|
||||
p.sale_to_list_ratio,
|
||||
|
||||
-- Calculated metrics
|
||||
case
|
||||
when p.active_listings > 0
|
||||
then round(p.sales_count::numeric / p.active_listings, 3)
|
||||
else null
|
||||
end as absorption_rate,
|
||||
|
||||
case
|
||||
when p.sales_count > 0
|
||||
then round(p.active_listings::numeric / p.sales_count, 1)
|
||||
else null
|
||||
end as months_of_inventory
|
||||
|
||||
from purchases p
|
||||
inner join time_dim t on p.date_key = t.date_key
|
||||
inner join district_dim d on p.district_key = d.district_key
|
||||
)
|
||||
|
||||
select * from enriched
|
||||
57
dbt/models/intermediate/int_rentals__annual.sql
Normal file
57
dbt/models/intermediate/int_rentals__annual.sql
Normal file
@@ -0,0 +1,57 @@
|
||||
-- Intermediate: Annual rental data enriched with dimensions
|
||||
-- Joins rentals with time and zone dimensions for analysis
|
||||
|
||||
with rentals as (
|
||||
select * from {{ ref('stg_cmhc__rentals') }}
|
||||
),
|
||||
|
||||
time_dim as (
|
||||
select * from {{ ref('stg_dimensions__time') }}
|
||||
),
|
||||
|
||||
zone_dim as (
|
||||
select * from {{ ref('stg_dimensions__cmhc_zones') }}
|
||||
),
|
||||
|
||||
enriched as (
|
||||
select
|
||||
r.rental_id,
|
||||
|
||||
-- Time attributes
|
||||
t.date_key,
|
||||
t.full_date,
|
||||
t.year,
|
||||
t.month,
|
||||
t.quarter,
|
||||
|
||||
-- Zone attributes
|
||||
z.zone_key,
|
||||
z.zone_code,
|
||||
z.zone_name,
|
||||
|
||||
-- Bedroom type
|
||||
r.bedroom_type,
|
||||
|
||||
-- Metrics
|
||||
r.rental_universe,
|
||||
r.avg_rent,
|
||||
r.median_rent,
|
||||
r.vacancy_rate,
|
||||
r.availability_rate,
|
||||
r.turnover_rate,
|
||||
r.year_over_year_rent_change,
|
||||
r.reliability_code,
|
||||
|
||||
-- Calculated metrics
|
||||
case
|
||||
when r.rental_universe > 0 and r.vacancy_rate is not null
|
||||
then round(r.rental_universe * (r.vacancy_rate / 100), 0)
|
||||
else null
|
||||
end as vacant_units_estimate
|
||||
|
||||
from rentals r
|
||||
inner join time_dim t on r.date_key = t.date_key
|
||||
inner join zone_dim z on r.zone_key = z.zone_key
|
||||
)
|
||||
|
||||
select * from enriched
|
||||
23
dbt/models/marts/_marts.yml
Normal file
23
dbt/models/marts/_marts.yml
Normal file
@@ -0,0 +1,23 @@
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: mart_toronto_purchases
|
||||
description: "Final mart for Toronto purchase/sales analysis by district and time"
|
||||
columns:
|
||||
- name: purchase_id
|
||||
description: "Unique purchase record identifier"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: mart_toronto_rentals
|
||||
description: "Final mart for Toronto rental market analysis by zone and time"
|
||||
columns:
|
||||
- name: rental_id
|
||||
description: "Unique rental record identifier"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: mart_toronto_market_summary
|
||||
description: "Combined market summary aggregating purchases and rentals at Toronto level"
|
||||
81
dbt/models/marts/mart_toronto_market_summary.sql
Normal file
81
dbt/models/marts/mart_toronto_market_summary.sql
Normal file
@@ -0,0 +1,81 @@
|
||||
-- Mart: Toronto Market Summary
|
||||
-- Aggregated view combining purchase and rental market indicators
|
||||
-- Grain: One row per year-month
|
||||
|
||||
with purchases_agg as (
|
||||
select
|
||||
year,
|
||||
month,
|
||||
month_name,
|
||||
quarter,
|
||||
|
||||
-- Aggregate purchase metrics across all districts
|
||||
sum(sales_count) as total_sales,
|
||||
sum(dollar_volume) as total_dollar_volume,
|
||||
round(avg(avg_price), 0) as avg_price_all_districts,
|
||||
round(avg(median_price), 0) as median_price_all_districts,
|
||||
sum(new_listings) as total_new_listings,
|
||||
sum(active_listings) as total_active_listings,
|
||||
round(avg(days_on_market), 0) as avg_days_on_market,
|
||||
round(avg(sale_to_list_ratio), 2) as avg_sale_to_list_ratio,
|
||||
round(avg(absorption_rate), 3) as avg_absorption_rate,
|
||||
round(avg(months_of_inventory), 1) as avg_months_of_inventory,
|
||||
round(avg(avg_price_yoy_pct), 2) as avg_price_yoy_pct
|
||||
|
||||
from {{ ref('mart_toronto_purchases') }}
|
||||
group by year, month, month_name, quarter
|
||||
),
|
||||
|
||||
rentals_agg as (
|
||||
select
|
||||
year,
|
||||
|
||||
-- Aggregate rental metrics across all zones (all bedroom types)
|
||||
round(avg(avg_rent), 0) as avg_rent_all_zones,
|
||||
round(avg(vacancy_rate), 2) as avg_vacancy_rate,
|
||||
round(avg(rent_change_pct), 2) as avg_rent_change_pct,
|
||||
sum(rental_universe) as total_rental_universe
|
||||
|
||||
from {{ ref('mart_toronto_rentals') }}
|
||||
group by year
|
||||
),
|
||||
|
||||
final as (
|
||||
select
|
||||
p.year,
|
||||
p.month,
|
||||
p.month_name,
|
||||
p.quarter,
|
||||
|
||||
-- Purchase market indicators
|
||||
p.total_sales,
|
||||
p.total_dollar_volume,
|
||||
p.avg_price_all_districts,
|
||||
p.median_price_all_districts,
|
||||
p.total_new_listings,
|
||||
p.total_active_listings,
|
||||
p.avg_days_on_market,
|
||||
p.avg_sale_to_list_ratio,
|
||||
p.avg_absorption_rate,
|
||||
p.avg_months_of_inventory,
|
||||
p.avg_price_yoy_pct,
|
||||
|
||||
-- Rental market indicators (annual, so join on year)
|
||||
r.avg_rent_all_zones,
|
||||
r.avg_vacancy_rate,
|
||||
r.avg_rent_change_pct,
|
||||
r.total_rental_universe,
|
||||
|
||||
-- Affordability indicator (price to rent ratio)
|
||||
case
|
||||
when r.avg_rent_all_zones > 0
|
||||
then round(p.avg_price_all_districts / (r.avg_rent_all_zones * 12), 1)
|
||||
else null
|
||||
end as price_to_annual_rent_ratio
|
||||
|
||||
from purchases_agg p
|
||||
left join rentals_agg r on p.year = r.year
|
||||
)
|
||||
|
||||
select * from final
|
||||
order by year desc, month desc
|
||||
79
dbt/models/marts/mart_toronto_purchases.sql
Normal file
79
dbt/models/marts/mart_toronto_purchases.sql
Normal file
@@ -0,0 +1,79 @@
|
||||
-- Mart: Toronto Purchase Market Analysis
|
||||
-- Final analytical table for purchase/sales data visualization
|
||||
-- Grain: One row per district per month
|
||||
|
||||
with purchases as (
|
||||
select * from {{ ref('int_purchases__monthly') }}
|
||||
),
|
||||
|
||||
-- Add year-over-year calculations
|
||||
with_yoy as (
|
||||
select
|
||||
p.*,
|
||||
|
||||
-- Previous year same month values
|
||||
lag(p.avg_price, 12) over (
|
||||
partition by p.district_code
|
||||
order by p.date_key
|
||||
) as avg_price_prev_year,
|
||||
|
||||
lag(p.sales_count, 12) over (
|
||||
partition by p.district_code
|
||||
order by p.date_key
|
||||
) as sales_count_prev_year,
|
||||
|
||||
lag(p.median_price, 12) over (
|
||||
partition by p.district_code
|
||||
order by p.date_key
|
||||
) as median_price_prev_year
|
||||
|
||||
from purchases p
|
||||
),
|
||||
|
||||
final as (
|
||||
select
|
||||
purchase_id,
|
||||
date_key,
|
||||
full_date,
|
||||
year,
|
||||
month,
|
||||
quarter,
|
||||
month_name,
|
||||
district_key,
|
||||
district_code,
|
||||
district_name,
|
||||
area_type,
|
||||
sales_count,
|
||||
dollar_volume,
|
||||
avg_price,
|
||||
median_price,
|
||||
new_listings,
|
||||
active_listings,
|
||||
days_on_market,
|
||||
sale_to_list_ratio,
|
||||
absorption_rate,
|
||||
months_of_inventory,
|
||||
|
||||
-- Year-over-year changes
|
||||
case
|
||||
when avg_price_prev_year > 0
|
||||
then round(((avg_price - avg_price_prev_year) / avg_price_prev_year) * 100, 2)
|
||||
else null
|
||||
end as avg_price_yoy_pct,
|
||||
|
||||
case
|
||||
when sales_count_prev_year > 0
|
||||
then round(((sales_count - sales_count_prev_year)::numeric / sales_count_prev_year) * 100, 2)
|
||||
else null
|
||||
end as sales_count_yoy_pct,
|
||||
|
||||
case
|
||||
when median_price_prev_year > 0
|
||||
then round(((median_price - median_price_prev_year) / median_price_prev_year) * 100, 2)
|
||||
else null
|
||||
end as median_price_yoy_pct
|
||||
|
||||
from with_yoy
|
||||
)
|
||||
|
||||
select * from final
|
||||
64
dbt/models/marts/mart_toronto_rentals.sql
Normal file
64
dbt/models/marts/mart_toronto_rentals.sql
Normal file
@@ -0,0 +1,64 @@
|
||||
-- Mart: Toronto Rental Market Analysis
|
||||
-- Final analytical table for rental market visualization
|
||||
-- Grain: One row per zone per bedroom type per survey year
|
||||
|
||||
with rentals as (
|
||||
select * from {{ ref('int_rentals__annual') }}
|
||||
),
|
||||
|
||||
-- Add year-over-year calculations
|
||||
with_yoy as (
|
||||
select
|
||||
r.*,
|
||||
|
||||
-- Previous year values
|
||||
lag(r.avg_rent, 1) over (
|
||||
partition by r.zone_code, r.bedroom_type
|
||||
order by r.year
|
||||
) as avg_rent_prev_year,
|
||||
|
||||
lag(r.vacancy_rate, 1) over (
|
||||
partition by r.zone_code, r.bedroom_type
|
||||
order by r.year
|
||||
) as vacancy_rate_prev_year
|
||||
|
||||
from rentals r
|
||||
),
|
||||
|
||||
final as (
|
||||
select
|
||||
rental_id,
|
||||
date_key,
|
||||
full_date,
|
||||
year,
|
||||
quarter,
|
||||
zone_key,
|
||||
zone_code,
|
||||
zone_name,
|
||||
bedroom_type,
|
||||
rental_universe,
|
||||
avg_rent,
|
||||
median_rent,
|
||||
vacancy_rate,
|
||||
availability_rate,
|
||||
turnover_rate,
|
||||
year_over_year_rent_change,
|
||||
reliability_code,
|
||||
vacant_units_estimate,
|
||||
|
||||
-- Calculated year-over-year (if not provided)
|
||||
coalesce(
|
||||
year_over_year_rent_change,
|
||||
case
|
||||
when avg_rent_prev_year > 0
|
||||
then round(((avg_rent - avg_rent_prev_year) / avg_rent_prev_year) * 100, 2)
|
||||
else null
|
||||
end
|
||||
) as rent_change_pct,
|
||||
|
||||
vacancy_rate - vacancy_rate_prev_year as vacancy_rate_change
|
||||
|
||||
from with_yoy
|
||||
)
|
||||
|
||||
select * from final
|
||||
61
dbt/models/staging/_sources.yml
Normal file
61
dbt/models/staging/_sources.yml
Normal file
@@ -0,0 +1,61 @@
|
||||
version: 2
|
||||
|
||||
sources:
|
||||
- name: toronto_housing
|
||||
description: "Toronto housing data loaded from TRREB and CMHC sources"
|
||||
database: portfolio
|
||||
schema: public
|
||||
tables:
|
||||
- name: fact_purchases
|
||||
description: "TRREB monthly purchase/sales statistics by district"
|
||||
columns:
|
||||
- name: id
|
||||
description: "Primary key"
|
||||
- name: date_key
|
||||
description: "Foreign key to dim_time"
|
||||
- name: district_key
|
||||
description: "Foreign key to dim_trreb_district"
|
||||
|
||||
- name: fact_rentals
|
||||
description: "CMHC annual rental survey data by zone and bedroom type"
|
||||
columns:
|
||||
- name: id
|
||||
description: "Primary key"
|
||||
- name: date_key
|
||||
description: "Foreign key to dim_time"
|
||||
- name: zone_key
|
||||
description: "Foreign key to dim_cmhc_zone"
|
||||
|
||||
- name: dim_time
|
||||
description: "Time dimension (monthly grain)"
|
||||
columns:
|
||||
- name: date_key
|
||||
description: "Primary key (YYYYMMDD format)"
|
||||
|
||||
- name: dim_trreb_district
|
||||
description: "TRREB district dimension with geometry"
|
||||
columns:
|
||||
- name: district_key
|
||||
description: "Primary key"
|
||||
- name: district_code
|
||||
description: "TRREB district code"
|
||||
|
||||
- name: dim_cmhc_zone
|
||||
description: "CMHC zone dimension with geometry"
|
||||
columns:
|
||||
- name: zone_key
|
||||
description: "Primary key"
|
||||
- name: zone_code
|
||||
description: "CMHC zone code"
|
||||
|
||||
- name: dim_neighbourhood
|
||||
description: "City of Toronto neighbourhoods (reference only)"
|
||||
columns:
|
||||
- name: neighbourhood_id
|
||||
description: "Primary key"
|
||||
|
||||
- name: dim_policy_event
|
||||
description: "Housing policy events for annotation"
|
||||
columns:
|
||||
- name: event_id
|
||||
description: "Primary key"
|
||||
73
dbt/models/staging/_staging.yml
Normal file
73
dbt/models/staging/_staging.yml
Normal file
@@ -0,0 +1,73 @@
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: stg_trreb__purchases
|
||||
description: "Staged TRREB purchase/sales data from fact_purchases"
|
||||
columns:
|
||||
- name: purchase_id
|
||||
description: "Unique identifier for purchase record"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: date_key
|
||||
description: "Date dimension key (YYYYMMDD)"
|
||||
tests:
|
||||
- not_null
|
||||
- name: district_key
|
||||
description: "TRREB district dimension key"
|
||||
tests:
|
||||
- not_null
|
||||
|
||||
- name: stg_cmhc__rentals
|
||||
description: "Staged CMHC rental market data from fact_rentals"
|
||||
columns:
|
||||
- name: rental_id
|
||||
description: "Unique identifier for rental record"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: date_key
|
||||
description: "Date dimension key (YYYYMMDD)"
|
||||
tests:
|
||||
- not_null
|
||||
- name: zone_key
|
||||
description: "CMHC zone dimension key"
|
||||
tests:
|
||||
- not_null
|
||||
|
||||
- name: stg_dimensions__time
|
||||
description: "Staged time dimension"
|
||||
columns:
|
||||
- name: date_key
|
||||
description: "Date dimension key (YYYYMMDD)"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: stg_dimensions__trreb_districts
|
||||
description: "Staged TRREB district dimension"
|
||||
columns:
|
||||
- name: district_key
|
||||
description: "District dimension key"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: district_code
|
||||
description: "TRREB district code (e.g., W01, C01)"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: stg_dimensions__cmhc_zones
|
||||
description: "Staged CMHC zone dimension"
|
||||
columns:
|
||||
- name: zone_key
|
||||
description: "Zone dimension key"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: zone_code
|
||||
description: "CMHC zone code"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
26
dbt/models/staging/stg_cmhc__rentals.sql
Normal file
26
dbt/models/staging/stg_cmhc__rentals.sql
Normal file
@@ -0,0 +1,26 @@
|
||||
-- Staged CMHC rental market survey data
|
||||
-- Source: fact_rentals table loaded from CMHC CSV exports
|
||||
-- Grain: One row per zone per bedroom type per survey year
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'fact_rentals') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
id as rental_id,
|
||||
date_key,
|
||||
zone_key,
|
||||
bedroom_type,
|
||||
universe as rental_universe,
|
||||
avg_rent,
|
||||
median_rent,
|
||||
vacancy_rate,
|
||||
availability_rate,
|
||||
turnover_rate,
|
||||
rent_change_pct as year_over_year_rent_change,
|
||||
reliability_code
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
18
dbt/models/staging/stg_dimensions__cmhc_zones.sql
Normal file
18
dbt/models/staging/stg_dimensions__cmhc_zones.sql
Normal file
@@ -0,0 +1,18 @@
|
||||
-- Staged CMHC zone dimension
|
||||
-- Source: dim_cmhc_zone table
|
||||
-- Grain: One row per zone
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'dim_cmhc_zone') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
zone_key,
|
||||
zone_code,
|
||||
zone_name,
|
||||
geometry
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
21
dbt/models/staging/stg_dimensions__time.sql
Normal file
21
dbt/models/staging/stg_dimensions__time.sql
Normal file
@@ -0,0 +1,21 @@
|
||||
-- Staged time dimension
|
||||
-- Source: dim_time table
|
||||
-- Grain: One row per month
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'dim_time') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
date_key,
|
||||
full_date,
|
||||
year,
|
||||
month,
|
||||
quarter,
|
||||
month_name,
|
||||
is_month_start
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
19
dbt/models/staging/stg_dimensions__trreb_districts.sql
Normal file
19
dbt/models/staging/stg_dimensions__trreb_districts.sql
Normal file
@@ -0,0 +1,19 @@
|
||||
-- Staged TRREB district dimension
|
||||
-- Source: dim_trreb_district table
|
||||
-- Grain: One row per district
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'dim_trreb_district') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
district_key,
|
||||
district_code,
|
||||
district_name,
|
||||
area_type,
|
||||
geometry
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
25
dbt/models/staging/stg_trreb__purchases.sql
Normal file
25
dbt/models/staging/stg_trreb__purchases.sql
Normal file
@@ -0,0 +1,25 @@
|
||||
-- Staged TRREB purchase/sales data
|
||||
-- Source: fact_purchases table loaded from TRREB Market Watch PDFs
|
||||
-- Grain: One row per district per month
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'fact_purchases') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
id as purchase_id,
|
||||
date_key,
|
||||
district_key,
|
||||
sales_count,
|
||||
dollar_volume,
|
||||
avg_price,
|
||||
median_price,
|
||||
new_listings,
|
||||
active_listings,
|
||||
avg_dom as days_on_market,
|
||||
avg_sp_lp as sale_to_list_ratio
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
5
dbt/packages.yml
Normal file
5
dbt/packages.yml
Normal file
@@ -0,0 +1,5 @@
|
||||
packages:
|
||||
- package: dbt-labs/dbt_utils
|
||||
version: ">=1.0.0"
|
||||
- package: calogica/dbt_expectations
|
||||
version: ">=0.10.0"
|
||||
21
dbt/profiles.yml.example
Normal file
21
dbt/profiles.yml.example
Normal file
@@ -0,0 +1,21 @@
|
||||
toronto_housing:
|
||||
target: dev
|
||||
outputs:
|
||||
dev:
|
||||
type: postgres
|
||||
host: localhost
|
||||
user: portfolio
|
||||
password: "{{ env_var('POSTGRES_PASSWORD') }}"
|
||||
port: 5432
|
||||
dbname: portfolio
|
||||
schema: public
|
||||
threads: 4
|
||||
prod:
|
||||
type: postgres
|
||||
host: "{{ env_var('POSTGRES_HOST') }}"
|
||||
user: "{{ env_var('POSTGRES_USER') }}"
|
||||
password: "{{ env_var('POSTGRES_PASSWORD') }}"
|
||||
port: 5432
|
||||
dbname: portfolio
|
||||
schema: public
|
||||
threads: 4
|
||||
134
docs/bio_content_v2.md
Normal file
134
docs/bio_content_v2.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# Portfolio Bio Content
|
||||
|
||||
**Version**: 2.0
|
||||
**Last Updated**: January 2026
|
||||
**Purpose**: Content source for `portfolio_app/pages/home.py`
|
||||
|
||||
---
|
||||
|
||||
## Document Context
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Parent Document** | `portfolio_project_plan_v5.md` |
|
||||
| **Role** | Bio content and social links for landing page |
|
||||
| **Consumed By** | `portfolio_app/pages/home.py` |
|
||||
|
||||
---
|
||||
|
||||
## Headline
|
||||
|
||||
**Primary**: Leo | Data Engineer & Analytics Developer
|
||||
|
||||
**Tagline**: I build data infrastructure that actually gets used.
|
||||
|
||||
---
|
||||
|
||||
## Professional Summary
|
||||
|
||||
Over the past 5 years, I've designed and evolved an enterprise analytics platform from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call abandon rates, and dashboards that executives actually open.
|
||||
|
||||
My approach: dimensional modeling (star schema), layered transformations (staging → intermediate → marts), and automation that eliminates manual work. I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
|
||||
|
||||
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states. Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the Project Management Institute.
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Category | Technologies |
|
||||
|----------|--------------|
|
||||
| **Languages** | Python, SQL |
|
||||
| **Data Processing** | Pandas, SQLAlchemy, FastAPI |
|
||||
| **Databases** | PostgreSQL, MSSQL |
|
||||
| **Visualization** | Power BI, Plotly, Dash |
|
||||
| **Patterns** | dbt, dimensional modeling, star schema |
|
||||
| **Other** | Genesys Cloud |
|
||||
|
||||
**Display Format** (for landing page):
|
||||
```
|
||||
Python (Pandas, SQLAlchemy, FastAPI) • SQL (MSSQL, PostgreSQL) • Power BI • Plotly/Dash • Genesys Cloud • dbt patterns
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Side Project
|
||||
|
||||
**Bandit Labs** — Building automation and AI tooling for small businesses.
|
||||
|
||||
*Note: Keep this brief on portfolio; link only if separate landing page exists.*
|
||||
|
||||
---
|
||||
|
||||
## Social Links
|
||||
|
||||
| Platform | URL | Icon |
|
||||
|----------|-----|------|
|
||||
| **LinkedIn** | `https://linkedin.com/in/[USERNAME]` | `lucide-react: Linkedin` |
|
||||
| **GitHub** | `https://github.com/[USERNAME]` | `lucide-react: Github` |
|
||||
|
||||
> **TODO**: Replace `[USERNAME]` placeholders with actual URLs before bio page launch.
|
||||
|
||||
---
|
||||
|
||||
## Availability Statement
|
||||
|
||||
Open to **Senior Data Analyst**, **Analytics Engineer**, and **BI Developer** opportunities in Toronto or remote.
|
||||
|
||||
---
|
||||
|
||||
## Portfolio Projects Section
|
||||
|
||||
*Dynamically populated based on deployed projects.*
|
||||
|
||||
| Project | Status | Link |
|
||||
|---------|--------|------|
|
||||
| Toronto Housing Dashboard | In Development | `/toronto` |
|
||||
| Energy Pricing Analysis | Planned | `/energy` |
|
||||
|
||||
**Display Logic**:
|
||||
- Show only projects with `status = deployed`
|
||||
- "In Development" projects can show as coming soon or be hidden (user preference)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Content Hierarchy for `home.py`
|
||||
|
||||
```
|
||||
1. Name + Tagline (hero section)
|
||||
2. Professional Summary (2-3 paragraphs)
|
||||
3. Tech Stack (horizontal chips or inline list)
|
||||
4. Portfolio Projects (cards linking to dashboards)
|
||||
5. Social Links (icon buttons)
|
||||
6. Availability statement (subtle, bottom)
|
||||
```
|
||||
|
||||
### Styling Recommendations
|
||||
|
||||
- Clean, minimal — let the projects speak
|
||||
- Dark/light mode support via dash-mantine-components theme
|
||||
- No headshot required (optional)
|
||||
- Mobile-responsive layout
|
||||
|
||||
### Content Updates
|
||||
|
||||
When updating bio content:
|
||||
1. Edit this document
|
||||
2. Update `home.py` to reflect changes
|
||||
3. Redeploy
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
| Document | Relationship |
|
||||
|----------|--------------|
|
||||
| `portfolio_project_plan_v5.md` | Parent — references this for bio content |
|
||||
| `portfolio_app/pages/home.py` | Consumer — implements this content |
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 2.0*
|
||||
*Updated: January 2026*
|
||||
@@ -1,28 +1,49 @@
|
||||
"""Dash application factory with Pages routing."""
|
||||
|
||||
import dash
|
||||
from dash import html
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc, html
|
||||
|
||||
from .components import create_sidebar
|
||||
from .config import get_settings
|
||||
|
||||
|
||||
def create_app() -> dash.Dash:
|
||||
"""Create and configure the Dash application."""
|
||||
settings = get_settings()
|
||||
|
||||
app = dash.Dash(
|
||||
__name__,
|
||||
use_pages=True,
|
||||
suppress_callback_exceptions=True,
|
||||
title="Analytics Portfolio",
|
||||
external_stylesheets=dmc.styles.ALL,
|
||||
)
|
||||
|
||||
app.layout = html.Div(
|
||||
[
|
||||
dash.page_container,
|
||||
]
|
||||
app.layout = dmc.MantineProvider(
|
||||
id="mantine-provider",
|
||||
children=[
|
||||
dcc.Location(id="url", refresh=False),
|
||||
dcc.Store(id="theme-store", storage_type="local", data="dark"),
|
||||
dcc.Store(id="theme-init-dummy"), # Dummy store for theme init callback
|
||||
html.Div(
|
||||
[
|
||||
create_sidebar(),
|
||||
html.Div(
|
||||
dash.page_container,
|
||||
className="page-content-wrapper",
|
||||
),
|
||||
],
|
||||
),
|
||||
],
|
||||
theme={
|
||||
"primaryColor": "blue",
|
||||
"fontFamily": "'Inter', sans-serif",
|
||||
},
|
||||
defaultColorScheme="dark",
|
||||
)
|
||||
|
||||
# Import callbacks to register them
|
||||
from . import callbacks # noqa: F401
|
||||
|
||||
return app
|
||||
|
||||
|
||||
|
||||
139
portfolio_app/assets/sidebar.css
Normal file
139
portfolio_app/assets/sidebar.css
Normal file
@@ -0,0 +1,139 @@
|
||||
/* Floating sidebar navigation styles */
|
||||
|
||||
/* Sidebar container */
|
||||
.floating-sidebar {
|
||||
position: fixed;
|
||||
left: 16px;
|
||||
top: 50%;
|
||||
transform: translateY(-50%);
|
||||
width: 60px;
|
||||
padding: 16px 8px;
|
||||
border-radius: 32px;
|
||||
z-index: 1000;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);
|
||||
transition: background-color 0.2s ease;
|
||||
}
|
||||
|
||||
/* Page content offset to prevent sidebar overlap */
|
||||
.page-content-wrapper {
|
||||
margin-left: 92px; /* sidebar width (60px) + left margin (16px) + gap (16px) */
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
/* Dark theme (default) */
|
||||
[data-mantine-color-scheme="dark"] .floating-sidebar {
|
||||
background-color: #141414;
|
||||
}
|
||||
|
||||
[data-mantine-color-scheme="dark"] body {
|
||||
background-color: #000000;
|
||||
}
|
||||
|
||||
/* Light theme */
|
||||
[data-mantine-color-scheme="light"] .floating-sidebar {
|
||||
background-color: #f0f0f0;
|
||||
}
|
||||
|
||||
[data-mantine-color-scheme="light"] body {
|
||||
background-color: #ffffff;
|
||||
}
|
||||
|
||||
/* Brand initials styling */
|
||||
.sidebar-brand {
|
||||
width: 40px;
|
||||
height: 40px;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
border-radius: 50%;
|
||||
background-color: var(--mantine-color-blue-filled);
|
||||
margin-bottom: 4px;
|
||||
transition: transform 0.2s ease;
|
||||
}
|
||||
|
||||
.sidebar-brand:hover {
|
||||
transform: scale(1.05);
|
||||
}
|
||||
|
||||
.sidebar-brand-link {
|
||||
font-weight: 700;
|
||||
font-size: 16px;
|
||||
color: white;
|
||||
text-decoration: none;
|
||||
line-height: 1;
|
||||
}
|
||||
|
||||
/* Divider between sections */
|
||||
.sidebar-divider {
|
||||
width: 32px;
|
||||
height: 1px;
|
||||
background-color: var(--mantine-color-dimmed);
|
||||
margin: 4px 0;
|
||||
opacity: 0.3;
|
||||
}
|
||||
|
||||
/* Active nav icon indicator */
|
||||
.nav-icon-active {
|
||||
background-color: var(--mantine-color-blue-filled) !important;
|
||||
}
|
||||
|
||||
/* Navigation icon hover effects */
|
||||
.floating-sidebar .mantine-ActionIcon-root {
|
||||
transition: transform 0.15s ease, background-color 0.15s ease;
|
||||
}
|
||||
|
||||
.floating-sidebar .mantine-ActionIcon-root:hover {
|
||||
transform: scale(1.1);
|
||||
}
|
||||
|
||||
/* Ensure links don't have underlines */
|
||||
.floating-sidebar a {
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
/* Theme toggle specific styling */
|
||||
#theme-toggle {
|
||||
transition: transform 0.3s ease;
|
||||
}
|
||||
|
||||
#theme-toggle:hover {
|
||||
transform: rotate(15deg) scale(1.1);
|
||||
}
|
||||
|
||||
/* Responsive adjustments for smaller screens */
|
||||
@media (max-width: 768px) {
|
||||
.floating-sidebar {
|
||||
left: 8px;
|
||||
width: 50px;
|
||||
padding: 12px 6px;
|
||||
border-radius: 25px;
|
||||
}
|
||||
|
||||
.page-content-wrapper {
|
||||
margin-left: 70px;
|
||||
}
|
||||
|
||||
.sidebar-brand {
|
||||
width: 34px;
|
||||
height: 34px;
|
||||
}
|
||||
|
||||
.sidebar-brand-link {
|
||||
font-size: 14px;
|
||||
}
|
||||
}
|
||||
|
||||
/* Very small screens - hide sidebar, show minimal navigation */
|
||||
@media (max-width: 480px) {
|
||||
.floating-sidebar {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.page-content-wrapper {
|
||||
margin-left: 0;
|
||||
}
|
||||
}
|
||||
5
portfolio_app/callbacks/__init__.py
Normal file
5
portfolio_app/callbacks/__init__.py
Normal file
@@ -0,0 +1,5 @@
|
||||
"""Application-level callbacks for the portfolio app."""
|
||||
|
||||
from . import theme
|
||||
|
||||
__all__ = ["theme"]
|
||||
38
portfolio_app/callbacks/theme.py
Normal file
38
portfolio_app/callbacks/theme.py
Normal file
@@ -0,0 +1,38 @@
|
||||
"""Theme toggle callbacks using clientside JavaScript."""
|
||||
|
||||
from dash import Input, Output, State, clientside_callback
|
||||
|
||||
# Toggle theme on button click
|
||||
# Stores new theme value and updates the DOM attribute
|
||||
clientside_callback(
|
||||
"""
|
||||
function(n_clicks, currentTheme) {
|
||||
if (n_clicks === undefined || n_clicks === null) {
|
||||
return window.dash_clientside.no_update;
|
||||
}
|
||||
const newTheme = currentTheme === 'dark' ? 'light' : 'dark';
|
||||
document.documentElement.setAttribute('data-mantine-color-scheme', newTheme);
|
||||
return newTheme;
|
||||
}
|
||||
""",
|
||||
Output("theme-store", "data"),
|
||||
Input("theme-toggle", "n_clicks"),
|
||||
State("theme-store", "data"),
|
||||
prevent_initial_call=True,
|
||||
)
|
||||
|
||||
# Initialize theme from localStorage on page load
|
||||
# Uses a dummy output since we only need the side effect of setting the DOM attribute
|
||||
clientside_callback(
|
||||
"""
|
||||
function(theme) {
|
||||
if (theme) {
|
||||
document.documentElement.setAttribute('data-mantine-color-scheme', theme);
|
||||
}
|
||||
return theme;
|
||||
}
|
||||
""",
|
||||
Output("theme-init-dummy", "data"),
|
||||
Input("theme-store", "data"),
|
||||
prevent_initial_call=False,
|
||||
)
|
||||
16
portfolio_app/components/__init__.py
Normal file
16
portfolio_app/components/__init__.py
Normal file
@@ -0,0 +1,16 @@
|
||||
"""Shared Dash components for the portfolio application."""
|
||||
|
||||
from .map_controls import create_map_controls, create_metric_selector
|
||||
from .metric_card import MetricCard, create_metric_cards_row
|
||||
from .sidebar import create_sidebar
|
||||
from .time_slider import create_time_slider, create_year_selector
|
||||
|
||||
__all__ = [
|
||||
"create_map_controls",
|
||||
"create_metric_selector",
|
||||
"create_sidebar",
|
||||
"create_time_slider",
|
||||
"create_year_selector",
|
||||
"MetricCard",
|
||||
"create_metric_cards_row",
|
||||
]
|
||||
79
portfolio_app/components/map_controls.py
Normal file
79
portfolio_app/components/map_controls.py
Normal file
@@ -0,0 +1,79 @@
|
||||
"""Map control components for choropleth visualizations."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import dash_mantine_components as dmc
|
||||
from dash import html
|
||||
|
||||
|
||||
def create_metric_selector(
|
||||
id_prefix: str,
|
||||
options: list[dict[str, str]],
|
||||
default_value: str | None = None,
|
||||
label: str = "Select Metric",
|
||||
) -> dmc.Select:
|
||||
"""Create a metric selector dropdown.
|
||||
|
||||
Args:
|
||||
id_prefix: Prefix for component IDs.
|
||||
options: List of options with 'label' and 'value' keys.
|
||||
default_value: Initial selected value.
|
||||
label: Label text for the selector.
|
||||
|
||||
Returns:
|
||||
Mantine Select component.
|
||||
"""
|
||||
return dmc.Select(
|
||||
id=f"{id_prefix}-metric-selector",
|
||||
label=label,
|
||||
data=options,
|
||||
value=default_value or (options[0]["value"] if options else None),
|
||||
style={"width": "200px"},
|
||||
)
|
||||
|
||||
|
||||
def create_map_controls(
|
||||
id_prefix: str,
|
||||
metric_options: list[dict[str, str]],
|
||||
default_metric: str | None = None,
|
||||
show_layer_toggle: bool = True,
|
||||
) -> dmc.Paper:
|
||||
"""Create a control panel for map visualizations.
|
||||
|
||||
Args:
|
||||
id_prefix: Prefix for component IDs.
|
||||
metric_options: Options for metric selector.
|
||||
default_metric: Default selected metric.
|
||||
show_layer_toggle: Whether to show layer visibility toggle.
|
||||
|
||||
Returns:
|
||||
Mantine Paper component containing controls.
|
||||
"""
|
||||
controls: list[Any] = [
|
||||
create_metric_selector(
|
||||
id_prefix=id_prefix,
|
||||
options=metric_options,
|
||||
default_value=default_metric,
|
||||
label="Display Metric",
|
||||
),
|
||||
]
|
||||
|
||||
if show_layer_toggle:
|
||||
controls.append(
|
||||
dmc.Switch(
|
||||
id=f"{id_prefix}-layer-toggle",
|
||||
label="Show Boundaries",
|
||||
checked=True,
|
||||
style={"marginTop": "10px"},
|
||||
)
|
||||
)
|
||||
|
||||
return dmc.Paper(
|
||||
children=[
|
||||
dmc.Text("Map Controls", fw=500, size="sm", mb="xs"),
|
||||
html.Div(controls),
|
||||
],
|
||||
p="md",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
)
|
||||
115
portfolio_app/components/metric_card.py
Normal file
115
portfolio_app/components/metric_card.py
Normal file
@@ -0,0 +1,115 @@
|
||||
"""Metric card components for KPI display."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc
|
||||
|
||||
from portfolio_app.figures.summary_cards import create_metric_card_figure
|
||||
|
||||
|
||||
class MetricCard:
|
||||
"""A reusable metric card component."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
id_prefix: str,
|
||||
title: str,
|
||||
value: float | int | str = 0,
|
||||
delta: float | None = None,
|
||||
prefix: str = "",
|
||||
suffix: str = "",
|
||||
format_spec: str = ",.0f",
|
||||
positive_is_good: bool = True,
|
||||
):
|
||||
"""Initialize a metric card.
|
||||
|
||||
Args:
|
||||
id_prefix: Prefix for component IDs.
|
||||
title: Card title.
|
||||
value: Main metric value.
|
||||
delta: Change value for delta indicator.
|
||||
prefix: Value prefix (e.g., '$').
|
||||
suffix: Value suffix.
|
||||
format_spec: Python format specification.
|
||||
positive_is_good: Whether positive delta is good.
|
||||
"""
|
||||
self.id_prefix = id_prefix
|
||||
self.title = title
|
||||
self.value = value
|
||||
self.delta = delta
|
||||
self.prefix = prefix
|
||||
self.suffix = suffix
|
||||
self.format_spec = format_spec
|
||||
self.positive_is_good = positive_is_good
|
||||
|
||||
def render(self) -> dmc.Paper:
|
||||
"""Render the metric card component.
|
||||
|
||||
Returns:
|
||||
Mantine Paper component with embedded graph.
|
||||
"""
|
||||
fig = create_metric_card_figure(
|
||||
value=self.value,
|
||||
title=self.title,
|
||||
delta=self.delta,
|
||||
prefix=self.prefix,
|
||||
suffix=self.suffix,
|
||||
format_spec=self.format_spec,
|
||||
positive_is_good=self.positive_is_good,
|
||||
)
|
||||
|
||||
return dmc.Paper(
|
||||
children=[
|
||||
dcc.Graph(
|
||||
id=f"{self.id_prefix}-graph",
|
||||
figure=fig,
|
||||
config={"displayModeBar": False},
|
||||
style={"height": "120px"},
|
||||
)
|
||||
],
|
||||
p="xs",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_metric_cards_row(
|
||||
metrics: list[dict[str, Any]],
|
||||
id_prefix: str = "metric",
|
||||
) -> dmc.SimpleGrid:
|
||||
"""Create a row of metric cards.
|
||||
|
||||
Args:
|
||||
metrics: List of metric configurations with keys:
|
||||
- title: Card title
|
||||
- value: Metric value
|
||||
- delta: Optional change value
|
||||
- prefix: Optional value prefix
|
||||
- suffix: Optional value suffix
|
||||
- format_spec: Optional format specification
|
||||
- positive_is_good: Optional delta color logic
|
||||
id_prefix: Prefix for component IDs.
|
||||
|
||||
Returns:
|
||||
Mantine SimpleGrid component with metric cards.
|
||||
"""
|
||||
cards = []
|
||||
for i, metric in enumerate(metrics):
|
||||
card = MetricCard(
|
||||
id_prefix=f"{id_prefix}-{i}",
|
||||
title=metric.get("title", ""),
|
||||
value=metric.get("value", 0),
|
||||
delta=metric.get("delta"),
|
||||
prefix=metric.get("prefix", ""),
|
||||
suffix=metric.get("suffix", ""),
|
||||
format_spec=metric.get("format_spec", ",.0f"),
|
||||
positive_is_good=metric.get("positive_is_good", True),
|
||||
)
|
||||
cards.append(card.render())
|
||||
|
||||
return dmc.SimpleGrid(
|
||||
cols={"base": 1, "sm": 2, "md": len(cards)},
|
||||
spacing="md",
|
||||
children=cards,
|
||||
)
|
||||
179
portfolio_app/components/sidebar.py
Normal file
179
portfolio_app/components/sidebar.py
Normal file
@@ -0,0 +1,179 @@
|
||||
"""Floating sidebar navigation component."""
|
||||
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc, html
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
# Navigation items configuration
|
||||
NAV_ITEMS = [
|
||||
{"path": "/", "icon": "tabler:home", "label": "Home"},
|
||||
{"path": "/toronto", "icon": "tabler:map-2", "label": "Toronto Housing"},
|
||||
]
|
||||
|
||||
# External links configuration
|
||||
EXTERNAL_LINKS = [
|
||||
{
|
||||
"url": "https://github.com/leomiranda",
|
||||
"icon": "tabler:brand-github",
|
||||
"label": "GitHub",
|
||||
},
|
||||
{
|
||||
"url": "https://linkedin.com/in/leobmiranda",
|
||||
"icon": "tabler:brand-linkedin",
|
||||
"label": "LinkedIn",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def create_brand_logo() -> html.Div:
|
||||
"""Create the brand initials logo."""
|
||||
return html.Div(
|
||||
dcc.Link(
|
||||
"LM",
|
||||
href="/",
|
||||
className="sidebar-brand-link",
|
||||
),
|
||||
className="sidebar-brand",
|
||||
)
|
||||
|
||||
|
||||
def create_nav_icon(
|
||||
icon: str,
|
||||
label: str,
|
||||
path: str,
|
||||
current_path: str,
|
||||
) -> dmc.Tooltip:
|
||||
"""Create a navigation icon with tooltip.
|
||||
|
||||
Args:
|
||||
icon: Iconify icon string.
|
||||
label: Tooltip label.
|
||||
path: Navigation path.
|
||||
current_path: Current page path for active state.
|
||||
|
||||
Returns:
|
||||
Tooltip-wrapped navigation icon.
|
||||
"""
|
||||
is_active = current_path == path or (path != "/" and current_path.startswith(path))
|
||||
|
||||
return dmc.Tooltip(
|
||||
dcc.Link(
|
||||
dmc.ActionIcon(
|
||||
DashIconify(icon=icon, width=20),
|
||||
variant="subtle" if not is_active else "filled",
|
||||
size="lg",
|
||||
radius="xl",
|
||||
color="blue" if is_active else "gray",
|
||||
className="nav-icon-active" if is_active else "",
|
||||
),
|
||||
href=path,
|
||||
),
|
||||
label=label,
|
||||
position="right",
|
||||
withArrow=True,
|
||||
)
|
||||
|
||||
|
||||
def create_theme_toggle(current_theme: str = "dark") -> dmc.Tooltip:
|
||||
"""Create the theme toggle button.
|
||||
|
||||
Args:
|
||||
current_theme: Current theme ('dark' or 'light').
|
||||
|
||||
Returns:
|
||||
Tooltip-wrapped theme toggle icon.
|
||||
"""
|
||||
icon = "tabler:sun" if current_theme == "dark" else "tabler:moon"
|
||||
label = "Switch to light mode" if current_theme == "dark" else "Switch to dark mode"
|
||||
|
||||
return dmc.Tooltip(
|
||||
dmc.ActionIcon(
|
||||
DashIconify(icon=icon, width=20, id="theme-toggle-icon"),
|
||||
id="theme-toggle",
|
||||
variant="subtle",
|
||||
size="lg",
|
||||
radius="xl",
|
||||
color="gray",
|
||||
),
|
||||
label=label,
|
||||
position="right",
|
||||
withArrow=True,
|
||||
)
|
||||
|
||||
|
||||
def create_external_link(url: str, icon: str, label: str) -> dmc.Tooltip:
|
||||
"""Create an external link icon with tooltip.
|
||||
|
||||
Args:
|
||||
url: External URL.
|
||||
icon: Iconify icon string.
|
||||
label: Tooltip label.
|
||||
|
||||
Returns:
|
||||
Tooltip-wrapped external link icon.
|
||||
"""
|
||||
return dmc.Tooltip(
|
||||
dmc.Anchor(
|
||||
dmc.ActionIcon(
|
||||
DashIconify(icon=icon, width=20),
|
||||
variant="subtle",
|
||||
size="lg",
|
||||
radius="xl",
|
||||
color="gray",
|
||||
),
|
||||
href=url,
|
||||
target="_blank",
|
||||
),
|
||||
label=label,
|
||||
position="right",
|
||||
withArrow=True,
|
||||
)
|
||||
|
||||
|
||||
def create_sidebar_divider() -> html.Div:
|
||||
"""Create a horizontal divider for the sidebar."""
|
||||
return html.Div(className="sidebar-divider")
|
||||
|
||||
|
||||
def create_sidebar(current_path: str = "/", current_theme: str = "dark") -> html.Div:
|
||||
"""Create the floating sidebar navigation.
|
||||
|
||||
Args:
|
||||
current_path: Current page path for active state highlighting.
|
||||
current_theme: Current theme for toggle icon state.
|
||||
|
||||
Returns:
|
||||
Complete sidebar component.
|
||||
"""
|
||||
return html.Div(
|
||||
[
|
||||
# Brand logo
|
||||
create_brand_logo(),
|
||||
create_sidebar_divider(),
|
||||
# Navigation icons
|
||||
*[
|
||||
create_nav_icon(
|
||||
icon=item["icon"],
|
||||
label=item["label"],
|
||||
path=item["path"],
|
||||
current_path=current_path,
|
||||
)
|
||||
for item in NAV_ITEMS
|
||||
],
|
||||
create_sidebar_divider(),
|
||||
# Theme toggle
|
||||
create_theme_toggle(current_theme),
|
||||
create_sidebar_divider(),
|
||||
# External links
|
||||
*[
|
||||
create_external_link(
|
||||
url=link["url"],
|
||||
icon=link["icon"],
|
||||
label=link["label"],
|
||||
)
|
||||
for link in EXTERNAL_LINKS
|
||||
],
|
||||
],
|
||||
className="floating-sidebar",
|
||||
id="floating-sidebar",
|
||||
)
|
||||
135
portfolio_app/components/time_slider.py
Normal file
135
portfolio_app/components/time_slider.py
Normal file
@@ -0,0 +1,135 @@
|
||||
"""Time selection components for temporal data filtering."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
import dash_mantine_components as dmc
|
||||
|
||||
|
||||
def create_year_selector(
|
||||
id_prefix: str,
|
||||
min_year: int = 2020,
|
||||
max_year: int | None = None,
|
||||
default_year: int | None = None,
|
||||
label: str = "Select Year",
|
||||
) -> dmc.Select:
|
||||
"""Create a year selector dropdown.
|
||||
|
||||
Args:
|
||||
id_prefix: Prefix for component IDs.
|
||||
min_year: Minimum year option.
|
||||
max_year: Maximum year option (defaults to current year).
|
||||
default_year: Initial selected year.
|
||||
label: Label text for the selector.
|
||||
|
||||
Returns:
|
||||
Mantine Select component.
|
||||
"""
|
||||
if max_year is None:
|
||||
max_year = date.today().year
|
||||
|
||||
if default_year is None:
|
||||
default_year = max_year
|
||||
|
||||
years = list(range(max_year, min_year - 1, -1))
|
||||
options = [{"label": str(year), "value": str(year)} for year in years]
|
||||
|
||||
return dmc.Select(
|
||||
id=f"{id_prefix}-year-selector",
|
||||
label=label,
|
||||
data=options,
|
||||
value=str(default_year),
|
||||
style={"width": "120px"},
|
||||
)
|
||||
|
||||
|
||||
def create_time_slider(
|
||||
id_prefix: str,
|
||||
min_year: int = 2020,
|
||||
max_year: int | None = None,
|
||||
default_range: tuple[int, int] | None = None,
|
||||
label: str = "Time Range",
|
||||
) -> dmc.Paper:
|
||||
"""Create a time range slider component.
|
||||
|
||||
Args:
|
||||
id_prefix: Prefix for component IDs.
|
||||
min_year: Minimum year for the slider.
|
||||
max_year: Maximum year for the slider.
|
||||
default_range: Default (start, end) year range.
|
||||
label: Label text for the slider.
|
||||
|
||||
Returns:
|
||||
Mantine Paper component containing the slider.
|
||||
"""
|
||||
if max_year is None:
|
||||
max_year = date.today().year
|
||||
|
||||
if default_range is None:
|
||||
default_range = (min_year, max_year)
|
||||
|
||||
# Create marks for every year
|
||||
marks = [
|
||||
{"value": year, "label": str(year)} for year in range(min_year, max_year + 1)
|
||||
]
|
||||
|
||||
return dmc.Paper(
|
||||
children=[
|
||||
dmc.Text(label, fw=500, size="sm", mb="xs"),
|
||||
dmc.RangeSlider(
|
||||
id=f"{id_prefix}-time-slider",
|
||||
min=min_year,
|
||||
max=max_year,
|
||||
value=list(default_range),
|
||||
marks=marks,
|
||||
step=1,
|
||||
minRange=1,
|
||||
style={"marginTop": "20px", "marginBottom": "10px"},
|
||||
),
|
||||
],
|
||||
p="md",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_month_selector(
|
||||
id_prefix: str,
|
||||
default_month: int | None = None,
|
||||
label: str = "Select Month",
|
||||
) -> dmc.Select:
|
||||
"""Create a month selector dropdown.
|
||||
|
||||
Args:
|
||||
id_prefix: Prefix for component IDs.
|
||||
default_month: Initial selected month (1-12).
|
||||
label: Label text for the selector.
|
||||
|
||||
Returns:
|
||||
Mantine Select component.
|
||||
"""
|
||||
months = [
|
||||
"January",
|
||||
"February",
|
||||
"March",
|
||||
"April",
|
||||
"May",
|
||||
"June",
|
||||
"July",
|
||||
"August",
|
||||
"September",
|
||||
"October",
|
||||
"November",
|
||||
"December",
|
||||
]
|
||||
options = [{"label": month, "value": str(i + 1)} for i, month in enumerate(months)]
|
||||
|
||||
if default_month is None:
|
||||
default_month = date.today().month
|
||||
|
||||
return dmc.Select(
|
||||
id=f"{id_prefix}-month-selector",
|
||||
label=label,
|
||||
data=options,
|
||||
value=str(default_month),
|
||||
style={"width": "140px"},
|
||||
)
|
||||
@@ -5,7 +5,7 @@ from functools import lru_cache
|
||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
class Settings(BaseSettings): # type: ignore[misc]
|
||||
"""Application settings loaded from environment variables."""
|
||||
|
||||
model_config = SettingsConfigDict(
|
||||
|
||||
31
portfolio_app/figures/__init__.py
Normal file
31
portfolio_app/figures/__init__.py
Normal file
@@ -0,0 +1,31 @@
|
||||
"""Plotly figure factories for data visualization."""
|
||||
|
||||
from .choropleth import (
|
||||
create_choropleth_figure,
|
||||
create_district_map,
|
||||
create_zone_map,
|
||||
)
|
||||
from .summary_cards import create_metric_card_figure, create_summary_metrics
|
||||
from .time_series import (
|
||||
add_policy_markers,
|
||||
create_market_comparison_chart,
|
||||
create_price_time_series,
|
||||
create_time_series_with_events,
|
||||
create_volume_time_series,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Choropleth
|
||||
"create_choropleth_figure",
|
||||
"create_district_map",
|
||||
"create_zone_map",
|
||||
# Time series
|
||||
"create_price_time_series",
|
||||
"create_volume_time_series",
|
||||
"create_market_comparison_chart",
|
||||
"create_time_series_with_events",
|
||||
"add_policy_markers",
|
||||
# Summary
|
||||
"create_metric_card_figure",
|
||||
"create_summary_metrics",
|
||||
]
|
||||
171
portfolio_app/figures/choropleth.py
Normal file
171
portfolio_app/figures/choropleth.py
Normal file
@@ -0,0 +1,171 @@
|
||||
"""Choropleth map figure factory for Toronto housing data."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import plotly.express as px
|
||||
import plotly.graph_objects as go
|
||||
|
||||
|
||||
def create_choropleth_figure(
|
||||
geojson: dict[str, Any] | None,
|
||||
data: list[dict[str, Any]],
|
||||
location_key: str,
|
||||
color_column: str,
|
||||
hover_data: list[str] | None = None,
|
||||
color_scale: str = "Blues",
|
||||
title: str | None = None,
|
||||
map_style: str = "carto-positron",
|
||||
center: dict[str, float] | None = None,
|
||||
zoom: float = 9.5,
|
||||
) -> go.Figure:
|
||||
"""Create a choropleth map figure.
|
||||
|
||||
Args:
|
||||
geojson: GeoJSON FeatureCollection for boundaries.
|
||||
data: List of data records with location keys and values.
|
||||
location_key: Column name for location identifier.
|
||||
color_column: Column name for color values.
|
||||
hover_data: Additional columns to show on hover.
|
||||
color_scale: Plotly color scale name.
|
||||
title: Optional chart title.
|
||||
map_style: Mapbox style (carto-positron, open-street-map, etc.).
|
||||
center: Map center coordinates {"lat": float, "lon": float}.
|
||||
zoom: Initial zoom level.
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
# Default center to Toronto
|
||||
if center is None:
|
||||
center = {"lat": 43.7, "lon": -79.4}
|
||||
|
||||
# Use dark-mode friendly map style by default
|
||||
if map_style == "carto-positron":
|
||||
map_style = "carto-darkmatter"
|
||||
|
||||
# If no geojson provided, create a placeholder map
|
||||
if geojson is None or not data:
|
||||
fig = go.Figure(go.Scattermapbox())
|
||||
fig.update_layout(
|
||||
mapbox={
|
||||
"style": map_style,
|
||||
"center": center,
|
||||
"zoom": zoom,
|
||||
},
|
||||
margin={"l": 0, "r": 0, "t": 40, "b": 0},
|
||||
title=title or "Toronto Housing Map",
|
||||
height=500,
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
)
|
||||
fig.add_annotation(
|
||||
text="No geometry data available. Complete QGIS digitization to enable map.",
|
||||
xref="paper",
|
||||
yref="paper",
|
||||
x=0.5,
|
||||
y=0.5,
|
||||
showarrow=False,
|
||||
font={"size": 14, "color": "#888888"},
|
||||
)
|
||||
return fig
|
||||
|
||||
# Create choropleth with data
|
||||
import pandas as pd
|
||||
|
||||
df = pd.DataFrame(data)
|
||||
|
||||
# Use dark-mode friendly map style
|
||||
effective_map_style = (
|
||||
"carto-darkmatter" if map_style == "carto-positron" else map_style
|
||||
)
|
||||
|
||||
fig = px.choropleth_mapbox(
|
||||
df,
|
||||
geojson=geojson,
|
||||
locations=location_key,
|
||||
featureidkey=f"properties.{location_key}",
|
||||
color=color_column,
|
||||
color_continuous_scale=color_scale,
|
||||
hover_data=hover_data,
|
||||
mapbox_style=effective_map_style,
|
||||
center=center,
|
||||
zoom=zoom,
|
||||
opacity=0.7,
|
||||
)
|
||||
|
||||
fig.update_layout(
|
||||
margin={"l": 0, "r": 0, "t": 40, "b": 0},
|
||||
title=title,
|
||||
height=500,
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
coloraxis_colorbar={
|
||||
"title": {
|
||||
"text": color_column.replace("_", " ").title(),
|
||||
"font": {"color": "#c9c9c9"},
|
||||
},
|
||||
"thickness": 15,
|
||||
"len": 0.7,
|
||||
"tickfont": {"color": "#c9c9c9"},
|
||||
},
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def create_district_map(
|
||||
districts_geojson: dict[str, Any] | None,
|
||||
purchase_data: list[dict[str, Any]],
|
||||
metric: str = "avg_price",
|
||||
) -> go.Figure:
|
||||
"""Create choropleth map for TRREB districts.
|
||||
|
||||
Args:
|
||||
districts_geojson: GeoJSON for TRREB district boundaries.
|
||||
purchase_data: Purchase statistics by district.
|
||||
metric: Metric to display (avg_price, sales_count, etc.).
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
hover_columns = ["district_name", "sales_count", "avg_price", "median_price"]
|
||||
|
||||
return create_choropleth_figure(
|
||||
geojson=districts_geojson,
|
||||
data=purchase_data,
|
||||
location_key="district_code",
|
||||
color_column=metric,
|
||||
hover_data=[c for c in hover_columns if c != metric],
|
||||
color_scale="Blues" if "price" in metric else "Greens",
|
||||
title="Toronto Purchase Market by District",
|
||||
)
|
||||
|
||||
|
||||
def create_zone_map(
|
||||
zones_geojson: dict[str, Any] | None,
|
||||
rental_data: list[dict[str, Any]],
|
||||
metric: str = "avg_rent",
|
||||
) -> go.Figure:
|
||||
"""Create choropleth map for CMHC zones.
|
||||
|
||||
Args:
|
||||
zones_geojson: GeoJSON for CMHC zone boundaries.
|
||||
rental_data: Rental statistics by zone.
|
||||
metric: Metric to display (avg_rent, vacancy_rate, etc.).
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
hover_columns = ["zone_name", "avg_rent", "vacancy_rate", "rental_universe"]
|
||||
|
||||
return create_choropleth_figure(
|
||||
geojson=zones_geojson,
|
||||
data=rental_data,
|
||||
location_key="zone_code",
|
||||
color_column=metric,
|
||||
hover_data=[c for c in hover_columns if c != metric],
|
||||
color_scale="Oranges" if "rent" in metric else "Purples",
|
||||
title="Toronto Rental Market by Zone",
|
||||
)
|
||||
107
portfolio_app/figures/summary_cards.py
Normal file
107
portfolio_app/figures/summary_cards.py
Normal file
@@ -0,0 +1,107 @@
|
||||
"""Summary card figure factories for KPI display."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import plotly.graph_objects as go
|
||||
|
||||
|
||||
def create_metric_card_figure(
|
||||
value: float | int | str,
|
||||
title: str,
|
||||
delta: float | None = None,
|
||||
delta_suffix: str = "%",
|
||||
prefix: str = "",
|
||||
suffix: str = "",
|
||||
format_spec: str = ",.0f",
|
||||
positive_is_good: bool = True,
|
||||
) -> go.Figure:
|
||||
"""Create a KPI indicator figure.
|
||||
|
||||
Args:
|
||||
value: The main metric value.
|
||||
title: Card title.
|
||||
delta: Optional change value (for delta indicator).
|
||||
delta_suffix: Suffix for delta value (e.g., '%').
|
||||
prefix: Prefix for main value (e.g., '$').
|
||||
suffix: Suffix for main value.
|
||||
format_spec: Python format specification for the value.
|
||||
positive_is_good: Whether positive delta is good (green) or bad (red).
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
# Determine numeric value for indicator
|
||||
if isinstance(value, int | float):
|
||||
number_value: float | None = float(value)
|
||||
else:
|
||||
number_value = None
|
||||
|
||||
fig = go.Figure()
|
||||
|
||||
# Add indicator trace
|
||||
indicator_config: dict[str, Any] = {
|
||||
"mode": "number",
|
||||
"value": number_value if number_value is not None else 0,
|
||||
"title": {"text": title, "font": {"size": 14}},
|
||||
"number": {
|
||||
"font": {"size": 32},
|
||||
"prefix": prefix,
|
||||
"suffix": suffix,
|
||||
"valueformat": format_spec,
|
||||
},
|
||||
}
|
||||
|
||||
# Add delta if provided
|
||||
if delta is not None:
|
||||
indicator_config["mode"] = "number+delta"
|
||||
indicator_config["delta"] = {
|
||||
"reference": number_value - delta if number_value else 0,
|
||||
"relative": False,
|
||||
"valueformat": ".1f",
|
||||
"suffix": delta_suffix,
|
||||
"increasing": {"color": "green" if positive_is_good else "red"},
|
||||
"decreasing": {"color": "red" if positive_is_good else "green"},
|
||||
}
|
||||
|
||||
fig.add_trace(go.Indicator(**indicator_config))
|
||||
|
||||
fig.update_layout(
|
||||
height=120,
|
||||
margin={"l": 20, "r": 20, "t": 40, "b": 20},
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font={"family": "Inter, sans-serif", "color": "#c9c9c9"},
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def create_summary_metrics(
|
||||
metrics: dict[str, dict[str, Any]],
|
||||
) -> list[go.Figure]:
|
||||
"""Create multiple metric card figures.
|
||||
|
||||
Args:
|
||||
metrics: Dictionary of metric configurations.
|
||||
Key: metric name
|
||||
Value: dict with 'value', 'title', 'delta' (optional), etc.
|
||||
|
||||
Returns:
|
||||
List of Plotly Figure objects.
|
||||
"""
|
||||
figures = []
|
||||
|
||||
for metric_config in metrics.values():
|
||||
fig = create_metric_card_figure(
|
||||
value=metric_config.get("value", 0),
|
||||
title=metric_config.get("title", ""),
|
||||
delta=metric_config.get("delta"),
|
||||
delta_suffix=metric_config.get("delta_suffix", "%"),
|
||||
prefix=metric_config.get("prefix", ""),
|
||||
suffix=metric_config.get("suffix", ""),
|
||||
format_spec=metric_config.get("format_spec", ",.0f"),
|
||||
positive_is_good=metric_config.get("positive_is_good", True),
|
||||
)
|
||||
figures.append(fig)
|
||||
|
||||
return figures
|
||||
386
portfolio_app/figures/time_series.py
Normal file
386
portfolio_app/figures/time_series.py
Normal file
@@ -0,0 +1,386 @@
|
||||
"""Time series figure factories for Toronto housing data."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import plotly.express as px
|
||||
import plotly.graph_objects as go
|
||||
|
||||
|
||||
def create_price_time_series(
|
||||
data: list[dict[str, Any]],
|
||||
date_column: str = "full_date",
|
||||
price_column: str = "avg_price",
|
||||
group_column: str | None = None,
|
||||
title: str = "Average Price Over Time",
|
||||
show_yoy: bool = True,
|
||||
) -> go.Figure:
|
||||
"""Create a time series chart for price data.
|
||||
|
||||
Args:
|
||||
data: List of records with date and price columns.
|
||||
date_column: Column name for dates.
|
||||
price_column: Column name for price values.
|
||||
group_column: Optional column for grouping (e.g., district_code).
|
||||
title: Chart title.
|
||||
show_yoy: Whether to show year-over-year change annotations.
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
import pandas as pd
|
||||
|
||||
if not data:
|
||||
fig = go.Figure()
|
||||
fig.add_annotation(
|
||||
text="No data available",
|
||||
xref="paper",
|
||||
yref="paper",
|
||||
x=0.5,
|
||||
y=0.5,
|
||||
showarrow=False,
|
||||
font={"color": "#888888"},
|
||||
)
|
||||
fig.update_layout(
|
||||
title=title,
|
||||
height=350,
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
)
|
||||
return fig
|
||||
|
||||
df = pd.DataFrame(data)
|
||||
df[date_column] = pd.to_datetime(df[date_column])
|
||||
|
||||
if group_column and group_column in df.columns:
|
||||
fig = px.line(
|
||||
df,
|
||||
x=date_column,
|
||||
y=price_column,
|
||||
color=group_column,
|
||||
title=title,
|
||||
)
|
||||
else:
|
||||
fig = px.line(
|
||||
df,
|
||||
x=date_column,
|
||||
y=price_column,
|
||||
title=title,
|
||||
)
|
||||
|
||||
fig.update_layout(
|
||||
height=350,
|
||||
margin={"l": 40, "r": 20, "t": 50, "b": 40},
|
||||
xaxis_title="Date",
|
||||
yaxis_title=price_column.replace("_", " ").title(),
|
||||
yaxis_tickprefix="$",
|
||||
yaxis_tickformat=",",
|
||||
hovermode="x unified",
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
xaxis={"gridcolor": "#333333", "linecolor": "#444444"},
|
||||
yaxis={"gridcolor": "#333333", "linecolor": "#444444"},
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def create_volume_time_series(
|
||||
data: list[dict[str, Any]],
|
||||
date_column: str = "full_date",
|
||||
volume_column: str = "sales_count",
|
||||
group_column: str | None = None,
|
||||
title: str = "Sales Volume Over Time",
|
||||
chart_type: str = "bar",
|
||||
) -> go.Figure:
|
||||
"""Create a time series chart for volume/count data.
|
||||
|
||||
Args:
|
||||
data: List of records with date and volume columns.
|
||||
date_column: Column name for dates.
|
||||
volume_column: Column name for volume values.
|
||||
group_column: Optional column for grouping.
|
||||
title: Chart title.
|
||||
chart_type: 'bar' or 'line'.
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
import pandas as pd
|
||||
|
||||
if not data:
|
||||
fig = go.Figure()
|
||||
fig.add_annotation(
|
||||
text="No data available",
|
||||
xref="paper",
|
||||
yref="paper",
|
||||
x=0.5,
|
||||
y=0.5,
|
||||
showarrow=False,
|
||||
font={"color": "#888888"},
|
||||
)
|
||||
fig.update_layout(
|
||||
title=title,
|
||||
height=350,
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
)
|
||||
return fig
|
||||
|
||||
df = pd.DataFrame(data)
|
||||
df[date_column] = pd.to_datetime(df[date_column])
|
||||
|
||||
if chart_type == "bar":
|
||||
if group_column and group_column in df.columns:
|
||||
fig = px.bar(
|
||||
df,
|
||||
x=date_column,
|
||||
y=volume_column,
|
||||
color=group_column,
|
||||
title=title,
|
||||
)
|
||||
else:
|
||||
fig = px.bar(
|
||||
df,
|
||||
x=date_column,
|
||||
y=volume_column,
|
||||
title=title,
|
||||
)
|
||||
else:
|
||||
if group_column and group_column in df.columns:
|
||||
fig = px.line(
|
||||
df,
|
||||
x=date_column,
|
||||
y=volume_column,
|
||||
color=group_column,
|
||||
title=title,
|
||||
)
|
||||
else:
|
||||
fig = px.line(
|
||||
df,
|
||||
x=date_column,
|
||||
y=volume_column,
|
||||
title=title,
|
||||
)
|
||||
|
||||
fig.update_layout(
|
||||
height=350,
|
||||
margin={"l": 40, "r": 20, "t": 50, "b": 40},
|
||||
xaxis_title="Date",
|
||||
yaxis_title=volume_column.replace("_", " ").title(),
|
||||
yaxis_tickformat=",",
|
||||
hovermode="x unified",
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
xaxis={"gridcolor": "#333333", "linecolor": "#444444"},
|
||||
yaxis={"gridcolor": "#333333", "linecolor": "#444444"},
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def create_market_comparison_chart(
|
||||
data: list[dict[str, Any]],
|
||||
date_column: str = "full_date",
|
||||
metrics: list[str] | None = None,
|
||||
title: str = "Market Indicators",
|
||||
) -> go.Figure:
|
||||
"""Create a multi-metric comparison chart.
|
||||
|
||||
Args:
|
||||
data: List of records with date and metric columns.
|
||||
date_column: Column name for dates.
|
||||
metrics: List of metric columns to display.
|
||||
title: Chart title.
|
||||
|
||||
Returns:
|
||||
Plotly Figure object with secondary y-axis.
|
||||
"""
|
||||
import pandas as pd
|
||||
from plotly.subplots import make_subplots
|
||||
|
||||
if not data:
|
||||
fig = go.Figure()
|
||||
fig.add_annotation(
|
||||
text="No data available",
|
||||
xref="paper",
|
||||
yref="paper",
|
||||
x=0.5,
|
||||
y=0.5,
|
||||
showarrow=False,
|
||||
font={"color": "#888888"},
|
||||
)
|
||||
fig.update_layout(
|
||||
title=title,
|
||||
height=400,
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
)
|
||||
return fig
|
||||
|
||||
if metrics is None:
|
||||
metrics = ["avg_price", "sales_count"]
|
||||
|
||||
df = pd.DataFrame(data)
|
||||
df[date_column] = pd.to_datetime(df[date_column])
|
||||
|
||||
fig = make_subplots(specs=[[{"secondary_y": True}]])
|
||||
|
||||
colors = ["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728"]
|
||||
|
||||
for i, metric in enumerate(metrics[:4]):
|
||||
if metric not in df.columns:
|
||||
continue
|
||||
|
||||
secondary = i > 0
|
||||
fig.add_trace(
|
||||
go.Scatter(
|
||||
x=df[date_column],
|
||||
y=df[metric],
|
||||
name=metric.replace("_", " ").title(),
|
||||
line={"color": colors[i % len(colors)]},
|
||||
),
|
||||
secondary_y=secondary,
|
||||
)
|
||||
|
||||
fig.update_layout(
|
||||
title=title,
|
||||
height=400,
|
||||
margin={"l": 40, "r": 40, "t": 50, "b": 40},
|
||||
hovermode="x unified",
|
||||
paper_bgcolor="rgba(0,0,0,0)",
|
||||
plot_bgcolor="rgba(0,0,0,0)",
|
||||
font_color="#c9c9c9",
|
||||
xaxis={"gridcolor": "#333333", "linecolor": "#444444"},
|
||||
yaxis={"gridcolor": "#333333", "linecolor": "#444444"},
|
||||
legend={
|
||||
"orientation": "h",
|
||||
"yanchor": "bottom",
|
||||
"y": 1.02,
|
||||
"xanchor": "right",
|
||||
"x": 1,
|
||||
"font": {"color": "#c9c9c9"},
|
||||
},
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def add_policy_markers(
|
||||
fig: go.Figure,
|
||||
policy_events: list[dict[str, Any]],
|
||||
date_column: str = "event_date",
|
||||
y_position: float | None = None,
|
||||
) -> go.Figure:
|
||||
"""Add policy event markers to an existing time series figure.
|
||||
|
||||
Args:
|
||||
fig: Existing Plotly figure to add markers to.
|
||||
policy_events: List of policy event dicts with date and metadata.
|
||||
date_column: Column name for event dates.
|
||||
y_position: Y position for markers. If None, uses top of chart.
|
||||
|
||||
Returns:
|
||||
Updated Plotly Figure object with policy markers.
|
||||
"""
|
||||
if not policy_events:
|
||||
return fig
|
||||
|
||||
# Color mapping for policy categories
|
||||
category_colors = {
|
||||
"monetary": "#1f77b4", # Blue
|
||||
"tax": "#2ca02c", # Green
|
||||
"regulatory": "#ff7f0e", # Orange
|
||||
"supply": "#9467bd", # Purple
|
||||
"economic": "#d62728", # Red
|
||||
}
|
||||
|
||||
# Symbol mapping for expected direction
|
||||
direction_symbols = {
|
||||
"bullish": "triangle-up",
|
||||
"bearish": "triangle-down",
|
||||
"neutral": "circle",
|
||||
}
|
||||
|
||||
for event in policy_events:
|
||||
event_date = event.get(date_column)
|
||||
category = event.get("category", "economic")
|
||||
direction = event.get("expected_direction", "neutral")
|
||||
title = event.get("title", "Policy Event")
|
||||
level = event.get("level", "federal")
|
||||
|
||||
color = category_colors.get(category, "#666666")
|
||||
symbol = direction_symbols.get(direction, "circle")
|
||||
|
||||
# Add vertical line for the event
|
||||
fig.add_vline(
|
||||
x=event_date,
|
||||
line_dash="dot",
|
||||
line_color=color,
|
||||
opacity=0.5,
|
||||
annotation_text="",
|
||||
)
|
||||
|
||||
# Add marker with hover info
|
||||
fig.add_trace(
|
||||
go.Scatter(
|
||||
x=[event_date],
|
||||
y=[y_position] if y_position else [None], # type: ignore[list-item]
|
||||
mode="markers",
|
||||
marker={
|
||||
"symbol": symbol,
|
||||
"size": 12,
|
||||
"color": color,
|
||||
"line": {"width": 1, "color": "white"},
|
||||
},
|
||||
name=title,
|
||||
hovertemplate=(
|
||||
f"<b>{title}</b><br>"
|
||||
f"Date: %{{x}}<br>"
|
||||
f"Level: {level.title()}<br>"
|
||||
f"Category: {category.title()}<br>"
|
||||
f"<extra></extra>"
|
||||
),
|
||||
showlegend=False,
|
||||
)
|
||||
)
|
||||
|
||||
return fig
|
||||
|
||||
|
||||
def create_time_series_with_events(
|
||||
data: list[dict[str, Any]],
|
||||
policy_events: list[dict[str, Any]],
|
||||
date_column: str = "full_date",
|
||||
value_column: str = "avg_price",
|
||||
title: str = "Price Trend with Policy Events",
|
||||
) -> go.Figure:
|
||||
"""Create a time series chart with policy event markers.
|
||||
|
||||
Args:
|
||||
data: Time series data.
|
||||
policy_events: Policy events to overlay.
|
||||
date_column: Column name for dates.
|
||||
value_column: Column name for values.
|
||||
title: Chart title.
|
||||
|
||||
Returns:
|
||||
Plotly Figure with time series and policy markers.
|
||||
"""
|
||||
# Create base time series
|
||||
fig = create_price_time_series(
|
||||
data=data,
|
||||
date_column=date_column,
|
||||
price_column=value_column,
|
||||
title=title,
|
||||
)
|
||||
|
||||
# Add policy markers at the top of the chart
|
||||
if policy_events:
|
||||
fig = add_policy_markers(fig, policy_events)
|
||||
|
||||
return fig
|
||||
20
portfolio_app/pages/health.py
Normal file
20
portfolio_app/pages/health.py
Normal file
@@ -0,0 +1,20 @@
|
||||
"""Health check endpoint for deployment monitoring."""
|
||||
|
||||
import dash
|
||||
from dash import html
|
||||
|
||||
dash.register_page(
|
||||
__name__,
|
||||
path="/health",
|
||||
title="Health Check",
|
||||
)
|
||||
|
||||
|
||||
def layout() -> html.Div:
|
||||
"""Return simple health check response."""
|
||||
return html.Div(
|
||||
[
|
||||
html.Pre("status: ok"),
|
||||
],
|
||||
id="health-check",
|
||||
)
|
||||
@@ -1,14 +1,169 @@
|
||||
"""Bio landing page."""
|
||||
|
||||
import dash
|
||||
from dash import html
|
||||
import dash_mantine_components as dmc
|
||||
|
||||
dash.register_page(__name__, path="/", name="Home")
|
||||
|
||||
layout = html.Div(
|
||||
[
|
||||
html.H1("Analytics Portfolio"),
|
||||
html.P("Welcome to Leo's analytics portfolio."),
|
||||
html.P("Dashboard coming soon."),
|
||||
]
|
||||
# Content from bio_content_v2.md
|
||||
HEADLINE = "Leo | Data Engineer & Analytics Developer"
|
||||
TAGLINE = "I build data infrastructure that actually gets used."
|
||||
|
||||
SUMMARY = """Over the past 5 years, I've designed and evolved an enterprise analytics platform
|
||||
from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and
|
||||
dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call
|
||||
abandon rates, and dashboards that executives actually open.
|
||||
|
||||
My approach: dimensional modeling (star schema), layered transformations
|
||||
(staging → intermediate → marts), and automation that eliminates manual work.
|
||||
I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
|
||||
|
||||
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states.
|
||||
Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the
|
||||
Project Management Institute."""
|
||||
|
||||
TECH_STACK = [
|
||||
"Python",
|
||||
"Pandas",
|
||||
"SQLAlchemy",
|
||||
"FastAPI",
|
||||
"SQL",
|
||||
"PostgreSQL",
|
||||
"MSSQL",
|
||||
"Power BI",
|
||||
"Plotly/Dash",
|
||||
"dbt patterns",
|
||||
"Genesys Cloud",
|
||||
]
|
||||
|
||||
PROJECTS = [
|
||||
{
|
||||
"title": "Toronto Housing Dashboard",
|
||||
"description": "Choropleth visualization of GTA real estate trends with TRREB and CMHC data.",
|
||||
"status": "In Development",
|
||||
"link": "/toronto",
|
||||
},
|
||||
{
|
||||
"title": "Energy Pricing Analysis",
|
||||
"description": "Time series analysis and ML prediction for utility market pricing.",
|
||||
"status": "Planned",
|
||||
"link": "/energy",
|
||||
},
|
||||
]
|
||||
|
||||
AVAILABILITY = "Open to Senior Data Analyst, Analytics Engineer, and BI Developer opportunities in Toronto or remote."
|
||||
|
||||
|
||||
def create_hero_section() -> dmc.Stack:
|
||||
"""Create the hero section with name and tagline."""
|
||||
return dmc.Stack(
|
||||
[
|
||||
dmc.Title(HEADLINE, order=1, ta="center"),
|
||||
dmc.Text(TAGLINE, size="xl", c="dimmed", ta="center"),
|
||||
],
|
||||
gap="xs",
|
||||
py="xl",
|
||||
)
|
||||
|
||||
|
||||
def create_summary_section() -> dmc.Paper:
|
||||
"""Create the professional summary section."""
|
||||
paragraphs = SUMMARY.strip().split("\n\n")
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("About", order=2, size="h3"),
|
||||
*[dmc.Text(p.replace("\n", " "), size="md") for p in paragraphs],
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_tech_stack_section() -> dmc.Paper:
|
||||
"""Create the tech stack section with badges."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Tech Stack", order=2, size="h3"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(tech, size="lg", variant="light", radius="sm")
|
||||
for tech in TECH_STACK
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_project_card(project: dict[str, str]) -> dmc.Card:
|
||||
"""Create a project card."""
|
||||
status_color = "blue" if project["status"] == "In Development" else "gray"
|
||||
return dmc.Card(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Text(project["title"], fw=500, size="lg"),
|
||||
dmc.Badge(project["status"], color=status_color, variant="light"),
|
||||
],
|
||||
justify="space-between",
|
||||
align="center",
|
||||
),
|
||||
dmc.Text(project["description"], size="sm", c="dimmed", mt="sm"),
|
||||
],
|
||||
withBorder=True,
|
||||
radius="md",
|
||||
p="lg",
|
||||
)
|
||||
|
||||
|
||||
def create_projects_section() -> dmc.Paper:
|
||||
"""Create the portfolio projects section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Portfolio Projects", order=2, size="h3"),
|
||||
dmc.SimpleGrid(
|
||||
[create_project_card(p) for p in PROJECTS],
|
||||
cols={"base": 1, "sm": 2},
|
||||
spacing="lg",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_availability_section() -> dmc.Text:
|
||||
"""Create the availability statement."""
|
||||
return dmc.Text(AVAILABILITY, size="sm", c="dimmed", ta="center", fs="italic")
|
||||
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_hero_section(),
|
||||
create_summary_section(),
|
||||
create_tech_stack_section(),
|
||||
create_projects_section(),
|
||||
dmc.Divider(my="lg"),
|
||||
create_availability_section(),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="xl",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
|
||||
@@ -1 +1 @@
|
||||
"""Toronto Housing Dashboard page."""
|
||||
"""Toronto Housing Dashboard pages."""
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
294
portfolio_app/pages/toronto/dashboard.py
Normal file
294
portfolio_app/pages/toronto/dashboard.py
Normal file
@@ -0,0 +1,294 @@
|
||||
"""Toronto Housing Dashboard page."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc, html
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
from portfolio_app.components import (
|
||||
create_map_controls,
|
||||
create_metric_cards_row,
|
||||
create_time_slider,
|
||||
create_year_selector,
|
||||
)
|
||||
|
||||
dash.register_page(__name__, path="/toronto", name="Toronto Housing")
|
||||
|
||||
# Metric options for the purchase market
|
||||
PURCHASE_METRIC_OPTIONS = [
|
||||
{"label": "Average Price", "value": "avg_price"},
|
||||
{"label": "Median Price", "value": "median_price"},
|
||||
{"label": "Sales Volume", "value": "sales_count"},
|
||||
{"label": "Days on Market", "value": "avg_dom"},
|
||||
]
|
||||
|
||||
# Metric options for the rental market
|
||||
RENTAL_METRIC_OPTIONS = [
|
||||
{"label": "Average Rent", "value": "avg_rent"},
|
||||
{"label": "Vacancy Rate", "value": "vacancy_rate"},
|
||||
{"label": "Rental Universe", "value": "rental_universe"},
|
||||
]
|
||||
|
||||
# Sample metrics for KPI cards (will be populated by callbacks)
|
||||
SAMPLE_METRICS = [
|
||||
{
|
||||
"title": "Avg. Price",
|
||||
"value": 1125000,
|
||||
"delta": 2.3,
|
||||
"prefix": "$",
|
||||
"format_spec": ",.0f",
|
||||
},
|
||||
{
|
||||
"title": "Sales Volume",
|
||||
"value": 4850,
|
||||
"delta": -5.1,
|
||||
"format_spec": ",",
|
||||
},
|
||||
{
|
||||
"title": "Avg. DOM",
|
||||
"value": 18,
|
||||
"delta": 3,
|
||||
"suffix": " days",
|
||||
"positive_is_good": False,
|
||||
},
|
||||
{
|
||||
"title": "Avg. Rent",
|
||||
"value": 2450,
|
||||
"delta": 4.2,
|
||||
"prefix": "$",
|
||||
"format_spec": ",.0f",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def create_header() -> dmc.Group:
|
||||
"""Create the dashboard header with title and controls."""
|
||||
return dmc.Group(
|
||||
[
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Toronto Housing Dashboard", order=1),
|
||||
dmc.Text(
|
||||
"Real estate market analysis for the Greater Toronto Area",
|
||||
c="dimmed",
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"Methodology",
|
||||
leftSection=DashIconify(
|
||||
icon="tabler:info-circle", width=18
|
||||
),
|
||||
variant="subtle",
|
||||
color="gray",
|
||||
),
|
||||
href="/toronto/methodology",
|
||||
),
|
||||
create_year_selector(
|
||||
id_prefix="toronto",
|
||||
min_year=2020,
|
||||
default_year=2024,
|
||||
label="Year",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
],
|
||||
justify="space-between",
|
||||
align="flex-start",
|
||||
)
|
||||
|
||||
|
||||
def create_kpi_section() -> dmc.Box:
|
||||
"""Create the KPI metrics row."""
|
||||
return dmc.Box(
|
||||
children=[
|
||||
dmc.Title("Key Metrics", order=3, size="h4", mb="sm"),
|
||||
html.Div(
|
||||
id="toronto-kpi-cards",
|
||||
children=[
|
||||
create_metric_cards_row(SAMPLE_METRICS, id_prefix="toronto-kpi")
|
||||
],
|
||||
),
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
def create_purchase_map_section() -> dmc.Grid:
|
||||
"""Create the purchase market choropleth section."""
|
||||
return dmc.Grid(
|
||||
[
|
||||
dmc.GridCol(
|
||||
create_map_controls(
|
||||
id_prefix="purchase-map",
|
||||
metric_options=PURCHASE_METRIC_OPTIONS,
|
||||
default_metric="avg_price",
|
||||
),
|
||||
span={"base": 12, "md": 3},
|
||||
),
|
||||
dmc.GridCol(
|
||||
dmc.Paper(
|
||||
children=[
|
||||
dcc.Graph(
|
||||
id="purchase-choropleth",
|
||||
config={"scrollZoom": True},
|
||||
style={"height": "500px"},
|
||||
),
|
||||
],
|
||||
p="xs",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
),
|
||||
span={"base": 12, "md": 9},
|
||||
),
|
||||
],
|
||||
gutter="md",
|
||||
)
|
||||
|
||||
|
||||
def create_rental_map_section() -> dmc.Grid:
|
||||
"""Create the rental market choropleth section."""
|
||||
return dmc.Grid(
|
||||
[
|
||||
dmc.GridCol(
|
||||
create_map_controls(
|
||||
id_prefix="rental-map",
|
||||
metric_options=RENTAL_METRIC_OPTIONS,
|
||||
default_metric="avg_rent",
|
||||
),
|
||||
span={"base": 12, "md": 3},
|
||||
),
|
||||
dmc.GridCol(
|
||||
dmc.Paper(
|
||||
children=[
|
||||
dcc.Graph(
|
||||
id="rental-choropleth",
|
||||
config={"scrollZoom": True},
|
||||
style={"height": "500px"},
|
||||
),
|
||||
],
|
||||
p="xs",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
),
|
||||
span={"base": 12, "md": 9},
|
||||
),
|
||||
],
|
||||
gutter="md",
|
||||
)
|
||||
|
||||
|
||||
def create_time_series_section() -> dmc.Grid:
|
||||
"""Create the time series charts section."""
|
||||
return dmc.Grid(
|
||||
[
|
||||
dmc.GridCol(
|
||||
dmc.Paper(
|
||||
children=[
|
||||
dmc.Title("Price Trends", order=4, size="h5", mb="sm"),
|
||||
dcc.Graph(
|
||||
id="price-time-series",
|
||||
config={"displayModeBar": False},
|
||||
style={"height": "350px"},
|
||||
),
|
||||
],
|
||||
p="md",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
),
|
||||
span={"base": 12, "md": 6},
|
||||
),
|
||||
dmc.GridCol(
|
||||
dmc.Paper(
|
||||
children=[
|
||||
dmc.Title("Sales Volume", order=4, size="h5", mb="sm"),
|
||||
dcc.Graph(
|
||||
id="volume-time-series",
|
||||
config={"displayModeBar": False},
|
||||
style={"height": "350px"},
|
||||
),
|
||||
],
|
||||
p="md",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
),
|
||||
span={"base": 12, "md": 6},
|
||||
),
|
||||
],
|
||||
gutter="md",
|
||||
)
|
||||
|
||||
|
||||
def create_market_comparison_section() -> dmc.Paper:
|
||||
"""Create the market comparison chart section."""
|
||||
return dmc.Paper(
|
||||
children=[
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Title("Market Indicators", order=4, size="h5"),
|
||||
create_time_slider(
|
||||
id_prefix="market-comparison",
|
||||
min_year=2020,
|
||||
label="",
|
||||
),
|
||||
],
|
||||
justify="space-between",
|
||||
align="center",
|
||||
mb="md",
|
||||
),
|
||||
dcc.Graph(
|
||||
id="market-comparison-chart",
|
||||
config={"displayModeBar": False},
|
||||
style={"height": "400px"},
|
||||
),
|
||||
],
|
||||
p="md",
|
||||
radius="sm",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_data_notice() -> dmc.Alert:
|
||||
"""Create a notice about data availability."""
|
||||
return dmc.Alert(
|
||||
children=[
|
||||
dmc.Text(
|
||||
"This dashboard uses TRREB and CMHC data. "
|
||||
"Geographic boundaries require QGIS digitization to enable choropleth maps. "
|
||||
"Sample data is shown below.",
|
||||
size="sm",
|
||||
),
|
||||
],
|
||||
title="Data Notice",
|
||||
color="blue",
|
||||
variant="light",
|
||||
)
|
||||
|
||||
|
||||
# Register callbacks
|
||||
from portfolio_app.pages.toronto import callbacks # noqa: E402, F401
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_header(),
|
||||
create_data_notice(),
|
||||
create_kpi_section(),
|
||||
dmc.Divider(my="md", label="Purchase Market", labelPosition="center"),
|
||||
create_purchase_map_section(),
|
||||
dmc.Divider(my="md", label="Rental Market", labelPosition="center"),
|
||||
create_rental_map_section(),
|
||||
dmc.Divider(my="md", label="Trends", labelPosition="center"),
|
||||
create_time_series_section(),
|
||||
create_market_comparison_section(),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
size="xl",
|
||||
py="xl",
|
||||
)
|
||||
274
portfolio_app/pages/toronto/methodology.py
Normal file
274
portfolio_app/pages/toronto/methodology.py
Normal file
@@ -0,0 +1,274 @@
|
||||
"""Methodology page for Toronto Housing Dashboard."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc, html
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
dash.register_page(
|
||||
__name__,
|
||||
path="/toronto/methodology",
|
||||
title="Methodology | Toronto Housing Dashboard",
|
||||
description="Data sources, methodology, and limitations for the Toronto Housing Dashboard",
|
||||
)
|
||||
|
||||
|
||||
def layout() -> dmc.Container:
|
||||
"""Render the methodology page layout."""
|
||||
return dmc.Container(
|
||||
size="md",
|
||||
py="xl",
|
||||
children=[
|
||||
# Back to Dashboard button
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"Back to Dashboard",
|
||||
leftSection=DashIconify(icon="tabler:arrow-left", width=18),
|
||||
variant="subtle",
|
||||
color="gray",
|
||||
),
|
||||
href="/toronto",
|
||||
),
|
||||
# Header
|
||||
dmc.Title("Methodology", order=1, mb="lg", mt="md"),
|
||||
dmc.Text(
|
||||
"This page documents the data sources, processing methodology, "
|
||||
"and known limitations of the Toronto Housing Dashboard.",
|
||||
size="lg",
|
||||
c="dimmed",
|
||||
mb="xl",
|
||||
),
|
||||
# Data Sources Section
|
||||
dmc.Paper(
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
mb="lg",
|
||||
children=[
|
||||
dmc.Title("Data Sources", order=2, mb="md"),
|
||||
# TRREB
|
||||
dmc.Title("Purchase Data: TRREB", order=3, size="h4", mb="sm"),
|
||||
dmc.Text(
|
||||
[
|
||||
"The Toronto Regional Real Estate Board (TRREB) publishes monthly ",
|
||||
html.Strong("Market Watch"),
|
||||
" reports containing aggregate statistics for residential real estate "
|
||||
"transactions across the Greater Toronto Area.",
|
||||
],
|
||||
mb="sm",
|
||||
),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem("Source: TRREB Market Watch Reports (PDF)"),
|
||||
dmc.ListItem("Geographic granularity: ~35 TRREB Districts"),
|
||||
dmc.ListItem("Temporal granularity: Monthly"),
|
||||
dmc.ListItem("Coverage: 2021-present"),
|
||||
dmc.ListItem(
|
||||
[
|
||||
"Metrics: Sales count, average/median price, new listings, ",
|
||||
"active listings, days on market, sale-to-list ratio",
|
||||
]
|
||||
),
|
||||
],
|
||||
mb="md",
|
||||
),
|
||||
dmc.Anchor(
|
||||
"TRREB Market Watch Archive",
|
||||
href="https://trreb.ca/market-data/market-watch/market-watch-archive/",
|
||||
target="_blank",
|
||||
mb="lg",
|
||||
),
|
||||
# CMHC
|
||||
dmc.Title(
|
||||
"Rental Data: CMHC", order=3, size="h4", mb="sm", mt="md"
|
||||
),
|
||||
dmc.Text(
|
||||
[
|
||||
"Canada Mortgage and Housing Corporation (CMHC) conducts the annual ",
|
||||
html.Strong("Rental Market Survey"),
|
||||
" providing rental market statistics for major urban centres.",
|
||||
],
|
||||
mb="sm",
|
||||
),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem("Source: CMHC Rental Market Survey (Excel)"),
|
||||
dmc.ListItem(
|
||||
"Geographic granularity: ~20 CMHC Zones (Census Tract aligned)"
|
||||
),
|
||||
dmc.ListItem(
|
||||
"Temporal granularity: Annual (October survey)"
|
||||
),
|
||||
dmc.ListItem("Coverage: 2021-present"),
|
||||
dmc.ListItem(
|
||||
[
|
||||
"Metrics: Average/median rent, vacancy rate, universe count, ",
|
||||
"turnover rate, year-over-year rent change",
|
||||
]
|
||||
),
|
||||
],
|
||||
mb="md",
|
||||
),
|
||||
dmc.Anchor(
|
||||
"CMHC Housing Market Information Portal",
|
||||
href="https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market",
|
||||
target="_blank",
|
||||
),
|
||||
],
|
||||
),
|
||||
# Geographic Considerations
|
||||
dmc.Paper(
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
mb="lg",
|
||||
children=[
|
||||
dmc.Title("Geographic Considerations", order=2, mb="md"),
|
||||
dmc.Alert(
|
||||
title="Important: Non-Aligned Geographies",
|
||||
color="yellow",
|
||||
mb="md",
|
||||
children=[
|
||||
"TRREB Districts and CMHC Zones do ",
|
||||
html.Strong("not"),
|
||||
" align geographically. They are displayed as separate layers and "
|
||||
"should not be directly compared at the sub-regional level.",
|
||||
],
|
||||
),
|
||||
dmc.Text(
|
||||
"The dashboard presents three geographic layers:",
|
||||
mb="sm",
|
||||
),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("TRREB Districts (~35): "),
|
||||
"Used for purchase/sales data visualization. "
|
||||
"Districts are defined by TRREB and labeled with codes like W01, C01, E01.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("CMHC Zones (~20): "),
|
||||
"Used for rental data visualization. "
|
||||
"Zones are aligned with Census Tract boundaries.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("City Neighbourhoods (158): "),
|
||||
"Reference overlay only. "
|
||||
"These are official City of Toronto neighbourhood boundaries.",
|
||||
]
|
||||
),
|
||||
],
|
||||
),
|
||||
],
|
||||
),
|
||||
# Policy Events
|
||||
dmc.Paper(
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
mb="lg",
|
||||
children=[
|
||||
dmc.Title("Policy Event Annotations", order=2, mb="md"),
|
||||
dmc.Text(
|
||||
"The time series charts include markers for significant policy events "
|
||||
"that may have influenced housing market conditions. These annotations are "
|
||||
"for contextual reference only.",
|
||||
mb="md",
|
||||
),
|
||||
dmc.Alert(
|
||||
title="No Causation Claims",
|
||||
color="blue",
|
||||
children=[
|
||||
"The presence of a policy marker near a market trend change does ",
|
||||
html.Strong("not"),
|
||||
" imply causation. Housing markets are influenced by numerous factors "
|
||||
"beyond policy interventions.",
|
||||
],
|
||||
),
|
||||
],
|
||||
),
|
||||
# Limitations
|
||||
dmc.Paper(
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
mb="lg",
|
||||
children=[
|
||||
dmc.Title("Limitations", order=2, mb="md"),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Aggregate Data: "),
|
||||
"All statistics are aggregates. Individual property characteristics, "
|
||||
"condition, and micro-location are not reflected.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Reporting Lag: "),
|
||||
"TRREB data reflects closed transactions, which may lag market "
|
||||
"conditions by 1-3 months. CMHC data is annual.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Geographic Boundaries: "),
|
||||
"TRREB district boundaries were manually digitized from reference maps "
|
||||
"and may contain minor inaccuracies.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Data Suppression: "),
|
||||
"Some cells may be suppressed for confidentiality when transaction "
|
||||
"counts are below thresholds.",
|
||||
]
|
||||
),
|
||||
],
|
||||
),
|
||||
],
|
||||
),
|
||||
# Technical Implementation
|
||||
dmc.Paper(
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
children=[
|
||||
dmc.Title("Technical Implementation", order=2, mb="md"),
|
||||
dmc.Text("This dashboard is built with:", mb="sm"),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem("Python 3.11+ with Dash and Plotly"),
|
||||
dmc.ListItem("PostgreSQL with PostGIS for geospatial data"),
|
||||
dmc.ListItem("dbt for data transformation"),
|
||||
dmc.ListItem("Pydantic for data validation"),
|
||||
dmc.ListItem("SQLAlchemy 2.0 for database operations"),
|
||||
],
|
||||
mb="md",
|
||||
),
|
||||
dmc.Anchor(
|
||||
"View source code on GitHub",
|
||||
href="https://github.com/lmiranda/personal-portfolio",
|
||||
target="_blank",
|
||||
),
|
||||
],
|
||||
),
|
||||
# Back link
|
||||
dmc.Group(
|
||||
mt="xl",
|
||||
children=[
|
||||
dmc.Anchor(
|
||||
"← Back to Dashboard",
|
||||
href="/toronto",
|
||||
size="lg",
|
||||
),
|
||||
],
|
||||
),
|
||||
],
|
||||
)
|
||||
257
portfolio_app/toronto/demo_data.py
Normal file
257
portfolio_app/toronto/demo_data.py
Normal file
@@ -0,0 +1,257 @@
|
||||
"""Demo/sample data for testing the Toronto Housing Dashboard without full pipeline.
|
||||
|
||||
This module provides synthetic data for development and demonstration purposes.
|
||||
Replace with real data from the database in production.
|
||||
"""
|
||||
|
||||
from datetime import date
|
||||
from typing import Any
|
||||
|
||||
|
||||
def get_demo_districts() -> list[dict[str, Any]]:
|
||||
"""Return sample TRREB district data."""
|
||||
return [
|
||||
{"district_code": "W01", "district_name": "Long Branch", "area_type": "West"},
|
||||
{"district_code": "W02", "district_name": "Mimico", "area_type": "West"},
|
||||
{
|
||||
"district_code": "W03",
|
||||
"district_name": "Kingsway South",
|
||||
"area_type": "West",
|
||||
},
|
||||
{"district_code": "W04", "district_name": "Edenbridge", "area_type": "West"},
|
||||
{"district_code": "W05", "district_name": "Islington", "area_type": "West"},
|
||||
{"district_code": "W06", "district_name": "Rexdale", "area_type": "West"},
|
||||
{"district_code": "W07", "district_name": "Willowdale", "area_type": "West"},
|
||||
{"district_code": "W08", "district_name": "York", "area_type": "West"},
|
||||
{
|
||||
"district_code": "C01",
|
||||
"district_name": "Downtown Core",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{"district_code": "C02", "district_name": "Annex", "area_type": "Central"},
|
||||
{
|
||||
"district_code": "C03",
|
||||
"district_name": "Forest Hill",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{
|
||||
"district_code": "C04",
|
||||
"district_name": "Lawrence Park",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{
|
||||
"district_code": "C06",
|
||||
"district_name": "Willowdale East",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{"district_code": "C07", "district_name": "Thornhill", "area_type": "Central"},
|
||||
{"district_code": "C08", "district_name": "Waterfront", "area_type": "Central"},
|
||||
{"district_code": "E01", "district_name": "Leslieville", "area_type": "East"},
|
||||
{"district_code": "E02", "district_name": "The Beaches", "area_type": "East"},
|
||||
{"district_code": "E03", "district_name": "Danforth", "area_type": "East"},
|
||||
{"district_code": "E04", "district_name": "Birch Cliff", "area_type": "East"},
|
||||
{"district_code": "E05", "district_name": "Scarborough", "area_type": "East"},
|
||||
]
|
||||
|
||||
|
||||
def get_demo_purchase_data() -> list[dict[str, Any]]:
|
||||
"""Return sample purchase data for time series visualization."""
|
||||
import random
|
||||
|
||||
random.seed(42)
|
||||
data = []
|
||||
|
||||
base_prices = {
|
||||
"W01": 850000,
|
||||
"C01": 1200000,
|
||||
"E01": 950000,
|
||||
}
|
||||
|
||||
for year in [2024, 2025]:
|
||||
for month in range(1, 13):
|
||||
if year == 2025 and month > 12:
|
||||
break
|
||||
|
||||
for district, base_price in base_prices.items():
|
||||
# Add some randomness and trend
|
||||
trend = (year - 2024) * 12 + month
|
||||
price_variation = random.uniform(-0.05, 0.05)
|
||||
trend_factor = 1 + (trend * 0.002) # Slight upward trend
|
||||
|
||||
avg_price = int(base_price * trend_factor * (1 + price_variation))
|
||||
sales = random.randint(50, 200)
|
||||
|
||||
data.append(
|
||||
{
|
||||
"district_code": district,
|
||||
"full_date": date(year, month, 1),
|
||||
"year": year,
|
||||
"month": month,
|
||||
"avg_price": avg_price,
|
||||
"median_price": int(avg_price * 0.95),
|
||||
"sales_count": sales,
|
||||
"new_listings": int(sales * random.uniform(1.2, 1.8)),
|
||||
"active_listings": int(sales * random.uniform(2.0, 3.5)),
|
||||
"days_on_market": random.randint(15, 45),
|
||||
"sale_to_list_ratio": round(random.uniform(0.95, 1.05), 2),
|
||||
}
|
||||
)
|
||||
|
||||
return data
|
||||
|
||||
|
||||
def get_demo_rental_data() -> list[dict[str, Any]]:
|
||||
"""Return sample rental data for visualization."""
|
||||
data = []
|
||||
|
||||
zones = [
|
||||
("Zone01", "Downtown"),
|
||||
("Zone02", "Midtown"),
|
||||
("Zone03", "North York"),
|
||||
("Zone04", "Scarborough"),
|
||||
("Zone05", "Etobicoke"),
|
||||
]
|
||||
|
||||
bedroom_types = ["bachelor", "1_bedroom", "2_bedroom", "3_bedroom"]
|
||||
|
||||
base_rents = {
|
||||
"bachelor": 1800,
|
||||
"1_bedroom": 2200,
|
||||
"2_bedroom": 2800,
|
||||
"3_bedroom": 3400,
|
||||
}
|
||||
|
||||
for year in [2021, 2022, 2023, 2024, 2025]:
|
||||
for zone_code, zone_name in zones:
|
||||
for bedroom in bedroom_types:
|
||||
# Rental trend: ~5% increase per year
|
||||
year_factor = 1 + ((year - 2021) * 0.05)
|
||||
base_rent = base_rents[bedroom]
|
||||
|
||||
data.append(
|
||||
{
|
||||
"zone_code": zone_code,
|
||||
"zone_name": zone_name,
|
||||
"survey_year": year,
|
||||
"full_date": date(year, 10, 1),
|
||||
"bedroom_type": bedroom,
|
||||
"average_rent": int(base_rent * year_factor),
|
||||
"median_rent": int(base_rent * year_factor * 0.98),
|
||||
"vacancy_rate": round(
|
||||
2.5 - (year - 2021) * 0.3, 1
|
||||
), # Decreasing vacancy
|
||||
"universe": 5000 + (year - 2021) * 200,
|
||||
}
|
||||
)
|
||||
|
||||
return data
|
||||
|
||||
|
||||
def get_demo_policy_events() -> list[dict[str, Any]]:
|
||||
"""Return sample policy events for annotation."""
|
||||
return [
|
||||
{
|
||||
"event_date": date(2024, 6, 5),
|
||||
"effective_date": date(2024, 6, 5),
|
||||
"level": "federal",
|
||||
"category": "monetary",
|
||||
"title": "BoC Rate Cut (25bp)",
|
||||
"description": "Bank of Canada cuts overnight rate by 25 basis points to 4.75%",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
{
|
||||
"event_date": date(2024, 7, 24),
|
||||
"effective_date": date(2024, 7, 24),
|
||||
"level": "federal",
|
||||
"category": "monetary",
|
||||
"title": "BoC Rate Cut (25bp)",
|
||||
"description": "Bank of Canada cuts overnight rate by 25 basis points to 4.50%",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
{
|
||||
"event_date": date(2024, 9, 4),
|
||||
"effective_date": date(2024, 9, 4),
|
||||
"level": "federal",
|
||||
"category": "monetary",
|
||||
"title": "BoC Rate Cut (25bp)",
|
||||
"description": "Bank of Canada cuts overnight rate by 25 basis points to 4.25%",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
{
|
||||
"event_date": date(2024, 10, 23),
|
||||
"effective_date": date(2024, 10, 23),
|
||||
"level": "federal",
|
||||
"category": "monetary",
|
||||
"title": "BoC Rate Cut (50bp)",
|
||||
"description": "Bank of Canada cuts overnight rate by 50 basis points to 3.75%",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
{
|
||||
"event_date": date(2024, 12, 11),
|
||||
"effective_date": date(2024, 12, 11),
|
||||
"level": "federal",
|
||||
"category": "monetary",
|
||||
"title": "BoC Rate Cut (50bp)",
|
||||
"description": "Bank of Canada cuts overnight rate by 50 basis points to 3.25%",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
{
|
||||
"event_date": date(2024, 9, 16),
|
||||
"effective_date": date(2024, 12, 15),
|
||||
"level": "federal",
|
||||
"category": "regulatory",
|
||||
"title": "CMHC 30-Year Amortization",
|
||||
"description": "30-year amortization extended to all first-time buyers and new builds",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
{
|
||||
"event_date": date(2024, 9, 16),
|
||||
"effective_date": date(2024, 12, 15),
|
||||
"level": "federal",
|
||||
"category": "regulatory",
|
||||
"title": "Insured Mortgage Cap $1.5M",
|
||||
"description": "Insured mortgage cap raised from $1M to $1.5M",
|
||||
"expected_direction": "bullish",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def get_demo_summary_metrics() -> dict[str, dict[str, Any]]:
|
||||
"""Return summary metrics for KPI cards."""
|
||||
return {
|
||||
"avg_price": {
|
||||
"value": 1067968,
|
||||
"title": "Avg. Price (2025)",
|
||||
"delta": -4.7,
|
||||
"delta_suffix": "%",
|
||||
"prefix": "$",
|
||||
"format_spec": ",.0f",
|
||||
"positive_is_good": True,
|
||||
},
|
||||
"total_sales": {
|
||||
"value": 67610,
|
||||
"title": "Total Sales (2024)",
|
||||
"delta": 2.6,
|
||||
"delta_suffix": "%",
|
||||
"format_spec": ",.0f",
|
||||
"positive_is_good": True,
|
||||
},
|
||||
"avg_rent": {
|
||||
"value": 2450,
|
||||
"title": "Avg. Rent (2025)",
|
||||
"delta": 3.2,
|
||||
"delta_suffix": "%",
|
||||
"prefix": "$",
|
||||
"format_spec": ",.0f",
|
||||
"positive_is_good": False,
|
||||
},
|
||||
"vacancy_rate": {
|
||||
"value": 1.8,
|
||||
"title": "Vacancy Rate",
|
||||
"delta": -0.4,
|
||||
"delta_suffix": "pp",
|
||||
"suffix": "%",
|
||||
"format_spec": ".1f",
|
||||
"positive_is_good": False,
|
||||
},
|
||||
}
|
||||
@@ -1 +1,32 @@
|
||||
"""Database loaders for Toronto housing data."""
|
||||
|
||||
from .base import bulk_insert, get_session, upsert_by_key
|
||||
from .cmhc import load_cmhc_record, load_cmhc_rentals
|
||||
from .dimensions import (
|
||||
generate_date_key,
|
||||
load_cmhc_zones,
|
||||
load_neighbourhoods,
|
||||
load_policy_events,
|
||||
load_time_dimension,
|
||||
load_trreb_districts,
|
||||
)
|
||||
from .trreb import load_trreb_purchases, load_trreb_record
|
||||
|
||||
__all__ = [
|
||||
# Base utilities
|
||||
"get_session",
|
||||
"bulk_insert",
|
||||
"upsert_by_key",
|
||||
# Dimension loaders
|
||||
"generate_date_key",
|
||||
"load_time_dimension",
|
||||
"load_trreb_districts",
|
||||
"load_cmhc_zones",
|
||||
"load_neighbourhoods",
|
||||
"load_policy_events",
|
||||
# Fact loaders
|
||||
"load_trreb_purchases",
|
||||
"load_trreb_record",
|
||||
"load_cmhc_rentals",
|
||||
"load_cmhc_record",
|
||||
]
|
||||
|
||||
85
portfolio_app/toronto/loaders/base.py
Normal file
85
portfolio_app/toronto/loaders/base.py
Normal file
@@ -0,0 +1,85 @@
|
||||
"""Base loader utilities for database operations."""
|
||||
|
||||
from collections.abc import Generator
|
||||
from contextlib import contextmanager
|
||||
from typing import Any, TypeVar
|
||||
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from portfolio_app.toronto.models import get_session_factory
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_session() -> Generator[Session, None, None]:
|
||||
"""Get a database session with automatic cleanup.
|
||||
|
||||
Yields:
|
||||
SQLAlchemy session that auto-commits on success, rollbacks on error.
|
||||
"""
|
||||
session_factory = get_session_factory()
|
||||
session = session_factory()
|
||||
try:
|
||||
yield session
|
||||
session.commit()
|
||||
except Exception:
|
||||
session.rollback()
|
||||
raise
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
|
||||
def bulk_insert(session: Session, objects: list[T]) -> int:
|
||||
"""Bulk insert objects into the database.
|
||||
|
||||
Args:
|
||||
session: Active SQLAlchemy session.
|
||||
objects: List of ORM model instances to insert.
|
||||
|
||||
Returns:
|
||||
Number of objects inserted.
|
||||
"""
|
||||
session.add_all(objects)
|
||||
session.flush()
|
||||
return len(objects)
|
||||
|
||||
|
||||
def upsert_by_key(
|
||||
session: Session,
|
||||
model_class: Any,
|
||||
objects: list[T],
|
||||
key_columns: list[str],
|
||||
) -> tuple[int, int]:
|
||||
"""Upsert objects based on unique key columns.
|
||||
|
||||
Args:
|
||||
session: Active SQLAlchemy session.
|
||||
model_class: The ORM model class.
|
||||
objects: List of ORM model instances to upsert.
|
||||
key_columns: Column names that form the unique key.
|
||||
|
||||
Returns:
|
||||
Tuple of (inserted_count, updated_count).
|
||||
"""
|
||||
inserted = 0
|
||||
updated = 0
|
||||
|
||||
for obj in objects:
|
||||
# Build filter for existing record
|
||||
filters = {col: getattr(obj, col) for col in key_columns}
|
||||
existing = session.query(model_class).filter_by(**filters).first()
|
||||
|
||||
if existing:
|
||||
# Update existing record
|
||||
for column in model_class.__table__.columns:
|
||||
if column.name not in key_columns and column.name != "id":
|
||||
setattr(existing, column.name, getattr(obj, column.name))
|
||||
updated += 1
|
||||
else:
|
||||
# Insert new record
|
||||
session.add(obj)
|
||||
inserted += 1
|
||||
|
||||
session.flush()
|
||||
return inserted, updated
|
||||
137
portfolio_app/toronto/loaders/cmhc.py
Normal file
137
portfolio_app/toronto/loaders/cmhc.py
Normal file
@@ -0,0 +1,137 @@
|
||||
"""Loader for CMHC rental data into fact_rentals."""
|
||||
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from portfolio_app.toronto.models import DimCMHCZone, DimTime, FactRentals
|
||||
from portfolio_app.toronto.schemas import CMHCAnnualSurvey, CMHCRentalRecord
|
||||
|
||||
from .base import get_session, upsert_by_key
|
||||
from .dimensions import generate_date_key
|
||||
|
||||
|
||||
def load_cmhc_rentals(
|
||||
survey: CMHCAnnualSurvey,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load CMHC annual survey data into fact_rentals.
|
||||
|
||||
Args:
|
||||
survey: Validated CMHC annual survey containing records.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
from datetime import date
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
# Get zone key mapping
|
||||
zones = sess.query(DimCMHCZone).all()
|
||||
zone_map = {z.zone_code: z.zone_key for z in zones}
|
||||
|
||||
# CMHC surveys are annual - use October 1st as reference date
|
||||
survey_date = date(survey.survey_year, 10, 1)
|
||||
date_key = generate_date_key(survey_date)
|
||||
|
||||
# Verify time dimension exists
|
||||
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
|
||||
if not time_dim:
|
||||
raise ValueError(
|
||||
f"Time dimension not found for date_key {date_key}. "
|
||||
"Load time dimension first."
|
||||
)
|
||||
|
||||
records = []
|
||||
for record in survey.records:
|
||||
zone_key = zone_map.get(record.zone_code)
|
||||
if not zone_key:
|
||||
# Skip records for unknown zones
|
||||
continue
|
||||
|
||||
fact = FactRentals(
|
||||
date_key=date_key,
|
||||
zone_key=zone_key,
|
||||
bedroom_type=record.bedroom_type.value,
|
||||
universe=record.universe,
|
||||
avg_rent=record.average_rent,
|
||||
median_rent=record.median_rent,
|
||||
vacancy_rate=record.vacancy_rate,
|
||||
availability_rate=record.availability_rate,
|
||||
turnover_rate=record.turnover_rate,
|
||||
rent_change_pct=record.rent_change_pct,
|
||||
reliability_code=record.average_rent_reliability.value
|
||||
if record.average_rent_reliability
|
||||
else None,
|
||||
)
|
||||
records.append(fact)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, FactRentals, records, ["date_key", "zone_key", "bedroom_type"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_cmhc_record(
|
||||
record: CMHCRentalRecord,
|
||||
survey_year: int,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load a single CMHC record into fact_rentals.
|
||||
|
||||
Args:
|
||||
record: Single validated CMHC rental record.
|
||||
survey_year: Year of the survey.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded (0 or 1).
|
||||
"""
|
||||
from datetime import date
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
# Get zone key
|
||||
zone = sess.query(DimCMHCZone).filter_by(zone_code=record.zone_code).first()
|
||||
if not zone:
|
||||
return 0
|
||||
|
||||
survey_date = date(survey_year, 10, 1)
|
||||
date_key = generate_date_key(survey_date)
|
||||
|
||||
# Verify time dimension exists
|
||||
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
|
||||
if not time_dim:
|
||||
raise ValueError(
|
||||
f"Time dimension not found for date_key {date_key}. "
|
||||
"Load time dimension first."
|
||||
)
|
||||
|
||||
fact = FactRentals(
|
||||
date_key=date_key,
|
||||
zone_key=zone.zone_key,
|
||||
bedroom_type=record.bedroom_type.value,
|
||||
universe=record.universe,
|
||||
avg_rent=record.average_rent,
|
||||
median_rent=record.median_rent,
|
||||
vacancy_rate=record.vacancy_rate,
|
||||
availability_rate=record.availability_rate,
|
||||
turnover_rate=record.turnover_rate,
|
||||
rent_change_pct=record.rent_change_pct,
|
||||
reliability_code=record.average_rent_reliability.value
|
||||
if record.average_rent_reliability
|
||||
else None,
|
||||
)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, FactRentals, [fact], ["date_key", "zone_key", "bedroom_type"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
251
portfolio_app/toronto/loaders/dimensions.py
Normal file
251
portfolio_app/toronto/loaders/dimensions.py
Normal file
@@ -0,0 +1,251 @@
|
||||
"""Loaders for dimension tables."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from portfolio_app.toronto.models import (
|
||||
DimCMHCZone,
|
||||
DimNeighbourhood,
|
||||
DimPolicyEvent,
|
||||
DimTime,
|
||||
DimTRREBDistrict,
|
||||
)
|
||||
from portfolio_app.toronto.schemas import (
|
||||
CMHCZone,
|
||||
Neighbourhood,
|
||||
PolicyEvent,
|
||||
TRREBDistrict,
|
||||
)
|
||||
|
||||
from .base import get_session, upsert_by_key
|
||||
|
||||
|
||||
def generate_date_key(d: date) -> int:
|
||||
"""Generate integer date key from date (YYYYMMDD format).
|
||||
|
||||
Args:
|
||||
d: Date to convert.
|
||||
|
||||
Returns:
|
||||
Integer in YYYYMMDD format.
|
||||
"""
|
||||
return d.year * 10000 + d.month * 100 + d.day
|
||||
|
||||
|
||||
def load_time_dimension(
|
||||
start_date: date,
|
||||
end_date: date,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load time dimension with date range.
|
||||
|
||||
Args:
|
||||
start_date: Start of date range.
|
||||
end_date: End of date range (inclusive).
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
month_names = [
|
||||
"",
|
||||
"January",
|
||||
"February",
|
||||
"March",
|
||||
"April",
|
||||
"May",
|
||||
"June",
|
||||
"July",
|
||||
"August",
|
||||
"September",
|
||||
"October",
|
||||
"November",
|
||||
"December",
|
||||
]
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
records = []
|
||||
current = start_date.replace(day=1) # Start at month beginning
|
||||
|
||||
while current <= end_date:
|
||||
quarter = (current.month - 1) // 3 + 1
|
||||
dim = DimTime(
|
||||
date_key=generate_date_key(current),
|
||||
full_date=current,
|
||||
year=current.year,
|
||||
month=current.month,
|
||||
quarter=quarter,
|
||||
month_name=month_names[current.month],
|
||||
is_month_start=True,
|
||||
)
|
||||
records.append(dim)
|
||||
|
||||
# Move to next month
|
||||
if current.month == 12:
|
||||
current = current.replace(year=current.year + 1, month=1)
|
||||
else:
|
||||
current = current.replace(month=current.month + 1)
|
||||
|
||||
inserted, updated = upsert_by_key(sess, DimTime, records, ["date_key"])
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_trreb_districts(
|
||||
districts: list[TRREBDistrict],
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load TRREB district dimension.
|
||||
|
||||
Args:
|
||||
districts: List of validated district schemas.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
records = []
|
||||
for d in districts:
|
||||
dim = DimTRREBDistrict(
|
||||
district_code=d.district_code,
|
||||
district_name=d.district_name,
|
||||
area_type=d.area_type.value,
|
||||
geometry=d.geometry_wkt,
|
||||
)
|
||||
records.append(dim)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, DimTRREBDistrict, records, ["district_code"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_cmhc_zones(
|
||||
zones: list[CMHCZone],
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load CMHC zone dimension.
|
||||
|
||||
Args:
|
||||
zones: List of validated zone schemas.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
records = []
|
||||
for z in zones:
|
||||
dim = DimCMHCZone(
|
||||
zone_code=z.zone_code,
|
||||
zone_name=z.zone_name,
|
||||
geometry=z.geometry_wkt,
|
||||
)
|
||||
records.append(dim)
|
||||
|
||||
inserted, updated = upsert_by_key(sess, DimCMHCZone, records, ["zone_code"])
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_neighbourhoods(
|
||||
neighbourhoods: list[Neighbourhood],
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load neighbourhood dimension.
|
||||
|
||||
Args:
|
||||
neighbourhoods: List of validated neighbourhood schemas.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
records = []
|
||||
for n in neighbourhoods:
|
||||
dim = DimNeighbourhood(
|
||||
neighbourhood_id=n.neighbourhood_id,
|
||||
name=n.name,
|
||||
geometry=n.geometry_wkt,
|
||||
population=n.population,
|
||||
land_area_sqkm=n.land_area_sqkm,
|
||||
pop_density_per_sqkm=n.pop_density_per_sqkm,
|
||||
pct_bachelors_or_higher=n.pct_bachelors_or_higher,
|
||||
median_household_income=n.median_household_income,
|
||||
pct_owner_occupied=n.pct_owner_occupied,
|
||||
pct_renter_occupied=n.pct_renter_occupied,
|
||||
census_year=n.census_year,
|
||||
)
|
||||
records.append(dim)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, DimNeighbourhood, records, ["neighbourhood_id"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_policy_events(
|
||||
events: list[PolicyEvent],
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load policy event dimension.
|
||||
|
||||
Args:
|
||||
events: List of validated policy event schemas.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
records = []
|
||||
for e in events:
|
||||
dim = DimPolicyEvent(
|
||||
event_date=e.event_date,
|
||||
effective_date=e.effective_date,
|
||||
level=e.level.value,
|
||||
category=e.category.value,
|
||||
title=e.title,
|
||||
description=e.description,
|
||||
expected_direction=e.expected_direction.value,
|
||||
source_url=e.source_url,
|
||||
confidence=e.confidence.value,
|
||||
)
|
||||
records.append(dim)
|
||||
|
||||
# For policy events, use event_date + title as unique key
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, DimPolicyEvent, records, ["event_date", "title"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
129
portfolio_app/toronto/loaders/trreb.py
Normal file
129
portfolio_app/toronto/loaders/trreb.py
Normal file
@@ -0,0 +1,129 @@
|
||||
"""Loader for TRREB purchase data into fact_purchases."""
|
||||
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from portfolio_app.toronto.models import DimTime, DimTRREBDistrict, FactPurchases
|
||||
from portfolio_app.toronto.schemas import TRREBMonthlyRecord, TRREBMonthlyReport
|
||||
|
||||
from .base import get_session, upsert_by_key
|
||||
from .dimensions import generate_date_key
|
||||
|
||||
|
||||
def load_trreb_purchases(
|
||||
report: TRREBMonthlyReport,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load TRREB monthly report data into fact_purchases.
|
||||
|
||||
Args:
|
||||
report: Validated TRREB monthly report containing records.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
# Get district key mapping
|
||||
districts = sess.query(DimTRREBDistrict).all()
|
||||
district_map = {d.district_code: d.district_key for d in districts}
|
||||
|
||||
# Build date key from report date
|
||||
date_key = generate_date_key(report.report_date)
|
||||
|
||||
# Verify time dimension exists
|
||||
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
|
||||
if not time_dim:
|
||||
raise ValueError(
|
||||
f"Time dimension not found for date_key {date_key}. "
|
||||
"Load time dimension first."
|
||||
)
|
||||
|
||||
records = []
|
||||
for record in report.records:
|
||||
district_key = district_map.get(record.area_code)
|
||||
if not district_key:
|
||||
# Skip records for unknown districts (e.g., aggregate rows)
|
||||
continue
|
||||
|
||||
fact = FactPurchases(
|
||||
date_key=date_key,
|
||||
district_key=district_key,
|
||||
sales_count=record.sales,
|
||||
dollar_volume=record.dollar_volume,
|
||||
avg_price=record.avg_price,
|
||||
median_price=record.median_price,
|
||||
new_listings=record.new_listings,
|
||||
active_listings=record.active_listings,
|
||||
avg_dom=record.avg_dom,
|
||||
avg_sp_lp=record.avg_sp_lp,
|
||||
)
|
||||
records.append(fact)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, FactPurchases, records, ["date_key", "district_key"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_trreb_record(
|
||||
record: TRREBMonthlyRecord,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load a single TRREB record into fact_purchases.
|
||||
|
||||
Args:
|
||||
record: Single validated TRREB monthly record.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded (0 or 1).
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
# Get district key
|
||||
district = (
|
||||
sess.query(DimTRREBDistrict)
|
||||
.filter_by(district_code=record.area_code)
|
||||
.first()
|
||||
)
|
||||
if not district:
|
||||
return 0
|
||||
|
||||
date_key = generate_date_key(record.report_date)
|
||||
|
||||
# Verify time dimension exists
|
||||
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
|
||||
if not time_dim:
|
||||
raise ValueError(
|
||||
f"Time dimension not found for date_key {date_key}. "
|
||||
"Load time dimension first."
|
||||
)
|
||||
|
||||
fact = FactPurchases(
|
||||
date_key=date_key,
|
||||
district_key=district.district_key,
|
||||
sales_count=record.sales,
|
||||
dollar_volume=record.dollar_volume,
|
||||
avg_price=record.avg_price,
|
||||
median_price=record.median_price,
|
||||
new_listings=record.new_listings,
|
||||
active_listings=record.active_listings,
|
||||
avg_dom=record.avg_dom,
|
||||
avg_sp_lp=record.avg_sp_lp,
|
||||
)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, FactPurchases, [fact], ["date_key", "district_key"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
@@ -1 +1,28 @@
|
||||
"""SQLAlchemy models for Toronto housing data."""
|
||||
|
||||
from .base import Base, create_tables, get_engine, get_session_factory
|
||||
from .dimensions import (
|
||||
DimCMHCZone,
|
||||
DimNeighbourhood,
|
||||
DimPolicyEvent,
|
||||
DimTime,
|
||||
DimTRREBDistrict,
|
||||
)
|
||||
from .facts import FactPurchases, FactRentals
|
||||
|
||||
__all__ = [
|
||||
# Base
|
||||
"Base",
|
||||
"get_engine",
|
||||
"get_session_factory",
|
||||
"create_tables",
|
||||
# Dimensions
|
||||
"DimTime",
|
||||
"DimTRREBDistrict",
|
||||
"DimCMHCZone",
|
||||
"DimNeighbourhood",
|
||||
"DimPolicyEvent",
|
||||
# Facts
|
||||
"FactPurchases",
|
||||
"FactRentals",
|
||||
]
|
||||
|
||||
30
portfolio_app/toronto/models/base.py
Normal file
30
portfolio_app/toronto/models/base.py
Normal file
@@ -0,0 +1,30 @@
|
||||
"""SQLAlchemy base configuration and engine setup."""
|
||||
|
||||
from sqlalchemy import Engine, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Session, sessionmaker
|
||||
|
||||
from portfolio_app.config import get_settings
|
||||
|
||||
|
||||
class Base(DeclarativeBase): # type: ignore[misc]
|
||||
"""Base class for all SQLAlchemy models."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
def get_engine() -> Engine:
|
||||
"""Create database engine from settings."""
|
||||
settings = get_settings()
|
||||
return create_engine(settings.database_url, echo=False)
|
||||
|
||||
|
||||
def get_session_factory() -> sessionmaker[Session]:
|
||||
"""Create session factory."""
|
||||
engine = get_engine()
|
||||
return sessionmaker(bind=engine)
|
||||
|
||||
|
||||
def create_tables() -> None:
|
||||
"""Create all tables in database."""
|
||||
engine = get_engine()
|
||||
Base.metadata.create_all(engine)
|
||||
104
portfolio_app/toronto/models/dimensions.py
Normal file
104
portfolio_app/toronto/models/dimensions.py
Normal file
@@ -0,0 +1,104 @@
|
||||
"""SQLAlchemy models for dimension tables."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from geoalchemy2 import Geometry
|
||||
from sqlalchemy import Boolean, Date, Integer, Numeric, String, Text
|
||||
from sqlalchemy.orm import Mapped, mapped_column
|
||||
|
||||
from .base import Base
|
||||
|
||||
|
||||
class DimTime(Base):
|
||||
"""Time dimension table."""
|
||||
|
||||
__tablename__ = "dim_time"
|
||||
|
||||
date_key: Mapped[int] = mapped_column(Integer, primary_key=True)
|
||||
full_date: Mapped[date] = mapped_column(Date, nullable=False, unique=True)
|
||||
year: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
month: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
quarter: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
month_name: Mapped[str] = mapped_column(String(20), nullable=False)
|
||||
is_month_start: Mapped[bool] = mapped_column(Boolean, default=True)
|
||||
|
||||
|
||||
class DimTRREBDistrict(Base):
|
||||
"""TRREB district dimension table with PostGIS geometry."""
|
||||
|
||||
__tablename__ = "dim_trreb_district"
|
||||
|
||||
district_key: Mapped[int] = mapped_column(
|
||||
Integer, primary_key=True, autoincrement=True
|
||||
)
|
||||
district_code: Mapped[str] = mapped_column(String(3), nullable=False, unique=True)
|
||||
district_name: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
area_type: Mapped[str] = mapped_column(String(10), nullable=False)
|
||||
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
|
||||
|
||||
|
||||
class DimCMHCZone(Base):
|
||||
"""CMHC zone dimension table with PostGIS geometry."""
|
||||
|
||||
__tablename__ = "dim_cmhc_zone"
|
||||
|
||||
zone_key: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
zone_code: Mapped[str] = mapped_column(String(10), nullable=False, unique=True)
|
||||
zone_name: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
|
||||
|
||||
|
||||
class DimNeighbourhood(Base):
|
||||
"""City of Toronto neighbourhood dimension.
|
||||
|
||||
Note: No FK to fact tables in V1 - reference overlay only.
|
||||
"""
|
||||
|
||||
__tablename__ = "dim_neighbourhood"
|
||||
|
||||
neighbourhood_id: Mapped[int] = mapped_column(Integer, primary_key=True)
|
||||
name: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
|
||||
population: Mapped[int | None] = mapped_column(Integer, nullable=True)
|
||||
land_area_sqkm: Mapped[float | None] = mapped_column(Numeric(10, 4), nullable=True)
|
||||
pop_density_per_sqkm: Mapped[float | None] = mapped_column(
|
||||
Numeric(10, 2), nullable=True
|
||||
)
|
||||
pct_bachelors_or_higher: Mapped[float | None] = mapped_column(
|
||||
Numeric(5, 2), nullable=True
|
||||
)
|
||||
median_household_income: Mapped[float | None] = mapped_column(
|
||||
Numeric(12, 2), nullable=True
|
||||
)
|
||||
pct_owner_occupied: Mapped[float | None] = mapped_column(
|
||||
Numeric(5, 2), nullable=True
|
||||
)
|
||||
pct_renter_occupied: Mapped[float | None] = mapped_column(
|
||||
Numeric(5, 2), nullable=True
|
||||
)
|
||||
census_year: Mapped[int] = mapped_column(Integer, default=2021)
|
||||
|
||||
|
||||
class DimPolicyEvent(Base):
|
||||
"""Policy event dimension for time-series annotation."""
|
||||
|
||||
__tablename__ = "dim_policy_event"
|
||||
|
||||
event_id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
event_date: Mapped[date] = mapped_column(Date, nullable=False)
|
||||
effective_date: Mapped[date | None] = mapped_column(Date, nullable=True)
|
||||
level: Mapped[str] = mapped_column(
|
||||
String(20), nullable=False
|
||||
) # federal/provincial/municipal
|
||||
category: Mapped[str] = mapped_column(
|
||||
String(20), nullable=False
|
||||
) # monetary/tax/regulatory/supply/economic
|
||||
title: Mapped[str] = mapped_column(String(200), nullable=False)
|
||||
description: Mapped[str | None] = mapped_column(Text, nullable=True)
|
||||
expected_direction: Mapped[str] = mapped_column(
|
||||
String(10), nullable=False
|
||||
) # bearish/bullish/neutral
|
||||
source_url: Mapped[str | None] = mapped_column(String(500), nullable=True)
|
||||
confidence: Mapped[str] = mapped_column(
|
||||
String(10), default="medium"
|
||||
) # high/medium/low
|
||||
69
portfolio_app/toronto/models/facts.py
Normal file
69
portfolio_app/toronto/models/facts.py
Normal file
@@ -0,0 +1,69 @@
|
||||
"""SQLAlchemy models for fact tables."""
|
||||
|
||||
from sqlalchemy import ForeignKey, Integer, Numeric, String
|
||||
from sqlalchemy.orm import Mapped, mapped_column, relationship
|
||||
|
||||
from .base import Base
|
||||
|
||||
|
||||
class FactPurchases(Base):
|
||||
"""Fact table for TRREB purchase/sales data.
|
||||
|
||||
Grain: One row per district per month.
|
||||
"""
|
||||
|
||||
__tablename__ = "fact_purchases"
|
||||
|
||||
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
date_key: Mapped[int] = mapped_column(
|
||||
Integer, ForeignKey("dim_time.date_key"), nullable=False
|
||||
)
|
||||
district_key: Mapped[int] = mapped_column(
|
||||
Integer, ForeignKey("dim_trreb_district.district_key"), nullable=False
|
||||
)
|
||||
sales_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
dollar_volume: Mapped[float] = mapped_column(Numeric(15, 2), nullable=False)
|
||||
avg_price: Mapped[float] = mapped_column(Numeric(12, 2), nullable=False)
|
||||
median_price: Mapped[float] = mapped_column(Numeric(12, 2), nullable=False)
|
||||
new_listings: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
active_listings: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
avg_dom: Mapped[int] = mapped_column(Integer, nullable=False) # Days on market
|
||||
avg_sp_lp: Mapped[float] = mapped_column(
|
||||
Numeric(5, 2), nullable=False
|
||||
) # Sale/List ratio
|
||||
|
||||
# Relationships
|
||||
time = relationship("DimTime", backref="purchases")
|
||||
district = relationship("DimTRREBDistrict", backref="purchases")
|
||||
|
||||
|
||||
class FactRentals(Base):
|
||||
"""Fact table for CMHC rental market data.
|
||||
|
||||
Grain: One row per zone per bedroom type per survey year.
|
||||
"""
|
||||
|
||||
__tablename__ = "fact_rentals"
|
||||
|
||||
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
date_key: Mapped[int] = mapped_column(
|
||||
Integer, ForeignKey("dim_time.date_key"), nullable=False
|
||||
)
|
||||
zone_key: Mapped[int] = mapped_column(
|
||||
Integer, ForeignKey("dim_cmhc_zone.zone_key"), nullable=False
|
||||
)
|
||||
bedroom_type: Mapped[str] = mapped_column(String(20), nullable=False)
|
||||
universe: Mapped[int | None] = mapped_column(Integer, nullable=True)
|
||||
avg_rent: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
|
||||
median_rent: Mapped[float | None] = mapped_column(Numeric(10, 2), nullable=True)
|
||||
vacancy_rate: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
|
||||
availability_rate: Mapped[float | None] = mapped_column(
|
||||
Numeric(5, 2), nullable=True
|
||||
)
|
||||
turnover_rate: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
|
||||
rent_change_pct: Mapped[float | None] = mapped_column(Numeric(5, 2), nullable=True)
|
||||
reliability_code: Mapped[str | None] = mapped_column(String(2), nullable=True)
|
||||
|
||||
# Relationships
|
||||
time = relationship("DimTime", backref="rentals")
|
||||
zone = relationship("DimCMHCZone", backref="rentals")
|
||||
@@ -1 +1,20 @@
|
||||
"""Data parsers for Toronto housing data sources."""
|
||||
"""Parsers for Toronto housing data sources."""
|
||||
|
||||
from .cmhc import CMHCParser
|
||||
from .geo import (
|
||||
CMHCZoneParser,
|
||||
NeighbourhoodParser,
|
||||
TRREBDistrictParser,
|
||||
load_geojson,
|
||||
)
|
||||
from .trreb import TRREBParser
|
||||
|
||||
__all__ = [
|
||||
"TRREBParser",
|
||||
"CMHCParser",
|
||||
# GeoJSON parsers
|
||||
"CMHCZoneParser",
|
||||
"TRREBDistrictParser",
|
||||
"NeighbourhoodParser",
|
||||
"load_geojson",
|
||||
]
|
||||
|
||||
147
portfolio_app/toronto/parsers/cmhc.py
Normal file
147
portfolio_app/toronto/parsers/cmhc.py
Normal file
@@ -0,0 +1,147 @@
|
||||
"""CMHC CSV processor for rental market survey data.
|
||||
|
||||
This module provides the structure for processing CMHC (Canada Mortgage and Housing
|
||||
Corporation) rental market survey data from CSV exports.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any, cast
|
||||
|
||||
import pandas as pd
|
||||
|
||||
from portfolio_app.toronto.schemas import CMHCAnnualSurvey, CMHCRentalRecord
|
||||
|
||||
|
||||
class CMHCParser:
|
||||
"""Parser for CMHC Rental Market Survey CSV data.
|
||||
|
||||
CMHC conducts annual rental market surveys and publishes data including:
|
||||
- Average and median rents by zone and bedroom type
|
||||
- Vacancy rates
|
||||
- Universe (total rental units)
|
||||
- Year-over-year rent changes
|
||||
|
||||
Data is available via the Housing Market Information Portal as CSV exports.
|
||||
"""
|
||||
|
||||
# Expected columns in CMHC CSV exports
|
||||
REQUIRED_COLUMNS = {
|
||||
"zone_code",
|
||||
"zone_name",
|
||||
"bedroom_type",
|
||||
"survey_year",
|
||||
}
|
||||
|
||||
# Column name mappings from CMHC export format
|
||||
COLUMN_MAPPINGS = {
|
||||
"Zone Code": "zone_code",
|
||||
"Zone Name": "zone_name",
|
||||
"Bedroom Type": "bedroom_type",
|
||||
"Survey Year": "survey_year",
|
||||
"Universe": "universe",
|
||||
"Average Rent ($)": "avg_rent",
|
||||
"Median Rent ($)": "median_rent",
|
||||
"Vacancy Rate (%)": "vacancy_rate",
|
||||
"Availability Rate (%)": "availability_rate",
|
||||
"Turnover Rate (%)": "turnover_rate",
|
||||
"% Change in Rent": "rent_change_pct",
|
||||
"Reliability Code": "reliability_code",
|
||||
}
|
||||
|
||||
def __init__(self, csv_path: Path) -> None:
|
||||
"""Initialize parser with path to CSV file.
|
||||
|
||||
Args:
|
||||
csv_path: Path to the CMHC CSV export file.
|
||||
"""
|
||||
self.csv_path = csv_path
|
||||
self._validate_path()
|
||||
|
||||
def _validate_path(self) -> None:
|
||||
"""Validate that the CSV path exists and is readable."""
|
||||
if not self.csv_path.exists():
|
||||
raise FileNotFoundError(f"CSV not found: {self.csv_path}")
|
||||
if not self.csv_path.suffix.lower() == ".csv":
|
||||
raise ValueError(f"Expected CSV file, got: {self.csv_path.suffix}")
|
||||
|
||||
def parse(self) -> CMHCAnnualSurvey:
|
||||
"""Parse the CSV and return structured data.
|
||||
|
||||
Returns:
|
||||
CMHCAnnualSurvey containing all extracted records.
|
||||
|
||||
Raises:
|
||||
ValueError: If required columns are missing.
|
||||
"""
|
||||
df = self._load_csv()
|
||||
df = self._normalize_columns(df)
|
||||
self._validate_columns(df)
|
||||
records = self._convert_to_records(df)
|
||||
survey_year = self._infer_survey_year(df)
|
||||
|
||||
return CMHCAnnualSurvey(survey_year=survey_year, records=records)
|
||||
|
||||
def _load_csv(self) -> pd.DataFrame:
|
||||
"""Load CSV file into DataFrame.
|
||||
|
||||
Returns:
|
||||
Raw DataFrame from CSV.
|
||||
"""
|
||||
return pd.read_csv(self.csv_path)
|
||||
|
||||
def _normalize_columns(self, df: pd.DataFrame) -> pd.DataFrame:
|
||||
"""Normalize column names to standard format.
|
||||
|
||||
Args:
|
||||
df: DataFrame with original column names.
|
||||
|
||||
Returns:
|
||||
DataFrame with normalized column names.
|
||||
"""
|
||||
rename_map = {k: v for k, v in self.COLUMN_MAPPINGS.items() if k in df.columns}
|
||||
return df.rename(columns=rename_map)
|
||||
|
||||
def _validate_columns(self, df: pd.DataFrame) -> None:
|
||||
"""Validate that all required columns are present.
|
||||
|
||||
Args:
|
||||
df: DataFrame to validate.
|
||||
|
||||
Raises:
|
||||
ValueError: If required columns are missing.
|
||||
"""
|
||||
missing = self.REQUIRED_COLUMNS - set(df.columns)
|
||||
if missing:
|
||||
raise ValueError(f"Missing required columns: {missing}")
|
||||
|
||||
def _convert_to_records(self, df: pd.DataFrame) -> list[CMHCRentalRecord]:
|
||||
"""Convert DataFrame rows to validated schema records.
|
||||
|
||||
Args:
|
||||
df: Normalized DataFrame.
|
||||
|
||||
Returns:
|
||||
List of validated CMHCRentalRecord objects.
|
||||
"""
|
||||
records = []
|
||||
for _, row in df.iterrows():
|
||||
record_data = row.to_dict()
|
||||
# Handle NaN values
|
||||
record_data = {
|
||||
k: (None if pd.isna(v) else v) for k, v in record_data.items()
|
||||
}
|
||||
records.append(CMHCRentalRecord(**cast(dict[str, Any], record_data)))
|
||||
return records
|
||||
|
||||
def _infer_survey_year(self, df: pd.DataFrame) -> int:
|
||||
"""Infer survey year from data.
|
||||
|
||||
Args:
|
||||
df: DataFrame with survey_year column.
|
||||
|
||||
Returns:
|
||||
Survey year as integer.
|
||||
"""
|
||||
if "survey_year" in df.columns:
|
||||
return int(df["survey_year"].iloc[0])
|
||||
raise ValueError("Cannot infer survey year from data.")
|
||||
463
portfolio_app/toronto/parsers/geo.py
Normal file
463
portfolio_app/toronto/parsers/geo.py
Normal file
@@ -0,0 +1,463 @@
|
||||
"""GeoJSON parser for geographic boundary files.
|
||||
|
||||
This module provides parsers for loading geographic boundary files
|
||||
(GeoJSON format) and converting them to Pydantic schemas for database
|
||||
loading or direct use in Plotly choropleth maps.
|
||||
"""
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from pyproj import Transformer
|
||||
from shapely.geometry import mapping, shape
|
||||
from shapely.ops import transform
|
||||
|
||||
from portfolio_app.toronto.schemas import CMHCZone, Neighbourhood, TRREBDistrict
|
||||
from portfolio_app.toronto.schemas.dimensions import AreaType
|
||||
|
||||
# Transformer for reprojecting from Web Mercator to WGS84
|
||||
_TRANSFORMER_3857_TO_4326 = Transformer.from_crs(
|
||||
"EPSG:3857", "EPSG:4326", always_xy=True
|
||||
)
|
||||
|
||||
|
||||
def load_geojson(path: Path) -> dict[str, Any]:
|
||||
"""Load a GeoJSON file and return as dictionary.
|
||||
|
||||
Args:
|
||||
path: Path to the GeoJSON file.
|
||||
|
||||
Returns:
|
||||
GeoJSON as dictionary (FeatureCollection).
|
||||
|
||||
Raises:
|
||||
FileNotFoundError: If file does not exist.
|
||||
ValueError: If file is not valid GeoJSON.
|
||||
"""
|
||||
if not path.exists():
|
||||
raise FileNotFoundError(f"GeoJSON file not found: {path}")
|
||||
|
||||
if path.suffix.lower() not in (".geojson", ".json"):
|
||||
raise ValueError(f"Expected GeoJSON file, got: {path.suffix}")
|
||||
|
||||
with open(path, encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
|
||||
if data.get("type") != "FeatureCollection":
|
||||
raise ValueError("GeoJSON must be a FeatureCollection")
|
||||
|
||||
return dict(data)
|
||||
|
||||
|
||||
def geometry_to_wkt(geometry: dict[str, Any]) -> str:
|
||||
"""Convert GeoJSON geometry to WKT string.
|
||||
|
||||
Args:
|
||||
geometry: GeoJSON geometry dictionary.
|
||||
|
||||
Returns:
|
||||
WKT representation of the geometry.
|
||||
"""
|
||||
return str(shape(geometry).wkt)
|
||||
|
||||
|
||||
def reproject_geometry(
|
||||
geometry: dict[str, Any], source_crs: str = "EPSG:3857"
|
||||
) -> dict[str, Any]:
|
||||
"""Reproject a GeoJSON geometry to WGS84 (EPSG:4326).
|
||||
|
||||
Args:
|
||||
geometry: GeoJSON geometry dictionary.
|
||||
source_crs: Source CRS (default EPSG:3857 Web Mercator).
|
||||
|
||||
Returns:
|
||||
GeoJSON geometry in WGS84 coordinates.
|
||||
"""
|
||||
if source_crs == "EPSG:3857":
|
||||
transformer = _TRANSFORMER_3857_TO_4326
|
||||
else:
|
||||
transformer = Transformer.from_crs(source_crs, "EPSG:4326", always_xy=True)
|
||||
|
||||
geom = shape(geometry)
|
||||
reprojected = transform(transformer.transform, geom)
|
||||
return dict(mapping(reprojected))
|
||||
|
||||
|
||||
class CMHCZoneParser:
|
||||
"""Parser for CMHC zone boundary GeoJSON files.
|
||||
|
||||
CMHC zone boundaries are extracted from the R `cmhc` package using
|
||||
`get_cmhc_geography(geography_type="ZONE", cma="Toronto")`.
|
||||
|
||||
Expected GeoJSON properties:
|
||||
- zone_code or Zone_Code: Zone identifier
|
||||
- zone_name or Zone_Name: Zone name
|
||||
"""
|
||||
|
||||
# Property name mappings for different GeoJSON formats
|
||||
CODE_PROPERTIES = ["zone_code", "Zone_Code", "ZONE_CODE", "zonecode", "code"]
|
||||
NAME_PROPERTIES = [
|
||||
"zone_name",
|
||||
"Zone_Name",
|
||||
"ZONE_NAME",
|
||||
"ZONE_NAME_EN",
|
||||
"NAME_EN",
|
||||
"zonename",
|
||||
"name",
|
||||
"NAME",
|
||||
]
|
||||
|
||||
def __init__(self, geojson_path: Path) -> None:
|
||||
"""Initialize parser with path to GeoJSON file.
|
||||
|
||||
Args:
|
||||
geojson_path: Path to the CMHC zones GeoJSON file.
|
||||
"""
|
||||
self.geojson_path = geojson_path
|
||||
self._geojson: dict[str, Any] | None = None
|
||||
|
||||
@property
|
||||
def geojson(self) -> dict[str, Any]:
|
||||
"""Lazy-load and return raw GeoJSON data."""
|
||||
if self._geojson is None:
|
||||
self._geojson = load_geojson(self.geojson_path)
|
||||
return self._geojson
|
||||
|
||||
def _find_property(
|
||||
self, properties: dict[str, Any], candidates: list[str]
|
||||
) -> str | None:
|
||||
"""Find a property value by checking multiple candidate names."""
|
||||
for name in candidates:
|
||||
if name in properties and properties[name] is not None:
|
||||
return str(properties[name])
|
||||
return None
|
||||
|
||||
def parse(self) -> list[CMHCZone]:
|
||||
"""Parse GeoJSON and return list of CMHCZone schemas.
|
||||
|
||||
Returns:
|
||||
List of validated CMHCZone objects.
|
||||
|
||||
Raises:
|
||||
ValueError: If required properties are missing.
|
||||
"""
|
||||
zones = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
geom = feature.get("geometry")
|
||||
|
||||
zone_code = self._find_property(props, self.CODE_PROPERTIES)
|
||||
zone_name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
|
||||
if not zone_code:
|
||||
raise ValueError(
|
||||
f"Zone code not found in properties: {list(props.keys())}"
|
||||
)
|
||||
if not zone_name:
|
||||
zone_name = zone_code # Fallback to code if name missing
|
||||
|
||||
geometry_wkt = geometry_to_wkt(geom) if geom else None
|
||||
|
||||
zones.append(
|
||||
CMHCZone(
|
||||
zone_code=zone_code,
|
||||
zone_name=zone_name,
|
||||
geometry_wkt=geometry_wkt,
|
||||
)
|
||||
)
|
||||
|
||||
return zones
|
||||
|
||||
def _needs_reprojection(self) -> bool:
|
||||
"""Check if GeoJSON needs reprojection to WGS84."""
|
||||
crs = self.geojson.get("crs", {})
|
||||
crs_name = crs.get("properties", {}).get("name", "")
|
||||
# EPSG:3857 or Web Mercator needs reprojection
|
||||
return "3857" in crs_name or "900913" in crs_name
|
||||
|
||||
def get_geojson_for_choropleth(
|
||||
self, key_property: str = "zone_code"
|
||||
) -> dict[str, Any]:
|
||||
"""Get GeoJSON formatted for Plotly choropleth maps.
|
||||
|
||||
Ensures the feature properties include a standardized key for
|
||||
joining with data. Automatically reprojects from EPSG:3857 to
|
||||
WGS84 if needed.
|
||||
|
||||
Args:
|
||||
key_property: Property name to use as feature identifier.
|
||||
|
||||
Returns:
|
||||
GeoJSON FeatureCollection with standardized properties in WGS84.
|
||||
"""
|
||||
needs_reproject = self._needs_reprojection()
|
||||
features = []
|
||||
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
new_props = dict(props)
|
||||
|
||||
# Ensure standardized property names exist
|
||||
zone_code = self._find_property(props, self.CODE_PROPERTIES)
|
||||
zone_name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
|
||||
new_props["zone_code"] = zone_code
|
||||
new_props["zone_name"] = zone_name or zone_code
|
||||
|
||||
# Reproject geometry if needed
|
||||
geometry = feature.get("geometry")
|
||||
if needs_reproject and geometry:
|
||||
geometry = reproject_geometry(geometry)
|
||||
|
||||
features.append(
|
||||
{
|
||||
"type": "Feature",
|
||||
"properties": new_props,
|
||||
"geometry": geometry,
|
||||
}
|
||||
)
|
||||
|
||||
return {"type": "FeatureCollection", "features": features}
|
||||
|
||||
|
||||
class TRREBDistrictParser:
|
||||
"""Parser for TRREB district boundary GeoJSON files.
|
||||
|
||||
TRREB district boundaries are manually digitized from the TRREB PDF map
|
||||
using QGIS.
|
||||
|
||||
Expected GeoJSON properties:
|
||||
- district_code: District code (W01, C01, E01, etc.)
|
||||
- district_name: District name
|
||||
- area_type: West, Central, East, or North
|
||||
"""
|
||||
|
||||
CODE_PROPERTIES = [
|
||||
"district_code",
|
||||
"District_Code",
|
||||
"DISTRICT_CODE",
|
||||
"districtcode",
|
||||
"code",
|
||||
]
|
||||
NAME_PROPERTIES = [
|
||||
"district_name",
|
||||
"District_Name",
|
||||
"DISTRICT_NAME",
|
||||
"districtname",
|
||||
"name",
|
||||
"NAME",
|
||||
]
|
||||
AREA_PROPERTIES = [
|
||||
"area_type",
|
||||
"Area_Type",
|
||||
"AREA_TYPE",
|
||||
"areatype",
|
||||
"area",
|
||||
"type",
|
||||
]
|
||||
|
||||
def __init__(self, geojson_path: Path) -> None:
|
||||
"""Initialize parser with path to GeoJSON file."""
|
||||
self.geojson_path = geojson_path
|
||||
self._geojson: dict[str, Any] | None = None
|
||||
|
||||
@property
|
||||
def geojson(self) -> dict[str, Any]:
|
||||
"""Lazy-load and return raw GeoJSON data."""
|
||||
if self._geojson is None:
|
||||
self._geojson = load_geojson(self.geojson_path)
|
||||
return self._geojson
|
||||
|
||||
def _find_property(
|
||||
self, properties: dict[str, Any], candidates: list[str]
|
||||
) -> str | None:
|
||||
"""Find a property value by checking multiple candidate names."""
|
||||
for name in candidates:
|
||||
if name in properties and properties[name] is not None:
|
||||
return str(properties[name])
|
||||
return None
|
||||
|
||||
def _infer_area_type(self, district_code: str) -> AreaType:
|
||||
"""Infer area type from district code prefix."""
|
||||
prefix = district_code[0].upper()
|
||||
mapping = {"W": AreaType.WEST, "C": AreaType.CENTRAL, "E": AreaType.EAST}
|
||||
return mapping.get(prefix, AreaType.NORTH)
|
||||
|
||||
def parse(self) -> list[TRREBDistrict]:
|
||||
"""Parse GeoJSON and return list of TRREBDistrict schemas."""
|
||||
districts = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
geom = feature.get("geometry")
|
||||
|
||||
district_code = self._find_property(props, self.CODE_PROPERTIES)
|
||||
district_name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
area_type_str = self._find_property(props, self.AREA_PROPERTIES)
|
||||
|
||||
if not district_code:
|
||||
raise ValueError(
|
||||
f"District code not found in properties: {list(props.keys())}"
|
||||
)
|
||||
if not district_name:
|
||||
district_name = district_code
|
||||
|
||||
# Infer or parse area type
|
||||
if area_type_str:
|
||||
try:
|
||||
area_type = AreaType(area_type_str)
|
||||
except ValueError:
|
||||
area_type = self._infer_area_type(district_code)
|
||||
else:
|
||||
area_type = self._infer_area_type(district_code)
|
||||
|
||||
geometry_wkt = geometry_to_wkt(geom) if geom else None
|
||||
|
||||
districts.append(
|
||||
TRREBDistrict(
|
||||
district_code=district_code,
|
||||
district_name=district_name,
|
||||
area_type=area_type,
|
||||
geometry_wkt=geometry_wkt,
|
||||
)
|
||||
)
|
||||
|
||||
return districts
|
||||
|
||||
def get_geojson_for_choropleth(
|
||||
self, key_property: str = "district_code"
|
||||
) -> dict[str, Any]:
|
||||
"""Get GeoJSON formatted for Plotly choropleth maps."""
|
||||
features = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
new_props = dict(props)
|
||||
|
||||
district_code = self._find_property(props, self.CODE_PROPERTIES)
|
||||
district_name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
|
||||
new_props["district_code"] = district_code
|
||||
new_props["district_name"] = district_name or district_code
|
||||
|
||||
features.append(
|
||||
{
|
||||
"type": "Feature",
|
||||
"properties": new_props,
|
||||
"geometry": feature.get("geometry"),
|
||||
}
|
||||
)
|
||||
|
||||
return {"type": "FeatureCollection", "features": features}
|
||||
|
||||
|
||||
class NeighbourhoodParser:
|
||||
"""Parser for City of Toronto neighbourhood boundary GeoJSON files.
|
||||
|
||||
Neighbourhood boundaries are from the City of Toronto Open Data portal.
|
||||
|
||||
Expected GeoJSON properties:
|
||||
- neighbourhood_id or AREA_ID: Neighbourhood ID (1-158)
|
||||
- name or AREA_NAME: Neighbourhood name
|
||||
"""
|
||||
|
||||
ID_PROPERTIES = [
|
||||
"neighbourhood_id",
|
||||
"AREA_SHORT_CODE", # City of Toronto 158 neighbourhoods
|
||||
"AREA_LONG_CODE",
|
||||
"AREA_ID",
|
||||
"area_id",
|
||||
"id",
|
||||
"ID",
|
||||
"HOOD_ID",
|
||||
]
|
||||
NAME_PROPERTIES = [
|
||||
"AREA_NAME", # City of Toronto 158 neighbourhoods
|
||||
"name",
|
||||
"NAME",
|
||||
"area_name",
|
||||
"neighbourhood_name",
|
||||
]
|
||||
|
||||
def __init__(self, geojson_path: Path) -> None:
|
||||
"""Initialize parser with path to GeoJSON file."""
|
||||
self.geojson_path = geojson_path
|
||||
self._geojson: dict[str, Any] | None = None
|
||||
|
||||
@property
|
||||
def geojson(self) -> dict[str, Any]:
|
||||
"""Lazy-load and return raw GeoJSON data."""
|
||||
if self._geojson is None:
|
||||
self._geojson = load_geojson(self.geojson_path)
|
||||
return self._geojson
|
||||
|
||||
def _find_property(
|
||||
self, properties: dict[str, Any], candidates: list[str]
|
||||
) -> str | None:
|
||||
"""Find a property value by checking multiple candidate names."""
|
||||
for name in candidates:
|
||||
if name in properties and properties[name] is not None:
|
||||
return str(properties[name])
|
||||
return None
|
||||
|
||||
def parse(self) -> list[Neighbourhood]:
|
||||
"""Parse GeoJSON and return list of Neighbourhood schemas.
|
||||
|
||||
Note: This parser only extracts ID, name, and geometry.
|
||||
Census enrichment data (population, income, etc.) should be
|
||||
loaded separately and merged.
|
||||
"""
|
||||
neighbourhoods = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
geom = feature.get("geometry")
|
||||
|
||||
neighbourhood_id_str = self._find_property(props, self.ID_PROPERTIES)
|
||||
name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
|
||||
if not neighbourhood_id_str:
|
||||
raise ValueError(
|
||||
f"Neighbourhood ID not found in properties: {list(props.keys())}"
|
||||
)
|
||||
|
||||
neighbourhood_id = int(neighbourhood_id_str)
|
||||
if not name:
|
||||
name = f"Neighbourhood {neighbourhood_id}"
|
||||
|
||||
geometry_wkt = geometry_to_wkt(geom) if geom else None
|
||||
|
||||
neighbourhoods.append(
|
||||
Neighbourhood(
|
||||
neighbourhood_id=neighbourhood_id,
|
||||
name=name,
|
||||
geometry_wkt=geometry_wkt,
|
||||
)
|
||||
)
|
||||
|
||||
return neighbourhoods
|
||||
|
||||
def get_geojson_for_choropleth(
|
||||
self, key_property: str = "neighbourhood_id"
|
||||
) -> dict[str, Any]:
|
||||
"""Get GeoJSON formatted for Plotly choropleth maps."""
|
||||
features = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
new_props = dict(props)
|
||||
|
||||
neighbourhood_id = self._find_property(props, self.ID_PROPERTIES)
|
||||
name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
|
||||
new_props["neighbourhood_id"] = (
|
||||
int(neighbourhood_id) if neighbourhood_id else None
|
||||
)
|
||||
new_props["name"] = name
|
||||
|
||||
features.append(
|
||||
{
|
||||
"type": "Feature",
|
||||
"properties": new_props,
|
||||
"geometry": feature.get("geometry"),
|
||||
}
|
||||
)
|
||||
|
||||
return {"type": "FeatureCollection", "features": features}
|
||||
82
portfolio_app/toronto/parsers/trreb.py
Normal file
82
portfolio_app/toronto/parsers/trreb.py
Normal file
@@ -0,0 +1,82 @@
|
||||
"""TRREB PDF parser for monthly market watch reports.
|
||||
|
||||
This module provides the structure for parsing TRREB (Toronto Regional Real Estate Board)
|
||||
monthly Market Watch PDF reports into structured data.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from portfolio_app.toronto.schemas import TRREBMonthlyRecord, TRREBMonthlyReport
|
||||
|
||||
|
||||
class TRREBParser:
|
||||
"""Parser for TRREB Market Watch PDF reports.
|
||||
|
||||
TRREB publishes monthly Market Watch reports as PDFs containing:
|
||||
- Summary statistics by area (416, 905, Total)
|
||||
- District-level breakdowns
|
||||
- Year-over-year comparisons
|
||||
|
||||
The parser extracts tabular data from these PDFs and validates
|
||||
against the TRREBMonthlyRecord schema.
|
||||
"""
|
||||
|
||||
def __init__(self, pdf_path: Path) -> None:
|
||||
"""Initialize parser with path to PDF file.
|
||||
|
||||
Args:
|
||||
pdf_path: Path to the TRREB Market Watch PDF file.
|
||||
"""
|
||||
self.pdf_path = pdf_path
|
||||
self._validate_path()
|
||||
|
||||
def _validate_path(self) -> None:
|
||||
"""Validate that the PDF path exists and is readable."""
|
||||
if not self.pdf_path.exists():
|
||||
raise FileNotFoundError(f"PDF not found: {self.pdf_path}")
|
||||
if not self.pdf_path.suffix.lower() == ".pdf":
|
||||
raise ValueError(f"Expected PDF file, got: {self.pdf_path.suffix}")
|
||||
|
||||
def parse(self) -> TRREBMonthlyReport:
|
||||
"""Parse the PDF and return structured data.
|
||||
|
||||
Returns:
|
||||
TRREBMonthlyReport containing all extracted records.
|
||||
|
||||
Raises:
|
||||
NotImplementedError: PDF parsing not yet implemented.
|
||||
"""
|
||||
raise NotImplementedError(
|
||||
"PDF parsing requires pdfplumber/tabula-py. "
|
||||
"Implementation pending Sprint 4 data ingestion."
|
||||
)
|
||||
|
||||
def _extract_tables(self) -> list[dict[str, Any]]:
|
||||
"""Extract raw tables from PDF pages.
|
||||
|
||||
Returns:
|
||||
List of dictionaries representing table data.
|
||||
"""
|
||||
raise NotImplementedError("Table extraction not yet implemented.")
|
||||
|
||||
def _parse_district_table(
|
||||
self, table_data: list[dict[str, Any]]
|
||||
) -> list[TRREBMonthlyRecord]:
|
||||
"""Parse district-level statistics table.
|
||||
|
||||
Args:
|
||||
table_data: Raw table data extracted from PDF.
|
||||
|
||||
Returns:
|
||||
List of validated TRREBMonthlyRecord objects.
|
||||
"""
|
||||
raise NotImplementedError("District table parsing not yet implemented.")
|
||||
|
||||
def _infer_report_date(self) -> tuple[int, int]:
|
||||
"""Infer report year and month from PDF filename or content.
|
||||
|
||||
Returns:
|
||||
Tuple of (year, month).
|
||||
"""
|
||||
raise NotImplementedError("Date inference not yet implemented.")
|
||||
@@ -1 +1,39 @@
|
||||
"""Pydantic schemas for Toronto housing data validation."""
|
||||
|
||||
from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
|
||||
from .dimensions import (
|
||||
AreaType,
|
||||
CMHCZone,
|
||||
Confidence,
|
||||
ExpectedDirection,
|
||||
Neighbourhood,
|
||||
PolicyCategory,
|
||||
PolicyEvent,
|
||||
PolicyLevel,
|
||||
TimeDimension,
|
||||
TRREBDistrict,
|
||||
)
|
||||
from .trreb import TRREBMonthlyRecord, TRREBMonthlyReport
|
||||
|
||||
__all__ = [
|
||||
# TRREB
|
||||
"TRREBMonthlyRecord",
|
||||
"TRREBMonthlyReport",
|
||||
# CMHC
|
||||
"CMHCRentalRecord",
|
||||
"CMHCAnnualSurvey",
|
||||
"BedroomType",
|
||||
"ReliabilityCode",
|
||||
# Dimensions
|
||||
"TimeDimension",
|
||||
"TRREBDistrict",
|
||||
"CMHCZone",
|
||||
"Neighbourhood",
|
||||
"PolicyEvent",
|
||||
# Enums
|
||||
"AreaType",
|
||||
"PolicyLevel",
|
||||
"PolicyCategory",
|
||||
"ExpectedDirection",
|
||||
"Confidence",
|
||||
]
|
||||
|
||||
81
portfolio_app/toronto/schemas/cmhc.py
Normal file
81
portfolio_app/toronto/schemas/cmhc.py
Normal file
@@ -0,0 +1,81 @@
|
||||
"""Pydantic schemas for CMHC rental market data."""
|
||||
|
||||
from decimal import Decimal
|
||||
from enum import Enum
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class BedroomType(str, Enum):
|
||||
"""CMHC bedroom type categories."""
|
||||
|
||||
BACHELOR = "Bachelor"
|
||||
ONE_BED = "1 Bedroom"
|
||||
TWO_BED = "2 Bedroom"
|
||||
THREE_BED_PLUS = "3 Bedroom+"
|
||||
TOTAL = "Total"
|
||||
|
||||
|
||||
class ReliabilityCode(str, Enum):
|
||||
"""CMHC data reliability codes.
|
||||
|
||||
Based on coefficient of variation (CV).
|
||||
"""
|
||||
|
||||
EXCELLENT = "a" # CV <= 2.5%
|
||||
GOOD = "b" # 2.5% < CV <= 5%
|
||||
FAIR = "c" # 5% < CV <= 10%
|
||||
POOR = "d" # CV > 10%
|
||||
SUPPRESSED = "**" # Sample too small
|
||||
|
||||
|
||||
class CMHCRentalRecord(BaseModel):
|
||||
"""Schema for a single CMHC rental survey record.
|
||||
|
||||
Represents rental data for one zone and bedroom type in one survey year.
|
||||
"""
|
||||
|
||||
survey_year: int = Field(ge=1990, description="Survey year (October snapshot)")
|
||||
zone_code: str = Field(max_length=10, description="CMHC zone identifier")
|
||||
zone_name: str = Field(max_length=100, description="Zone name")
|
||||
bedroom_type: BedroomType = Field(description="Bedroom category")
|
||||
universe: int | None = Field(
|
||||
default=None, ge=0, description="Total rental units in zone"
|
||||
)
|
||||
vacancy_rate: Decimal | None = Field(
|
||||
default=None, ge=0, le=100, description="Vacancy rate (%)"
|
||||
)
|
||||
vacancy_rate_reliability: ReliabilityCode | None = Field(default=None)
|
||||
availability_rate: Decimal | None = Field(
|
||||
default=None, ge=0, le=100, description="Availability rate (%)"
|
||||
)
|
||||
average_rent: Decimal | None = Field(
|
||||
default=None, ge=0, description="Average monthly rent ($)"
|
||||
)
|
||||
average_rent_reliability: ReliabilityCode | None = Field(default=None)
|
||||
median_rent: Decimal | None = Field(
|
||||
default=None, ge=0, description="Median monthly rent ($)"
|
||||
)
|
||||
rent_change_pct: Decimal | None = Field(
|
||||
default=None, description="YoY rent change (%)"
|
||||
)
|
||||
turnover_rate: Decimal | None = Field(
|
||||
default=None, ge=0, le=100, description="Unit turnover rate (%)"
|
||||
)
|
||||
|
||||
model_config = {"str_strip_whitespace": True}
|
||||
|
||||
|
||||
class CMHCAnnualSurvey(BaseModel):
|
||||
"""Schema for a complete CMHC annual survey for Toronto.
|
||||
|
||||
Contains all zone and bedroom type combinations for one survey year.
|
||||
"""
|
||||
|
||||
survey_year: int
|
||||
records: list[CMHCRentalRecord]
|
||||
|
||||
@property
|
||||
def zone_count(self) -> int:
|
||||
"""Number of unique zones in survey."""
|
||||
return len({r.zone_code for r in self.records})
|
||||
121
portfolio_app/toronto/schemas/dimensions.py
Normal file
121
portfolio_app/toronto/schemas/dimensions.py
Normal file
@@ -0,0 +1,121 @@
|
||||
"""Pydantic schemas for dimension tables."""
|
||||
|
||||
from datetime import date
|
||||
from decimal import Decimal
|
||||
from enum import Enum
|
||||
|
||||
from pydantic import BaseModel, Field, HttpUrl
|
||||
|
||||
|
||||
class PolicyLevel(str, Enum):
|
||||
"""Government level for policy events."""
|
||||
|
||||
FEDERAL = "federal"
|
||||
PROVINCIAL = "provincial"
|
||||
MUNICIPAL = "municipal"
|
||||
|
||||
|
||||
class PolicyCategory(str, Enum):
|
||||
"""Policy event category."""
|
||||
|
||||
MONETARY = "monetary"
|
||||
TAX = "tax"
|
||||
REGULATORY = "regulatory"
|
||||
SUPPLY = "supply"
|
||||
ECONOMIC = "economic"
|
||||
|
||||
|
||||
class ExpectedDirection(str, Enum):
|
||||
"""Expected price impact direction."""
|
||||
|
||||
BULLISH = "bullish" # Expected to increase prices
|
||||
BEARISH = "bearish" # Expected to decrease prices
|
||||
NEUTRAL = "neutral" # Uncertain or mixed impact
|
||||
|
||||
|
||||
class Confidence(str, Enum):
|
||||
"""Confidence level in policy event data."""
|
||||
|
||||
HIGH = "high"
|
||||
MEDIUM = "medium"
|
||||
LOW = "low"
|
||||
|
||||
|
||||
class AreaType(str, Enum):
|
||||
"""TRREB area type."""
|
||||
|
||||
WEST = "West"
|
||||
CENTRAL = "Central"
|
||||
EAST = "East"
|
||||
NORTH = "North"
|
||||
|
||||
|
||||
class TimeDimension(BaseModel):
|
||||
"""Schema for time dimension record."""
|
||||
|
||||
date_key: int = Field(description="Date key in YYYYMMDD format")
|
||||
full_date: date
|
||||
year: int = Field(ge=2000, le=2100)
|
||||
month: int = Field(ge=1, le=12)
|
||||
quarter: int = Field(ge=1, le=4)
|
||||
month_name: str = Field(max_length=20)
|
||||
is_month_start: bool = True
|
||||
|
||||
|
||||
class TRREBDistrict(BaseModel):
|
||||
"""Schema for TRREB district dimension."""
|
||||
|
||||
district_code: str = Field(max_length=3, description="W01, C01, E01, etc.")
|
||||
district_name: str = Field(max_length=100)
|
||||
area_type: AreaType
|
||||
geometry_wkt: str | None = Field(default=None, description="WKT geometry string")
|
||||
|
||||
|
||||
class CMHCZone(BaseModel):
|
||||
"""Schema for CMHC zone dimension."""
|
||||
|
||||
zone_code: str = Field(max_length=10)
|
||||
zone_name: str = Field(max_length=100)
|
||||
geometry_wkt: str | None = Field(default=None, description="WKT geometry string")
|
||||
|
||||
|
||||
class Neighbourhood(BaseModel):
|
||||
"""Schema for City of Toronto neighbourhood dimension.
|
||||
|
||||
Note: No FK to fact tables in V1 - reference overlay only.
|
||||
"""
|
||||
|
||||
neighbourhood_id: int = Field(ge=1, le=200)
|
||||
name: str = Field(max_length=100)
|
||||
geometry_wkt: str | None = Field(default=None)
|
||||
population: int | None = Field(default=None, ge=0)
|
||||
land_area_sqkm: Decimal | None = Field(default=None, ge=0)
|
||||
pop_density_per_sqkm: Decimal | None = Field(default=None, ge=0)
|
||||
pct_bachelors_or_higher: Decimal | None = Field(default=None, ge=0, le=100)
|
||||
median_household_income: Decimal | None = Field(default=None, ge=0)
|
||||
pct_owner_occupied: Decimal | None = Field(default=None, ge=0, le=100)
|
||||
pct_renter_occupied: Decimal | None = Field(default=None, ge=0, le=100)
|
||||
census_year: int = Field(default=2021, description="Census year for SCD tracking")
|
||||
|
||||
|
||||
class PolicyEvent(BaseModel):
|
||||
"""Schema for policy event dimension.
|
||||
|
||||
Used for time-series annotation. No causation claims.
|
||||
"""
|
||||
|
||||
event_date: date = Field(description="Date event was announced/occurred")
|
||||
effective_date: date | None = Field(
|
||||
default=None, description="Date policy took effect"
|
||||
)
|
||||
level: PolicyLevel
|
||||
category: PolicyCategory
|
||||
title: str = Field(max_length=200, description="Short event title for display")
|
||||
description: str | None = Field(
|
||||
default=None, description="Longer description for tooltip"
|
||||
)
|
||||
expected_direction: ExpectedDirection
|
||||
source_url: HttpUrl | None = Field(default=None)
|
||||
confidence: Confidence = Field(default=Confidence.MEDIUM)
|
||||
|
||||
model_config = {"str_strip_whitespace": True}
|
||||
52
portfolio_app/toronto/schemas/trreb.py
Normal file
52
portfolio_app/toronto/schemas/trreb.py
Normal file
@@ -0,0 +1,52 @@
|
||||
"""Pydantic schemas for TRREB monthly market data."""
|
||||
|
||||
from datetime import date
|
||||
from decimal import Decimal
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class TRREBMonthlyRecord(BaseModel):
|
||||
"""Schema for a single TRREB monthly summary record.
|
||||
|
||||
Represents aggregated sales data for one district in one month.
|
||||
"""
|
||||
|
||||
report_date: date = Field(description="First of month (YYYY-MM-01)")
|
||||
area_code: str = Field(
|
||||
max_length=3, description="District code (W01, C01, E01, etc.)"
|
||||
)
|
||||
area_name: str = Field(max_length=100, description="District name")
|
||||
area_type: str = Field(max_length=10, description="West / Central / East / North")
|
||||
sales: int = Field(ge=0, description="Number of transactions")
|
||||
dollar_volume: Decimal = Field(ge=0, description="Total sales volume ($)")
|
||||
avg_price: Decimal = Field(ge=0, description="Average sale price ($)")
|
||||
median_price: Decimal = Field(ge=0, description="Median sale price ($)")
|
||||
new_listings: int = Field(ge=0, description="New listings count")
|
||||
active_listings: int = Field(ge=0, description="Active listings at month end")
|
||||
avg_sp_lp: Decimal = Field(
|
||||
ge=0, le=200, description="Avg sale price / list price ratio (%)"
|
||||
)
|
||||
avg_dom: int = Field(ge=0, description="Average days on market")
|
||||
|
||||
model_config = {"str_strip_whitespace": True}
|
||||
|
||||
|
||||
class TRREBMonthlyReport(BaseModel):
|
||||
"""Schema for a complete TRREB monthly report.
|
||||
|
||||
Contains all district records for a single month.
|
||||
"""
|
||||
|
||||
report_date: date
|
||||
records: list[TRREBMonthlyRecord]
|
||||
|
||||
@property
|
||||
def total_sales(self) -> int:
|
||||
"""Total sales across all districts."""
|
||||
return sum(r.sales for r in self.records)
|
||||
|
||||
@property
|
||||
def district_count(self) -> int:
|
||||
"""Number of districts in report."""
|
||||
return len(self.records)
|
||||
@@ -39,6 +39,7 @@ dependencies = [
|
||||
"dash>=3.3",
|
||||
"plotly>=6.5",
|
||||
"dash-mantine-components>=2.4",
|
||||
"dash-iconify>=0.1",
|
||||
|
||||
# PDF Parsing
|
||||
"pdfplumber>=0.11",
|
||||
@@ -132,17 +133,20 @@ skip-magic-trailing-comma = false
|
||||
python_version = "3.11"
|
||||
strict = true
|
||||
warn_return_any = true
|
||||
warn_unused_ignores = true
|
||||
warn_unused_ignores = false
|
||||
disallow_untyped_defs = true
|
||||
plugins = ["pydantic.mypy"]
|
||||
|
||||
[[tool.mypy.overrides]]
|
||||
module = [
|
||||
"dash.*",
|
||||
"dash_mantine_components.*",
|
||||
"dash_iconify.*",
|
||||
"plotly.*",
|
||||
"geopandas.*",
|
||||
"shapely.*",
|
||||
"pdfplumber.*",
|
||||
"tabula.*",
|
||||
"pydantic_settings.*",
|
||||
]
|
||||
ignore_missing_imports = true
|
||||
|
||||
52
scripts/db/init_schema.py
Normal file
52
scripts/db/init_schema.py
Normal file
@@ -0,0 +1,52 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Initialize database schema.
|
||||
|
||||
Usage:
|
||||
python scripts/db/init_schema.py
|
||||
|
||||
This script creates all SQLAlchemy tables in the database.
|
||||
Run this after docker-compose up to initialize the schema.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add project root to path
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||
|
||||
from portfolio_app.toronto.models import create_tables, get_engine # noqa: E402
|
||||
|
||||
|
||||
def main() -> int:
|
||||
"""Initialize the database schema."""
|
||||
print("Initializing database schema...")
|
||||
|
||||
try:
|
||||
engine = get_engine()
|
||||
|
||||
# Test connection
|
||||
with engine.connect() as conn:
|
||||
result = conn.execute("SELECT 1")
|
||||
result.fetchone()
|
||||
print("Database connection successful")
|
||||
|
||||
# Create all tables
|
||||
create_tables()
|
||||
print("Schema created successfully")
|
||||
|
||||
# List created tables
|
||||
from sqlalchemy import inspect
|
||||
|
||||
inspector = inspect(engine)
|
||||
tables = inspector.get_table_names()
|
||||
print(f"Created tables: {', '.join(tables)}")
|
||||
|
||||
return 0
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error: {e}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
6
tests/test_placeholder.py
Normal file
6
tests/test_placeholder.py
Normal file
@@ -0,0 +1,6 @@
|
||||
"""Placeholder test to ensure pytest collection succeeds."""
|
||||
|
||||
|
||||
def test_placeholder():
|
||||
"""Remove this once real tests are added."""
|
||||
assert True
|
||||
Reference in New Issue
Block a user