Compare commits
15 Commits
sprint-7-c
...
f69d0c15a7
| Author | SHA1 | Date | |
|---|---|---|---|
| f69d0c15a7 | |||
| 81993b23a7 | |||
| 457efec77f | |||
| f5f2bf3706 | |||
| fcaefabce8 | |||
| cb877df9e1 | |||
| 48b4eeeb62 | |||
| d3ca4ad4eb | |||
| e7bc545f25 | |||
| c8f4cc6241 | |||
| 3cd2eada7c | |||
| 138e6fe497 | |||
| cd7b5ce154 | |||
| e1135a77a8 | |||
| 39656ca836 |
103
CLAUDE.md
103
CLAUDE.md
@@ -6,8 +6,8 @@ Working context for Claude Code on the Analytics Portfolio project.
|
||||
|
||||
## Project Status
|
||||
|
||||
**Current Sprint**: 7 (Navigation & Theme Modernization)
|
||||
**Phase**: 1 - Toronto Housing Dashboard
|
||||
**Current Sprint**: 9 (Neighbourhood Dashboard Transition)
|
||||
**Phase**: Toronto Neighbourhood Dashboard
|
||||
**Branch**: `development` (feature branches merge here)
|
||||
|
||||
---
|
||||
@@ -33,7 +33,10 @@ make ci # Run all checks
|
||||
1. Create feature branch FROM `development`: `git checkout -b feature/{sprint}-{description}`
|
||||
2. Work and commit on feature branch
|
||||
3. Merge INTO `development` when complete
|
||||
4. `development` -> `staging` -> `main` for releases
|
||||
4. Delete the feature branch after merge (keep branches clean)
|
||||
5. `development` -> `staging` -> `main` for releases
|
||||
|
||||
**CRITICAL: NEVER DELETE the `development` branch. It is the main integration branch.**
|
||||
|
||||
---
|
||||
|
||||
@@ -43,8 +46,8 @@ make ci # Run all checks
|
||||
|
||||
| Context | Style | Example |
|
||||
|---------|-------|---------|
|
||||
| Same directory | Single dot | `from .trreb import TRREBParser` |
|
||||
| Sibling directory | Double dot | `from ..schemas.trreb import TRREBRecord` |
|
||||
| Same directory | Single dot | `from .neighbourhood import NeighbourhoodRecord` |
|
||||
| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` |
|
||||
| External packages | Absolute | `import pandas as pd` |
|
||||
|
||||
### Module Responsibilities
|
||||
@@ -53,7 +56,7 @@ make ci # Run all checks
|
||||
|-----------|----------|---------|
|
||||
| `schemas/` | Pydantic models | Data validation |
|
||||
| `models/` | SQLAlchemy ORM | Database persistence |
|
||||
| `parsers/` | PDF/CSV extraction | Raw data ingestion |
|
||||
| `parsers/` | API/CSV extraction | Raw data ingestion |
|
||||
| `loaders/` | Database operations | Data loading |
|
||||
| `figures/` | Chart factories | Plotly figure generation |
|
||||
| `callbacks/` | Dash callbacks | In `pages/{dashboard}/callbacks/` |
|
||||
@@ -101,18 +104,43 @@ portfolio_app/
|
||||
├── app.py # Dash app factory with Pages routing
|
||||
├── config.py # Pydantic BaseSettings
|
||||
├── assets/ # CSS, images (auto-served)
|
||||
│ └── sidebar.css # Navigation styling
|
||||
├── callbacks/ # Global callbacks
|
||||
│ ├── sidebar.py # Sidebar toggle
|
||||
│ └── theme.py # Dark/light theme
|
||||
├── pages/
|
||||
│ ├── home.py # Bio landing page -> /
|
||||
│ ├── about.py # About page -> /about
|
||||
│ ├── contact.py # Contact form -> /contact
|
||||
│ ├── health.py # Health endpoint -> /health
|
||||
│ ├── projects.py # Project showcase -> /projects
|
||||
│ ├── resume.py # Resume/CV -> /resume
|
||||
│ ├── blog/
|
||||
│ │ ├── index.py # Blog listing -> /blog
|
||||
│ │ └── article.py # Blog article -> /blog/{slug}
|
||||
│ └── toronto/
|
||||
│ ├── dashboard.py # Layout only -> /toronto
|
||||
│ └── callbacks/ # Interaction logic
|
||||
├── components/ # Shared UI (navbar, footer, cards)
|
||||
│ ├── dashboard.py # Dashboard -> /toronto
|
||||
│ ├── methodology.py # Methodology -> /toronto/methodology
|
||||
│ └── callbacks/ # Dashboard interactions
|
||||
├── components/ # Shared UI (sidebar, cards, controls)
|
||||
│ ├── metric_card.py # KPI card component
|
||||
│ ├── map_controls.py # Map control panel
|
||||
│ ├── sidebar.py # Navigation sidebar
|
||||
│ └── time_slider.py # Time range selector
|
||||
├── figures/ # Shared chart factories
|
||||
│ ├── choropleth.py # Map visualizations
|
||||
│ ├── summary_cards.py # KPI figures
|
||||
│ └── time_series.py # Trend charts
|
||||
├── content/ # Markdown content
|
||||
│ └── blog/ # Blog articles
|
||||
├── toronto/ # Toronto data logic
|
||||
│ ├── parsers/
|
||||
│ ├── loaders/
|
||||
│ ├── schemas/ # Pydantic
|
||||
│ └── models/ # SQLAlchemy
|
||||
│ ├── models/ # SQLAlchemy
|
||||
│ └── demo_data.py # Sample data
|
||||
├── utils/ # Utilities
|
||||
│ └── markdown_loader.py # Markdown processing
|
||||
└── errors/
|
||||
```
|
||||
|
||||
@@ -121,7 +149,15 @@ portfolio_app/
|
||||
| URL | Page | Sprint |
|
||||
|-----|------|--------|
|
||||
| `/` | Bio landing page | 2 |
|
||||
| `/toronto` | Toronto Housing Dashboard | 6 |
|
||||
| `/about` | About page | 8 |
|
||||
| `/contact` | Contact form | 8 |
|
||||
| `/health` | Health endpoint | 8 |
|
||||
| `/projects` | Project showcase | 8 |
|
||||
| `/resume` | Resume/CV | 8 |
|
||||
| `/blog` | Blog listing | 8 |
|
||||
| `/blog/{slug}` | Blog article | 8 |
|
||||
| `/toronto` | Toronto Dashboard | 6 |
|
||||
| `/toronto/methodology` | Dashboard methodology | 6 |
|
||||
|
||||
---
|
||||
|
||||
@@ -152,27 +188,20 @@ portfolio_app/
|
||||
### Geographic Reality (Toronto Housing)
|
||||
|
||||
```
|
||||
TRREB Districts (~35) - Purchase data (W01, C01, E01...)
|
||||
City Neighbourhoods (158) - Primary geographic unit for analysis
|
||||
CMHC Zones (~20) - Rental data (Census Tract aligned)
|
||||
City Neighbourhoods (158) - Enrichment/overlay only
|
||||
```
|
||||
|
||||
**Critical**: These geographies do NOT align. Display as separate layers—do not force crosswalks.
|
||||
|
||||
### Star Schema
|
||||
|
||||
| Table | Type | Keys |
|
||||
|-------|------|------|
|
||||
| `fact_purchases` | Fact | -> dim_time, dim_trreb_district |
|
||||
| `fact_rentals` | Fact | -> dim_time, dim_cmhc_zone |
|
||||
| `dim_time` | Dimension | date_key (PK) |
|
||||
| `dim_trreb_district` | Dimension | district_key (PK), geometry |
|
||||
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
|
||||
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
|
||||
| `dim_policy_event` | Dimension | event_id (PK) |
|
||||
|
||||
**V1 Rule**: `dim_neighbourhood` has NO FK to fact tables—reference overlay only.
|
||||
|
||||
### dbt Layers
|
||||
|
||||
| Layer | Naming | Purpose |
|
||||
@@ -183,37 +212,15 @@ City Neighbourhoods (158) - Enrichment/overlay only
|
||||
|
||||
---
|
||||
|
||||
## DO NOT BUILD (Phase 1)
|
||||
## Deferred Features
|
||||
|
||||
**Stop and flag if a task seems to require these**:
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| `bridge_district_neighbourhood` table | Area-weighted aggregation is Phase 4 |
|
||||
| Crime data integration | Deferred to Phase 4 |
|
||||
| Historical boundary reconciliation (140->158) | 2021+ data only for V1 |
|
||||
| ML prediction models | Energy project scope (Phase 3) |
|
||||
| Multi-project shared infrastructure | Build first, abstract second (Phase 2) |
|
||||
|
||||
---
|
||||
|
||||
## Sprint 1 Deliverables
|
||||
|
||||
| Category | Tasks |
|
||||
|----------|-------|
|
||||
| **Bootstrap** | Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md |
|
||||
| **Infrastructure** | Docker Compose (PostgreSQL + PostGIS), scripts/ directory |
|
||||
| **App Foundation** | portfolio_app/ structure, config.py, error handling |
|
||||
| **Tests** | tests/ directory, conftest.py, pytest config |
|
||||
| **Data Acquisition** | Download TRREB PDFs, START boundary digitization (HUMAN task) |
|
||||
|
||||
### Human Tasks (Cannot Automate)
|
||||
|
||||
| Task | Tool | Effort |
|
||||
|------|------|--------|
|
||||
| Digitize TRREB district boundaries | QGIS | 3-4 hours |
|
||||
| Research policy events (10-20) | Manual | 2-3 hours |
|
||||
| Replace social link placeholders | Manual | 5 minutes |
|
||||
| ML prediction models | Energy project scope (future phase) |
|
||||
| Multi-project shared infrastructure | Build first, abstract second |
|
||||
|
||||
---
|
||||
|
||||
@@ -248,10 +255,10 @@ All scripts in `scripts/`:
|
||||
|
||||
| Document | Location | Use When |
|
||||
|----------|----------|----------|
|
||||
| Full specification | `docs/PROJECT_REFERENCE.md` | Architecture decisions |
|
||||
| Data schemas | `docs/toronto_housing_dashboard_spec_v5.md` | Parser/model tasks |
|
||||
| WBS details | `docs/wbs_sprint_plan_v4.md` | Sprint planning |
|
||||
| Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions |
|
||||
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification |
|
||||
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning |
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: Sprint 7*
|
||||
*Last Updated: Sprint 9*
|
||||
|
||||
@@ -1,17 +1,6 @@
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: int_purchases__monthly
|
||||
description: "Purchase data enriched with time and district dimensions"
|
||||
columns:
|
||||
- name: purchase_id
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: district_code
|
||||
tests:
|
||||
- not_null
|
||||
|
||||
- name: int_rentals__annual
|
||||
description: "Rental data enriched with time and zone dimensions"
|
||||
columns:
|
||||
|
||||
@@ -1,62 +0,0 @@
|
||||
-- Intermediate: Monthly purchase data enriched with dimensions
|
||||
-- Joins purchases with time and district dimensions for analysis
|
||||
|
||||
with purchases as (
|
||||
select * from {{ ref('stg_trreb__purchases') }}
|
||||
),
|
||||
|
||||
time_dim as (
|
||||
select * from {{ ref('stg_dimensions__time') }}
|
||||
),
|
||||
|
||||
district_dim as (
|
||||
select * from {{ ref('stg_dimensions__trreb_districts') }}
|
||||
),
|
||||
|
||||
enriched as (
|
||||
select
|
||||
p.purchase_id,
|
||||
|
||||
-- Time attributes
|
||||
t.date_key,
|
||||
t.full_date,
|
||||
t.year,
|
||||
t.month,
|
||||
t.quarter,
|
||||
t.month_name,
|
||||
|
||||
-- District attributes
|
||||
d.district_key,
|
||||
d.district_code,
|
||||
d.district_name,
|
||||
d.area_type,
|
||||
|
||||
-- Metrics
|
||||
p.sales_count,
|
||||
p.dollar_volume,
|
||||
p.avg_price,
|
||||
p.median_price,
|
||||
p.new_listings,
|
||||
p.active_listings,
|
||||
p.days_on_market,
|
||||
p.sale_to_list_ratio,
|
||||
|
||||
-- Calculated metrics
|
||||
case
|
||||
when p.active_listings > 0
|
||||
then round(p.sales_count::numeric / p.active_listings, 3)
|
||||
else null
|
||||
end as absorption_rate,
|
||||
|
||||
case
|
||||
when p.sales_count > 0
|
||||
then round(p.active_listings::numeric / p.sales_count, 1)
|
||||
else null
|
||||
end as months_of_inventory
|
||||
|
||||
from purchases p
|
||||
inner join time_dim t on p.date_key = t.date_key
|
||||
inner join district_dim d on p.district_key = d.district_key
|
||||
)
|
||||
|
||||
select * from enriched
|
||||
@@ -1,15 +1,6 @@
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: mart_toronto_purchases
|
||||
description: "Final mart for Toronto purchase/sales analysis by district and time"
|
||||
columns:
|
||||
- name: purchase_id
|
||||
description: "Unique purchase record identifier"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: mart_toronto_rentals
|
||||
description: "Final mart for Toronto rental market analysis by zone and time"
|
||||
columns:
|
||||
@@ -18,6 +9,3 @@ models:
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: mart_toronto_market_summary
|
||||
description: "Combined market summary aggregating purchases and rentals at Toronto level"
|
||||
|
||||
@@ -1,81 +0,0 @@
|
||||
-- Mart: Toronto Market Summary
|
||||
-- Aggregated view combining purchase and rental market indicators
|
||||
-- Grain: One row per year-month
|
||||
|
||||
with purchases_agg as (
|
||||
select
|
||||
year,
|
||||
month,
|
||||
month_name,
|
||||
quarter,
|
||||
|
||||
-- Aggregate purchase metrics across all districts
|
||||
sum(sales_count) as total_sales,
|
||||
sum(dollar_volume) as total_dollar_volume,
|
||||
round(avg(avg_price), 0) as avg_price_all_districts,
|
||||
round(avg(median_price), 0) as median_price_all_districts,
|
||||
sum(new_listings) as total_new_listings,
|
||||
sum(active_listings) as total_active_listings,
|
||||
round(avg(days_on_market), 0) as avg_days_on_market,
|
||||
round(avg(sale_to_list_ratio), 2) as avg_sale_to_list_ratio,
|
||||
round(avg(absorption_rate), 3) as avg_absorption_rate,
|
||||
round(avg(months_of_inventory), 1) as avg_months_of_inventory,
|
||||
round(avg(avg_price_yoy_pct), 2) as avg_price_yoy_pct
|
||||
|
||||
from {{ ref('mart_toronto_purchases') }}
|
||||
group by year, month, month_name, quarter
|
||||
),
|
||||
|
||||
rentals_agg as (
|
||||
select
|
||||
year,
|
||||
|
||||
-- Aggregate rental metrics across all zones (all bedroom types)
|
||||
round(avg(avg_rent), 0) as avg_rent_all_zones,
|
||||
round(avg(vacancy_rate), 2) as avg_vacancy_rate,
|
||||
round(avg(rent_change_pct), 2) as avg_rent_change_pct,
|
||||
sum(rental_universe) as total_rental_universe
|
||||
|
||||
from {{ ref('mart_toronto_rentals') }}
|
||||
group by year
|
||||
),
|
||||
|
||||
final as (
|
||||
select
|
||||
p.year,
|
||||
p.month,
|
||||
p.month_name,
|
||||
p.quarter,
|
||||
|
||||
-- Purchase market indicators
|
||||
p.total_sales,
|
||||
p.total_dollar_volume,
|
||||
p.avg_price_all_districts,
|
||||
p.median_price_all_districts,
|
||||
p.total_new_listings,
|
||||
p.total_active_listings,
|
||||
p.avg_days_on_market,
|
||||
p.avg_sale_to_list_ratio,
|
||||
p.avg_absorption_rate,
|
||||
p.avg_months_of_inventory,
|
||||
p.avg_price_yoy_pct,
|
||||
|
||||
-- Rental market indicators (annual, so join on year)
|
||||
r.avg_rent_all_zones,
|
||||
r.avg_vacancy_rate,
|
||||
r.avg_rent_change_pct,
|
||||
r.total_rental_universe,
|
||||
|
||||
-- Affordability indicator (price to rent ratio)
|
||||
case
|
||||
when r.avg_rent_all_zones > 0
|
||||
then round(p.avg_price_all_districts / (r.avg_rent_all_zones * 12), 1)
|
||||
else null
|
||||
end as price_to_annual_rent_ratio
|
||||
|
||||
from purchases_agg p
|
||||
left join rentals_agg r on p.year = r.year
|
||||
)
|
||||
|
||||
select * from final
|
||||
order by year desc, month desc
|
||||
@@ -1,79 +0,0 @@
|
||||
-- Mart: Toronto Purchase Market Analysis
|
||||
-- Final analytical table for purchase/sales data visualization
|
||||
-- Grain: One row per district per month
|
||||
|
||||
with purchases as (
|
||||
select * from {{ ref('int_purchases__monthly') }}
|
||||
),
|
||||
|
||||
-- Add year-over-year calculations
|
||||
with_yoy as (
|
||||
select
|
||||
p.*,
|
||||
|
||||
-- Previous year same month values
|
||||
lag(p.avg_price, 12) over (
|
||||
partition by p.district_code
|
||||
order by p.date_key
|
||||
) as avg_price_prev_year,
|
||||
|
||||
lag(p.sales_count, 12) over (
|
||||
partition by p.district_code
|
||||
order by p.date_key
|
||||
) as sales_count_prev_year,
|
||||
|
||||
lag(p.median_price, 12) over (
|
||||
partition by p.district_code
|
||||
order by p.date_key
|
||||
) as median_price_prev_year
|
||||
|
||||
from purchases p
|
||||
),
|
||||
|
||||
final as (
|
||||
select
|
||||
purchase_id,
|
||||
date_key,
|
||||
full_date,
|
||||
year,
|
||||
month,
|
||||
quarter,
|
||||
month_name,
|
||||
district_key,
|
||||
district_code,
|
||||
district_name,
|
||||
area_type,
|
||||
sales_count,
|
||||
dollar_volume,
|
||||
avg_price,
|
||||
median_price,
|
||||
new_listings,
|
||||
active_listings,
|
||||
days_on_market,
|
||||
sale_to_list_ratio,
|
||||
absorption_rate,
|
||||
months_of_inventory,
|
||||
|
||||
-- Year-over-year changes
|
||||
case
|
||||
when avg_price_prev_year > 0
|
||||
then round(((avg_price - avg_price_prev_year) / avg_price_prev_year) * 100, 2)
|
||||
else null
|
||||
end as avg_price_yoy_pct,
|
||||
|
||||
case
|
||||
when sales_count_prev_year > 0
|
||||
then round(((sales_count - sales_count_prev_year)::numeric / sales_count_prev_year) * 100, 2)
|
||||
else null
|
||||
end as sales_count_yoy_pct,
|
||||
|
||||
case
|
||||
when median_price_prev_year > 0
|
||||
then round(((median_price - median_price_prev_year) / median_price_prev_year) * 100, 2)
|
||||
else null
|
||||
end as median_price_yoy_pct
|
||||
|
||||
from with_yoy
|
||||
)
|
||||
|
||||
select * from final
|
||||
@@ -2,20 +2,10 @@ version: 2
|
||||
|
||||
sources:
|
||||
- name: toronto_housing
|
||||
description: "Toronto housing data loaded from TRREB and CMHC sources"
|
||||
description: "Toronto housing data loaded from CMHC and City of Toronto sources"
|
||||
database: portfolio
|
||||
schema: public
|
||||
tables:
|
||||
- name: fact_purchases
|
||||
description: "TRREB monthly purchase/sales statistics by district"
|
||||
columns:
|
||||
- name: id
|
||||
description: "Primary key"
|
||||
- name: date_key
|
||||
description: "Foreign key to dim_time"
|
||||
- name: district_key
|
||||
description: "Foreign key to dim_trreb_district"
|
||||
|
||||
- name: fact_rentals
|
||||
description: "CMHC annual rental survey data by zone and bedroom type"
|
||||
columns:
|
||||
@@ -32,14 +22,6 @@ sources:
|
||||
- name: date_key
|
||||
description: "Primary key (YYYYMMDD format)"
|
||||
|
||||
- name: dim_trreb_district
|
||||
description: "TRREB district dimension with geometry"
|
||||
columns:
|
||||
- name: district_key
|
||||
description: "Primary key"
|
||||
- name: district_code
|
||||
description: "TRREB district code"
|
||||
|
||||
- name: dim_cmhc_zone
|
||||
description: "CMHC zone dimension with geometry"
|
||||
columns:
|
||||
@@ -49,7 +31,7 @@ sources:
|
||||
description: "CMHC zone code"
|
||||
|
||||
- name: dim_neighbourhood
|
||||
description: "City of Toronto neighbourhoods (reference only)"
|
||||
description: "City of Toronto neighbourhoods (158 official boundaries)"
|
||||
columns:
|
||||
- name: neighbourhood_id
|
||||
description: "Primary key"
|
||||
|
||||
@@ -1,23 +1,6 @@
|
||||
version: 2
|
||||
|
||||
models:
|
||||
- name: stg_trreb__purchases
|
||||
description: "Staged TRREB purchase/sales data from fact_purchases"
|
||||
columns:
|
||||
- name: purchase_id
|
||||
description: "Unique identifier for purchase record"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: date_key
|
||||
description: "Date dimension key (YYYYMMDD)"
|
||||
tests:
|
||||
- not_null
|
||||
- name: district_key
|
||||
description: "TRREB district dimension key"
|
||||
tests:
|
||||
- not_null
|
||||
|
||||
- name: stg_cmhc__rentals
|
||||
description: "Staged CMHC rental market data from fact_rentals"
|
||||
columns:
|
||||
@@ -44,20 +27,6 @@ models:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: stg_dimensions__trreb_districts
|
||||
description: "Staged TRREB district dimension"
|
||||
columns:
|
||||
- name: district_key
|
||||
description: "District dimension key"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
- name: district_code
|
||||
description: "TRREB district code (e.g., W01, C01)"
|
||||
tests:
|
||||
- unique
|
||||
- not_null
|
||||
|
||||
- name: stg_dimensions__cmhc_zones
|
||||
description: "Staged CMHC zone dimension"
|
||||
columns:
|
||||
|
||||
@@ -1,19 +0,0 @@
|
||||
-- Staged TRREB district dimension
|
||||
-- Source: dim_trreb_district table
|
||||
-- Grain: One row per district
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'dim_trreb_district') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
district_key,
|
||||
district_code,
|
||||
district_name,
|
||||
area_type,
|
||||
geometry
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
@@ -1,25 +0,0 @@
|
||||
-- Staged TRREB purchase/sales data
|
||||
-- Source: fact_purchases table loaded from TRREB Market Watch PDFs
|
||||
-- Grain: One row per district per month
|
||||
|
||||
with source as (
|
||||
select * from {{ source('toronto_housing', 'fact_purchases') }}
|
||||
),
|
||||
|
||||
staged as (
|
||||
select
|
||||
id as purchase_id,
|
||||
date_key,
|
||||
district_key,
|
||||
sales_count,
|
||||
dollar_volume,
|
||||
avg_price,
|
||||
median_price,
|
||||
new_listings,
|
||||
active_listings,
|
||||
avg_dom as days_on_market,
|
||||
avg_sp_lp as sale_to_list_ratio
|
||||
from source
|
||||
)
|
||||
|
||||
select * from staged
|
||||
@@ -65,8 +65,8 @@ Two-project analytics portfolio demonstrating end-to-end data engineering, visua
|
||||
|
||||
| Context | Style | Example |
|
||||
|---------|-------|---------|
|
||||
| Same directory | Single dot | `from .trreb import TRREBParser` |
|
||||
| Sibling directory | Double dot | `from ..schemas.trreb import TRREBRecord` |
|
||||
| Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` |
|
||||
| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` |
|
||||
| External packages | Absolute | `import pandas as pd` |
|
||||
|
||||
### Module Separation
|
||||
@@ -75,7 +75,7 @@ Two-project analytics portfolio demonstrating end-to-end data engineering, visua
|
||||
|-----------|----------|---------|
|
||||
| `schemas/` | Pydantic models | Data validation |
|
||||
| `models/` | SQLAlchemy ORM | Database persistence |
|
||||
| `parsers/` | PDF/CSV extraction | Raw data ingestion |
|
||||
| `parsers/` | API/CSV extraction | Raw data ingestion |
|
||||
| `loaders/` | Database operations | Data loading |
|
||||
| `figures/` | Chart factories | Plotly figure generation |
|
||||
| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` |
|
||||
@@ -145,45 +145,36 @@ portfolio_app/
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Toronto Housing Dashboard
|
||||
## Phase 1: Toronto Neighbourhood Dashboard
|
||||
|
||||
### Data Sources
|
||||
|
||||
| Track | Source | Format | Geography | Frequency |
|
||||
|-------|--------|--------|-----------|-----------|
|
||||
| Purchases | TRREB Monthly Reports | PDF | ~35 Districts | Monthly |
|
||||
| Rentals | CMHC Rental Market Survey | CSV | ~20 Zones | Annual |
|
||||
| Enrichment | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
|
||||
| Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual |
|
||||
| Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
|
||||
| Policy Events | Curated list | CSV | N/A | Event-based |
|
||||
|
||||
### Geographic Reality
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ City of Toronto Neighbourhoods (158) │ ← Enrichment only
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data
|
||||
│ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Critical**: These geographies do NOT align. Display as separate layers with toggle—do not force crosswalks.
|
||||
|
||||
### Data Model (Star Schema)
|
||||
|
||||
| Table | Type | Keys |
|
||||
|-------|------|------|
|
||||
| `fact_purchases` | Fact | → dim_time, dim_trreb_district |
|
||||
| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone |
|
||||
| `dim_time` | Dimension | date_key (PK) |
|
||||
| `dim_trreb_district` | Dimension | district_key (PK), geometry |
|
||||
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
|
||||
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
|
||||
| `dim_policy_event` | Dimension | event_id (PK) |
|
||||
|
||||
**V1 Rule**: `dim_neighbourhood` has NO FK to fact tables—reference overlay only.
|
||||
|
||||
### dbt Layer Structure
|
||||
|
||||
| Layer | Naming | Purpose |
|
||||
@@ -198,31 +189,11 @@ portfolio_app/
|
||||
|
||||
| Sprint | Focus | Milestone |
|
||||
|--------|-------|-----------|
|
||||
| 1 | Project bootstrap, start TRREB digitization | — |
|
||||
| 2 | Bio page, data acquisition | **Launch 1: Bio Live** |
|
||||
| 3 | Parsers, schemas, models | — |
|
||||
| 4 | Loaders, dbt | — |
|
||||
| 5 | Visualization | — |
|
||||
| 6 | Polish, deploy dashboard | **Launch 2: Dashboard Live** |
|
||||
| 7 | Buffer | — |
|
||||
|
||||
### Sprint 1 Deliverables
|
||||
|
||||
| Category | Tasks |
|
||||
|----------|-------|
|
||||
| **Bootstrap** | Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md |
|
||||
| **Infrastructure** | Docker Compose (PostgreSQL + PostGIS), scripts/ directory |
|
||||
| **App Foundation** | portfolio_app/ structure, config.py, error handling |
|
||||
| **Tests** | tests/ directory, conftest.py, pytest config |
|
||||
| **Data Acquisition** | Download TRREB PDFs, START boundary digitization (HUMAN task) |
|
||||
|
||||
### Human Tasks (Cannot Automate)
|
||||
|
||||
| Task | Tool | Effort |
|
||||
|------|------|--------|
|
||||
| Digitize TRREB district boundaries | QGIS | 3-4 hours |
|
||||
| Research policy events (10-20) | Manual research | 2-3 hours |
|
||||
| Replace social link placeholders | Manual | 5 minutes |
|
||||
| 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** |
|
||||
| 7 | Navigation & theme modernization | — |
|
||||
| 8 | Portfolio website expansion | **Launch 2: Website Live** |
|
||||
| 9 | Neighbourhood dashboard transition | Cleanup complete |
|
||||
| 10+ | Dashboard implementation | **Launch 3: Dashboard Live** |
|
||||
|
||||
---
|
||||
|
||||
@@ -230,27 +201,24 @@ portfolio_app/
|
||||
|
||||
### Phase 1 — Build These
|
||||
|
||||
- Bio landing page with content from bio_content_v2.md
|
||||
- TRREB PDF parser
|
||||
- CMHC CSV processor
|
||||
- Bio landing page and portfolio website
|
||||
- CMHC rental data processor
|
||||
- Toronto neighbourhood data integration
|
||||
- PostgreSQL + PostGIS database layer
|
||||
- Star schema (facts + dimensions)
|
||||
- dbt models with tests
|
||||
- Choropleth visualization (Dash)
|
||||
- Policy event annotation layer
|
||||
- Neighbourhood overlay (toggle-able)
|
||||
|
||||
### Phase 1 — Do NOT Build
|
||||
### Deferred Features
|
||||
|
||||
| Feature | Reason | When |
|
||||
|---------|--------|------|
|
||||
| `bridge_district_neighbourhood` table | Area-weighted aggregation is Phase 4 | After Energy project |
|
||||
| Crime data integration | Deferred scope | Phase 4 |
|
||||
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Phase 4 |
|
||||
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase |
|
||||
| ML prediction models | Energy project scope | Phase 3 |
|
||||
| Multi-project shared infrastructure | Build first, abstract second | Phase 2 |
|
||||
| Multi-project shared infrastructure | Build first, abstract second | Future |
|
||||
|
||||
If a task seems to require Phase 3/4 features, **stop and flag it**.
|
||||
If a task seems to require deferred features, **stop and flag it**.
|
||||
|
||||
---
|
||||
|
||||
@@ -362,19 +330,24 @@ LOG_LEVEL=INFO
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Launch 1 (Sprint 2)
|
||||
- [ ] Bio page accessible via HTTPS
|
||||
- [ ] All bio content rendered (from bio_content_v2.md)
|
||||
- [ ] No placeholder text visible
|
||||
- [ ] Mobile responsive
|
||||
- [ ] Social links functional
|
||||
### Launch 1 (Bio Live)
|
||||
- [x] Bio page accessible via HTTPS
|
||||
- [x] All bio content rendered
|
||||
- [x] No placeholder text visible
|
||||
- [x] Mobile responsive
|
||||
- [x] Social links functional
|
||||
|
||||
### Launch 2 (Sprint 6)
|
||||
- [ ] Choropleth renders TRREB districts and CMHC zones
|
||||
- [ ] Purchase/rental mode toggle works
|
||||
### Launch 2 (Website Live)
|
||||
- [x] Full portfolio website with navigation
|
||||
- [x] About, Contact, Projects, Resume, Blog pages
|
||||
- [x] Dark mode theme support
|
||||
- [x] Sidebar navigation
|
||||
|
||||
### Launch 3 (Dashboard Live)
|
||||
- [ ] Choropleth renders neighbourhoods and CMHC zones
|
||||
- [ ] Rental data visualization works
|
||||
- [ ] Time navigation works
|
||||
- [ ] Policy event markers visible
|
||||
- [ ] Neighbourhood overlay toggleable
|
||||
- [ ] Methodology documentation published
|
||||
- [ ] Data sources cited
|
||||
|
||||
@@ -386,11 +359,10 @@ For detailed specifications, see:
|
||||
|
||||
| Document | Location | Use When |
|
||||
|----------|----------|----------|
|
||||
| Data schemas | `docs/toronto_housing_spec.md` | Parser/model tasks |
|
||||
| WBS details | `docs/wbs.md` | Sprint planning |
|
||||
| Bio content | `docs/bio_content.md` | Building home.py |
|
||||
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification |
|
||||
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning |
|
||||
|
||||
---
|
||||
|
||||
*Reference Version: 1.0*
|
||||
*Created: January 2026*
|
||||
*Reference Version: 2.0*
|
||||
*Updated: Sprint 9*
|
||||
|
||||
276
docs/changes/Change-Toronto-Analysis-Reviewed.md
Normal file
276
docs/changes/Change-Toronto-Analysis-Reviewed.md
Normal file
@@ -0,0 +1,276 @@
|
||||
# Toronto Neighbourhood Dashboard — Implementation Plan
|
||||
|
||||
**Document Type:** Execution Guide
|
||||
**Target:** Transition from TRREB-based to Neighbourhood-based Dashboard
|
||||
**Version:** 2.0 | January 2026
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Transition from TRREB district-based housing dashboard to a comprehensive Toronto Neighbourhood Dashboard built around the city's 158 official neighbourhoods.
|
||||
|
||||
**Key Changes:**
|
||||
- Geographic foundation: TRREB districts (~35) → City Neighbourhoods (158)
|
||||
- Data sources: PDF parsing → Open APIs (Toronto Open Data, Toronto Police, CMHC)
|
||||
- Scope: Housing-only → 5 thematic tabs (Overview, Housing, Safety, Demographics, Amenities)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Repository Cleanup
|
||||
|
||||
### Files to DELETE
|
||||
|
||||
| File | Reason |
|
||||
|------|--------|
|
||||
| `portfolio_app/toronto/schemas/trreb.py` | TRREB schema obsolete |
|
||||
| `portfolio_app/toronto/parsers/trreb.py` | PDF parsing no longer needed |
|
||||
| `portfolio_app/toronto/loaders/trreb.py` | TRREB loading logic obsolete |
|
||||
| `dbt/models/staging/stg_trreb__purchases.sql` | TRREB staging obsolete |
|
||||
| `dbt/models/intermediate/int_purchases__monthly.sql` | TRREB intermediate obsolete |
|
||||
| `dbt/models/marts/mart_toronto_purchases.sql` | Will rebuild for neighbourhood grain |
|
||||
|
||||
### Files to MODIFY (Remove TRREB References)
|
||||
|
||||
| File | Action |
|
||||
|------|--------|
|
||||
| `portfolio_app/toronto/schemas/__init__.py` | Remove TRREB imports |
|
||||
| `portfolio_app/toronto/parsers/__init__.py` | Remove TRREB parser imports |
|
||||
| `portfolio_app/toronto/loaders/__init__.py` | Remove TRREB loader imports |
|
||||
| `portfolio_app/toronto/models/facts.py` | Remove `FactPurchases` model |
|
||||
| `portfolio_app/toronto/models/dimensions.py` | Remove `DimTRREBDistrict` model |
|
||||
| `portfolio_app/toronto/demo_data.py` | Remove TRREB demo data |
|
||||
| `dbt/models/sources.yml` | Remove TRREB source definitions |
|
||||
| `dbt/models/schema.yml` | Remove TRREB model documentation |
|
||||
|
||||
### Files to KEEP (Reusable)
|
||||
|
||||
| File | Why |
|
||||
|------|-----|
|
||||
| `portfolio_app/toronto/schemas/cmhc.py` | CMHC data still used |
|
||||
| `portfolio_app/toronto/parsers/cmhc.py` | Reusable with modifications |
|
||||
| `portfolio_app/toronto/loaders/base.py` | Generic database utilities |
|
||||
| `portfolio_app/toronto/loaders/dimensions.py` | Dimension loading patterns |
|
||||
| `portfolio_app/toronto/models/base.py` | SQLAlchemy base class |
|
||||
| `portfolio_app/figures/*.py` | All chart factories reusable |
|
||||
| `portfolio_app/components/*.py` | All UI components reusable |
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Documentation Updates
|
||||
|
||||
| Document | Action |
|
||||
|----------|--------|
|
||||
| `CLAUDE.md` | Update data model section, mark transition complete |
|
||||
| `docs/PROJECT_REFERENCE.md` | Update architecture, data sources |
|
||||
| `docs/toronto_housing_dashboard_spec_v5.md` | Archive or delete |
|
||||
| `docs/wbs_sprint_plan_v4.md` | Archive or delete |
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: New Data Model
|
||||
|
||||
### Star Schema (Neighbourhood-Centric)
|
||||
|
||||
| Table | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `dim_neighbourhood` | Central Dimension | 158 neighbourhoods with geometry |
|
||||
| `dim_time` | Dimension | Date dimension (keep existing) |
|
||||
| `dim_cmhc_zone` | Bridge Dimension | 15 CMHC zones with neighbourhood mapping |
|
||||
| `bridge_cmhc_neighbourhood` | Bridge | Zone-to-neighbourhood area weights |
|
||||
| `fact_census` | Fact | Census indicators by neighbourhood |
|
||||
| `fact_crime` | Fact | Crime stats by neighbourhood |
|
||||
| `fact_rentals` | Fact | Rental data by CMHC zone (keep existing) |
|
||||
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
|
||||
|
||||
### New Schema Files
|
||||
|
||||
| File | Contains |
|
||||
|------|----------|
|
||||
| `toronto/schemas/neighbourhood.py` | NeighbourhoodRecord, CensusRecord, CrimeRecord |
|
||||
| `toronto/schemas/amenities.py` | AmenityType enum, AmenityRecord |
|
||||
|
||||
### New Parser Files
|
||||
|
||||
| File | Data Source | API |
|
||||
|------|-------------|-----|
|
||||
| `toronto/parsers/toronto_open_data.py` | Neighbourhoods, Census, Parks, Schools, Childcare | Toronto Open Data Portal |
|
||||
| `toronto/parsers/toronto_police.py` | Crime Rates, MCI, Shootings | Toronto Police Portal |
|
||||
|
||||
### New Loader Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `toronto/loaders/neighbourhoods.py` | Load GeoJSON boundaries |
|
||||
| `toronto/loaders/census.py` | Load neighbourhood profiles |
|
||||
| `toronto/loaders/crime.py` | Load crime statistics |
|
||||
| `toronto/loaders/amenities.py` | Load parks, schools, childcare |
|
||||
| `toronto/loaders/cmhc_crosswalk.py` | Build CMHC-neighbourhood bridge |
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: dbt Restructuring
|
||||
|
||||
### Staging Layer
|
||||
|
||||
| Model | Source |
|
||||
|-------|--------|
|
||||
| `stg_toronto__neighbourhoods` | dim_neighbourhood |
|
||||
| `stg_toronto__census` | fact_census |
|
||||
| `stg_toronto__crime` | fact_crime |
|
||||
| `stg_toronto__amenities` | fact_amenities |
|
||||
| `stg_cmhc__rentals` | fact_rentals (modify existing) |
|
||||
| `stg_cmhc__zone_crosswalk` | bridge_cmhc_neighbourhood |
|
||||
|
||||
### Intermediate Layer
|
||||
|
||||
| Model | Purpose |
|
||||
|-------|---------|
|
||||
| `int_neighbourhood__demographics` | Combined census demographics |
|
||||
| `int_neighbourhood__housing` | Housing indicators |
|
||||
| `int_neighbourhood__crime_summary` | Aggregated crime by type |
|
||||
| `int_neighbourhood__amenity_scores` | Normalized amenity metrics |
|
||||
| `int_rentals__neighbourhood_allocated` | CMHC rentals allocated to neighbourhoods |
|
||||
|
||||
### Mart Layer (One per Tab)
|
||||
|
||||
| Model | Tab | Key Metrics |
|
||||
|-------|-----|-------------|
|
||||
| `mart_neighbourhood_overview` | Overview | Composite livability score |
|
||||
| `mart_neighbourhood_housing` | Housing | Affordability index, rent-to-income |
|
||||
| `mart_neighbourhood_safety` | Safety | Crime rates, YoY change |
|
||||
| `mart_neighbourhood_demographics` | Demographics | Income, age, diversity |
|
||||
| `mart_neighbourhood_amenities` | Amenities | Parks, schools, transit per capita |
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Dashboard Implementation
|
||||
|
||||
### Tab Structure
|
||||
|
||||
```
|
||||
pages/toronto/
|
||||
├── dashboard.py # Main layout with tab navigation
|
||||
├── tabs/
|
||||
│ ├── overview.py # Composite livability
|
||||
│ ├── housing.py # Affordability
|
||||
│ ├── safety.py # Crime
|
||||
│ ├── demographics.py # Population
|
||||
│ └── amenities.py # Services
|
||||
└── callbacks/
|
||||
├── map_callbacks.py
|
||||
├── chart_callbacks.py
|
||||
└── selection_callbacks.py
|
||||
```
|
||||
|
||||
### Layout Pattern (All Tabs)
|
||||
|
||||
Each tab follows the same structure:
|
||||
1. **Choropleth Map** (left) — 158 neighbourhoods, click to select
|
||||
2. **KPI Cards** (right) — 3-4 contextual metrics
|
||||
3. **Supporting Charts** (bottom) — Trend + comparison visualizations
|
||||
4. **Details Panel** (collapsible) — All metrics for selected neighbourhood
|
||||
|
||||
### Graphs by Tab
|
||||
|
||||
| Tab | Choropleth Metric | Chart 1 | Chart 2 |
|
||||
|-----|-------------------|---------|---------|
|
||||
| Overview | Livability score | Top/Bottom 10 bar | Income vs Crime scatter |
|
||||
| Housing | Affordability index | Rent trend (5yr line) | Dwelling types (pie/bar) |
|
||||
| Safety | Crime rate per 100K | Crime breakdown (stacked bar) | Crime trend (5yr line) |
|
||||
| Demographics | Median income | Age pyramid | Top languages (bar) |
|
||||
| Amenities | Park area per capita | Amenity radar | Transit accessibility (bar) |
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: Jupyter Notebooks
|
||||
|
||||
### Purpose
|
||||
|
||||
One notebook per graph to document:
|
||||
1. **Data Reference** — How the data was built (query, transformation steps, sample output)
|
||||
2. **Data Visualization** — Import figure factory, render the graph
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
notebooks/
|
||||
├── README.md
|
||||
├── overview/
|
||||
├── housing/
|
||||
├── safety/
|
||||
├── demographics/
|
||||
└── amenities/
|
||||
```
|
||||
|
||||
### Notebook Template
|
||||
|
||||
```markdown
|
||||
# [Graph Name]
|
||||
|
||||
## 1. Data Reference
|
||||
|
||||
### Source Tables
|
||||
- List tables/marts used
|
||||
- Grain of each table
|
||||
|
||||
### Query
|
||||
```sql
|
||||
SELECT ... FROM ...
|
||||
```
|
||||
|
||||
### Transformation Steps
|
||||
1. Step description
|
||||
2. Step description
|
||||
|
||||
### Sample Data
|
||||
```python
|
||||
df = pd.read_sql(query, engine)
|
||||
df.head(10)
|
||||
```
|
||||
|
||||
## 2. Data Visualization
|
||||
|
||||
```python
|
||||
from portfolio_app.figures.choropleth import create_choropleth_figure
|
||||
fig = create_choropleth_figure(...)
|
||||
fig.show()
|
||||
```
|
||||
```
|
||||
|
||||
Create one notebook per graph as each is implemented (15 total across 5 tabs).
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: Final Documentation Review
|
||||
|
||||
After all implementation, audit and update:
|
||||
|
||||
- [ ] `CLAUDE.md` — Project status, app structure, data model, URL routes
|
||||
- [ ] `README.md` — Project description, installation, quick start
|
||||
- [ ] `docs/PROJECT_REFERENCE.md` — Architecture matches implementation
|
||||
- [ ] Remove or archive legacy spec documents
|
||||
|
||||
---
|
||||
|
||||
## Data Source Reference
|
||||
|
||||
| Source | Datasets | URL |
|
||||
|--------|----------|-----|
|
||||
| Toronto Open Data | Neighbourhoods, Census Profiles, Parks, Schools, Childcare, TTC | open.toronto.ca |
|
||||
| Toronto Police | Crime Rates, MCI, Shootings | data.torontopolice.on.ca |
|
||||
| CMHC | Rental Market Survey | cmhc-schl.gc.ca |
|
||||
|
||||
---
|
||||
|
||||
## CMHC Zone Mapping Note
|
||||
|
||||
CMHC uses 15 zones that don't align with 158 neighbourhoods. Strategy:
|
||||
- Create `bridge_cmhc_neighbourhood` with area weights
|
||||
- Allocate rental metrics proportionally to overlapping neighbourhoods
|
||||
- Document methodology in `/toronto/methodology` page
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 2.0*
|
||||
*Trimmed from v1.0 for execution clarity*
|
||||
423
docs/changes/Change-Toronto-Analysis.md
Normal file
423
docs/changes/Change-Toronto-Analysis.md
Normal file
@@ -0,0 +1,423 @@
|
||||
# Toronto Neighbourhood Dashboard — Deliverables
|
||||
|
||||
**Project Type:** Interactive Data Visualization Dashboard
|
||||
**Geographic Scope:** City of Toronto, 158 Official Neighbourhoods
|
||||
**Author:** Leo Miranda
|
||||
**Version:** 1.0 | January 2026
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Multi-tab analytics dashboard built around Toronto's official neighbourhood boundaries. The core interaction is a choropleth map where users explore the city through different thematic lenses—housing affordability, safety, demographics, amenities—with supporting visualizations that tell a cohesive story per theme.
|
||||
|
||||
**Primary Goals:**
|
||||
1. Demonstrate interactive data visualization skills (Plotly/Dash)
|
||||
2. Showcase data engineering capabilities (multi-source ETL, dimensional modeling)
|
||||
3. Create a portfolio piece with genuine analytical value
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Geographic Foundation (Required First)
|
||||
|
||||
| Dataset | Source | Format | Last Updated | Download |
|
||||
|---------|--------|--------|--------------|----------|
|
||||
| **Neighbourhoods Boundaries** | Toronto Open Data | GeoJSON | 2024 | [Link](https://open.toronto.ca/dataset/neighbourhoods/) |
|
||||
| **Neighbourhood Profiles** | Toronto Open Data | CSV | 2021 Census | [Link](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
|
||||
|
||||
**Critical Notes:**
|
||||
- Toronto uses 158 official neighbourhoods (updated 2024, was 140)
|
||||
- GeoJSON includes `AREA_ID` for joining to tabular data
|
||||
- Neighbourhood Profiles has 2,400+ indicators per neighbourhood from Census
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Tier 1 — MVP Datasets
|
||||
|
||||
| Dataset | Source | Measures Available | Update Freq | Granularity |
|
||||
|---------|--------|-------------------|-------------|-------------|
|
||||
| **Neighbourhoods GeoJSON** | Toronto Open Data | Boundary polygons, area IDs | Static | Neighbourhood |
|
||||
| **Neighbourhood Profiles (full)** | Toronto Open Data | 2,400+ Census indicators | Every 5 years | Neighbourhood |
|
||||
| **Neighbourhood Crime Rates** | Toronto Police Portal | MCI rates per 100K by year | Annual | Neighbourhood |
|
||||
| **CMHC Rental Market Survey** | CMHC Portal | Avg rent by bedroom, vacancy rate | Annual (Oct) | 15 CMHC Zones |
|
||||
| **Parks** | Toronto Open Data | Park locations, area, type | Annual | Point/Polygon |
|
||||
|
||||
**Total API/Download Calls:** 5
|
||||
**Data Volume:** ~50MB combined
|
||||
|
||||
### Tier 1 Measures to Extract
|
||||
|
||||
**From Neighbourhood Profiles:**
|
||||
- Population, population density
|
||||
- Median household income
|
||||
- Age distribution (0-14, 15-24, 25-44, 45-64, 65+)
|
||||
- % Immigrants, % Visible minorities
|
||||
- Top languages spoken
|
||||
- Unemployment rate
|
||||
- Education attainment (% with post-secondary)
|
||||
- Housing tenure (own vs rent %)
|
||||
- Dwelling types distribution
|
||||
- Average rent, housing costs as % of income
|
||||
|
||||
**From Crime Rates:**
|
||||
- Total MCI rate per 100K population
|
||||
- Year-over-year crime trend
|
||||
|
||||
**From CMHC:**
|
||||
- Average monthly rent (1BR, 2BR, 3BR)
|
||||
- Vacancy rates
|
||||
|
||||
**From Parks:**
|
||||
- Park count per neighbourhood
|
||||
- Park area per capita
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Tier 2 — Expansion Datasets
|
||||
|
||||
| Dataset | Source | Measures Available | Update Freq | Granularity |
|
||||
|---------|--------|-------------------|-------------|-------------|
|
||||
| **Major Crime Indicators (MCI)** | Toronto Police Portal | Assault, B&E, auto theft, robbery, theft over | Quarterly | Neighbourhood |
|
||||
| **Shootings & Firearm Discharges** | Toronto Police Portal | Shooting incidents, injuries, fatalities | Quarterly | Neighbourhood |
|
||||
| **Building Permits** | Toronto Open Data | New construction, permits by type | Monthly | Address-level |
|
||||
| **Schools** | Toronto Open Data | Public/Catholic, elementary/secondary | Annual | Point |
|
||||
| **TTC Routes & Stops** | Toronto Open Data | Route geometry, stop locations | Static | Route/Stop |
|
||||
| **Licensed Child Care Centres** | Toronto Open Data | Capacity, ages served, locations | Annual | Point |
|
||||
|
||||
### Tier 2 Measures to Extract
|
||||
|
||||
**From MCI Details:**
|
||||
- Breakdown by crime type (assault, B&E, auto theft, robbery, theft over)
|
||||
|
||||
**From Shootings:**
|
||||
- Shooting incidents count
|
||||
- Injuries/fatalities
|
||||
|
||||
**From Building Permits:**
|
||||
- New construction permits (trailing 12 months)
|
||||
- Permit types distribution
|
||||
|
||||
**From Schools:**
|
||||
- Schools per 1000 children
|
||||
- School type breakdown
|
||||
|
||||
**From TTC:**
|
||||
- Transit stops within neighbourhood
|
||||
- Transit accessibility score
|
||||
|
||||
**From Child Care:**
|
||||
- Child care spaces per capita
|
||||
- Coverage by age group
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Data Sources by Thematic Group
|
||||
|
||||
### GROUP A: Housing & Affordability
|
||||
|
||||
| Dataset | Tier | Measures | Update Freq |
|
||||
|---------|------|----------|-------------|
|
||||
| Neighbourhood Profiles (Housing) | 1 | Avg rent, ownership %, dwelling types, housing costs as % of income | Every 5 years |
|
||||
| CMHC Rental Market Survey | 1 | Avg rent by bedroom, vacancy rate, rental universe | Annual |
|
||||
| Building Permits | 2 | New construction, permits by type | Monthly |
|
||||
|
||||
**Calculated Metrics:**
|
||||
- Rent-to-Income Ratio (CMHC rent ÷ Census income)
|
||||
- Affordability Index (% of income spent on housing)
|
||||
|
||||
---
|
||||
|
||||
### GROUP B: Safety & Crime
|
||||
|
||||
| Dataset | Tier | Measures | Update Freq |
|
||||
|---------|------|----------|-------------|
|
||||
| Neighbourhood Crime Rates | 1 | MCI rates per 100K pop by year | Annual |
|
||||
| Major Crime Indicators (MCI) | 2 | Assault, B&E, auto theft, robbery, theft over | Quarterly |
|
||||
| Shootings & Firearm Discharges | 2 | Shooting incidents, injuries, fatalities | Quarterly |
|
||||
|
||||
**Calculated Metrics:**
|
||||
- Year-over-year crime change %
|
||||
- Crime type distribution
|
||||
|
||||
---
|
||||
|
||||
### GROUP C: Demographics & Community
|
||||
|
||||
| Dataset | Tier | Measures | Update Freq |
|
||||
|---------|------|----------|-------------|
|
||||
| Neighbourhood Profiles (Demographics) | 1 | Age distribution, household composition, income | Every 5 years |
|
||||
| Neighbourhood Profiles (Immigration) | 1 | Immigration status, visible minorities, languages | Every 5 years |
|
||||
| Neighbourhood Profiles (Education) | 1 | Education attainment, field of study | Every 5 years |
|
||||
| Neighbourhood Profiles (Labour) | 1 | Employment rate, occupation, industry | Every 5 years |
|
||||
|
||||
---
|
||||
|
||||
### GROUP D: Transportation & Mobility
|
||||
|
||||
| Dataset | Tier | Measures | Update Freq |
|
||||
|---------|------|----------|-------------|
|
||||
| Commute Mode (Census) | 1 | % car, transit, walk, bike | Every 5 years |
|
||||
| TTC Routes & Stops | 2 | Route geometry, stop locations | Static |
|
||||
|
||||
**Calculated Metrics:**
|
||||
- Transit accessibility (stops within 500m of neighbourhood centroid)
|
||||
|
||||
---
|
||||
|
||||
### GROUP E: Amenities & Services
|
||||
|
||||
| Dataset | Tier | Measures | Update Freq |
|
||||
|---------|------|----------|-------------|
|
||||
| Parks | 1 | Park locations, area, type | Annual |
|
||||
| Schools | 2 | Public/Catholic, elementary/secondary | Annual |
|
||||
| Licensed Child Care Centres | 2 | Capacity, ages served | Annual |
|
||||
|
||||
**Calculated Metrics:**
|
||||
- Park area per capita
|
||||
- Schools per 1000 children (ages 5-17)
|
||||
- Child care spaces per 1000 children (ages 0-4)
|
||||
|
||||
---
|
||||
|
||||
## Part 5: Tab Structure
|
||||
|
||||
### Tab Architecture
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ [Overview] [Housing] [Safety] [Demographics] [Amenities] │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────┐ ┌────────────────┐ │
|
||||
│ │ │ │ KPI Card 1 │ │
|
||||
│ │ CHOROPLETH MAP │ ├────────────────┤ │
|
||||
│ │ (158 Neighbourhoods) │ │ KPI Card 2 │ │
|
||||
│ │ │ ├────────────────┤ │
|
||||
│ │ Click to select │ │ KPI Card 3 │ │
|
||||
│ │ │ └────────────────┘ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ Supporting Chart 1 │ │ Supporting Chart 2 │ │
|
||||
│ │ (Context/Trend) │ │ (Comparison/Rank) │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
│ │
|
||||
│ [Neighbourhood: Selected Name] ──────────────────────── │
|
||||
│ Details panel with all metrics for selected area │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Tab 1: Overview (Default Landing)
|
||||
|
||||
**Story:** "How do Toronto neighbourhoods compare across key livability metrics?"
|
||||
|
||||
| Element | Content | Data Source |
|
||||
|---------|---------|-------------|
|
||||
| Map Colour | Composite livability score | Calculated from weighted metrics |
|
||||
| KPI Cards | Population, Median Income, Avg Crime Rate | Neighbourhood Profiles, Crime Rates |
|
||||
| Chart 1 | Top 10 / Bottom 10 by livability score | Calculated |
|
||||
| Chart 2 | Income vs Crime scatter plot | Neighbourhood Profiles, Crime Rates |
|
||||
|
||||
**Metric Selector:** Allow user to change map colour by any single metric.
|
||||
|
||||
---
|
||||
|
||||
### Tab 2: Housing & Affordability
|
||||
|
||||
**Story:** "Where can you afford to live, and what's being built?"
|
||||
|
||||
| Element | Content | Data Source |
|
||||
|---------|---------|-------------|
|
||||
| Map Colour | Rent-to-Income Ratio (Affordability Index) | CMHC + Census income |
|
||||
| KPI Cards | Median Rent (1BR), Vacancy Rate, New Permits (12mo) | CMHC, Building Permits |
|
||||
| Chart 1 | Rent trend (5-year line chart by bedroom) | CMHC historical |
|
||||
| Chart 2 | Dwelling type breakdown (pie/bar) | Neighbourhood Profiles |
|
||||
|
||||
**Metric Selector:** Toggle between rent, ownership %, dwelling types.
|
||||
|
||||
---
|
||||
|
||||
### Tab 3: Safety
|
||||
|
||||
**Story:** "How safe is each neighbourhood, and what crimes are most common?"
|
||||
|
||||
| Element | Content | Data Source |
|
||||
|---------|---------|-------------|
|
||||
| Map Colour | Total MCI Rate per 100K | Crime Rates |
|
||||
| KPI Cards | Total Crimes, YoY Change %, Shooting Incidents | Crime Rates, Shootings |
|
||||
| Chart 1 | Crime type breakdown (stacked bar) | MCI Details |
|
||||
| Chart 2 | 5-year crime trend (line chart) | Crime Rates historical |
|
||||
|
||||
**Metric Selector:** Toggle between total crime, specific crime types, shootings.
|
||||
|
||||
---
|
||||
|
||||
### Tab 4: Demographics
|
||||
|
||||
**Story:** "Who lives here? Age, income, diversity."
|
||||
|
||||
| Element | Content | Data Source |
|
||||
|---------|---------|-------------|
|
||||
| Map Colour | Median Household Income | Neighbourhood Profiles |
|
||||
| KPI Cards | Population, % Immigrant, Unemployment Rate | Neighbourhood Profiles |
|
||||
| Chart 1 | Age distribution (population pyramid or bar) | Neighbourhood Profiles |
|
||||
| Chart 2 | Top languages spoken (horizontal bar) | Neighbourhood Profiles |
|
||||
|
||||
**Metric Selector:** Income, immigrant %, age groups, education.
|
||||
|
||||
---
|
||||
|
||||
### Tab 5: Amenities & Services
|
||||
|
||||
**Story:** "What's nearby? Parks, schools, child care, transit."
|
||||
|
||||
| Element | Content | Data Source |
|
||||
|---------|---------|-------------|
|
||||
| Map Colour | Park Area per Capita | Parks + Population |
|
||||
| KPI Cards | Parks Count, Schools Count, Child Care Spaces | Multiple datasets |
|
||||
| Chart 1 | Amenity density comparison (radar or bar) | Calculated |
|
||||
| Chart 2 | Transit accessibility (stops within 500m) | TTC Stops |
|
||||
|
||||
**Metric Selector:** Parks, schools, child care, transit access.
|
||||
|
||||
---
|
||||
|
||||
## Part 6: Data Pipeline Architecture
|
||||
|
||||
### ETL Flow
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ DATA SOURCES │ │ STAGING LAYER │ │ MART LAYER │
|
||||
│ │ │ │ │ │
|
||||
│ Toronto Open │────▶│ stg_geography │────▶│ dim_neighbourhood│
|
||||
│ Data Portal │ │ stg_census │ │ fact_crime │
|
||||
│ │ │ stg_crime │ │ fact_housing │
|
||||
│ CMHC Portal │────▶│ stg_rental │ │ fact_amenities │
|
||||
│ │ │ stg_permits │ │ │
|
||||
│ Toronto Police │────▶│ stg_amenities │ │ agg_dashboard │
|
||||
│ Portal │ │ stg_childcare │ │ (pre-computed) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
### Key Transformations
|
||||
|
||||
| Transformation | Description |
|
||||
|----------------|-------------|
|
||||
| **Geography Standardization** | Ensure all datasets use `neighbourhood_id` (AREA_ID from GeoJSON) |
|
||||
| **Census Pivot** | Neighbourhood Profiles is wide format — pivot to metrics per neighbourhood |
|
||||
| **CMHC Zone Mapping** | Create crosswalk from 15 CMHC zones to 158 neighbourhoods |
|
||||
| **Amenity Aggregation** | Spatial join point data (schools, parks, child care) to neighbourhood polygons |
|
||||
| **Rate Calculations** | Normalize counts to per-capita or per-100K |
|
||||
|
||||
### Data Refresh Schedule
|
||||
|
||||
| Layer | Frequency | Trigger |
|
||||
|-------|-----------|---------|
|
||||
| Staging (API pulls) | Weekly | Scheduled job |
|
||||
| Marts (transforms) | Weekly | Post-staging |
|
||||
| Dashboard cache | On-demand | User refresh button |
|
||||
|
||||
---
|
||||
|
||||
## Part 7: Technical Stack
|
||||
|
||||
### Core Stack
|
||||
|
||||
| Component | Technology | Rationale |
|
||||
|-----------|------------|-----------|
|
||||
| **Frontend** | Plotly Dash | Production-ready, rapid iteration |
|
||||
| **Mapping** | Plotly `choropleth_mapbox` | Native Dash integration |
|
||||
| **Data Store** | PostgreSQL + PostGIS | Spatial queries, existing expertise |
|
||||
| **ETL** | Python (Pandas, SQLAlchemy) | Existing stack |
|
||||
| **Deployment** | Render / Railway | Free tier, easy Dash hosting |
|
||||
|
||||
### Alternative (Portfolio Stretch)
|
||||
|
||||
| Component | Technology | Why Consider |
|
||||
|-----------|------------|--------------|
|
||||
| **Frontend** | React + deck.gl | More "modern" for portfolio |
|
||||
| **Data Store** | DuckDB | Serverless, embeddable |
|
||||
| **ETL** | dbt | Aligns with skills roadmap |
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Data Source URLs
|
||||
|
||||
| Source | URL |
|
||||
|--------|-----|
|
||||
| Toronto Open Data — Neighbourhoods | https://open.toronto.ca/dataset/neighbourhoods/ |
|
||||
| Toronto Open Data — Neighbourhood Profiles | https://open.toronto.ca/dataset/neighbourhood-profiles/ |
|
||||
| Toronto Police — Neighbourhood Crime Rates | https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-open-data |
|
||||
| Toronto Police — MCI | https://data.torontopolice.on.ca/datasets/major-crime-indicators-open-data |
|
||||
| Toronto Police — Shootings | https://data.torontopolice.on.ca/datasets/shootings-firearm-discharges-open-data |
|
||||
| CMHC Rental Market Survey | https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market |
|
||||
| Toronto Open Data — Parks | https://open.toronto.ca/dataset/parks/ |
|
||||
| Toronto Open Data — Schools | https://open.toronto.ca/dataset/school-locations-all-types/ |
|
||||
| Toronto Open Data — Building Permits | https://open.toronto.ca/dataset/building-permits-cleared-permits/ |
|
||||
| Toronto Open Data — Child Care | https://open.toronto.ca/dataset/licensed-child-care-centres/ |
|
||||
| Toronto Open Data — TTC Routes | https://open.toronto.ca/dataset/ttc-routes-and-schedules/ |
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: Colour Palettes
|
||||
|
||||
### Affordability (Diverging)
|
||||
| Status | Hex | Usage |
|
||||
|--------|-----|-------|
|
||||
| Affordable (<30% income) | `#2ecc71` | Green |
|
||||
| Stretched (30-50%) | `#f1c40f` | Yellow |
|
||||
| Unaffordable (>50%) | `#e74c3c` | Red |
|
||||
|
||||
### Safety (Sequential)
|
||||
| Status | Hex | Usage |
|
||||
|--------|-----|-------|
|
||||
| Safest (lowest crime) | `#27ae60` | Dark green |
|
||||
| Moderate | `#f39c12` | Orange |
|
||||
| Highest Crime | `#c0392b` | Dark red |
|
||||
|
||||
### Demographics — Income (Sequential)
|
||||
| Level | Hex | Usage |
|
||||
|-------|-----|-------|
|
||||
| Highest Income | `#1a5276` | Dark blue |
|
||||
| Mid Income | `#5dade2` | Light blue |
|
||||
| Lowest Income | `#ecf0f1` | Light gray |
|
||||
|
||||
### General Recommendation
|
||||
Use **Viridis** or **Plasma** colorscales for perceptually uniform gradients on continuous metrics.
|
||||
|
||||
---
|
||||
|
||||
## Appendix C: Glossary
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **MCI** | Major Crime Indicators — Assault, B&E, Auto Theft, Robbery, Theft Over |
|
||||
| **CMHC Zone** | Canada Mortgage and Housing Corporation rental market survey zones (15 in Toronto) |
|
||||
| **Rent-to-Income Ratio** | Monthly rent ÷ monthly household income; <30% is considered affordable |
|
||||
| **PostGIS** | PostgreSQL extension for geographic data |
|
||||
| **Choropleth** | Thematic map where areas are shaded based on a statistical variable |
|
||||
|
||||
---
|
||||
|
||||
## Appendix D: Interview Talking Points
|
||||
|
||||
When discussing this project in interviews, emphasize:
|
||||
|
||||
1. **Data Engineering:** "I built a multi-source ETL pipeline that standardizes geographic keys across Census data, police data, and CMHC rental surveys—three different granularities I had to reconcile."
|
||||
|
||||
2. **Dimensional Modeling:** "The data model follows star schema patterns with a central neighbourhood dimension table and fact tables for crime, housing, and amenities."
|
||||
|
||||
3. **dbt Patterns:** "The transformation layer uses staging → intermediate → mart patterns, which I've documented for maintainability."
|
||||
|
||||
4. **Business Value:** "The dashboard answers questions like 'Where can a young professional afford to live that's safe and has good transit?' — turning raw data into actionable insights."
|
||||
|
||||
5. **Technical Decisions:** "I chose Plotly Dash over a React frontend because it let me iterate faster while maintaining production-quality interactivity. For a portfolio piece, speed to working demo matters."
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.0*
|
||||
*Created: January 2026*
|
||||
*Author: Leo Miranda / Claude*
|
||||
520
docs/changes/Portfolio-Changes.txt
Normal file
520
docs/changes/Portfolio-Changes.txt
Normal file
@@ -0,0 +1,520 @@
|
||||
# Leo Miranda — Portfolio Website Blueprint
|
||||
|
||||
Structure, navigation, and complete page content
|
||||
|
||||
---
|
||||
|
||||
## Site Architecture
|
||||
|
||||
```
|
||||
leodata.science
|
||||
├── Home (Landing)
|
||||
├── About
|
||||
├── Projects (Overview + Status)
|
||||
│ └── [Side Navbar]
|
||||
│ ├── → Toronto Housing Market Dashboard (live)
|
||||
│ ├── → US Retail Energy Price Predictor (coming soon)
|
||||
│ └── → DataFlow Platform (Phase 3)
|
||||
├── Lab (Bandit Labs / Experiments)
|
||||
├── Blog
|
||||
│ └── [Articles]
|
||||
├── Resume (downloadable + inline)
|
||||
└── Contact
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Navigation Structure
|
||||
|
||||
Primary Nav: Home | Projects | Lab | Blog | About | Resume
|
||||
|
||||
Footer: LinkedIn | GitHub | Email | “Built with Dash & too much coffee”
|
||||
|
||||
---
|
||||
|
||||
# PAGE CONTENT
|
||||
|
||||
---
|
||||
|
||||
## 1. HOME (Landing Page)
|
||||
|
||||
### Hero Section
|
||||
|
||||
Headline:
|
||||
|
||||
> I turn messy data into systems that actually work.
|
||||
|
||||
Subhead:
|
||||
|
||||
> Data Engineer & Analytics Specialist. 8 years building pipelines, dashboards, and the infrastructure nobody sees but everyone depends on. Based in Toronto.
|
||||
|
||||
CTA Buttons:
|
||||
|
||||
- View Projects → /projects
|
||||
- Get In Touch → /contact
|
||||
|
||||
---
|
||||
|
||||
### Quick Impact Strip (Optional — 3-4 stats)
|
||||
|
||||
| 1B+ | 40% | 5 Years |
|
||||
|-------------------------------------------------|------------------------------------|-----------------------------|
|
||||
| Rows processed daily across enterprise platform | Efficiency gain through automation | Building DataFlow from zero |
|
||||
|
||||
---
|
||||
|
||||
### Featured Project Card
|
||||
|
||||
Toronto Housing Market Dashboard
|
||||
|
||||
> Real-time analytics on Toronto’s housing trends. dbt-powered ETL, Python scraping, Plotly visualization.
|
||||
> \[View Dashboard\] \[View Repository\]
|
||||
|
||||
---
|
||||
|
||||
### Brief Intro (2-3 sentences)
|
||||
|
||||
I’m a data engineer who’s spent the last 8 years in the trenches—building the infrastructure that feeds dashboards, automates the boring stuff, and makes data actually usable. Most of my work has been in contact center operations and energy, where I’ve had to be scrappy: one-person data teams, legacy systems, stakeholders who need answers yesterday.
|
||||
|
||||
I like solving real problems, not theoretical ones.
|
||||
|
||||
---
|
||||
|
||||
## 2. ABOUT PAGE
|
||||
|
||||
### Opening
|
||||
|
||||
I didn’t start in data. I started in project management—CAPM certified, ITIL trained, the whole corporate playbook. Then I realized I liked building systems more than managing timelines, and I was better at automating reports than attending meetings about them.
|
||||
|
||||
That pivot led me to where I am now: 8 years deep in data engineering, analytics, and the messy reality of turning raw information into something people can actually use.
|
||||
|
||||
---
|
||||
|
||||
### What I Actually Do
|
||||
|
||||
The short version: I build data infrastructure. Pipelines, warehouses, dashboards, automation—the invisible machinery that makes businesses run on data instead of gut feelings.
|
||||
|
||||
The longer version: At Summitt Energy, I’ve been the sole data professional supporting 150+ employees across 9 markets (Canada and US). I inherited nothing—no data warehouse, no reporting infrastructure, no documentation. Over 5 years, I built DataFlow: an enterprise platform processing 1B+ rows, integrating contact center data, CRM systems, and legacy tools that definitely weren’t designed to talk to each other.
|
||||
|
||||
That meant learning to be a generalist. I’ve done ETL pipeline development (Python, SQLAlchemy), dimensional modeling, dashboard design (Power BI, Plotly-Dash), API integration, and more stakeholder management than I’d like to admit. When you’re the only data person, you learn to wear every hat.
|
||||
|
||||
---
|
||||
|
||||
### How I Think About Data
|
||||
|
||||
I’m not interested in data for data’s sake. The question I always start with: What decision does this help someone make?
|
||||
|
||||
Most of my work has been in operations-heavy environments—contact centers, energy retail, logistics. These aren’t glamorous domains, but they’re where data can have massive impact. A 30% improvement in abandon rate isn’t just a metric; it’s thousands of customers who didn’t hang up frustrated. A 40% reduction in reporting time means managers can actually manage instead of wrestling with spreadsheets.
|
||||
|
||||
I care about outcomes, not technology stacks.
|
||||
|
||||
---
|
||||
|
||||
### The Technical Stuff (For Those Who Want It)
|
||||
|
||||
Languages: Python (Pandas, SQLAlchemy, FastAPI), SQL (MSSQL, PostgreSQL), R, VBA
|
||||
|
||||
Data Engineering: ETL/ELT pipelines, dimensional modeling (star schema), dbt patterns, batch processing, API integration, web scraping (Selenium)
|
||||
|
||||
Visualization: Plotly/Dash, Power BI, Tableau
|
||||
|
||||
Platforms: Genesys Cloud, Five9, Zoho, Azure DevOps
|
||||
|
||||
Currently Learning: Cloud certification (Azure DP-203), Airflow, Snowflake
|
||||
|
||||
---
|
||||
|
||||
### Outside Work
|
||||
|
||||
I’m a Brazilian-Canadian based in Toronto. I speak Portuguese (native), English (fluent), and enough Spanish to survive.
|
||||
|
||||
When I’m not staring at SQL, I’m usually:
|
||||
|
||||
- Building automation tools for small businesses through Bandit Labs (my side project)
|
||||
- Contributing to open source (MCP servers, Claude Code plugins)
|
||||
- Trying to explain to my kid why Daddy’s job involves “making computers talk to each other”
|
||||
|
||||
---
|
||||
|
||||
### What I’m Looking For
|
||||
|
||||
I’m currently exploring Senior Data Analyst and Data Engineer roles in the Toronto area (or remote). I’m most interested in:
|
||||
|
||||
- Companies that treat data as infrastructure, not an afterthought
|
||||
- Teams where I can contribute to architecture decisions, not just execute tickets
|
||||
- Operations-focused industries (energy, logistics, financial services, contact center tech)
|
||||
|
||||
If that sounds like your team, let’s talk.
|
||||
|
||||
\[Download Resume\] \[Contact Me\]
|
||||
|
||||
---
|
||||
|
||||
## 3. PROJECTS PAGE
|
||||
|
||||
### Navigation Note
|
||||
|
||||
The Projects page serves as an overview and status hub for all projects. A side navbar provides direct links to live dashboards and repositories. Users land on the overview first, then navigate to specific projects via the sidebar.
|
||||
|
||||
### Intro Text
|
||||
|
||||
These are projects I’ve built—some professional (anonymized where needed), some personal. Each one taught me something. Use the sidebar to jump directly to live dashboards or explore the overviews below.
|
||||
|
||||
---
|
||||
|
||||
### Project Card: Toronto Housing Market Dashboard
|
||||
|
||||
Type: Personal Project | Status: Live
|
||||
|
||||
The Problem:
|
||||
Toronto’s housing market moves fast, and most publicly available data is either outdated, behind paywalls, or scattered across dozens of sources. I wanted a single dashboard that tracked trends in real-time.
|
||||
|
||||
What I Built:
|
||||
|
||||
- Data Pipeline: Python scraper pulling listings data, automated on schedule
|
||||
- Transformation Layer: dbt-based SQL architecture (staging → intermediate → marts)
|
||||
- Visualization: Interactive Plotly-Dash dashboard with filters by neighborhood, price range, property type
|
||||
- Infrastructure: PostgreSQL backend, version-controlled in Git
|
||||
|
||||
Tech Stack: Python, dbt, PostgreSQL, Plotly-Dash, GitHub Actions
|
||||
|
||||
What I Learned:
|
||||
Real estate data is messy as hell. Listings get pulled, prices change, duplicates are everywhere. Building a reliable pipeline meant implementing serious data quality checks and learning to embrace “good enough” over “perfect.”
|
||||
|
||||
\[View Live Dashboard\] \[View Repository (ETL + dbt)\]
|
||||
|
||||
---
|
||||
|
||||
### Project Card: US Retail Energy Price Predictor
|
||||
|
||||
Type: Personal Project | Status: Coming Soon (Phase 2)
|
||||
|
||||
The Problem:
|
||||
Retail energy pricing in deregulated US markets is volatile and opaque. Consumers and analysts lack accessible tools to understand pricing trends and forecast where rates are headed.
|
||||
|
||||
What I’m Building:
|
||||
|
||||
- Data Pipeline: Automated ingestion of public pricing data across multiple US markets
|
||||
- ML Model: Price prediction using time series forecasting (ARIMA, Prophet, or similar)
|
||||
- Transformation Layer: dbt-based SQL architecture for feature engineering
|
||||
- Visualization: Interactive dashboard showing historical trends + predictions by state/market
|
||||
|
||||
Tech Stack: Python, Scikit-learn, dbt, PostgreSQL, Plotly-Dash
|
||||
|
||||
Why This Project:
|
||||
This showcases the ML side of my skillset—something the Toronto Housing dashboard doesn’t cover. It also leverages my domain expertise from 5+ years in retail energy operations.
|
||||
|
||||
\[Coming Soon\]
|
||||
|
||||
---
|
||||
|
||||
### Project Card: DataFlow Platform (Enterprise Case Study)
|
||||
|
||||
Type: Professional | Status: Deferred (Phase 3 — requires sanitized codebase)
|
||||
|
||||
The Context:
|
||||
When I joined Summitt Energy, there was no data infrastructure. Reports were manual. Insights were guesswork. I was hired to fix that.
|
||||
|
||||
What I Built (Over 5 Years):
|
||||
|
||||
- v1 (2020): Basic ETL scripts pulling Genesys Cloud data into MSSQL
|
||||
- v2 (2021): Dimensional model (star schema) with fact/dimension tables
|
||||
- v3 (2022): Python refactor with SQLAlchemy ORM, batch processing, error handling
|
||||
- v4 (2023-24): dbt-pattern SQL views (staging → intermediate → marts), FastAPI layer, CLI tools
|
||||
|
||||
Current State:
|
||||
|
||||
- 21 tables, 1B+ rows
|
||||
- 5,000+ daily transactions processed
|
||||
- Integrates Genesys Cloud, Zoho CRM, legacy systems
|
||||
- Feeds Power BI prototypes and production Dash dashboards
|
||||
- Near-zero reporting errors
|
||||
|
||||
Impact:
|
||||
|
||||
- 40% improvement in reporting efficiency
|
||||
- 30% reduction in call abandon rate (via KPI framework)
|
||||
- 50% faster Average Speed to Answer
|
||||
- 100% callback completion rate
|
||||
|
||||
What I Learned:
|
||||
Building data infrastructure as a team of one forces brutal prioritization. I learned to ship imperfect solutions fast, iterate based on feedback, and never underestimate how long stakeholder buy-in takes.
|
||||
|
||||
Note: This is proprietary work. A sanitized case study with architecture patterns (no proprietary data) will be published in Phase 3.
|
||||
|
||||
---
|
||||
|
||||
### Project Card: AI-Assisted Automation (Bandit Labs)
|
||||
|
||||
Type: Consulting/Side Business | Status: Active
|
||||
|
||||
What It Is:
|
||||
Bandit Labs is my consulting practice focused on automation for small businesses. Most clients don’t need enterprise data platforms—they need someone to eliminate the 4 hours/week they spend manually entering receipts.
|
||||
|
||||
Sample Work:
|
||||
|
||||
- Receipt Processing Automation: OCR pipeline (Tesseract, Google Vision) extracting purchase data from photos, pushing directly to QuickBooks. Eliminated 3-4 hours/week of manual entry for a restaurant client.
|
||||
- Product Margin Tracker: Plotly-Dash dashboard with real-time profitability insights
|
||||
- Claude Code Plugins: MCP servers for Gitea, Wiki.js, NetBox integration
|
||||
|
||||
Why I Do This:
|
||||
Small businesses are underserved by the data/automation industry. Everyone wants to sell them enterprise software they don’t need. I like solving problems at a scale where the impact is immediately visible.
|
||||
|
||||
\[Learn More About Bandit Labs\]
|
||||
|
||||
---
|
||||
|
||||
## 4. LAB PAGE (Bandit Labs / Experiments)
|
||||
|
||||
### Intro
|
||||
|
||||
This is where I experiment. Some of this becomes client work. Some of it teaches me something and gets abandoned. All of it is real code solving real (or at least real-adjacent) problems.
|
||||
|
||||
---
|
||||
|
||||
### Bandit Labs — Automation for Small Business
|
||||
|
||||
I started Bandit Labs because I kept meeting small business owners drowning in manual work that should have been automated years ago. Enterprise tools are overkill. Custom development is expensive. There’s a gap in the middle.
|
||||
|
||||
What I Offer:
|
||||
|
||||
- Receipt/invoice processing automation
|
||||
- Dashboard development (Plotly-Dash)
|
||||
- Data pipeline setup for non-technical teams
|
||||
- AI integration for repetitive tasks
|
||||
|
||||
Recent Client Work:
|
||||
|
||||
- Rio Açaí (Restaurant, Gatineau): Receipt OCR → QuickBooks integration. Saved 3-4 hours/week.
|
||||
|
||||
\[Contact for Consulting\]
|
||||
|
||||
---
|
||||
|
||||
### Open Source / Experiments
|
||||
|
||||
MCP Servers (Model Context Protocol)
|
||||
I’ve built production-ready MCP servers for:
|
||||
|
||||
- Gitea: Issue management, label operations
|
||||
- Wiki.js: Documentation access via GraphQL
|
||||
- NetBox: CMDB integration (DCIM, IPAM, Virtualization)
|
||||
|
||||
These let AI assistants (like Claude) interact with infrastructure tools through natural language. Still experimental, but surprisingly useful for my own workflows.
|
||||
|
||||
Claude Code Plugins
|
||||
|
||||
- projman: AI-guided sprint planning with Gitea/Wiki.js integration
|
||||
- cmdb-assistant: Conversational infrastructure queries against NetBox
|
||||
- project-hygiene: Post-task cleanup automation
|
||||
|
||||
\[View on GitHub\]
|
||||
|
||||
---
|
||||
|
||||
## 5. BLOG PAGE
|
||||
|
||||
### Intro
|
||||
|
||||
I write occasionally about data engineering, automation, and the reality of being a one-person data team. No hot takes, no growth hacking—just things I’ve learned the hard way.
|
||||
|
||||
---
|
||||
|
||||
### Suggested Initial Articles
|
||||
|
||||
Article 1: “Building a Data Platform as a Team of One”What I learned from 5 years as the sole data professional at a mid-size company
|
||||
|
||||
Outline:
|
||||
|
||||
- The reality of “full stack data” when there’s no one else
|
||||
- Prioritization frameworks (what to build first when everything is urgent)
|
||||
- Technical debt vs. shipping something
|
||||
- Building stakeholder trust without a team to back you up
|
||||
- What I’d do differently
|
||||
|
||||
---
|
||||
|
||||
Article 2: “dbt Patterns Without dbt (And Why I Eventually Adopted Them)”How I accidentally implemented analytics engineering best practices before knowing the terminology
|
||||
|
||||
Outline:
|
||||
|
||||
- The problem: SQL spaghetti in production dashboards
|
||||
- My solution: staging → intermediate → marts view architecture
|
||||
- Why separation of concerns matters for maintainability
|
||||
- The day I discovered dbt and realized I’d been doing this manually
|
||||
- Migration path for legacy SQL codebases
|
||||
|
||||
---
|
||||
|
||||
Article 3: “The Toronto Housing Market Dashboard: A Data Engineering Postmortem”Building a real-time analytics pipeline for messy, uncooperative data
|
||||
|
||||
Outline:
|
||||
|
||||
- Why I built this (and why public housing data sucks)
|
||||
- Data sourcing challenges and ethical scraping
|
||||
- Pipeline architecture decisions
|
||||
- dbt transformation layer design
|
||||
- What broke and how I fixed it
|
||||
- Dashboard design for non-technical users
|
||||
|
||||
---
|
||||
|
||||
Article 4: “Automating Small Business Operations with OCR and AI”A case study in practical automation for non-enterprise clients
|
||||
|
||||
Outline:
|
||||
|
||||
- The client problem: 4 hours/week on receipt entry
|
||||
- Why “just use \[enterprise tool\]” doesn’t work for small business
|
||||
- Building an OCR pipeline with Tesseract and Google Vision
|
||||
- QuickBooks integration gotchas
|
||||
- ROI calculation for automation projects
|
||||
|
||||
---
|
||||
|
||||
Article 5: “What I Wish I Knew Before Building My First ETL Pipeline”Hard-won lessons for junior data engineers
|
||||
|
||||
Outline:
|
||||
|
||||
- Error handling isn’t optional (it’s the whole job)
|
||||
- Logging is your best friend at 2am
|
||||
- Why idempotency matters
|
||||
- The staging table pattern
|
||||
- Testing data pipelines
|
||||
- Documentation nobody will read (write it anyway)
|
||||
|
||||
---
|
||||
|
||||
Article 6: “Predicting US Retail Energy Prices: An ML Project Walkthrough”Building a forecasting model with domain knowledge from 5 years in energy retail
|
||||
|
||||
Outline:
|
||||
|
||||
- Why retail energy pricing is hard to predict (deregulation, seasonality, policy)
|
||||
- Data sourcing and pipeline architecture
|
||||
- Feature engineering with dbt
|
||||
- Model selection (ARIMA vs Prophet vs ensemble)
|
||||
- Evaluation metrics that matter for price forecasting
|
||||
- Lessons from applying domain expertise to ML
|
||||
|
||||
---
|
||||
|
||||
## 6. RESUME PAGE
|
||||
|
||||
### Inline Display
|
||||
|
||||
Show a clean, readable version of the resume directly on the page. Use your tailored Senior Data Analyst version as the base.
|
||||
|
||||
### Download Options
|
||||
|
||||
- \[Download PDF\]
|
||||
- \[Download DOCX\]
|
||||
- \[View on LinkedIn\]
|
||||
|
||||
### Optional: Interactive Timeline
|
||||
|
||||
Visual timeline of career progression with expandable sections for each role. More engaging than a wall of text, but only if you have time to build it.
|
||||
|
||||
---
|
||||
|
||||
## 7. CONTACT PAGE
|
||||
|
||||
### Intro
|
||||
|
||||
I’m currently open to Senior Data Analyst and Data Engineer roles in Toronto (or remote). If you’re working on something interesting and need someone who can build data infrastructure from scratch, I’d like to hear about it.
|
||||
|
||||
For consulting inquiries (automation, dashboards, small business data work), reach out about Bandit Labs.
|
||||
|
||||
---
|
||||
|
||||
### Contact Form Fields
|
||||
|
||||
- Name
|
||||
- Email
|
||||
- Subject (dropdown: Job Opportunity / Consulting Inquiry / Other)
|
||||
- Message
|
||||
|
||||
---
|
||||
|
||||
### Direct Contact
|
||||
|
||||
- Email: leobrmi@hotmail.com
|
||||
- Phone: (416) 859-7936
|
||||
- LinkedIn: \[link\]
|
||||
- GitHub: \[link\]
|
||||
|
||||
---
|
||||
|
||||
### Location
|
||||
|
||||
Toronto, ON, Canada
|
||||
Canadian Citizen | Eligible to work in Canada and US
|
||||
|
||||
---
|
||||
|
||||
## TONE GUIDELINES
|
||||
|
||||
### Do:
|
||||
|
||||
- Be direct and specific
|
||||
- Use first person naturally
|
||||
- Include concrete metrics
|
||||
- Acknowledge constraints and tradeoffs
|
||||
- Show personality without being performative
|
||||
- Write like you talk (minus the profanity)
|
||||
|
||||
### Don’t:
|
||||
|
||||
- Use buzzwords without substance (“leveraging synergies”)
|
||||
- Oversell or inflate
|
||||
- Write in third person
|
||||
- Use passive voice excessively
|
||||
- Sound like a LinkedIn influencer
|
||||
- Pretend you’re a full team when you’re one person
|
||||
|
||||
---
|
||||
|
||||
## SEO / DISCOVERABILITY
|
||||
|
||||
### Target Keywords (Organic)
|
||||
|
||||
- Toronto data analyst
|
||||
- Data engineer portfolio
|
||||
- Python ETL developer
|
||||
- dbt analytics engineer
|
||||
- Contact center analytics
|
||||
|
||||
### Blog Strategy
|
||||
|
||||
Aim for 1-2 posts per month initially. Focus on:
|
||||
|
||||
- Technical tutorials (how I built X)
|
||||
- Lessons learned (what went wrong and how I fixed it)
|
||||
- Industry observations (data work in operations-heavy companies)
|
||||
|
||||
---
|
||||
|
||||
## IMPLEMENTATION PRIORITY
|
||||
|
||||
### Phase 1 (MVP — Get it live)
|
||||
|
||||
1. Home page (hero + brief intro + featured project)
|
||||
2. About page (full content)
|
||||
3. Projects page (overview + status cards with navbar links to dashboards)
|
||||
4. Resume page (inline + download)
|
||||
5. Contact page (form + direct info)
|
||||
6. Blog (start with 2-3 articles)
|
||||
|
||||
### Phase 2 (Expand)
|
||||
|
||||
1. Lab page (Bandit Labs + experiments)
|
||||
2. US Retail Energy Price Predictor (ML project — coming soon)
|
||||
3. Add more projects as completed
|
||||
|
||||
### Phase 3 (Polish)
|
||||
|
||||
1. DataFlow Platform case study (requires sanitized fork of proprietary codebase)
|
||||
2. Testimonials (if available from Summitt stakeholders)
|
||||
3. Interactive elements (timeline, project filters)
|
||||
|
||||
---
|
||||
|
||||
Last updated: January 2025
|
||||
@@ -1,809 +0,0 @@
|
||||
# Toronto Housing Price Dashboard
|
||||
## Portfolio Project — Data Specification & Architecture
|
||||
|
||||
**Version**: 5.1
|
||||
**Last Updated**: January 2026
|
||||
**Status**: Specification Complete
|
||||
|
||||
---
|
||||
|
||||
## Document Context
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Parent Document** | `portfolio_project_plan_v5.md` |
|
||||
| **Role** | Detailed specification for Toronto Housing Dashboard |
|
||||
| **Scope** | Data schemas, source URLs, geographic boundaries, V1/V2 decisions |
|
||||
|
||||
**Rule**: For overall project scope, phasing, tech stack, and deployment architecture, see `portfolio_project_plan_v5.md`. This document provides implementation-level detail for the Toronto Housing project specifically.
|
||||
|
||||
**Terminology Note**: This document uses **Stages 1–4** to describe Toronto Housing implementation steps. These are distinct from the **Phases 1–5** in `portfolio_project_plan_v5.md`, which describe the overall portfolio project lifecycle.
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
A dashboard analyzing housing price variations across Toronto neighbourhoods over time, with dual analysis tracks:
|
||||
|
||||
| Track | Data Domain | Primary Source | Geographic Unit |
|
||||
|-------|-------------|----------------|-----------------|
|
||||
| **Purchases** | Sales transactions | TRREB Monthly Reports | ~35 Districts |
|
||||
| **Rentals** | Rental market stats | CMHC Rental Market Survey | ~20 Zones |
|
||||
|
||||
**Core Visualization**: Interactive choropleth map of Toronto with toggle between rental/purchase analysis, time-series exploration by month/year.
|
||||
|
||||
**Enrichment Layer** (V1: overlay only): Neighbourhood-level demographic and socioeconomic context including population density, education attainment, and income. Crime data deferred to Phase 4 of the portfolio project (post-Energy project).
|
||||
|
||||
**Tech Stack & Deployment**: See `portfolio_project_plan_v5.md` → Tech Stack, Deployment Architecture
|
||||
|
||||
---
|
||||
|
||||
## Geographic Layers
|
||||
|
||||
### Layer Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ City of Toronto Official Neighbourhoods (158) │ ← Reference overlay + Enrichment data
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ CMHC Survey Zones (~20) — Census Tract aligned │ ← Rental data
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Boundary Files
|
||||
|
||||
| Layer | Zones | Format | Source | Status |
|
||||
|-------|-------|--------|--------|--------|
|
||||
| **City Neighbourhoods** | 158 | GeoJSON, Shapefile | [GitHub - jasonicarter/toronto-geojson](https://github.com/jasonicarter/toronto-geojson) | ✅ Ready to use |
|
||||
| **TRREB Districts** | ~35 | PDF only | [TRREB Toronto Map PDF](https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf) | ⚠ Requires manual digitization |
|
||||
| **CMHC Zones** | ~20 | R package | R `cmhc` package via `get_cmhc_geography()` | ✅ Available (see note) |
|
||||
|
||||
### Digitization Task: TRREB Districts
|
||||
|
||||
**Input**: TRREB Toronto PDF map
|
||||
**Output**: GeoJSON with district codes (W01-W10, C01-C15, E01-E11)
|
||||
**Tool**: QGIS
|
||||
|
||||
**Process**:
|
||||
1. Import PDF as raster layer in QGIS
|
||||
2. Create vector layer with polygon features
|
||||
3. Trace district boundaries
|
||||
4. Add attributes: `district_code`, `district_name`, `area_type` (West/Central/East)
|
||||
5. Export as GeoJSON (WGS84 / EPSG:4326)
|
||||
|
||||
### CMHC Zone Boundaries
|
||||
|
||||
**Source**: The R `cmhc` package provides CMHC survey geography via the `get_cmhc_geography()` function.
|
||||
|
||||
**Extraction Process**:
|
||||
```r
|
||||
# In R
|
||||
library(cmhc)
|
||||
library(sf)
|
||||
|
||||
# Get Toronto CMA zones
|
||||
toronto_zones <- get_cmhc_geography(
|
||||
geography_type = "ZONE",
|
||||
cma = "Toronto"
|
||||
)
|
||||
|
||||
# Export to GeoJSON for Python/PostGIS
|
||||
st_write(toronto_zones, "cmhc_zones.geojson", driver = "GeoJSON")
|
||||
```
|
||||
|
||||
**Output**: `data/toronto/raw/geo/cmhc_zones.geojson`
|
||||
|
||||
**Why R?**: CMHC zone boundaries are not published as standalone files. The `cmhc` R package is the only reliable programmatic source. One-time extraction, then use GeoJSON in Python stack.
|
||||
|
||||
### ⚠ Neighbourhood Boundary Change (140 → 158)
|
||||
|
||||
The City of Toronto updated from 140 to 158 social planning neighbourhoods in **April 2021**. This affects data alignment:
|
||||
|
||||
| Data Source | Pre-2021 | Post-2021 | Handling |
|
||||
|-------------|----------|-----------|----------|
|
||||
| Census (2016 and earlier) | 140 neighbourhoods | N/A | Use 140-model files |
|
||||
| Census (2021+) | N/A | 158 neighbourhoods | Use 158-model files |
|
||||
|
||||
**V1 Strategy**: Use 2021 Census on 158 boundaries only. Defer historical trend analysis to portfolio Phase 4.
|
||||
|
||||
---
|
||||
|
||||
## Data Source #1: TRREB Monthly Market Reports
|
||||
|
||||
### Source Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | Toronto Regional Real Estate Board |
|
||||
| **URL** | [TRREB Market Watch](https://trreb.ca/index.php/market-news/market-watch) |
|
||||
| **Format** | PDF (monthly reports) |
|
||||
| **Update Frequency** | Monthly |
|
||||
| **Historical Availability** | 2007–Present |
|
||||
| **Access** | Public (aggregate data in PDFs) |
|
||||
| **Extraction Method** | PDF parsing (`pdfplumber` or `camelot-py`) |
|
||||
|
||||
### Available Tables
|
||||
|
||||
#### Table: `trreb_monthly_summary`
|
||||
**Location in PDF**: Pages 3-4 (Summary by Area)
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `report_date` | DATE | First of month (YYYY-MM-01) |
|
||||
| `area_code` | VARCHAR(3) | District code (W01, C01, E01, etc.) |
|
||||
| `area_name` | VARCHAR(100) | District name |
|
||||
| `area_type` | VARCHAR(10) | West / Central / East / North |
|
||||
| `sales` | INTEGER | Number of transactions |
|
||||
| `dollar_volume` | DECIMAL | Total sales volume ($) |
|
||||
| `avg_price` | DECIMAL | Average sale price ($) |
|
||||
| `median_price` | DECIMAL | Median sale price ($) |
|
||||
| `new_listings` | INTEGER | New listings count |
|
||||
| `active_listings` | INTEGER | Active listings at month end |
|
||||
| `avg_sp_lp` | DECIMAL | Avg sale price / list price ratio (%) |
|
||||
| `avg_dom` | INTEGER | Average days on market |
|
||||
|
||||
### Dimensions
|
||||
|
||||
| Dimension | Granularity | Values |
|
||||
|-----------|-------------|--------|
|
||||
| **Time** | Monthly | 2007-01 to present |
|
||||
| **Geography** | District | ~35 TRREB districts |
|
||||
| **Property Type** | Aggregate | All residential (no breakdown in summary) |
|
||||
|
||||
### Metrics Available
|
||||
|
||||
| Metric | Aggregation | Use Case |
|
||||
|--------|-------------|----------|
|
||||
| `avg_price` | Pre-calculated monthly avg | Primary price indicator |
|
||||
| `median_price` | Pre-calculated monthly median | Robust price indicator |
|
||||
| `sales` | Count | Market activity volume |
|
||||
| `avg_dom` | Average | Market velocity |
|
||||
| `avg_sp_lp` | Ratio | Buyer/seller market indicator |
|
||||
| `new_listings` | Count | Supply indicator |
|
||||
| `active_listings` | Snapshot | Inventory level |
|
||||
|
||||
### ⚠ Limitations
|
||||
|
||||
- No transaction-level data (aggregates only)
|
||||
- Property type breakdown requires parsing additional tables
|
||||
- PDF structure may vary slightly across years
|
||||
- District boundaries haven't changed since 2011
|
||||
|
||||
---
|
||||
|
||||
## Data Source #2: CMHC Rental Market Survey
|
||||
|
||||
### Source Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | Canada Mortgage and Housing Corporation |
|
||||
| **URL** | [CMHC Housing Market Information Portal](https://www03.cmhc-schl.gc.ca/hmip-pimh/) |
|
||||
| **Format** | CSV export, API |
|
||||
| **Update Frequency** | Annual (October survey) |
|
||||
| **Historical Availability** | 1990–Present |
|
||||
| **Access** | Public, free registration for bulk downloads |
|
||||
| **Geographic Levels** | CMA → Zone → Neighbourhood → Census Tract |
|
||||
|
||||
### Available Tables
|
||||
|
||||
#### Table: `cmhc_rental_summary`
|
||||
**Portal Path**: Toronto → Primary Rental Market → Summary Statistics
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `survey_year` | INTEGER | Survey year (October) |
|
||||
| `zone_code` | VARCHAR(10) | CMHC zone identifier |
|
||||
| `zone_name` | VARCHAR(100) | Zone name |
|
||||
| `bedroom_type` | VARCHAR(20) | Bachelor / 1-Bed / 2-Bed / 3-Bed+ / Total |
|
||||
| `universe` | INTEGER | Total rental units in zone |
|
||||
| `vacancy_rate` | DECIMAL | Vacancy rate (%) |
|
||||
| `vacancy_rate_reliability` | VARCHAR(1) | Reliability code (a/b/c/d) |
|
||||
| `availability_rate` | DECIMAL | Availability rate (%) |
|
||||
| `average_rent` | DECIMAL | Average monthly rent ($) |
|
||||
| `average_rent_reliability` | VARCHAR(1) | Reliability code |
|
||||
| `median_rent` | DECIMAL | Median monthly rent ($) |
|
||||
| `rent_change_pct` | DECIMAL | YoY rent change (%) |
|
||||
| `turnover_rate` | DECIMAL | Unit turnover rate (%) |
|
||||
|
||||
### Dimensions
|
||||
|
||||
| Dimension | Granularity | Values |
|
||||
|-----------|-------------|--------|
|
||||
| **Time** | Annual | 1990 to present (October snapshot) |
|
||||
| **Geography** | Zone | ~20 CMHC zones in Toronto CMA |
|
||||
| **Bedroom Type** | Category | Bachelor, 1-Bed, 2-Bed, 3-Bed+, Total |
|
||||
| **Structure Type** | Category | Row, Apartment (available in detailed tables) |
|
||||
|
||||
### Metrics Available
|
||||
|
||||
| Metric | Aggregation | Use Case |
|
||||
|--------|-------------|----------|
|
||||
| `average_rent` | Pre-calculated avg | Primary rent indicator |
|
||||
| `median_rent` | Pre-calculated median | Robust rent indicator |
|
||||
| `vacancy_rate` | Percentage | Market tightness |
|
||||
| `availability_rate` | Percentage | Supply accessibility |
|
||||
| `turnover_rate` | Percentage | Tenant mobility |
|
||||
| `rent_change_pct` | YoY % | Rent growth tracking |
|
||||
| `universe` | Count | Market size |
|
||||
|
||||
### Reliability Codes
|
||||
|
||||
| Code | Meaning | Coefficient of Variation |
|
||||
|------|---------|-------------------------|
|
||||
| `a` | Excellent | CV ≤ 2.5% |
|
||||
| `b` | Good | 2.5% < CV ≤ 5% |
|
||||
| `c` | Fair | 5% < CV ≤ 10% |
|
||||
| `d` | Poor (use with caution) | CV > 10% |
|
||||
| `**` | Data suppressed | Sample too small |
|
||||
|
||||
### ⚠ Limitations
|
||||
|
||||
- Annual only (no monthly granularity)
|
||||
- October snapshot (point-in-time)
|
||||
- Zones are larger than TRREB districts
|
||||
- Purpose-built rental only (excludes condo rentals in base survey)
|
||||
|
||||
---
|
||||
|
||||
## Data Source #3: City of Toronto Open Data
|
||||
|
||||
### Source Details
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | City of Toronto |
|
||||
| **URL** | [Toronto Open Data Portal](https://open.toronto.ca/) |
|
||||
| **Format** | GeoJSON, Shapefile, CSV |
|
||||
| **Use Case** | Reference layer, demographic enrichment |
|
||||
|
||||
### Relevant Datasets
|
||||
|
||||
#### Dataset: `neighbourhoods`
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `area_id` | INTEGER | Neighbourhood ID (1-158) |
|
||||
| `area_name` | VARCHAR(100) | Official neighbourhood name |
|
||||
| `geometry` | POLYGON | Boundary geometry |
|
||||
|
||||
#### Dataset: `neighbourhood_profiles` (Census-linked)
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `neighbourhood_id` | INTEGER | Links to neighbourhoods |
|
||||
| `population` | INTEGER | Total population |
|
||||
| `avg_household_income` | DECIMAL | Average household income |
|
||||
| `dwelling_count` | INTEGER | Total dwellings |
|
||||
| `owner_pct` | DECIMAL | % owner-occupied |
|
||||
| `renter_pct` | DECIMAL | % renter-occupied |
|
||||
|
||||
### Enrichment Potential
|
||||
|
||||
Can overlay demographic context on housing data:
|
||||
- Income brackets by neighbourhood
|
||||
- Ownership vs rental ratios
|
||||
- Population density
|
||||
- Dwelling type distribution
|
||||
|
||||
---
|
||||
|
||||
## Data Source #4: Enrichment Data (Density, Education)
|
||||
|
||||
### Purpose
|
||||
|
||||
Provide socioeconomic context to housing price analysis. Enables questions like:
|
||||
- Do neighbourhoods with higher education attainment have higher prices?
|
||||
- How does population density correlate with price per square foot?
|
||||
|
||||
### Geographic Alignment Reality
|
||||
|
||||
**Critical constraint**: Enrichment data is available at the **158-neighbourhood** level, while core housing data sits at **TRREB districts (~35)** and **CMHC zones (~20)**. These do not align cleanly.
|
||||
|
||||
```
|
||||
158 Neighbourhoods (fine) → Enrichment data lives here
|
||||
(no clean crosswalk)
|
||||
~35 TRREB Districts (coarse) → Purchase data lives here
|
||||
~20 CMHC Zones (coarse) → Rental data lives here
|
||||
```
|
||||
|
||||
### Available Enrichment Datasets
|
||||
|
||||
#### Dataset: Neighbourhood Profiles (Census)
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Provider** | City of Toronto (via Statistics Canada Census) |
|
||||
| **URL** | [Toronto Open Data - Neighbourhood Profiles](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
|
||||
| **Format** | CSV, JSON, XML, XLSX |
|
||||
| **Update Frequency** | Every 5 years (Census cycle) |
|
||||
| **Available Years** | 2001, 2006, 2011, 2016, 2021 |
|
||||
| **Geographic Unit** | 158 neighbourhoods (140 pre-2021) |
|
||||
|
||||
**Key Variables**:
|
||||
|
||||
| Variable | Description | Use Case |
|
||||
|----------|-------------|----------|
|
||||
| `population` | Total population | Density calculation |
|
||||
| `land_area_sqkm` | Area in square kilometers | Density calculation |
|
||||
| `pop_density_per_sqkm` | Population per km | Density metric |
|
||||
| `pct_bachelors_or_higher` | % age 25-64 with bachelor's+ | Education proxy |
|
||||
| `median_household_income` | Median total household income | Income metric |
|
||||
| `avg_household_income` | Average total household income | Income metric |
|
||||
| `pct_owner_occupied` | % owner-occupied dwellings | Tenure split |
|
||||
| `pct_renter_occupied` | % renter-occupied dwellings | Tenure split |
|
||||
|
||||
**Download URL (2021, 158 neighbourhoods)**:
|
||||
```
|
||||
https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx
|
||||
```
|
||||
|
||||
### Crime Data — Deferred to Portfolio Phase 4
|
||||
|
||||
Crime data (TPS Neighbourhood Crime Rates) is **not included in V1 scope**. It will be added in portfolio Phase 4 after the Energy Pricing project is complete.
|
||||
|
||||
**Rationale**:
|
||||
- Crime data is socially/politically sensitive and requires careful methodology documentation
|
||||
- V1 focuses on core housing metrics and policy events
|
||||
- Deferral reduces scope creep risk
|
||||
|
||||
**Future Reference** (Portfolio Phase 4):
|
||||
- Source: [TPS Public Safety Data Portal](https://data.torontopolice.on.ca/)
|
||||
- Dataset: Neighbourhood Crime Rates (Major Crime Indicators)
|
||||
- Geographic Unit: 158 neighbourhoods
|
||||
|
||||
### V1 Enrichment Data Summary
|
||||
|
||||
| Measure | Source | Geography | Frequency | Format | Status |
|
||||
|---------|--------|-----------|-----------|--------|--------|
|
||||
| **Population Density** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
|
||||
| **Education Attainment** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
|
||||
| **Median Income** | Neighbourhood Profiles | 158 neighbourhoods | Census (5-year) | CSV/JSON | ✅ Ready |
|
||||
| **Crime Rates (MCI)** | TPS Data Portal | 158 neighbourhoods | Annual | GeoJSON/CSV | Deferred to Portfolio Phase 4 |
|
||||
|
||||
---
|
||||
|
||||
## Data Source #5: Policy Events
|
||||
|
||||
### Purpose
|
||||
|
||||
Provide temporal context for housing price movements. Display as annotation markers on time series charts. **No causation claims** — correlation/context only.
|
||||
|
||||
### Event Schema
|
||||
|
||||
#### Table: `dim_policy_event`
|
||||
|
||||
| Column | Data Type | Description |
|
||||
|--------|-----------|-------------|
|
||||
| `event_id` | INTEGER (PK) | Auto-increment primary key |
|
||||
| `event_date` | DATE | Date event was announced/occurred |
|
||||
| `effective_date` | DATE | Date policy took effect (if different) |
|
||||
| `level` | VARCHAR(20) | `federal` / `provincial` / `municipal` |
|
||||
| `category` | VARCHAR(20) | `monetary` / `tax` / `regulatory` / `supply` / `economic` |
|
||||
| `title` | VARCHAR(200) | Short event title for display |
|
||||
| `description` | TEXT | Longer description for tooltip |
|
||||
| `expected_direction` | VARCHAR(10) | `bearish` / `bullish` / `neutral` |
|
||||
| `source_url` | VARCHAR(500) | Link to official announcement/documentation |
|
||||
| `confidence` | VARCHAR(10) | `high` / `medium` / `low` |
|
||||
| `created_at` | TIMESTAMP | Record creation timestamp |
|
||||
|
||||
### Event Tiers
|
||||
|
||||
| Tier | Level | Category Examples | Inclusion Criteria |
|
||||
|------|-------|-------------------|-------------------|
|
||||
| **1** | Federal | BoC rate decisions, OSFI stress tests | Always include; objective, documented |
|
||||
| **1** | Provincial | Fair Housing Plan, foreign buyer tax, rent control | Always include; legislative record |
|
||||
| **2** | Municipal | Zoning reforms, development charges | Include if material impact expected |
|
||||
| **2** | Economic | COVID measures, major employer closures | Include if Toronto-specific impact |
|
||||
| **3** | Market | Major project announcements | Strict criteria; must be verifiable |
|
||||
|
||||
### Expected Direction Values
|
||||
|
||||
| Value | Meaning | Example |
|
||||
|-------|---------|---------|
|
||||
| `bullish` | Expected to increase prices | Rate cut, supply restriction |
|
||||
| `bearish` | Expected to decrease prices | Rate hike, foreign buyer tax |
|
||||
| `neutral` | Uncertain or mixed impact | Regulatory clarification |
|
||||
|
||||
### ⚠ Caveats
|
||||
|
||||
- **No causation claims**: Events are context, not explanation
|
||||
- **Lag effects**: Policy impact may not be immediate
|
||||
- **Confounding factors**: Multiple simultaneous influences
|
||||
- **Display only**: No statistical analysis in V1
|
||||
|
||||
### Sample Events (Tier 1)
|
||||
|
||||
| Date | Level | Category | Title | Direction |
|
||||
|------|-------|----------|-------|-----------|
|
||||
| 2017-04-20 | provincial | tax | Ontario Fair Housing Plan | bearish |
|
||||
| 2018-01-01 | federal | regulatory | OSFI B-20 Stress Test | bearish |
|
||||
| 2020-03-27 | federal | monetary | BoC Emergency Rate Cut (0.25%) | bullish |
|
||||
| 2022-03-02 | federal | monetary | BoC Rate Hike Cycle Begins | bearish |
|
||||
| 2023-06-01 | federal | tax | Federal 2-Year Foreign Buyer Ban | bearish |
|
||||
|
||||
---
|
||||
|
||||
## Data Integration Strategy
|
||||
|
||||
### Temporal Alignment
|
||||
|
||||
| Source | Native Frequency | Alignment Strategy |
|
||||
|--------|------------------|---------------------|
|
||||
| TRREB | Monthly | Use as-is |
|
||||
| CMHC | Annual (October) | Spread to monthly OR display annual overlay |
|
||||
| Census/Enrichment | 5-year | Static snapshot; display as reference |
|
||||
| Policy Events | Event-based | Display as vertical markers on time axis |
|
||||
|
||||
**Recommendation**: Keep separate time axes. TRREB monthly for purchases, CMHC annual for rentals. Don't force artificial monthly rental data.
|
||||
|
||||
### Geographic Alignment
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ VISUALIZATION APPROACH │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Purchase Mode Rental Mode │
|
||||
│ ───────────────── ────────────── │
|
||||
│ Map: TRREB Districts Map: CMHC Zones │
|
||||
│ Time: Monthly slider Time: Annual selector │
|
||||
│ Metrics: Price, Sales Metrics: Rent, Vacancy │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────┐ │
|
||||
│ │ City Neighbourhoods Overlay │ │
|
||||
│ │ (158 boundaries as reference layer) │ │
|
||||
│ │ + Enrichment data (density, education, income) │ │
|
||||
│ ──────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Enrichment Integration Strategy (Phased)
|
||||
|
||||
#### V1: Reference Overlay (Current Scope)
|
||||
|
||||
**Approach**: Display neighbourhood enrichment as a separate toggle-able layer. No joins to housing data.
|
||||
|
||||
**UX**:
|
||||
- User hovers over TRREB district → tooltip shows "This district contains neighbourhoods: Annex, Casa Loma, Yorkville..."
|
||||
- User toggles "Show Enrichment" → choropleth switches to neighbourhood-level density/education/income
|
||||
- Enrichment and housing metrics displayed side-by-side, not merged
|
||||
|
||||
**Pros**:
|
||||
- No imputation or dodgy aggregations
|
||||
- Honest about geographic mismatch
|
||||
- Ships faster
|
||||
|
||||
**Cons**:
|
||||
- Can't do correlation analysis (price vs. enrichment) directly in dashboard
|
||||
|
||||
**Implementation**:
|
||||
- `dim_neighbourhood` as standalone dimension (no FK to fact tables)
|
||||
- Spatial lookup on hover (point-in-polygon)
|
||||
|
||||
#### V2/Portfolio Phase 4: Area-Weighted Aggregation (Future Scope)
|
||||
|
||||
**Approach**: Pre-compute area-weighted averages of neighbourhood metrics for each TRREB district and CMHC zone.
|
||||
|
||||
**Process**:
|
||||
1. Spatial join: intersect neighbourhood polygons with TRREB/CMHC polygons
|
||||
2. Compute overlap area for each neighbourhood-district pair
|
||||
3. Weight neighbourhood metrics by overlap area proportion
|
||||
4. User selects aggregation method in UI
|
||||
|
||||
**Aggregation Methods to Expose**:
|
||||
|
||||
| Method | Description | Best For |
|
||||
|--------|-------------|----------|
|
||||
| **Area-weighted mean** | Weight by % overlap area | Continuous metrics (density) |
|
||||
| **Population-weighted mean** | Weight by population in overlap | Per-capita metrics (education) |
|
||||
| **Majority assignment** | Assign neighbourhood to district with >50% overlap | Categorical data |
|
||||
| **Max overlap** | Assign to single district with largest overlap | 1:1 mapping needs |
|
||||
|
||||
**Default**: Population-weighted (more defensible for per-capita metrics). Hide selector behind "Advanced" toggle.
|
||||
|
||||
### V1 Future-Proofing (Do Now)
|
||||
|
||||
| Action | Why |
|
||||
|--------|-----|
|
||||
| Store neighbourhood boundaries in same CRS as TRREB/CMHC (WGS84) | Avoids reprojection headaches |
|
||||
| Keep `dim_neighbourhood` normalized (not denormalized into district tables) | Clean separation for V2 join |
|
||||
| Document Census year for each metric | Ready for 2026 Census |
|
||||
| Include `census_year` column in dim_neighbourhood | Enables SCD tracking |
|
||||
|
||||
### V1 Defer (Don't Do Yet)
|
||||
|
||||
| Action | Why Not |
|
||||
|--------|---------|
|
||||
| Pre-compute area-weighted crosswalk | Don't need for V1 |
|
||||
| Build aggregation method selector UI | No backend to support it |
|
||||
| Crime data integration | Deferred to Portfolio Phase 4 |
|
||||
| Historical neighbourhood boundary reconciliation (140→158) | Use 2021+ data only for V1 |
|
||||
|
||||
---
|
||||
|
||||
## Proposed Data Model
|
||||
|
||||
### Star Schema
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ dim_time │
|
||||
├──────────────────┤
|
||||
│ date_key (PK) │
|
||||
│ year │
|
||||
│ month │
|
||||
│ quarter │
|
||||
│ month_name │
|
||||
───────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ │ │
|
||||
│
|
||||
┌──────────────────┐ │ ┌──────────────────┐
|
||||
│ dim_trreb_district│ │ │ dim_cmhc_zone │
|
||||
├──────────────────┤ │ ├──────────────────┤
|
||||
│ district_key (PK)│ │ │ zone_key (PK) │
|
||||
│ district_code │ │ │ zone_code │
|
||||
│ district_name │ │ │ zone_name │
|
||||
│ area_type │ │ │ geometry │
|
||||
│ geometry │
|
||||
───────────────────────┘ │ │
|
||||
│ │ │
|
||||
│
|
||||
┌──────────────────┐ │ ┌──────────────────┐
|
||||
│ fact_purchases │ │ │ fact_rentals │
|
||||
├──────────────────┤ │ ├──────────────────┤
|
||||
│ date_key (FK) │ │ │ date_key (FK) │
|
||||
│ district_key (FK)│ │ │ zone_key (FK) │
|
||||
│ sales_count │ │ │ bedroom_type │
|
||||
│ avg_price │ │ │ avg_rent │
|
||||
│ median_price │ │ │ median_rent │
|
||||
│ new_listings │ │ │ vacancy_rate │
|
||||
│ active_listings │ │ │ universe │
|
||||
│ avg_dom │ │ │ turnover_rate │
|
||||
│ avg_sp_lp │ │ │ reliability_code │
|
||||
─────────────────────┘ │ ─────────────────────┘
|
||||
│
|
||||
|
||||
┌───────────────────────────┐
|
||||
│ dim_neighbourhood │
|
||||
├───────────────────────────┤
|
||||
│ neighbourhood_id (PK) │
|
||||
│ name │
|
||||
│ geometry │
|
||||
│ population │
|
||||
│ land_area_sqkm │
|
||||
│ pop_density_per_sqkm │
|
||||
│ pct_bachelors_or_higher │
|
||||
│ median_household_income │
|
||||
│ pct_owner_occupied │
|
||||
│ pct_renter_occupied │
|
||||
│ census_year │ ← For SCD tracking
|
||||
──────────────────────────────┘
|
||||
|
||||
┌───────────────────────────┐
|
||||
│ dim_policy_event │
|
||||
├───────────────────────────┤
|
||||
│ event_id (PK) │
|
||||
│ event_date │
|
||||
│ effective_date │
|
||||
│ level │ ← federal/provincial/municipal
|
||||
│ category │ ← monetary/tax/regulatory/supply/economic
|
||||
│ title │
|
||||
│ description │
|
||||
│ expected_direction │ ← bearish/bullish/neutral
|
||||
│ source_url │
|
||||
│ confidence │ ← high/medium/low
|
||||
│ created_at │
|
||||
──────────────────────────────┘
|
||||
|
||||
┌───────────────────────────┐
|
||||
│ bridge_district_neighbourhood │ ← Portfolio Phase 4 ONLY
|
||||
├───────────────────────────┤
|
||||
│ district_key (FK) │
|
||||
│ neighbourhood_id (FK) │
|
||||
│ area_overlap_pct │
|
||||
│ population_overlap │ ← For pop-weighted agg
|
||||
──────────────────────────────┘
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
- `dim_neighbourhood` has no FK relationship to fact tables in V1
|
||||
- `dim_policy_event` is standalone (no FK to facts); used for time-series annotation
|
||||
- `bridge_district_neighbourhood` is Portfolio Phase 4 scope only
|
||||
- Similar bridge table needed for CMHC zones in Portfolio Phase 4
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
> **Note**: Toronto Housing data logic lives in `portfolio_app/toronto/`. See `portfolio_project_plan_v5.md` for full project structure.
|
||||
|
||||
### Data Directory Structure
|
||||
|
||||
```
|
||||
data/
|
||||
└── toronto/
|
||||
├── raw/
|
||||
│ ├── trreb/
|
||||
│ │ └── market_watch_YYYY_MM.pdf
|
||||
│ ├── cmhc/
|
||||
│ │ └── rental_survey_YYYY.csv
|
||||
│ ├── enrichment/
|
||||
│ │ └── neighbourhood_profiles_2021.xlsx
|
||||
│ └── geo/
|
||||
│ ├── toronto_neighbourhoods.geojson
|
||||
│ ├── trreb_districts.geojson ← (to be created via QGIS)
|
||||
│ └── cmhc_zones.geojson ← (from R cmhc package)
|
||||
│
|
||||
├── processed/ ← gitignored
|
||||
│ ├── fact_purchases.parquet
|
||||
│ ├── fact_rentals.parquet
|
||||
│ ├── dim_time.parquet
|
||||
│ ├── dim_trreb_district.parquet
|
||||
│ ├── dim_cmhc_zone.parquet
|
||||
│ ├── dim_neighbourhood.parquet
|
||||
│ └── dim_policy_event.parquet
|
||||
│
|
||||
└── reference/
|
||||
├── policy_events.csv ← Curated event list
|
||||
└── neighbourhood_boundary_changelog.md ← 140→158 notes
|
||||
```
|
||||
|
||||
### Code Module Structure
|
||||
|
||||
```
|
||||
portfolio_app/toronto/
|
||||
├── __init__.py
|
||||
├── parsers/
|
||||
│ ├── __init__.py
|
||||
│ ├── trreb.py # PDF extraction
|
||||
│ └── cmhc.py # CSV processing
|
||||
├── loaders/
|
||||
│ ├── __init__.py
|
||||
│ └── database.py # DB operations
|
||||
├── schemas/ # Pydantic models
|
||||
│ ├── __init__.py
|
||||
│ ├── trreb.py
|
||||
│ ├── cmhc.py
|
||||
│ ├── enrichment.py
|
||||
│ └── policy_event.py
|
||||
├── models/ # SQLAlchemy ORM
|
||||
│ ├── __init__.py
|
||||
│ ├── base.py # DeclarativeBase, engine
|
||||
│ ├── dimensions.py # dim_time, dim_trreb_district, dim_policy_event, etc.
|
||||
│ └── facts.py # fact_purchases, fact_rentals
|
||||
└── transforms/
|
||||
└── __init__.py
|
||||
```
|
||||
|
||||
### Notebooks
|
||||
|
||||
```
|
||||
notebooks/
|
||||
├── 01_trreb_pdf_extraction.ipynb
|
||||
├── 02_cmhc_data_prep.ipynb
|
||||
├── 03_geo_layer_prep.ipynb
|
||||
├── 04_enrichment_data_prep.ipynb
|
||||
├── 05_policy_events_curation.ipynb
|
||||
└── 06_spatial_crosswalk.ipynb ← Portfolio Phase 4 only
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Implementation Checklist
|
||||
|
||||
> **Note**: These are **Stages** within the Toronto Housing project (Portfolio Phase 1). They are distinct from the overall portfolio **Phases** defined in `portfolio_project_plan_v5.md`.
|
||||
|
||||
### Stage 1: Data Acquisition
|
||||
- [ ] Download TRREB monthly PDFs (2020-present as MVP)
|
||||
- [ ] Register for CMHC portal and export Toronto rental data
|
||||
- [ ] Extract CMHC zone boundaries via R `cmhc` package
|
||||
- [ ] Download City of Toronto neighbourhood GeoJSON (158 boundaries)
|
||||
- [ ] Digitize TRREB district boundaries in QGIS
|
||||
- [ ] Download Neighbourhood Profiles (2021 Census, 158-model)
|
||||
|
||||
### Stage 2: Data Processing
|
||||
- [ ] Build TRREB PDF parser (`portfolio_app/toronto/parsers/trreb.py`)
|
||||
- [ ] Build Pydantic schemas (`portfolio_app/toronto/schemas/`)
|
||||
- [ ] Build SQLAlchemy models (`portfolio_app/toronto/models/`)
|
||||
- [ ] Extract and validate TRREB monthly summaries
|
||||
- [ ] Clean and structure CMHC rental data
|
||||
- [ ] Process Neighbourhood Profiles into `dim_neighbourhood`
|
||||
- [ ] Curate and load policy events into `dim_policy_event`
|
||||
- [ ] Create dimension tables
|
||||
- [ ] Build fact tables
|
||||
- [ ] Validate all geospatial layers use same CRS (WGS84/EPSG:4326)
|
||||
|
||||
### Stage 3: Visualization (V1)
|
||||
- [ ] Create dashboard page (`portfolio_app/pages/toronto/dashboard.py`)
|
||||
- [ ] Build choropleth figures (`portfolio_app/figures/choropleth.py`)
|
||||
- [ ] Build time series figures (`portfolio_app/figures/time_series.py`)
|
||||
- [ ] Design dashboard layout (purchase/rental toggle)
|
||||
- [ ] Implement choropleth map with layer switching
|
||||
- [ ] Add time slider/selector
|
||||
- [ ] Build neighbourhood overlay (toggle-able)
|
||||
- [ ] Add enrichment layer toggle (density/education/income choropleth)
|
||||
- [ ] Add policy event markers on time series
|
||||
- [ ] Add tooltips with cross-reference info ("This district contains...")
|
||||
- [ ] Add tooltips showing enrichment metrics on hover
|
||||
|
||||
### Stage 4: Polish (V1)
|
||||
- [ ] Add data source citations
|
||||
- [ ] Document methodology (especially geographic limitations)
|
||||
- [ ] Write docs (`docs/methodology.md`, `docs/data_sources.md`)
|
||||
- [ ] Deploy to portfolio
|
||||
|
||||
### Future Enhancements (Portfolio Phase 4 — Post-Energy Project)
|
||||
- [ ] Add crime data to dim_neighbourhood
|
||||
- [ ] Build spatial crosswalk (neighbourhood ↔ district/zone intersections)
|
||||
- [ ] Compute area-weighted and population-weighted aggregations
|
||||
- [ ] Add aggregation method selector to UI
|
||||
- [ ] Enable correlation analysis (price vs. enrichment metrics)
|
||||
- [ ] Add historical neighbourhood boundary support (140→158)
|
||||
|
||||
**Deployment & dbt Architecture**: See `portfolio_project_plan_v5.md` for:
|
||||
- dbt layer structure and testing strategy
|
||||
- Deployment architecture
|
||||
- Data quality framework
|
||||
|
||||
---
|
||||
|
||||
## References & Links
|
||||
|
||||
### Core Housing Data
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| TRREB Market Watch | https://trreb.ca/index.php/market-news/market-watch |
|
||||
| CMHC Housing Portal | https://www03.cmhc-schl.gc.ca/hmip-pimh/ |
|
||||
|
||||
### Geographic Boundaries
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Toronto Neighbourhoods GeoJSON | https://github.com/jasonicarter/toronto-geojson |
|
||||
| TRREB District Map (PDF) | https://webapp.proptx.ca/trrebdata/common/maps/Toronto.pdf |
|
||||
| Statistics Canada Census Tracts | https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index-eng.cfm |
|
||||
| R `cmhc` package (CRAN) | https://cran.r-project.org/package=cmhc |
|
||||
|
||||
### Enrichment Data
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Toronto Open Data Portal | https://open.toronto.ca/ |
|
||||
| Neighbourhood Profiles (CKAN) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/neighbourhood-profiles |
|
||||
| Neighbourhood Profiles 2021 (Direct Download) | https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/6e19a90f-971c-46b3-852c-0c48c436d1fc/resource/19d4a806-7385-4889-acf2-256f1e079060/download/nbhd_2021_census_profile_full_158model.xlsx |
|
||||
|
||||
### Policy Events Research
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Bank of Canada Interest Rates | https://www.bankofcanada.ca/rates/interest-rates/ |
|
||||
| OSFI (Stress Test Rules) | https://www.osfi-bsif.gc.ca/ |
|
||||
| Ontario Legislature (Bills) | https://www.ola.org/ |
|
||||
|
||||
### Reference Documentation
|
||||
|
||||
| Resource | URL |
|
||||
|----------|-----|
|
||||
| Statistics Canada 2021 Census Reference | https://www12.statcan.gc.ca/census-recensement/2021/ref/index-eng.cfm |
|
||||
| City of Toronto Neighbourhood Profiles Overview | https://www.toronto.ca/city-government/data-research-maps/neighbourhoods-communities/neighbourhood-profiles/ |
|
||||
|
||||
---
|
||||
|
||||
## Related Documents
|
||||
|
||||
| Document | Relationship | Use For |
|
||||
|----------|--------------|---------|
|
||||
| `portfolio_project_plan_v5.md` | Parent document | Overall scope, phasing, tech stack, deployment, dbt architecture, data quality framework |
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 5.1*
|
||||
*Updated: January 2026*
|
||||
*Project: Toronto Housing Price Dashboard — Portfolio Piece*
|
||||
@@ -1,794 +0,0 @@
|
||||
# Work Breakdown Structure & Sprint Plan
|
||||
|
||||
**Project**: Toronto Housing Dashboard (Portfolio Phase 1)
|
||||
**Version**: 4.1
|
||||
**Date**: January 2026
|
||||
|
||||
---
|
||||
|
||||
## Document Context
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Parent Documents** | `portfolio_project_plan_v5.md`, `toronto_housing_dashboard_spec_v5.md` |
|
||||
| **Content Source** | `bio_content_v2.md` |
|
||||
| **Role** | Executable sprint plan for Phase 1 delivery |
|
||||
|
||||
---
|
||||
|
||||
## Milestones
|
||||
|
||||
| Milestone | Deliverable | Target Sprint |
|
||||
|-----------|-------------|---------------|
|
||||
| **Launch 1** | Bio Landing Page | Sprint 2 |
|
||||
| **Launch 2** | Toronto Housing Dashboard | Sprint 6 |
|
||||
|
||||
---
|
||||
|
||||
## WBS Structure
|
||||
|
||||
```
|
||||
1.0 Launch 1: Bio Landing Page
|
||||
├── 1.1 Project Bootstrap
|
||||
├── 1.2 Infrastructure
|
||||
├── 1.3 Application Foundation
|
||||
├── 1.4 Bio Page
|
||||
└── 1.5 Deployment
|
||||
|
||||
2.0 Launch 2: Toronto Housing Dashboard
|
||||
├── 2.1 Data Acquisition
|
||||
├── 2.2 Data Processing
|
||||
├── 2.3 Database Layer
|
||||
├── 2.4 dbt Transformation
|
||||
├── 2.5 Visualization
|
||||
├── 2.6 Documentation
|
||||
└── 2.7 Operations
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Launch 1: Bio Landing Page
|
||||
|
||||
### 1.1 Project Bootstrap
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 1.1.1 | Git repository initialization | — | Low | Low |
|
||||
| 1.1.2 | Create `.gitignore` | 1.1.1 | Low | Low |
|
||||
| 1.1.3 | Create `pyproject.toml` | 1.1.1 | Low | Low |
|
||||
| 1.1.4 | Create `.python-version` (3.11+) | 1.1.1 | Low | Low |
|
||||
| 1.1.5 | Create `.env.example` | 1.1.1 | Low | Low |
|
||||
| 1.1.6 | Create `README.md` (initial) | 1.1.1 | Low | Low |
|
||||
| 1.1.7 | Create `CLAUDE.md` | 1.1.1 | Low | Low |
|
||||
| 1.1.8 | Create `Makefile` with all targets | 1.1.3 | Low | Medium |
|
||||
|
||||
### 1.2 Infrastructure
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 1.2.1 | Python env setup (pyenv, venv, deps) | 1.1.3, 1.1.4 | Low | Low |
|
||||
| 1.2.2 | Create `.pre-commit-config.yaml` | 1.2.1 | Low | Low |
|
||||
| 1.2.3 | Install pre-commit hooks | 1.2.2 | Low | Low |
|
||||
| 1.2.4 | Create `docker-compose.yml` (PostgreSQL + PostGIS) | 1.1.5 | Low | Low |
|
||||
| 1.2.5 | Create `scripts/` directory structure | 1.1.1 | Low | Low |
|
||||
| 1.2.6 | Create `scripts/docker/up.sh` | 1.2.5 | Low | Low |
|
||||
| 1.2.7 | Create `scripts/docker/down.sh` | 1.2.5 | Low | Low |
|
||||
| 1.2.8 | Create `scripts/docker/logs.sh` | 1.2.5 | Low | Low |
|
||||
| 1.2.9 | Create `scripts/docker/rebuild.sh` | 1.2.5 | Low | Low |
|
||||
| 1.2.10 | Create `scripts/db/init.sh` (PostGIS extension) | 1.2.5 | Low | Low |
|
||||
| 1.2.11 | Create `scripts/dev/setup.sh` | 1.2.5 | Low | Low |
|
||||
| 1.2.12 | Verify Docker + PostGIS working | 1.2.4, 1.2.10 | Low | Low |
|
||||
|
||||
### 1.3 Application Foundation
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 1.3.1 | Create `portfolio_app/` directory structure (full tree) | 1.2.1 | Low | Low |
|
||||
| 1.3.2 | Create `portfolio_app/__init__.py` | 1.3.1 | Low | Low |
|
||||
| 1.3.3 | Create `portfolio_app/config.py` (Pydantic BaseSettings) | 1.3.1 | Low | Medium |
|
||||
| 1.3.4 | Create `portfolio_app/errors/__init__.py` | 1.3.1 | Low | Low |
|
||||
| 1.3.5 | Create `portfolio_app/errors/exceptions.py` | 1.3.4 | Low | Low |
|
||||
| 1.3.6 | Create `portfolio_app/errors/handlers.py` | 1.3.5 | Low | Medium |
|
||||
| 1.3.7 | Create `portfolio_app/app.py` (Dash + Pages routing) | 1.3.3 | Low | Medium |
|
||||
| 1.3.8 | Configure dash-mantine-components theme | 1.3.7 | Low | Low |
|
||||
| 1.3.9 | Create `portfolio_app/assets/` directory | 1.3.1 | Low | Low |
|
||||
| 1.3.10 | Create `portfolio_app/assets/styles.css` | 1.3.9 | Low | Medium |
|
||||
| 1.3.11 | Create `portfolio_app/assets/variables.css` | 1.3.9 | Low | Low |
|
||||
| 1.3.12 | Add `portfolio_app/assets/favicon.ico` | 1.3.9 | Low | Low |
|
||||
| 1.3.13 | Create `portfolio_app/assets/images/` directory | 1.3.9 | Low | Low |
|
||||
| 1.3.14 | Create `tests/` directory structure | 1.2.1 | Low | Low |
|
||||
| 1.3.15 | Create `tests/__init__.py` | 1.3.14 | Low | Low |
|
||||
| 1.3.16 | Create `tests/conftest.py` | 1.3.14 | Low | Medium |
|
||||
| 1.3.17 | Configure pytest in `pyproject.toml` | 1.1.3, 1.3.14 | Low | Low |
|
||||
|
||||
### 1.4 Bio Page
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 1.4.1 | Create `portfolio_app/components/__init__.py` | 1.3.1 | Low | Low |
|
||||
| 1.4.2 | Create `portfolio_app/components/navbar.py` | 1.4.1, 1.3.8 | Low | Low |
|
||||
| 1.4.3 | Create `portfolio_app/components/footer.py` | 1.4.1, 1.3.8 | Low | Low |
|
||||
| 1.4.4 | Create `portfolio_app/components/cards.py` | 1.4.1, 1.3.8 | Low | Low |
|
||||
| 1.4.5 | Create `portfolio_app/pages/__init__.py` | 1.3.1 | Low | Low |
|
||||
| 1.4.6 | Create `portfolio_app/pages/home.py` (layout) | 1.4.5, 1.4.2, 1.4.3 | Low | Low |
|
||||
| 1.4.7 | Integrate bio content from `bio_content_v2.md` | 1.4.6 | Low | Low |
|
||||
| 1.4.8 | Replace social link placeholders with real URLs | 1.4.7 | Low | Low |
|
||||
| 1.4.9 | Implement project cards (deployed/in-dev logic) | 1.4.4, 1.4.6 | Low | Low |
|
||||
| 1.4.10 | Test bio page renders locally | 1.4.9 | Low | Low |
|
||||
|
||||
### 1.5 Deployment
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 1.5.1 | Install PostgreSQL + PostGIS on VPS | — | Low | Low |
|
||||
| 1.5.2 | Configure firewall (ufw: SSH, HTTP, HTTPS) | 1.5.1 | Low | Low |
|
||||
| 1.5.3 | Create application database user | 1.5.1 | Low | Low |
|
||||
| 1.5.4 | Create Gunicorn systemd service file | 1.4.10 | Low | Low |
|
||||
| 1.5.5 | Configure Nginx reverse proxy | 1.5.4 | Low | Low |
|
||||
| 1.5.6 | Configure SSL (certbot) | 1.5.5 | Low | Low |
|
||||
| 1.5.7 | Create `scripts/deploy/deploy.sh` | 1.2.5 | Low | Low |
|
||||
| 1.5.8 | Create `scripts/deploy/health-check.sh` | 1.2.5 | Low | Low |
|
||||
| 1.5.9 | Deploy bio page | 1.5.6, 1.5.7 | Low | Low |
|
||||
| 1.5.10 | Verify HTTPS access | 1.5.9 | Low | Low |
|
||||
|
||||
---
|
||||
|
||||
## Launch 2: Toronto Housing Dashboard
|
||||
|
||||
### 2.1 Data Acquisition
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.1.1 | Define TRREB year scope + download PDFs | — | Low | Low |
|
||||
| 2.1.2 | **HUMAN**: Digitize TRREB district boundaries (QGIS) | 2.1.1 | High | High |
|
||||
| 2.1.3 | Register for CMHC portal | — | Low | Low |
|
||||
| 2.1.4 | Export CMHC Toronto rental CSVs | 2.1.3 | Low | Low |
|
||||
| 2.1.5 | Extract CMHC zone boundaries (R cmhc package) | 2.1.3 | Low | Medium |
|
||||
| 2.1.6 | Download neighbourhoods GeoJSON (158 boundaries) | — | Low | Low |
|
||||
| 2.1.7 | Download Neighbourhood Profiles 2021 (xlsx) | — | Low | Low |
|
||||
| 2.1.8 | Validate CRS alignment (all geo files WGS84) | 2.1.2, 2.1.5, 2.1.6 | Low | Medium |
|
||||
| 2.1.9 | Research Tier 1 policy events (10—20 events) | — | Mid | Medium |
|
||||
| 2.1.10 | Create `data/toronto/reference/policy_events.csv` | 2.1.9 | Low | Low |
|
||||
| 2.1.11 | Create `data/` directory structure | 1.3.1 | Low | Low |
|
||||
| 2.1.12 | Organize raw files into `data/toronto/raw/` | 2.1.11 | Low | Low |
|
||||
| 2.1.13 | Test TRREB parser across year boundaries | 2.2.3 | Low | Medium |
|
||||
|
||||
### 2.2 Data Processing
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.2.1 | Create `portfolio_app/toronto/__init__.py` | 1.3.1 | Low | Low |
|
||||
| 2.2.2 | Create `portfolio_app/toronto/parsers/__init__.py` | 2.2.1 | Low | Low |
|
||||
| 2.2.3 | Build TRREB PDF parser (`parsers/trreb.py`) | 2.2.2, 2.1.1 | Mid | High |
|
||||
| 2.2.4 | TRREB data cleaning/normalization | 2.2.3 | Low | Medium |
|
||||
| 2.2.5 | TRREB parser unit tests | 2.2.4 | Low | Low |
|
||||
| 2.2.6 | Build CMHC CSV processor (`parsers/cmhc.py`) | 2.2.2, 2.1.4 | Low | Low |
|
||||
| 2.2.7 | CMHC reliability code handling | 2.2.6 | Low | Low |
|
||||
| 2.2.8 | CMHC processor unit tests | 2.2.7 | Low | Low |
|
||||
| 2.2.9 | Build Neighbourhood Profiles parser | 2.2.1, 2.1.7 | Low | Low |
|
||||
| 2.2.10 | Policy events CSV loader | 2.2.1, 2.1.10 | Low | Low |
|
||||
|
||||
### 2.3 Database Layer
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.3.1 | Create `portfolio_app/toronto/schemas/__init__.py` | 2.2.1 | Low | Low |
|
||||
| 2.3.2 | Create TRREB Pydantic schemas (`schemas/trreb.py`) | 2.3.1 | Low | Medium |
|
||||
| 2.3.3 | Create CMHC Pydantic schemas (`schemas/cmhc.py`) | 2.3.1 | Low | Medium |
|
||||
| 2.3.4 | Create enrichment Pydantic schemas (`schemas/enrichment.py`) | 2.3.1 | Low | Low |
|
||||
| 2.3.5 | Create policy event Pydantic schema (`schemas/policy_event.py`) | 2.3.1 | Low | Low |
|
||||
| 2.3.6 | Create `portfolio_app/toronto/models/__init__.py` | 2.2.1 | Low | Low |
|
||||
| 2.3.7 | Create SQLAlchemy base (`models/base.py`) | 2.3.6, 1.3.3 | Low | Medium |
|
||||
| 2.3.8 | Create dimension models (`models/dimensions.py`) | 2.3.7 | Low | Medium |
|
||||
| 2.3.9 | Create fact models (`models/facts.py`) | 2.3.8 | Low | Medium |
|
||||
| 2.3.10 | Create `portfolio_app/toronto/loaders/__init__.py` | 2.2.1 | Low | Low |
|
||||
| 2.3.11 | Create dimension loaders (`loaders/database.py`) | 2.3.10, 2.3.8 | Low | Medium |
|
||||
| 2.3.12 | Create fact loaders | 2.3.11, 2.3.9, 2.2.4, 2.2.7 | Mid | Medium |
|
||||
| 2.3.13 | Loader integration tests | 2.3.12 | Low | Medium |
|
||||
| 2.3.14 | Create SQL views for dashboard queries | 2.3.12 | Low | Medium |
|
||||
|
||||
### 2.4 dbt Transformation
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.4.1 | Create `dbt/` directory structure | 1.3.1 | Low | Low |
|
||||
| 2.4.2 | Create `dbt/dbt_project.yml` | 2.4.1 | Low | Low |
|
||||
| 2.4.3 | Create `dbt/profiles.yml` | 2.4.1, 1.3.3 | Low | Low |
|
||||
| 2.4.4 | Create `scripts/dbt/run.sh` | 1.2.5 | Low | Low |
|
||||
| 2.4.5 | Create `scripts/dbt/test.sh` | 1.2.5 | Low | Low |
|
||||
| 2.4.6 | Create `scripts/dbt/docs.sh` | 1.2.5 | Low | Low |
|
||||
| 2.4.7 | Create `scripts/dbt/fresh.sh` | 1.2.5 | Low | Low |
|
||||
| 2.4.8 | Create staging models (`stg_trreb__monthly`, `stg_cmhc__rental`) | 2.4.3, 2.3.12 | Low | Medium |
|
||||
| 2.4.9 | Create intermediate models | 2.4.8 | Low | Medium |
|
||||
| 2.4.10 | Create mart models | 2.4.9 | Low | Medium |
|
||||
| 2.4.11 | Create dbt schema tests (unique, not_null, relationships) | 2.4.10 | Low | Medium |
|
||||
| 2.4.12 | Create custom dbt tests (anomaly detection) | 2.4.11 | Low | Medium |
|
||||
| 2.4.13 | Create dbt documentation (schema.yml) | 2.4.10 | Low | Low |
|
||||
|
||||
### 2.5 Visualization
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.5.1 | Create `portfolio_app/figures/__init__.py` | 1.3.1 | Low | Low |
|
||||
| 2.5.2 | Build choropleth factory (`figures/choropleth.py`) | 2.5.1, 2.1.8 | Mid | Medium |
|
||||
| 2.5.3 | Build time series factory (`figures/time_series.py`) | 2.5.1 | Low | Medium |
|
||||
| 2.5.4 | Build YoY change chart factory (`figures/statistical.py`) | 2.5.1 | Low | Medium |
|
||||
| 2.5.5 | Build seasonality decomposition chart | 2.5.4 | Low | Medium |
|
||||
| 2.5.6 | Build district correlation matrix chart | 2.5.4 | Low | Medium |
|
||||
| 2.5.7 | Create `portfolio_app/pages/toronto/__init__.py` | 1.4.5 | Low | Low |
|
||||
| 2.5.8 | Create `portfolio_app/pages/toronto/dashboard.py` (layout only) | 2.5.7, 1.4.2, 1.4.3 | Mid | High |
|
||||
| 2.5.9 | Implement purchase/rental mode toggle | 2.5.8 | Low | Low |
|
||||
| 2.5.10 | Implement monthly time slider | 2.5.8 | Low | Medium |
|
||||
| 2.5.11 | Implement annual time selector (CMHC) | 2.5.8 | Low | Low |
|
||||
| 2.5.12 | Implement layer toggles (districts/zones/neighbourhoods) | 2.5.8 | Low | Medium |
|
||||
| 2.5.13 | Create `portfolio_app/pages/toronto/callbacks/__init__.py` | 2.5.7 | Low | Low |
|
||||
| 2.5.14 | Create `callbacks/map_callbacks.py` | 2.5.13, 2.5.2 | Mid | Medium |
|
||||
| 2.5.15 | Create `callbacks/filter_callbacks.py` | 2.5.13 | Low | Medium |
|
||||
| 2.5.16 | Create `callbacks/timeseries_callbacks.py` | 2.5.13, 2.5.3 | Low | Medium |
|
||||
| 2.5.17 | Implement district/zone tooltips | 2.5.14 | Low | Low |
|
||||
| 2.5.18 | Implement neighbourhood overlay | 2.5.14, 2.1.6 | Low | Medium |
|
||||
| 2.5.19 | Implement enrichment layer toggle | 2.5.18 | Low | Medium |
|
||||
| 2.5.20 | Implement policy event markers on time series | 2.5.16, 2.2.10 | Low | Medium |
|
||||
| 2.5.21 | Implement "district contains neighbourhoods" tooltip | 2.5.17 | Low | Low |
|
||||
| 2.5.22 | Test dashboard renders with sample data | 2.5.20 | Low | Medium |
|
||||
|
||||
### 2.6 Documentation
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.6.1 | Create `docs/` directory | 1.3.1 | Low | Low |
|
||||
| 2.6.2 | Write `docs/methodology.md` (geographic limitations) | 2.5.22 | Low | Medium |
|
||||
| 2.6.3 | Write `docs/data_sources.md` (citations) | 2.5.22 | Low | Low |
|
||||
| 2.6.4 | Write `docs/user_guide.md` | 2.5.22 | Low | Low |
|
||||
| 2.6.5 | Update `README.md` (final) | 2.6.2, 2.6.3 | Low | Low |
|
||||
| 2.6.6 | Update `CLAUDE.md` (final) | 2.6.5 | Low | Low |
|
||||
|
||||
### 2.7 Operations
|
||||
|
||||
| ID | Task | Depends On | Effort | Complexity |
|
||||
|----|------|------------|--------|------------|
|
||||
| 2.7.1 | Create `scripts/db/backup.sh` | 1.2.5 | Low | Low |
|
||||
| 2.7.2 | Create `scripts/db/restore.sh` | 1.2.5 | Low | Low |
|
||||
| 2.7.3 | Create `scripts/db/reset.sh` (dev only) | 1.2.5 | Low | Low |
|
||||
| 2.7.4 | Create `scripts/deploy/rollback.sh` | 1.2.5 | Low | Medium |
|
||||
| 2.7.5 | Implement backup retention policy | 2.7.1 | Low | Low |
|
||||
| 2.7.6 | Add `/health` endpoint | 2.5.8 | Low | Low |
|
||||
| 2.7.7 | Configure uptime monitoring (external) | 2.7.6 | Low | Low |
|
||||
| 2.7.8 | Deploy Toronto dashboard | 1.5.9, 2.5.22 | Low | Low |
|
||||
| 2.7.9 | Verify production deployment | 2.7.8 | Low | Low |
|
||||
|
||||
---
|
||||
|
||||
## L3 Task Details
|
||||
|
||||
### 1.1 Project Bootstrap
|
||||
|
||||
#### 1.1.1 Git repository initialization
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Initialize git repo with main branch |
|
||||
| **How** | `git init`, initial commit |
|
||||
| **Inputs** | — |
|
||||
| **Outputs** | `.git/` directory |
|
||||
| **Why** | Version control foundation |
|
||||
|
||||
#### 1.1.2 Create `.gitignore`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Git ignore rules per project plan |
|
||||
| **How** | Create file with patterns for: `.env`, `data/*/processed/`, `reports/`, `backups/`, `notebooks/*.html`, `__pycache__/`, `.venv/` |
|
||||
| **Inputs** | Project plan → Directory Rules |
|
||||
| **Outputs** | `.gitignore` |
|
||||
|
||||
#### 1.1.3 Create `pyproject.toml`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Python packaging config |
|
||||
| **How** | Define project metadata, dependencies, tool configs (ruff, mypy, pytest) |
|
||||
| **Inputs** | Tech stack versions from project plan |
|
||||
| **Outputs** | `pyproject.toml` |
|
||||
| **Dependencies** | PostgreSQL 16.x, Pydantic ≥2.0, SQLAlchemy ≥2.0, dbt-postgres ≥1.7, Pandas ≥2.1, GeoPandas ≥0.14, Dash ≥2.14, dash-mantine-components (latest), pytest ≥7.0 |
|
||||
|
||||
#### 1.1.4 Create `.python-version`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | pyenv version file |
|
||||
| **How** | Single line: `3.11` or specific patch version |
|
||||
| **Outputs** | `.python-version` |
|
||||
|
||||
#### 1.1.5 Create `.env.example`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Environment variable template |
|
||||
| **How** | Template with: DATABASE_URL, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB, DASH_DEBUG, SECRET_KEY, LOG_LEVEL |
|
||||
| **Inputs** | Project plan → Environment Setup |
|
||||
| **Outputs** | `.env.example` |
|
||||
|
||||
#### 1.1.6 Create `README.md` (initial)
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Project overview stub |
|
||||
| **How** | Title, brief description, "Setup coming soon" |
|
||||
| **Outputs** | `README.md` |
|
||||
|
||||
#### 1.1.7 Create `CLAUDE.md`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | AI assistant context file |
|
||||
| **How** | Project context, architecture decisions, patterns, conventions |
|
||||
| **Inputs** | Project plan → Code Architecture |
|
||||
| **Outputs** | `CLAUDE.md` |
|
||||
| **Why** | Claude Code effectiveness from day 1 |
|
||||
|
||||
#### 1.1.8 Create `Makefile`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | All make targets from project plan |
|
||||
| **How** | Implement targets: setup, venv, clean, docker-up/down/logs/rebuild, db-init/backup/restore/reset, run, run-prod, dbt-run/test/docs/fresh, test, test-cov, lint, format, typecheck, ci, deploy, rollback |
|
||||
| **Inputs** | Project plan → Makefile Targets |
|
||||
| **Outputs** | `Makefile` |
|
||||
|
||||
### 1.2 Infrastructure
|
||||
|
||||
#### 1.2.4 Create `docker-compose.yml`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Docker Compose V2 for PostgreSQL 16 + PostGIS |
|
||||
| **How** | Service definition, volume mounts, port 5432, env vars from `.env` |
|
||||
| **Inputs** | `.env.example` |
|
||||
| **Outputs** | `docker-compose.yml` |
|
||||
| **Note** | No `version` field (Docker Compose V2) |
|
||||
|
||||
#### 1.2.5 Create `scripts/` directory structure
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Full scripts tree per project plan |
|
||||
| **How** | `mkdir -p scripts/{db,docker,deploy,dbt,dev}` |
|
||||
| **Outputs** | `scripts/db/`, `scripts/docker/`, `scripts/deploy/`, `scripts/dbt/`, `scripts/dev/` |
|
||||
|
||||
#### 1.2.10 Create `scripts/db/init.sh`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Database initialization with PostGIS |
|
||||
| **How** | `CREATE DATABASE`, `CREATE EXTENSION postgis`, schema creation |
|
||||
| **Standard** | `set -euo pipefail`, usage comment, idempotent |
|
||||
| **Outputs** | `scripts/db/init.sh` |
|
||||
|
||||
### 1.3 Application Foundation
|
||||
|
||||
#### 1.3.1 Create `portfolio_app/` directory structure
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Full application tree per project plan |
|
||||
| **Directories** | `portfolio_app/`, `portfolio_app/assets/`, `portfolio_app/assets/images/`, `portfolio_app/pages/`, `portfolio_app/pages/toronto/`, `portfolio_app/pages/toronto/callbacks/`, `portfolio_app/components/`, `portfolio_app/figures/`, `portfolio_app/toronto/`, `portfolio_app/toronto/parsers/`, `portfolio_app/toronto/loaders/`, `portfolio_app/toronto/schemas/`, `portfolio_app/toronto/models/`, `portfolio_app/toronto/transforms/`, `portfolio_app/errors/` |
|
||||
| **Pattern** | Callbacks in `pages/{dashboard}/callbacks/` per project plan |
|
||||
|
||||
#### 1.3.3 Create `config.py`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Pydantic BaseSettings for config |
|
||||
| **How** | Settings class loading from `.env` |
|
||||
| **Fields** | DATABASE_URL, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB, DASH_DEBUG, SECRET_KEY, LOG_LEVEL |
|
||||
|
||||
#### 1.3.5 Create `exceptions.py`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Exception hierarchy per project plan |
|
||||
| **Classes** | `PortfolioError` (base), `ParseError`, `ValidationError`, `LoadError` |
|
||||
|
||||
#### 1.3.6 Create `handlers.py`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Error handling decorators |
|
||||
| **How** | Decorators for: logging/re-raise, retry logic, transaction boundaries, timing |
|
||||
| **Pattern** | Infrastructure concerns only; domain logic uses explicit handling |
|
||||
|
||||
#### 1.3.7 Create `app.py`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Dash app factory with Pages routing |
|
||||
| **How** | `Dash(__name__, use_pages=True)`, MantineProvider wrapper |
|
||||
| **Imports** | External: absolute; Internal: relative (dot notation) |
|
||||
|
||||
#### 1.3.16 Create `conftest.py`
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | pytest fixtures |
|
||||
| **How** | Test database fixture, sample data fixtures, app client fixture |
|
||||
|
||||
### 1.4 Bio Page
|
||||
|
||||
#### 1.4.7 Integrate bio content
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Content from `bio_content_v2.md` |
|
||||
| **Sections** | Headline, Professional Summary, Tech Stack, Side Project, Availability |
|
||||
| **Layout** | Hero → Summary → Tech Stack → Project Cards → Social Links → Availability |
|
||||
|
||||
#### 1.4.8 Replace social link placeholders
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Replace `[USERNAME]` in LinkedIn/GitHub URLs |
|
||||
| **Source** | `bio_content_v2.md` → Social Links |
|
||||
| **Acceptance** | No placeholder text in production |
|
||||
|
||||
#### 1.4.9 Implement project cards
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Dynamic project card display |
|
||||
| **Logic** | Show deployed projects with links; show "In Development" for in-progress; hide or grey out planned |
|
||||
| **Source** | `bio_content_v2.md` → Portfolio Projects Section |
|
||||
|
||||
### 2.1 Data Acquisition
|
||||
|
||||
#### 2.1.1 Define TRREB year scope + download PDFs
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Decide which years to parse for V1, download PDFs |
|
||||
| **Decision** | 2020—present for V1 (manageable scope, consistent PDF format). Expand to 2007+ in future if needed. |
|
||||
| **Output** | `data/toronto/raw/trreb/market_watch_YYYY_MM.pdf` |
|
||||
| **Note** | PDF format may vary pre-2018; test before committing to older years |
|
||||
|
||||
#### 2.1.2 Digitize TRREB district boundaries
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | GeoJSON with ~35 district polygons |
|
||||
| **Tool** | QGIS |
|
||||
| **Process** | Import PDF as raster → create vector layer → trace polygons → add attributes (district_code, district_name, area_type) → export GeoJSON (WGS84/EPSG:4326) |
|
||||
| **Input** | TRREB Toronto.pdf map |
|
||||
| **Output** | `data/toronto/raw/geo/trreb_districts.geojson` |
|
||||
| **Effort** | High |
|
||||
| **Complexity** | High |
|
||||
| **Note** | HUMAN TASK — not automatable |
|
||||
|
||||
#### 2.1.5 Extract CMHC zone boundaries
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | GeoJSON with ~20 zone polygons |
|
||||
| **Tool** | R with cmhc and sf packages |
|
||||
| **Process** | `get_cmhc_geography(geography_type="ZONE", cma="Toronto")` → `st_write()` to GeoJSON |
|
||||
| **Output** | `data/toronto/raw/geo/cmhc_zones.geojson` |
|
||||
|
||||
#### 2.1.9 Research Tier 1 policy events
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Federal/provincial policy events with dates, descriptions, expected direction |
|
||||
| **Sources** | Bank of Canada, OSFI, Ontario Legislature |
|
||||
| **Schema** | event_date, effective_date, level, category, title, description, expected_direction, source_url, confidence |
|
||||
| **Acceptance** | Minimum 10 events, maximum 20 |
|
||||
| **Examples** | BoC rate decisions, OSFI B-20, Ontario Fair Housing Plan, foreign buyer tax |
|
||||
|
||||
#### 2.1.13 Test TRREB parser across year boundaries
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Verify parser handles PDFs from different years |
|
||||
| **Test Cases** | 2020 Q1, 2022 Q1, 2024 Q1 (minimum) |
|
||||
| **Check For** | Table structure changes, column naming variations, page number shifts |
|
||||
| **Output** | Documented format variations, parser fallbacks if needed |
|
||||
|
||||
### 2.2 Data Processing
|
||||
|
||||
#### 2.2.3 Build TRREB PDF parser
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Extract summary tables from TRREB PDFs |
|
||||
| **Tool** | pdfplumber or camelot-py |
|
||||
| **Location** | Pages 3-4 (Summary by Area) |
|
||||
| **Fields** | report_date, area_code, area_name, area_type, sales, dollar_volume, avg_price, median_price, new_listings, active_listings, avg_sp_lp, avg_dom |
|
||||
| **Output** | `portfolio_app/toronto/parsers/trreb.py` |
|
||||
|
||||
#### 2.2.7 CMHC reliability code handling
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Parse reliability codes, handle suppression |
|
||||
| **Codes** | a (excellent), b (good), c (fair), d (poor/caution), ** (suppressed → NULL) |
|
||||
| **Implementation** | Pydantic validators, enum type |
|
||||
|
||||
### 2.3 Database Layer
|
||||
|
||||
#### 2.3.8 Create dimension models
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | SQLAlchemy 2.0 models for dimensions |
|
||||
| **Tables** | `dim_time`, `dim_trreb_district`, `dim_cmhc_zone`, `dim_neighbourhood`, `dim_policy_event` |
|
||||
| **Geometry** | PostGIS geometry columns for districts, zones, neighbourhoods |
|
||||
| **Note** | `dim_neighbourhood` has no FK to facts in V1 |
|
||||
|
||||
#### 2.3.9 Create fact models
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | SQLAlchemy 2.0 models for facts |
|
||||
| **Tables** | `fact_purchases`, `fact_rentals` |
|
||||
| **FKs** | fact_purchases → dim_time, dim_trreb_district; fact_rentals → dim_time, dim_cmhc_zone |
|
||||
|
||||
### 2.4 dbt Transformation
|
||||
|
||||
#### 2.4.8 Create staging models
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | 1:1 source mapping, cleaned and typed |
|
||||
| **Models** | `stg_trreb__monthly`, `stg_cmhc__rental` |
|
||||
| **Naming** | `stg_{source}__{entity}` |
|
||||
|
||||
#### 2.4.11 Create dbt schema tests
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Data quality tests |
|
||||
| **Tests** | `unique` (PKs), `not_null` (required), `accepted_values` (reliability codes, area_type), `relationships` (FK integrity) |
|
||||
|
||||
#### 2.4.12 Create custom dbt tests
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Anomaly detection rules |
|
||||
| **Rules** | Price MoM change >30% → flag; missing districts → fail; duplicate records → fail |
|
||||
|
||||
### 2.5 Visualization
|
||||
|
||||
#### 2.5.2 Build choropleth factory
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Reusable choropleth_mapbox figure generator |
|
||||
| **Inputs** | GeoDataFrame, metric column, color config |
|
||||
| **Output** | Plotly figure |
|
||||
| **Location** | `portfolio_app/figures/choropleth.py` |
|
||||
|
||||
#### 2.5.4—2.5.6 Statistical chart factories
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Statistical analysis visualizations |
|
||||
| **Charts** | YoY change with variance bands, seasonality decomposition, district correlation matrix |
|
||||
| **Location** | `portfolio_app/figures/statistical.py` |
|
||||
| **Why** | Required skill demonstration per project plan |
|
||||
|
||||
#### 2.5.8 Create dashboard layout
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Toronto dashboard page structure |
|
||||
| **File** | `portfolio_app/pages/toronto/dashboard.py` |
|
||||
| **Pattern** | Layout only — no callbacks in this file |
|
||||
| **Components** | Navbar, choropleth map, time controls, layer toggles, time series panel, statistics panel, footer |
|
||||
|
||||
#### 2.5.13—2.5.16 Create callbacks
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Dashboard interaction logic |
|
||||
| **Location** | `portfolio_app/pages/toronto/callbacks/` |
|
||||
| **Files** | `__init__.py`, `map_callbacks.py`, `filter_callbacks.py`, `timeseries_callbacks.py` |
|
||||
| **Pattern** | Separate from layout per project plan callback separation pattern |
|
||||
| **Registration** | Import callback modules in `callbacks/__init__.py`; import that package in `dashboard.py`. Dash Pages auto-discovers callbacks when module is imported. |
|
||||
|
||||
#### 2.5.22 Test dashboard renders with sample data
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **What** | Verify dashboard works end-to-end |
|
||||
| **Sample Data** | Use output from task 2.3.12 (fact loaders). Run loaders with subset of parsed data before this task. |
|
||||
| **Verify** | Choropleth renders, time controls work, tooltips display, no console errors |
|
||||
|
||||
---
|
||||
|
||||
## Sprint Plan
|
||||
|
||||
### Sprint 1: Project Bootstrap + Start TRREB Digitization
|
||||
|
||||
**Goal**: Dev environment working, repo initialized, TRREB digitization started
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| 1.1.1 | Git repo init | Low |
|
||||
| 1.1.2 | .gitignore | Low |
|
||||
| 1.1.3 | pyproject.toml | Low |
|
||||
| 1.1.4 | .python-version | Low |
|
||||
| 1.1.5 | .env.example | Low |
|
||||
| 1.1.6 | README.md (initial) | Low |
|
||||
| 1.1.7 | CLAUDE.md | Low |
|
||||
| 1.1.8 | Makefile | Low |
|
||||
| 1.2.1 | Python env setup | Low |
|
||||
| 1.2.2 | .pre-commit-config.yaml | Low |
|
||||
| 1.2.3 | Install pre-commit | Low |
|
||||
| 1.2.4 | docker-compose.yml | Low |
|
||||
| 1.2.5 | scripts/ directory structure | Low |
|
||||
| 1.2.6—1.2.9 | Docker scripts | Low |
|
||||
| 1.2.10 | scripts/db/init.sh | Low |
|
||||
| 1.2.11 | scripts/dev/setup.sh | Low |
|
||||
| 1.2.12 | Verify Docker + PostGIS | Low |
|
||||
| 1.3.1 | portfolio_app/ directory structure | Low |
|
||||
| 1.3.2—1.3.6 | App foundation files | Low |
|
||||
| 1.3.14—1.3.17 | Test infrastructure | Low |
|
||||
| 2.1.1 | Download TRREB PDFs | Low |
|
||||
| 2.1.2 | **START** TRREB boundaries (HUMAN) | High |
|
||||
| 2.1.9 | **START** Policy events research | Mid |
|
||||
|
||||
---
|
||||
|
||||
### Sprint 2: Bio Page + Data Acquisition
|
||||
|
||||
**Goal**: Bio live, all raw data downloaded
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| 1.3.7 | app.py with Pages | Low |
|
||||
| 1.3.8 | Theme config | Low |
|
||||
| 1.3.9—1.3.13 | Assets directory + files | Low |
|
||||
| 1.4.1—1.4.4 | Components | Low |
|
||||
| 1.4.5—1.4.10 | Bio page | Low |
|
||||
| 1.5.1—1.5.3 | VPS setup | Low |
|
||||
| 1.5.4—1.5.6 | Gunicorn/Nginx/SSL | Low |
|
||||
| 1.5.7—1.5.8 | Deploy scripts | Low |
|
||||
| 1.5.9—1.5.10 | Deploy + verify | Low |
|
||||
| 2.1.2 | **CONTINUE** TRREB boundaries | High |
|
||||
| 2.1.3—2.1.4 | CMHC registration + export | Low |
|
||||
| 2.1.5 | CMHC zone boundaries (R) | Low |
|
||||
| 2.1.6 | Neighbourhoods GeoJSON | Low |
|
||||
| 2.1.7 | Neighbourhood Profiles download | Low |
|
||||
| 2.1.9 | **CONTINUE** Policy events research | Mid |
|
||||
| 2.1.10 | policy_events.csv | Low |
|
||||
| 2.1.11—2.1.12 | data/ directory + organize | Low |
|
||||
|
||||
**Milestone**: **Launch 1 — Bio Live**
|
||||
|
||||
---
|
||||
|
||||
### Sprint 3: Parsers + Schemas + Models
|
||||
|
||||
**Goal**: ETL pipeline working, database layer complete
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| 2.1.2 | **COMPLETE** TRREB boundaries | High |
|
||||
| 2.1.8 | CRS validation | Low |
|
||||
| 2.2.1—2.2.2 | Toronto module init | Low |
|
||||
| 2.2.3—2.2.5 | TRREB parser + tests | Mid |
|
||||
| 2.2.6—2.2.8 | CMHC processor + tests | Low |
|
||||
| 2.2.9 | Neighbourhood Profiles parser | Low |
|
||||
| 2.2.10 | Policy events loader | Low |
|
||||
| 2.3.1—2.3.5 | Pydantic schemas | Low |
|
||||
| 2.3.6—2.3.9 | SQLAlchemy models | Low |
|
||||
|
||||
---
|
||||
|
||||
### Sprint 4: Loaders + dbt
|
||||
|
||||
**Goal**: Data loaded, transformation layer ready
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| 2.3.10—2.3.13 | Loaders + tests | Mid |
|
||||
| 2.3.14 | SQL views | Low |
|
||||
| 2.4.1—2.4.7 | dbt setup + scripts | Low |
|
||||
| 2.4.8—2.4.10 | dbt models | Low |
|
||||
| 2.4.11—2.4.12 | dbt tests | Low |
|
||||
| 2.4.13 | dbt documentation | Low |
|
||||
| 2.7.1—2.7.3 | DB backup/restore scripts | Low |
|
||||
|
||||
---
|
||||
|
||||
### Sprint 5: Visualization
|
||||
|
||||
**Goal**: Dashboard functional
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| 2.5.1—2.5.6 | Figure factories | Mid |
|
||||
| 2.5.7—2.5.12 | Dashboard layout + controls | Mid |
|
||||
| 2.5.13—2.5.16 | Callbacks | Mid |
|
||||
| 2.5.17—2.5.21 | Tooltips + overlays + markers | Low |
|
||||
| 2.5.22 | Test dashboard | Low |
|
||||
|
||||
---
|
||||
|
||||
### Sprint 6: Polish + Launch 2
|
||||
|
||||
**Goal**: Dashboard deployed
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| 2.6.1—2.6.6 | Documentation | Low |
|
||||
| 2.7.4—2.7.5 | Rollback script + retention | Low |
|
||||
| 2.7.6—2.7.7 | Health endpoint + monitoring | Low |
|
||||
| 2.7.8—2.7.9 | Deploy + verify | Low |
|
||||
|
||||
**Milestone**: **Launch 2 — Toronto Dashboard Live**
|
||||
|
||||
---
|
||||
|
||||
### Sprint 7: Buffer
|
||||
|
||||
**Goal**: Contingency for slippage, bug fixes
|
||||
|
||||
| Task ID | Task | Effort |
|
||||
|---------|------|--------|
|
||||
| — | Overflow from previous sprints | Varies |
|
||||
| — | Bug fixes | Varies |
|
||||
| — | UX polish | Low |
|
||||
|
||||
---
|
||||
|
||||
## Sprint Summary
|
||||
|
||||
| Sprint | Focus | Key Risk | Milestone |
|
||||
|--------|-------|----------|-----------|
|
||||
| 1 | Bootstrap + start boundaries | — | — |
|
||||
| 2 | Bio + data acquisition | TRREB digitization | Launch 1 |
|
||||
| 3 | Parsers + DB layer | PDF parser, boundaries | — |
|
||||
| 4 | Loaders + dbt | — | — |
|
||||
| 5 | Visualization | Choropleth complexity | — |
|
||||
| 6 | Polish + deploy | — | Launch 2 |
|
||||
| 7 | Buffer | — | — |
|
||||
|
||||
---
|
||||
|
||||
## Dependency Graph
|
||||
|
||||
### Launch 1 Critical Path
|
||||
```
|
||||
1.1.1 → 1.1.3 → 1.2.1 → 1.3.1 → 1.3.7 → 1.4.6 → 1.4.10 → 1.5.9 → 1.5.10
|
||||
```
|
||||
|
||||
### Launch 2 Critical Path
|
||||
```
|
||||
2.1.2 (TRREB boundaries) ─┬→ 2.1.8 (CRS) → 2.5.2 (choropleth) → 2.5.8 (layout) → 2.5.22 (test) → 2.7.8 (deploy)
|
||||
│
|
||||
2.1.1 → 2.2.3 (parser) → 2.2.4 → 2.3.12 (loaders) → 2.4.8 (dbt) ─┘
|
||||
```
|
||||
|
||||
### Parallel Tracks (can run simultaneously)
|
||||
|
||||
| Track | Tasks | Can Start |
|
||||
|-------|-------|-----------|
|
||||
| **A: TRREB Boundaries** | 2.1.1 → 2.1.2 | Sprint 1 |
|
||||
| **B: TRREB Parser** | 2.2.3—2.2.5 | Sprint 2 (after PDFs) |
|
||||
| **C: CMHC** | 2.1.3—2.1.5 → 2.2.6—2.2.8 | Sprint 2 |
|
||||
| **D: Enrichment** | 2.1.6—2.1.7 → 2.2.9 | Sprint 2 |
|
||||
| **E: Policy Events** | 2.1.9—2.1.10 → 2.2.10 | Sprint 1—2 |
|
||||
| **F: Schemas/Models** | 2.3.1—2.3.9 | Sprint 3 (after parsers) |
|
||||
| **G: dbt** | 2.4.* | Sprint 4 (after loaders) |
|
||||
| **H: Ops Scripts** | 2.7.1—2.7.5 | Sprint 4 |
|
||||
|
||||
---
|
||||
|
||||
## Risk Register
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| TRREB digitization slips | Medium | High | Start Sprint 1; timebox; accept lower precision initially |
|
||||
| PDF parser breaks on older years | Medium | Medium | Test multiple years early; build fallbacks |
|
||||
| PostGIS geometry issues | Low | Medium | Validate CRS before load (2.1.8) |
|
||||
| Choropleth performance | Low | Medium | Pre-aggregate; simplify geometries |
|
||||
| Policy events research takes too long | Medium | Low | Cap at 10 events minimum; expand post-launch |
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Launch 1
|
||||
- [ ] Bio page accessible via HTTPS
|
||||
- [ ] All content from `bio_content_v2.md` rendered
|
||||
- [ ] No placeholder text ([USERNAME]) visible
|
||||
- [ ] Mobile responsive
|
||||
- [ ] Social links functional
|
||||
|
||||
### Launch 2
|
||||
- [ ] Choropleth renders TRREB districts
|
||||
- [ ] Choropleth renders CMHC zones
|
||||
- [ ] Purchase/rental mode toggle works
|
||||
- [ ] Time navigation works (monthly for TRREB, annual for CMHC)
|
||||
- [ ] Policy event markers visible on time series
|
||||
- [ ] Neighbourhood overlay toggleable
|
||||
- [ ] Methodology documentation published
|
||||
- [ ] Data sources cited
|
||||
- [ ] Health endpoint responds
|
||||
|
||||
---
|
||||
|
||||
## Effort Legend
|
||||
|
||||
| Level | Meaning |
|
||||
|-------|---------|
|
||||
| **Low** | Straightforward; minimal iteration expected |
|
||||
| **Mid** | Requires debugging or multi-step coordination |
|
||||
| **High** | Complex logic, external tools, or human intervention required |
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 4.1*
|
||||
*Created: January 2026*
|
||||
@@ -1,5 +1,5 @@
|
||||
"""Application-level callbacks for the portfolio app."""
|
||||
|
||||
from . import theme
|
||||
from . import sidebar, theme
|
||||
|
||||
__all__ = ["theme"]
|
||||
__all__ = ["sidebar", "theme"]
|
||||
|
||||
25
portfolio_app/callbacks/sidebar.py
Normal file
25
portfolio_app/callbacks/sidebar.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Sidebar navigation callbacks for active state updates."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
from dash import Input, Output, callback
|
||||
|
||||
from portfolio_app.components.sidebar import create_sidebar_content
|
||||
|
||||
|
||||
@callback( # type: ignore[misc]
|
||||
Output("floating-sidebar", "children"),
|
||||
Input("url", "pathname"),
|
||||
prevent_initial_call=False,
|
||||
)
|
||||
def update_sidebar_active_state(pathname: str) -> list[Any]:
|
||||
"""Update sidebar to highlight the current page.
|
||||
|
||||
Args:
|
||||
pathname: Current URL pathname from dcc.Location.
|
||||
|
||||
Returns:
|
||||
Updated sidebar content with correct active state.
|
||||
"""
|
||||
current_path = pathname or "/"
|
||||
return create_sidebar_content(current_path=current_path)
|
||||
@@ -4,9 +4,18 @@ import dash_mantine_components as dmc
|
||||
from dash import dcc, html
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
# Navigation items configuration
|
||||
NAV_ITEMS = [
|
||||
# Navigation items configuration - main pages
|
||||
NAV_ITEMS_MAIN = [
|
||||
{"path": "/", "icon": "tabler:home", "label": "Home"},
|
||||
{"path": "/about", "icon": "tabler:user", "label": "About"},
|
||||
{"path": "/blog", "icon": "tabler:article", "label": "Blog"},
|
||||
{"path": "/resume", "icon": "tabler:file-text", "label": "Resume"},
|
||||
{"path": "/contact", "icon": "tabler:mail", "label": "Contact"},
|
||||
]
|
||||
|
||||
# Navigation items configuration - projects/dashboards (separated)
|
||||
NAV_ITEMS_PROJECTS = [
|
||||
{"path": "/projects", "icon": "tabler:folder", "label": "Projects"},
|
||||
{"path": "/toronto", "icon": "tabler:map-2", "label": "Toronto Housing"},
|
||||
]
|
||||
|
||||
@@ -135,22 +144,23 @@ def create_sidebar_divider() -> html.Div:
|
||||
return html.Div(className="sidebar-divider")
|
||||
|
||||
|
||||
def create_sidebar(current_path: str = "/", current_theme: str = "dark") -> html.Div:
|
||||
"""Create the floating sidebar navigation.
|
||||
def create_sidebar_content(
|
||||
current_path: str = "/", current_theme: str = "dark"
|
||||
) -> list[dmc.Tooltip | html.Div]:
|
||||
"""Create the sidebar content list.
|
||||
|
||||
Args:
|
||||
current_path: Current page path for active state highlighting.
|
||||
current_theme: Current theme for toggle icon state.
|
||||
|
||||
Returns:
|
||||
Complete sidebar component.
|
||||
List of sidebar components.
|
||||
"""
|
||||
return html.Div(
|
||||
[
|
||||
return [
|
||||
# Brand logo
|
||||
create_brand_logo(),
|
||||
create_sidebar_divider(),
|
||||
# Navigation icons
|
||||
# Main navigation icons
|
||||
*[
|
||||
create_nav_icon(
|
||||
icon=item["icon"],
|
||||
@@ -158,7 +168,18 @@ def create_sidebar(current_path: str = "/", current_theme: str = "dark") -> html
|
||||
path=item["path"],
|
||||
current_path=current_path,
|
||||
)
|
||||
for item in NAV_ITEMS
|
||||
for item in NAV_ITEMS_MAIN
|
||||
],
|
||||
create_sidebar_divider(),
|
||||
# Dashboard/Project links
|
||||
*[
|
||||
create_nav_icon(
|
||||
icon=item["icon"],
|
||||
label=item["label"],
|
||||
path=item["path"],
|
||||
current_path=current_path,
|
||||
)
|
||||
for item in NAV_ITEMS_PROJECTS
|
||||
],
|
||||
create_sidebar_divider(),
|
||||
# Theme toggle
|
||||
@@ -173,7 +194,21 @@ def create_sidebar(current_path: str = "/", current_theme: str = "dark") -> html
|
||||
)
|
||||
for link in EXTERNAL_LINKS
|
||||
],
|
||||
],
|
||||
className="floating-sidebar",
|
||||
]
|
||||
|
||||
|
||||
def create_sidebar(current_path: str = "/", current_theme: str = "dark") -> html.Div:
|
||||
"""Create the floating sidebar navigation.
|
||||
|
||||
Args:
|
||||
current_path: Current page path for active state highlighting.
|
||||
current_theme: Current theme for toggle icon state.
|
||||
|
||||
Returns:
|
||||
Complete sidebar component.
|
||||
"""
|
||||
return html.Div(
|
||||
id="floating-sidebar",
|
||||
className="floating-sidebar",
|
||||
children=create_sidebar_content(current_path, current_theme),
|
||||
)
|
||||
|
||||
111
portfolio_app/content/blog/building-data-platform-team-of-one.md
Normal file
111
portfolio_app/content/blog/building-data-platform-team-of-one.md
Normal file
@@ -0,0 +1,111 @@
|
||||
---
|
||||
title: "Building a Data Platform as a Team of One"
|
||||
date: "2025-01-15"
|
||||
description: "What I learned from 5 years as the sole data professional at a mid-size company"
|
||||
tags:
|
||||
- data-engineering
|
||||
- career
|
||||
- lessons-learned
|
||||
status: published
|
||||
---
|
||||
|
||||
When I joined Summitt Energy in 2019, there was no data infrastructure. No warehouse. No pipelines. No documentation. Just a collection of spreadsheets and a Genesys Cloud instance spitting out CSVs.
|
||||
|
||||
Five years later, I'd built DataFlow: an enterprise platform processing 1B+ rows across 21 tables, feeding dashboards that executives actually opened. Here's what I learned doing it alone.
|
||||
|
||||
## The Reality of "Full Stack Data"
|
||||
|
||||
When you're the only data person, "full stack" isn't a buzzword—it's survival. In a single week, I might:
|
||||
|
||||
- Debug a Python ETL script at 7am because overnight loads failed
|
||||
- Present quarterly metrics to leadership at 10am
|
||||
- Design a new dimensional model over lunch
|
||||
- Write SQL transformations in the afternoon
|
||||
- Handle ad-hoc "can you pull this data?" requests between meetings
|
||||
|
||||
There's no handoff. No "that's not my job." Everything is your job.
|
||||
|
||||
## Prioritization Frameworks
|
||||
|
||||
The hardest part isn't the technical work—it's deciding what to build first when everything feels urgent.
|
||||
|
||||
### The 80/20 Rule, Applied Ruthlessly
|
||||
|
||||
I asked myself: **What 20% of the data drives 80% of decisions?**
|
||||
|
||||
For a contact center, that turned out to be:
|
||||
- Call volume by interval
|
||||
- Abandon rate
|
||||
- Average handle time
|
||||
- Service level
|
||||
|
||||
Everything else was nice-to-have. I built those four metrics first, got them bulletproof, then expanded.
|
||||
|
||||
### The "Who's Screaming?" Test
|
||||
|
||||
When multiple stakeholders want different things:
|
||||
1. Who has executive backing?
|
||||
2. What's blocking revenue?
|
||||
3. What's causing visible pain?
|
||||
|
||||
If nobody's screaming, it can probably wait.
|
||||
|
||||
## Technical Debt vs. Shipping
|
||||
|
||||
I rewrote DataFlow three times:
|
||||
|
||||
- **v1 (2020)**: Hacky Python scripts. Worked, barely.
|
||||
- **v2 (2021)**: Proper dimensional model. Still messy code.
|
||||
- **v3 (2022)**: SQLAlchemy ORM, proper error handling, logging.
|
||||
- **v4 (2023)**: dbt-style transformations, FastAPI layer.
|
||||
|
||||
Was v1 embarrassing? Yes. Did it work? Also yes.
|
||||
|
||||
**The lesson**: Ship something that works, then iterate. Perfect is the enemy of done, especially when you're alone.
|
||||
|
||||
## Building Stakeholder Trust
|
||||
|
||||
The technical work is maybe 40% of the job. The rest is politics.
|
||||
|
||||
### Quick Wins First
|
||||
|
||||
Before asking for resources or patience, I delivered:
|
||||
- Automated a weekly report that took someone 4 hours
|
||||
- Fixed a dashboard that had been wrong for months
|
||||
- Built a simple tool that answered a frequent question
|
||||
|
||||
Trust is earned in small deposits.
|
||||
|
||||
### Speak Their Language
|
||||
|
||||
Executives don't care about your star schema. They care about:
|
||||
- "This will save 10 hours/week"
|
||||
- "This will catch errors before they hit customers"
|
||||
- "This will let you see X in real-time"
|
||||
|
||||
Translate technical work into business outcomes.
|
||||
|
||||
## What I'd Do Differently
|
||||
|
||||
1. **Document earlier**. I waited too long. When I finally wrote things down, onboarding became possible.
|
||||
|
||||
2. **Say no more**. Every "yes" to an ad-hoc request is a "no" to infrastructure work. Guard your time.
|
||||
|
||||
3. **Build monitoring first**. I spent too many mornings discovering failures manually. Alerting should be table stakes.
|
||||
|
||||
4. **Version control everything**. Even SQL. Even documentation. If it's not in Git, it doesn't exist.
|
||||
|
||||
## The Upside
|
||||
|
||||
Being a team of one forced me to learn things I'd have specialized away from on a bigger team:
|
||||
- Data modeling
|
||||
- Pipeline architecture
|
||||
- Dashboard design
|
||||
- Stakeholder management
|
||||
- System administration
|
||||
|
||||
It's brutal, but it makes you dangerous. You understand the whole stack.
|
||||
|
||||
---
|
||||
|
||||
*This is part of a series on building data infrastructure at small companies. More posts coming on dimensional modeling, dbt patterns, and surviving legacy systems.*
|
||||
@@ -2,7 +2,6 @@
|
||||
|
||||
from .choropleth import (
|
||||
create_choropleth_figure,
|
||||
create_district_map,
|
||||
create_zone_map,
|
||||
)
|
||||
from .summary_cards import create_metric_card_figure, create_summary_metrics
|
||||
@@ -17,7 +16,6 @@ from .time_series import (
|
||||
__all__ = [
|
||||
# Choropleth
|
||||
"create_choropleth_figure",
|
||||
"create_district_map",
|
||||
"create_zone_map",
|
||||
# Time series
|
||||
"create_price_time_series",
|
||||
|
||||
@@ -115,34 +115,6 @@ def create_choropleth_figure(
|
||||
return fig
|
||||
|
||||
|
||||
def create_district_map(
|
||||
districts_geojson: dict[str, Any] | None,
|
||||
purchase_data: list[dict[str, Any]],
|
||||
metric: str = "avg_price",
|
||||
) -> go.Figure:
|
||||
"""Create choropleth map for TRREB districts.
|
||||
|
||||
Args:
|
||||
districts_geojson: GeoJSON for TRREB district boundaries.
|
||||
purchase_data: Purchase statistics by district.
|
||||
metric: Metric to display (avg_price, sales_count, etc.).
|
||||
|
||||
Returns:
|
||||
Plotly Figure object.
|
||||
"""
|
||||
hover_columns = ["district_name", "sales_count", "avg_price", "median_price"]
|
||||
|
||||
return create_choropleth_figure(
|
||||
geojson=districts_geojson,
|
||||
data=purchase_data,
|
||||
location_key="district_code",
|
||||
color_column=metric,
|
||||
hover_data=[c for c in hover_columns if c != metric],
|
||||
color_scale="Blues" if "price" in metric else "Greens",
|
||||
title="Toronto Purchase Market by District",
|
||||
)
|
||||
|
||||
|
||||
def create_zone_map(
|
||||
zones_geojson: dict[str, Any] | None,
|
||||
rental_data: list[dict[str, Any]],
|
||||
|
||||
248
portfolio_app/pages/about.py
Normal file
248
portfolio_app/pages/about.py
Normal file
@@ -0,0 +1,248 @@
|
||||
"""About page - Professional narrative and background."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
dash.register_page(__name__, path="/about", name="About")
|
||||
|
||||
# Opening section
|
||||
OPENING = """I didn't start in data. I started in project management—CAPM certified, ITIL trained, \
|
||||
the whole corporate playbook. Then I realized I liked building systems more than managing timelines, \
|
||||
and I was better at automating reports than attending meetings about them.
|
||||
|
||||
That pivot led me to where I am now: 8 years deep in data engineering, analytics, and the messy \
|
||||
reality of turning raw information into something people can actually use."""
|
||||
|
||||
# What I Actually Do section
|
||||
WHAT_I_DO_SHORT = "The short version: I build data infrastructure. Pipelines, warehouses, \
|
||||
dashboards, automation—the invisible machinery that makes businesses run on data instead of gut feelings."
|
||||
|
||||
WHAT_I_DO_LONG = """The longer version: At Summitt Energy, I've been the sole data professional \
|
||||
supporting 150+ employees across 9 markets (Canada and US). I inherited nothing—no data warehouse, \
|
||||
no reporting infrastructure, no documentation. Over 5 years, I built DataFlow: an enterprise \
|
||||
platform processing 1B+ rows, integrating contact center data, CRM systems, and legacy tools \
|
||||
that definitely weren't designed to talk to each other.
|
||||
|
||||
That meant learning to be a generalist. I've done ETL pipeline development (Python, SQLAlchemy), \
|
||||
dimensional modeling, dashboard design (Power BI, Plotly-Dash), API integration, and more \
|
||||
stakeholder management than I'd like to admit. When you're the only data person, you learn to wear every hat."""
|
||||
|
||||
# How I Think About Data
|
||||
DATA_PHILOSOPHY_INTRO = "I'm not interested in data for data's sake. The question I always \
|
||||
start with: What decision does this help someone make?"
|
||||
|
||||
DATA_PHILOSOPHY_DETAIL = """Most of my work has been in operations-heavy environments—contact \
|
||||
centers, energy retail, logistics. These aren't glamorous domains, but they're where data can \
|
||||
have massive impact. A 30% improvement in abandon rate isn't just a metric; it's thousands of \
|
||||
customers who didn't hang up frustrated. A 40% reduction in reporting time means managers can \
|
||||
actually manage instead of wrestling with spreadsheets."""
|
||||
|
||||
DATA_PHILOSOPHY_CLOSE = "I care about outcomes, not technology stacks."
|
||||
|
||||
# Technical skills
|
||||
TECH_SKILLS = {
|
||||
"Languages": "Python (Pandas, SQLAlchemy, FastAPI), SQL (MSSQL, PostgreSQL), R, VBA",
|
||||
"Data Engineering": "ETL/ELT pipelines, dimensional modeling (star schema), dbt patterns, batch processing, API integration, web scraping (Selenium)",
|
||||
"Visualization": "Plotly/Dash, Power BI, Tableau",
|
||||
"Platforms": "Genesys Cloud, Five9, Zoho, Azure DevOps",
|
||||
"Currently Learning": "Cloud certification (Azure DP-203), Airflow, Snowflake",
|
||||
}
|
||||
|
||||
# Outside Work
|
||||
OUTSIDE_WORK_INTRO = "I'm a Brazilian-Canadian based in Toronto. I speak Portuguese (native), \
|
||||
English (fluent), and enough Spanish to survive."
|
||||
|
||||
OUTSIDE_WORK_ACTIVITIES = [
|
||||
"Building automation tools for small businesses through Bandit Labs (my side project)",
|
||||
"Contributing to open source (MCP servers, Claude Code plugins)",
|
||||
'Trying to explain to my kid why Daddy\'s job involves "making computers talk to each other"',
|
||||
]
|
||||
|
||||
# What I'm Looking For
|
||||
LOOKING_FOR_INTRO = "I'm currently exploring Senior Data Analyst and Data Engineer roles in \
|
||||
the Toronto area (or remote). I'm most interested in:"
|
||||
|
||||
LOOKING_FOR_ITEMS = [
|
||||
"Companies that treat data as infrastructure, not an afterthought",
|
||||
"Teams where I can contribute to architecture decisions, not just execute tickets",
|
||||
"Operations-focused industries (energy, logistics, financial services, contact center tech)",
|
||||
]
|
||||
|
||||
LOOKING_FOR_CLOSE = "If that sounds like your team, let's talk."
|
||||
|
||||
|
||||
def create_section_title(title: str) -> dmc.Title:
|
||||
"""Create a consistent section title."""
|
||||
return dmc.Title(title, order=2, size="h3", mb="sm")
|
||||
|
||||
|
||||
def create_opening_section() -> dmc.Paper:
|
||||
"""Create the opening/intro section."""
|
||||
paragraphs = OPENING.split("\n\n")
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[dmc.Text(p, size="md") for p in paragraphs],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_what_i_do_section() -> dmc.Paper:
|
||||
"""Create the What I Actually Do section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_section_title("What I Actually Do"),
|
||||
dmc.Text(WHAT_I_DO_SHORT, size="md", fw=500),
|
||||
dmc.Text(WHAT_I_DO_LONG, size="md"),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_philosophy_section() -> dmc.Paper:
|
||||
"""Create the How I Think About Data section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_section_title("How I Think About Data"),
|
||||
dmc.Text(DATA_PHILOSOPHY_INTRO, size="md", fw=500),
|
||||
dmc.Text(DATA_PHILOSOPHY_DETAIL, size="md"),
|
||||
dmc.Text(DATA_PHILOSOPHY_CLOSE, size="md", fw=500, fs="italic"),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_tech_section() -> dmc.Paper:
|
||||
"""Create the Technical Stuff section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_section_title("The Technical Stuff"),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Text(category + ":", fw=600, size="sm", w=150),
|
||||
dmc.Text(skills, size="sm", c="dimmed"),
|
||||
],
|
||||
gap="sm",
|
||||
align="flex-start",
|
||||
wrap="nowrap",
|
||||
)
|
||||
for category, skills in TECH_SKILLS.items()
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_outside_work_section() -> dmc.Paper:
|
||||
"""Create the Outside Work section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_section_title("Outside Work"),
|
||||
dmc.Text(OUTSIDE_WORK_INTRO, size="md"),
|
||||
dmc.Text("When I'm not staring at SQL, I'm usually:", size="md"),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem(dmc.Text(item, size="md"))
|
||||
for item in OUTSIDE_WORK_ACTIVITIES
|
||||
],
|
||||
spacing="xs",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_looking_for_section() -> dmc.Paper:
|
||||
"""Create the What I'm Looking For section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_section_title("What I'm Looking For"),
|
||||
dmc.Text(LOOKING_FOR_INTRO, size="md"),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem(dmc.Text(item, size="md"))
|
||||
for item in LOOKING_FOR_ITEMS
|
||||
],
|
||||
spacing="xs",
|
||||
),
|
||||
dmc.Text(LOOKING_FOR_CLOSE, size="md", fw=500),
|
||||
dmc.Group(
|
||||
[
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"Download Resume",
|
||||
variant="filled",
|
||||
leftSection=DashIconify(
|
||||
icon="tabler:download", width=18
|
||||
),
|
||||
),
|
||||
href="/resume",
|
||||
),
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"Contact Me",
|
||||
variant="outline",
|
||||
leftSection=DashIconify(icon="tabler:mail", width=18),
|
||||
),
|
||||
href="/contact",
|
||||
),
|
||||
],
|
||||
gap="sm",
|
||||
mt="md",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("About", order=1, ta="center", mb="lg"),
|
||||
create_opening_section(),
|
||||
create_what_i_do_section(),
|
||||
create_philosophy_section(),
|
||||
create_tech_section(),
|
||||
create_outside_work_section(),
|
||||
create_looking_for_section(),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="xl",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
1
portfolio_app/pages/blog/__init__.py
Normal file
1
portfolio_app/pages/blog/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Blog pages package."""
|
||||
147
portfolio_app/pages/blog/article.py
Normal file
147
portfolio_app/pages/blog/article.py
Normal file
@@ -0,0 +1,147 @@
|
||||
"""Blog article page - Dynamic routing for individual articles."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc, html
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
from portfolio_app.utils.markdown_loader import get_article
|
||||
|
||||
dash.register_page(
|
||||
__name__,
|
||||
path_template="/blog/<slug>",
|
||||
name="Article",
|
||||
)
|
||||
|
||||
|
||||
def create_not_found() -> dmc.Container:
|
||||
"""Create 404 state for missing articles."""
|
||||
return dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:file-unknown", width=48),
|
||||
size=80,
|
||||
radius="xl",
|
||||
variant="light",
|
||||
color="red",
|
||||
),
|
||||
dmc.Title("Article Not Found", order=2),
|
||||
dmc.Text(
|
||||
"The article you're looking for doesn't exist or has been moved.",
|
||||
size="md",
|
||||
c="dimmed",
|
||||
ta="center",
|
||||
),
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"Back to Blog",
|
||||
variant="light",
|
||||
leftSection=DashIconify(icon="tabler:arrow-left", width=18),
|
||||
),
|
||||
href="/blog",
|
||||
),
|
||||
],
|
||||
align="center",
|
||||
gap="md",
|
||||
py="xl",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
|
||||
|
||||
def layout(slug: str = "") -> dmc.Container:
|
||||
"""Generate the article layout dynamically.
|
||||
|
||||
Args:
|
||||
slug: Article slug from URL path.
|
||||
"""
|
||||
if not slug:
|
||||
return create_not_found()
|
||||
|
||||
article = get_article(slug)
|
||||
if not article:
|
||||
return create_not_found()
|
||||
|
||||
meta = article["meta"]
|
||||
|
||||
return dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
# Back link
|
||||
dcc.Link(
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(icon="tabler:arrow-left", width=16),
|
||||
dmc.Text("Back to Blog", size="sm"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
href="/blog",
|
||||
style={"textDecoration": "none"},
|
||||
),
|
||||
# Article header
|
||||
dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title(meta["title"], order=1),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(
|
||||
icon="tabler:calendar", width=16
|
||||
),
|
||||
dmc.Text(
|
||||
meta["date"], size="sm", c="dimmed"
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(tag, variant="light", size="sm")
|
||||
for tag in meta.get("tags", [])
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
justify="space-between",
|
||||
wrap="wrap",
|
||||
),
|
||||
(
|
||||
dmc.Text(meta["description"], size="lg", c="dimmed")
|
||||
if meta.get("description")
|
||||
else None
|
||||
),
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
),
|
||||
# Article content
|
||||
dmc.Paper(
|
||||
html.Div(
|
||||
# Render HTML content from markdown
|
||||
# Using dangerously_allow_html via dcc.Markdown or html.Div
|
||||
dcc.Markdown(
|
||||
article["content"],
|
||||
className="article-content",
|
||||
dangerously_allow_html=True,
|
||||
),
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
className="article-body",
|
||||
),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
113
portfolio_app/pages/blog/index.py
Normal file
113
portfolio_app/pages/blog/index.py
Normal file
@@ -0,0 +1,113 @@
|
||||
"""Blog index page - Article listing."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
from portfolio_app.utils.markdown_loader import Article, get_all_articles
|
||||
|
||||
dash.register_page(__name__, path="/blog", name="Blog")
|
||||
|
||||
# Page intro
|
||||
INTRO_TEXT = (
|
||||
"I write occasionally about data engineering, automation, and the reality of being "
|
||||
"a one-person data team. No hot takes, no growth hacking—just things I've learned "
|
||||
"the hard way."
|
||||
)
|
||||
|
||||
|
||||
def create_article_card(article: Article) -> dmc.Paper:
|
||||
"""Create an article preview card."""
|
||||
meta = article["meta"]
|
||||
return dmc.Paper(
|
||||
dcc.Link(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Text(meta["title"], fw=600, size="lg"),
|
||||
dmc.Text(meta["date"], size="sm", c="dimmed"),
|
||||
],
|
||||
justify="space-between",
|
||||
align="flex-start",
|
||||
wrap="wrap",
|
||||
),
|
||||
dmc.Text(meta["description"], size="md", c="dimmed", lineClamp=2),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(tag, variant="light", size="sm")
|
||||
for tag in meta.get("tags", [])[:3]
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
href=f"/blog/{meta['slug']}",
|
||||
style={"textDecoration": "none", "color": "inherit"},
|
||||
),
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
className="article-card",
|
||||
)
|
||||
|
||||
|
||||
def create_empty_state() -> dmc.Paper:
|
||||
"""Create empty state when no articles exist."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:article-off", width=48),
|
||||
size=80,
|
||||
radius="xl",
|
||||
variant="light",
|
||||
color="gray",
|
||||
),
|
||||
dmc.Title("No Articles Yet", order=3),
|
||||
dmc.Text(
|
||||
"Articles are coming soon. Check back later!",
|
||||
size="md",
|
||||
c="dimmed",
|
||||
ta="center",
|
||||
),
|
||||
],
|
||||
align="center",
|
||||
gap="md",
|
||||
py="xl",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def layout() -> dmc.Container:
|
||||
"""Generate the blog index layout dynamically."""
|
||||
articles = get_all_articles(include_drafts=False)
|
||||
|
||||
return dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Blog", order=1, ta="center"),
|
||||
dmc.Text(
|
||||
INTRO_TEXT, size="md", c="dimmed", ta="center", maw=600, mx="auto"
|
||||
),
|
||||
dmc.Divider(my="lg"),
|
||||
(
|
||||
dmc.Stack(
|
||||
[create_article_card(article) for article in articles],
|
||||
gap="lg",
|
||||
)
|
||||
if articles
|
||||
else create_empty_state()
|
||||
),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
287
portfolio_app/pages/contact.py
Normal file
287
portfolio_app/pages/contact.py
Normal file
@@ -0,0 +1,287 @@
|
||||
"""Contact page - Form UI and direct contact information."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
dash.register_page(__name__, path="/contact", name="Contact")
|
||||
|
||||
# Contact information
|
||||
CONTACT_INFO = {
|
||||
"email": "leobrmi@hotmail.com",
|
||||
"phone": "(416) 859-7936",
|
||||
"linkedin": "https://linkedin.com/in/leobmiranda",
|
||||
"github": "https://github.com/leomiranda",
|
||||
"location": "Toronto, ON, Canada",
|
||||
}
|
||||
|
||||
# Page intro text
|
||||
INTRO_TEXT = (
|
||||
"I'm currently open to Senior Data Analyst and Data Engineer roles in Toronto "
|
||||
"(or remote). If you're working on something interesting and need someone who can "
|
||||
"build data infrastructure from scratch, I'd like to hear about it."
|
||||
)
|
||||
|
||||
CONSULTING_TEXT = (
|
||||
"For consulting inquiries (automation, dashboards, small business data work), "
|
||||
"reach out about Bandit Labs."
|
||||
)
|
||||
|
||||
# Form subject options
|
||||
SUBJECT_OPTIONS = [
|
||||
{"value": "job", "label": "Job Opportunity"},
|
||||
{"value": "consulting", "label": "Consulting Inquiry"},
|
||||
{"value": "other", "label": "Other"},
|
||||
]
|
||||
|
||||
|
||||
def create_intro_section() -> dmc.Stack:
|
||||
"""Create the intro text section."""
|
||||
return dmc.Stack(
|
||||
[
|
||||
dmc.Title("Get In Touch", order=1, ta="center"),
|
||||
dmc.Text(INTRO_TEXT, size="md", ta="center", maw=600, mx="auto"),
|
||||
dmc.Text(
|
||||
CONSULTING_TEXT, size="md", ta="center", maw=600, mx="auto", c="dimmed"
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
mb="xl",
|
||||
)
|
||||
|
||||
|
||||
def create_contact_form() -> dmc.Paper:
|
||||
"""Create the contact form (disabled in Phase 1)."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Send a Message", order=2, size="h4"),
|
||||
dmc.Alert(
|
||||
"Contact form submission is coming soon. Please use the direct contact "
|
||||
"methods below for now.",
|
||||
title="Form Coming Soon",
|
||||
color="blue",
|
||||
variant="light",
|
||||
),
|
||||
dmc.TextInput(
|
||||
label="Name",
|
||||
placeholder="Your name",
|
||||
leftSection=DashIconify(icon="tabler:user", width=18),
|
||||
disabled=True,
|
||||
),
|
||||
dmc.TextInput(
|
||||
label="Email",
|
||||
placeholder="your.email@example.com",
|
||||
leftSection=DashIconify(icon="tabler:mail", width=18),
|
||||
disabled=True,
|
||||
),
|
||||
dmc.Select(
|
||||
label="Subject",
|
||||
placeholder="Select a subject",
|
||||
data=SUBJECT_OPTIONS,
|
||||
leftSection=DashIconify(icon="tabler:tag", width=18),
|
||||
disabled=True,
|
||||
),
|
||||
dmc.Textarea(
|
||||
label="Message",
|
||||
placeholder="Your message...",
|
||||
minRows=4,
|
||||
disabled=True,
|
||||
),
|
||||
dmc.Button(
|
||||
"Send Message",
|
||||
fullWidth=True,
|
||||
leftSection=DashIconify(icon="tabler:send", width=18),
|
||||
disabled=True,
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_direct_contact() -> dmc.Paper:
|
||||
"""Create the direct contact information section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Direct Contact", order=2, size="h4"),
|
||||
dmc.Stack(
|
||||
[
|
||||
# Email
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:mail", width=20),
|
||||
size="lg",
|
||||
radius="md",
|
||||
variant="light",
|
||||
),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("Email", size="sm", c="dimmed"),
|
||||
dmc.Anchor(
|
||||
CONTACT_INFO["email"],
|
||||
href=f"mailto:{CONTACT_INFO['email']}",
|
||||
size="md",
|
||||
fw=500,
|
||||
),
|
||||
],
|
||||
gap=0,
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
# Phone
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:phone", width=20),
|
||||
size="lg",
|
||||
radius="md",
|
||||
variant="light",
|
||||
),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("Phone", size="sm", c="dimmed"),
|
||||
dmc.Anchor(
|
||||
CONTACT_INFO["phone"],
|
||||
href=f"tel:{CONTACT_INFO['phone'].replace('(', '').replace(')', '').replace(' ', '').replace('-', '')}",
|
||||
size="md",
|
||||
fw=500,
|
||||
),
|
||||
],
|
||||
gap=0,
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
# LinkedIn
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:brand-linkedin", width=20),
|
||||
size="lg",
|
||||
radius="md",
|
||||
variant="light",
|
||||
color="blue",
|
||||
),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("LinkedIn", size="sm", c="dimmed"),
|
||||
dmc.Anchor(
|
||||
"linkedin.com/in/leobmiranda",
|
||||
href=CONTACT_INFO["linkedin"],
|
||||
target="_blank",
|
||||
size="md",
|
||||
fw=500,
|
||||
),
|
||||
],
|
||||
gap=0,
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
# GitHub
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:brand-github", width=20),
|
||||
size="lg",
|
||||
radius="md",
|
||||
variant="light",
|
||||
),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("GitHub", size="sm", c="dimmed"),
|
||||
dmc.Anchor(
|
||||
"github.com/leomiranda",
|
||||
href=CONTACT_INFO["github"],
|
||||
target="_blank",
|
||||
size="md",
|
||||
fw=500,
|
||||
),
|
||||
],
|
||||
gap=0,
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_location_section() -> dmc.Paper:
|
||||
"""Create the location and work eligibility section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Location", order=2, size="h4"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.ThemeIcon(
|
||||
DashIconify(icon="tabler:map-pin", width=20),
|
||||
size="lg",
|
||||
radius="md",
|
||||
variant="light",
|
||||
color="red",
|
||||
),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text(CONTACT_INFO["location"], size="md", fw=500),
|
||||
dmc.Text(
|
||||
"Canadian Citizen | Eligible to work in Canada and US",
|
||||
size="sm",
|
||||
c="dimmed",
|
||||
),
|
||||
],
|
||||
gap=0,
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_intro_section(),
|
||||
dmc.SimpleGrid(
|
||||
[
|
||||
create_contact_form(),
|
||||
dmc.Stack(
|
||||
[
|
||||
create_direct_contact(),
|
||||
create_location_section(),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
],
|
||||
cols={"base": 1, "md": 2},
|
||||
spacing="xl",
|
||||
),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
size="lg",
|
||||
py="xl",
|
||||
)
|
||||
@@ -1,81 +1,118 @@
|
||||
"""Bio landing page."""
|
||||
"""Home landing page - Portfolio entry point."""
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
dash.register_page(__name__, path="/", name="Home")
|
||||
|
||||
# Content from bio_content_v2.md
|
||||
HEADLINE = "Leo | Data Engineer & Analytics Developer"
|
||||
TAGLINE = "I build data infrastructure that actually gets used."
|
||||
# Hero content from blueprint
|
||||
HEADLINE = "I turn messy data into systems that actually work."
|
||||
SUBHEAD = (
|
||||
"Data Engineer & Analytics Specialist. 8 years building pipelines, dashboards, "
|
||||
"and the infrastructure nobody sees but everyone depends on. Based in Toronto."
|
||||
)
|
||||
|
||||
SUMMARY = """Over the past 5 years, I've designed and evolved an enterprise analytics platform
|
||||
from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and
|
||||
dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call
|
||||
abandon rates, and dashboards that executives actually open.
|
||||
|
||||
My approach: dimensional modeling (star schema), layered transformations
|
||||
(staging → intermediate → marts), and automation that eliminates manual work.
|
||||
I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
|
||||
|
||||
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states.
|
||||
Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the
|
||||
Project Management Institute."""
|
||||
|
||||
TECH_STACK = [
|
||||
"Python",
|
||||
"Pandas",
|
||||
"SQLAlchemy",
|
||||
"FastAPI",
|
||||
"SQL",
|
||||
"PostgreSQL",
|
||||
"MSSQL",
|
||||
"Power BI",
|
||||
"Plotly/Dash",
|
||||
"dbt patterns",
|
||||
"Genesys Cloud",
|
||||
# Impact metrics
|
||||
IMPACT_STATS = [
|
||||
{"value": "1B+", "label": "Rows processed daily across enterprise platform"},
|
||||
{"value": "40%", "label": "Efficiency gain through automation"},
|
||||
{"value": "5 Years", "label": "Building DataFlow from zero"},
|
||||
]
|
||||
|
||||
PROJECTS = [
|
||||
{
|
||||
"title": "Toronto Housing Dashboard",
|
||||
"description": "Choropleth visualization of GTA real estate trends with TRREB and CMHC data.",
|
||||
"status": "In Development",
|
||||
"link": "/toronto",
|
||||
},
|
||||
{
|
||||
"title": "Energy Pricing Analysis",
|
||||
"description": "Time series analysis and ML prediction for utility market pricing.",
|
||||
"status": "Planned",
|
||||
"link": "/energy",
|
||||
},
|
||||
]
|
||||
# Featured project
|
||||
FEATURED_PROJECT = {
|
||||
"title": "Toronto Housing Market Dashboard",
|
||||
"description": (
|
||||
"Real-time analytics on Toronto's housing trends. "
|
||||
"dbt-powered ETL, Python scraping, Plotly visualization."
|
||||
),
|
||||
"status": "Live",
|
||||
"dashboard_link": "/toronto",
|
||||
"repo_link": "https://github.com/leomiranda/personal-portfolio",
|
||||
}
|
||||
|
||||
AVAILABILITY = "Open to Senior Data Analyst, Analytics Engineer, and BI Developer opportunities in Toronto or remote."
|
||||
# Brief intro
|
||||
INTRO_TEXT = (
|
||||
"I'm a data engineer who's spent the last 8 years in the trenches—building the "
|
||||
"infrastructure that feeds dashboards, automates the boring stuff, and makes data "
|
||||
"actually usable. Most of my work has been in contact center operations and energy, "
|
||||
"where I've had to be scrappy: one-person data teams, legacy systems, stakeholders "
|
||||
"who need answers yesterday."
|
||||
)
|
||||
|
||||
INTRO_CLOSING = "I like solving real problems, not theoretical ones."
|
||||
|
||||
|
||||
def create_hero_section() -> dmc.Stack:
|
||||
"""Create the hero section with name and tagline."""
|
||||
"""Create the hero section with headline, subhead, and CTAs."""
|
||||
return dmc.Stack(
|
||||
[
|
||||
dmc.Title(HEADLINE, order=1, ta="center"),
|
||||
dmc.Text(TAGLINE, size="xl", c="dimmed", ta="center"),
|
||||
dmc.Title(
|
||||
HEADLINE,
|
||||
order=1,
|
||||
ta="center",
|
||||
size="2.5rem",
|
||||
),
|
||||
dmc.Text(
|
||||
SUBHEAD,
|
||||
size="lg",
|
||||
c="dimmed",
|
||||
ta="center",
|
||||
maw=700,
|
||||
mx="auto",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"View Projects",
|
||||
size="lg",
|
||||
variant="filled",
|
||||
leftSection=DashIconify(icon="tabler:folder", width=20),
|
||||
),
|
||||
href="/projects",
|
||||
),
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"Get In Touch",
|
||||
size="lg",
|
||||
variant="outline",
|
||||
leftSection=DashIconify(icon="tabler:mail", width=20),
|
||||
),
|
||||
href="/contact",
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
justify="center",
|
||||
gap="md",
|
||||
mt="md",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
py="xl",
|
||||
)
|
||||
|
||||
|
||||
def create_summary_section() -> dmc.Paper:
|
||||
"""Create the professional summary section."""
|
||||
paragraphs = SUMMARY.strip().split("\n\n")
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
def create_impact_stat(stat: dict[str, str]) -> dmc.Stack:
|
||||
"""Create a single impact stat."""
|
||||
return dmc.Stack(
|
||||
[
|
||||
dmc.Title("About", order=2, size="h3"),
|
||||
*[dmc.Text(p.replace("\n", " "), size="md") for p in paragraphs],
|
||||
dmc.Text(stat["value"], fw=700, size="2rem", ta="center"),
|
||||
dmc.Text(stat["label"], size="sm", c="dimmed", ta="center"),
|
||||
],
|
||||
gap="md",
|
||||
gap="xs",
|
||||
align="center",
|
||||
)
|
||||
|
||||
|
||||
def create_impact_strip() -> dmc.Paper:
|
||||
"""Create the impact statistics strip."""
|
||||
return dmc.Paper(
|
||||
dmc.SimpleGrid(
|
||||
[create_impact_stat(stat) for stat in IMPACT_STATS],
|
||||
cols={"base": 1, "sm": 3},
|
||||
spacing="xl",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
@@ -83,16 +120,56 @@ def create_summary_section() -> dmc.Paper:
|
||||
)
|
||||
|
||||
|
||||
def create_tech_stack_section() -> dmc.Paper:
|
||||
"""Create the tech stack section with badges."""
|
||||
def create_featured_project() -> dmc.Paper:
|
||||
"""Create the featured project card."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Tech Stack", order=2, size="h3"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(tech, size="lg", variant="light", radius="sm")
|
||||
for tech in TECH_STACK
|
||||
dmc.Title("Featured Project", order=2, size="h3"),
|
||||
dmc.Badge(
|
||||
FEATURED_PROJECT["status"],
|
||||
color="green",
|
||||
variant="light",
|
||||
size="lg",
|
||||
),
|
||||
],
|
||||
justify="space-between",
|
||||
),
|
||||
dmc.Title(
|
||||
FEATURED_PROJECT["title"],
|
||||
order=3,
|
||||
size="h4",
|
||||
),
|
||||
dmc.Text(
|
||||
FEATURED_PROJECT["description"],
|
||||
size="md",
|
||||
c="dimmed",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"View Dashboard",
|
||||
variant="light",
|
||||
leftSection=DashIconify(
|
||||
icon="tabler:chart-bar", width=18
|
||||
),
|
||||
),
|
||||
href=FEATURED_PROJECT["dashboard_link"],
|
||||
),
|
||||
dmc.Anchor(
|
||||
dmc.Button(
|
||||
"View Repository",
|
||||
variant="subtle",
|
||||
leftSection=DashIconify(
|
||||
icon="tabler:brand-github", width=18
|
||||
),
|
||||
),
|
||||
href=FEATURED_PROJECT["repo_link"],
|
||||
target="_blank",
|
||||
),
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
@@ -105,38 +182,13 @@ def create_tech_stack_section() -> dmc.Paper:
|
||||
)
|
||||
|
||||
|
||||
def create_project_card(project: dict[str, str]) -> dmc.Card:
|
||||
"""Create a project card."""
|
||||
status_color = "blue" if project["status"] == "In Development" else "gray"
|
||||
return dmc.Card(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Text(project["title"], fw=500, size="lg"),
|
||||
dmc.Badge(project["status"], color=status_color, variant="light"),
|
||||
],
|
||||
justify="space-between",
|
||||
align="center",
|
||||
),
|
||||
dmc.Text(project["description"], size="sm", c="dimmed", mt="sm"),
|
||||
],
|
||||
withBorder=True,
|
||||
radius="md",
|
||||
p="lg",
|
||||
)
|
||||
|
||||
|
||||
def create_projects_section() -> dmc.Paper:
|
||||
"""Create the portfolio projects section."""
|
||||
def create_intro_section() -> dmc.Paper:
|
||||
"""Create the brief intro section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Portfolio Projects", order=2, size="h3"),
|
||||
dmc.SimpleGrid(
|
||||
[create_project_card(p) for p in PROJECTS],
|
||||
cols={"base": 1, "sm": 2},
|
||||
spacing="lg",
|
||||
),
|
||||
dmc.Text(INTRO_TEXT, size="md"),
|
||||
dmc.Text(INTRO_CLOSING, size="md", fw=500, fs="italic"),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
@@ -146,20 +198,13 @@ def create_projects_section() -> dmc.Paper:
|
||||
)
|
||||
|
||||
|
||||
def create_availability_section() -> dmc.Text:
|
||||
"""Create the availability statement."""
|
||||
return dmc.Text(AVAILABILITY, size="sm", c="dimmed", ta="center", fs="italic")
|
||||
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_hero_section(),
|
||||
create_summary_section(),
|
||||
create_tech_stack_section(),
|
||||
create_projects_section(),
|
||||
dmc.Divider(my="lg"),
|
||||
create_availability_section(),
|
||||
create_impact_strip(),
|
||||
create_featured_project(),
|
||||
create_intro_section(),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="xl",
|
||||
|
||||
304
portfolio_app/pages/projects.py
Normal file
304
portfolio_app/pages/projects.py
Normal file
@@ -0,0 +1,304 @@
|
||||
"""Projects overview page - Hub for all portfolio projects."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash import dcc
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
dash.register_page(__name__, path="/projects", name="Projects")
|
||||
|
||||
# Page intro
|
||||
INTRO_TEXT = (
|
||||
"These are projects I've built—some professional (anonymized where needed), "
|
||||
"some personal. Each one taught me something. Use the sidebar to jump directly "
|
||||
"to live dashboards or explore the overviews below."
|
||||
)
|
||||
|
||||
# Project definitions
|
||||
PROJECTS: list[dict[str, Any]] = [
|
||||
{
|
||||
"title": "Toronto Housing Market Dashboard",
|
||||
"type": "Personal Project",
|
||||
"status": "Live",
|
||||
"status_color": "green",
|
||||
"problem": (
|
||||
"Toronto's housing market moves fast, and most publicly available data "
|
||||
"is either outdated, behind paywalls, or scattered across dozens of sources. "
|
||||
"I wanted a single dashboard that tracked trends in real-time."
|
||||
),
|
||||
"built": [
|
||||
"Data Pipeline: Python scraper pulling listings data, automated on schedule",
|
||||
"Transformation Layer: dbt-based SQL architecture (staging -> intermediate -> marts)",
|
||||
"Visualization: Interactive Plotly-Dash dashboard with filters by neighborhood, price range, property type",
|
||||
"Infrastructure: PostgreSQL backend, version-controlled in Git",
|
||||
],
|
||||
"tech_stack": "Python, dbt, PostgreSQL, Plotly-Dash, GitHub Actions",
|
||||
"learned": (
|
||||
"Real estate data is messy as hell. Listings get pulled, prices change, "
|
||||
"duplicates are everywhere. Building a reliable pipeline meant implementing "
|
||||
'serious data quality checks and learning to embrace "good enough" over "perfect."'
|
||||
),
|
||||
"dashboard_link": "/toronto",
|
||||
"repo_link": "https://github.com/leomiranda/personal-portfolio",
|
||||
},
|
||||
{
|
||||
"title": "US Retail Energy Price Predictor",
|
||||
"type": "Personal Project",
|
||||
"status": "Coming Soon",
|
||||
"status_color": "yellow",
|
||||
"problem": (
|
||||
"Retail energy pricing in deregulated US markets is volatile and opaque. "
|
||||
"Consumers and analysts lack accessible tools to understand pricing trends "
|
||||
"and forecast where rates are headed."
|
||||
),
|
||||
"built": [
|
||||
"Data Pipeline: Automated ingestion of public pricing data across multiple US markets",
|
||||
"ML Model: Price prediction using time series forecasting (ARIMA, Prophet, or similar)",
|
||||
"Transformation Layer: dbt-based SQL architecture for feature engineering",
|
||||
"Visualization: Interactive dashboard showing historical trends + predictions by state/market",
|
||||
],
|
||||
"tech_stack": "Python, Scikit-learn, dbt, PostgreSQL, Plotly-Dash",
|
||||
"learned": (
|
||||
"This showcases the ML side of my skillset—something the Toronto Housing "
|
||||
"dashboard doesn't cover. It also leverages my domain expertise from 5+ years "
|
||||
"in retail energy operations."
|
||||
),
|
||||
"dashboard_link": None,
|
||||
"repo_link": None,
|
||||
},
|
||||
{
|
||||
"title": "DataFlow Platform",
|
||||
"type": "Professional",
|
||||
"status": "Case Study Pending",
|
||||
"status_color": "gray",
|
||||
"problem": (
|
||||
"When I joined Summitt Energy, there was no data infrastructure. "
|
||||
"Reports were manual. Insights were guesswork. I was hired to fix that."
|
||||
),
|
||||
"built": [
|
||||
"v1 (2020): Basic ETL scripts pulling Genesys Cloud data into MSSQL",
|
||||
"v2 (2021): Dimensional model (star schema) with fact/dimension tables",
|
||||
"v3 (2022): Python refactor with SQLAlchemy ORM, batch processing, error handling",
|
||||
"v4 (2023-24): dbt-pattern SQL views (staging -> intermediate -> marts), FastAPI layer, CLI tools",
|
||||
],
|
||||
"tech_stack": "Python, SQLAlchemy, FastAPI, MSSQL, Power BI, Genesys Cloud API",
|
||||
"impact": [
|
||||
"21 tables, 1B+ rows",
|
||||
"5,000+ daily transactions processed",
|
||||
"40% improvement in reporting efficiency",
|
||||
"30% reduction in call abandon rate",
|
||||
"50% faster Average Speed to Answer",
|
||||
],
|
||||
"learned": (
|
||||
"Building data infrastructure as a team of one forces brutal prioritization. "
|
||||
"I learned to ship imperfect solutions fast, iterate based on feedback, "
|
||||
"and never underestimate how long stakeholder buy-in takes."
|
||||
),
|
||||
"note": "This is proprietary work. A sanitized case study with architecture patterns (no proprietary data) will be published in Phase 3.",
|
||||
"dashboard_link": None,
|
||||
"repo_link": None,
|
||||
},
|
||||
{
|
||||
"title": "AI-Assisted Automation (Bandit Labs)",
|
||||
"type": "Consulting/Side Business",
|
||||
"status": "Active",
|
||||
"status_color": "blue",
|
||||
"problem": (
|
||||
"Small businesses don't need enterprise data platforms—they need someone "
|
||||
"to eliminate the 4 hours/week they spend manually entering receipts."
|
||||
),
|
||||
"built": [
|
||||
"Receipt Processing Automation: OCR pipeline (Tesseract, Google Vision) extracting purchase data from photos",
|
||||
"Product Margin Tracker: Plotly-Dash dashboard with real-time profitability insights",
|
||||
"Claude Code Plugins: MCP servers for Gitea, Wiki.js, NetBox integration",
|
||||
],
|
||||
"tech_stack": "Python, Tesseract, Google Vision API, Plotly-Dash, QuickBooks API",
|
||||
"learned": (
|
||||
"Small businesses are underserved by the data/automation industry. "
|
||||
"Everyone wants to sell them enterprise software they don't need. "
|
||||
"I like solving problems at a scale where the impact is immediately visible."
|
||||
),
|
||||
"dashboard_link": None,
|
||||
"repo_link": None,
|
||||
"external_link": "/lab",
|
||||
"external_label": "Learn More About Bandit Labs",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def create_project_card(project: dict[str, Any]) -> dmc.Paper:
|
||||
"""Create a detailed project card."""
|
||||
# Build the "What I Built" list
|
||||
built_items = project.get("built", [])
|
||||
built_section = (
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("What I Built:", fw=600, size="sm"),
|
||||
dmc.List(
|
||||
[dmc.ListItem(dmc.Text(item, size="sm")) for item in built_items],
|
||||
spacing="xs",
|
||||
size="sm",
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
)
|
||||
if built_items
|
||||
else None
|
||||
)
|
||||
|
||||
# Build impact section for DataFlow
|
||||
impact_items = project.get("impact", [])
|
||||
impact_section = (
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("Impact:", fw=600, size="sm"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(item, variant="light", size="sm")
|
||||
for item in impact_items
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
)
|
||||
if impact_items
|
||||
else None
|
||||
)
|
||||
|
||||
# Build action buttons
|
||||
buttons = []
|
||||
if project.get("dashboard_link"):
|
||||
buttons.append(
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
"View Dashboard",
|
||||
variant="light",
|
||||
size="sm",
|
||||
leftSection=DashIconify(icon="tabler:chart-bar", width=16),
|
||||
),
|
||||
href=project["dashboard_link"],
|
||||
)
|
||||
)
|
||||
if project.get("repo_link"):
|
||||
buttons.append(
|
||||
dmc.Anchor(
|
||||
dmc.Button(
|
||||
"View Repository",
|
||||
variant="subtle",
|
||||
size="sm",
|
||||
leftSection=DashIconify(icon="tabler:brand-github", width=16),
|
||||
),
|
||||
href=project["repo_link"],
|
||||
target="_blank",
|
||||
)
|
||||
)
|
||||
if project.get("external_link"):
|
||||
buttons.append(
|
||||
dcc.Link(
|
||||
dmc.Button(
|
||||
project.get("external_label", "Learn More"),
|
||||
variant="outline",
|
||||
size="sm",
|
||||
leftSection=DashIconify(icon="tabler:arrow-right", width=16),
|
||||
),
|
||||
href=project["external_link"],
|
||||
)
|
||||
)
|
||||
|
||||
# Handle "Coming Soon" state
|
||||
if project["status"] == "Coming Soon" and not buttons:
|
||||
buttons.append(
|
||||
dmc.Badge("Coming Soon", variant="light", color="yellow", size="lg")
|
||||
)
|
||||
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
# Header
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text(project["title"], fw=600, size="lg"),
|
||||
dmc.Text(project["type"], size="sm", c="dimmed"),
|
||||
],
|
||||
gap=0,
|
||||
),
|
||||
dmc.Badge(
|
||||
project["status"],
|
||||
color=project["status_color"],
|
||||
variant="light",
|
||||
size="lg",
|
||||
),
|
||||
],
|
||||
justify="space-between",
|
||||
align="flex-start",
|
||||
),
|
||||
# Problem
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("The Problem:", fw=600, size="sm"),
|
||||
dmc.Text(project["problem"], size="sm", c="dimmed"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
# What I Built
|
||||
built_section,
|
||||
# Impact (if exists)
|
||||
impact_section,
|
||||
# Tech Stack
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Text("Tech Stack:", fw=600, size="sm"),
|
||||
dmc.Text(project["tech_stack"], size="sm", c="dimmed"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
# What I Learned
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text("What I Learned:", fw=600, size="sm"),
|
||||
dmc.Text(project["learned"], size="sm", fs="italic"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
# Note (if exists)
|
||||
(
|
||||
dmc.Alert(
|
||||
project["note"],
|
||||
color="gray",
|
||||
variant="light",
|
||||
)
|
||||
if project.get("note")
|
||||
else None
|
||||
),
|
||||
# Action buttons
|
||||
dmc.Group(buttons, gap="sm") if buttons else None,
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Projects", order=1, ta="center"),
|
||||
dmc.Text(
|
||||
INTRO_TEXT, size="md", c="dimmed", ta="center", maw=700, mx="auto"
|
||||
),
|
||||
dmc.Divider(my="lg"),
|
||||
*[create_project_card(project) for project in PROJECTS],
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="xl",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
362
portfolio_app/pages/resume.py
Normal file
362
portfolio_app/pages/resume.py
Normal file
@@ -0,0 +1,362 @@
|
||||
"""Resume page - Inline display with download options."""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import dash
|
||||
import dash_mantine_components as dmc
|
||||
from dash_iconify import DashIconify
|
||||
|
||||
dash.register_page(__name__, path="/resume", name="Resume")
|
||||
|
||||
# =============================================================================
|
||||
# HUMAN TASK: Upload resume content via Gitea
|
||||
# Replace the placeholder content below with actual resume data.
|
||||
# You can upload PDF/DOCX files to portfolio_app/assets/resume/
|
||||
# =============================================================================
|
||||
|
||||
# Resume sections - replace with actual content
|
||||
RESUME_HEADER = {
|
||||
"name": "Leo Miranda",
|
||||
"title": "Data Engineer & Analytics Specialist",
|
||||
"location": "Toronto, ON, Canada",
|
||||
"email": "leobrmi@hotmail.com",
|
||||
"phone": "(416) 859-7936",
|
||||
"linkedin": "linkedin.com/in/leobmiranda",
|
||||
"github": "github.com/leomiranda",
|
||||
}
|
||||
|
||||
RESUME_SUMMARY = (
|
||||
"Data Engineer with 8 years of experience building enterprise analytics platforms, "
|
||||
"ETL pipelines, and business intelligence solutions. Proven track record of delivering "
|
||||
"40% efficiency gains through automation and data infrastructure modernization. "
|
||||
"Expert in Python, SQL, and dimensional modeling with deep domain expertise in "
|
||||
"contact center operations and energy retail."
|
||||
)
|
||||
|
||||
# Experience - placeholder structure
|
||||
EXPERIENCE = [
|
||||
{
|
||||
"title": "Senior Data Analyst / Data Engineer",
|
||||
"company": "Summitt Energy",
|
||||
"location": "Toronto, ON",
|
||||
"period": "2019 - Present",
|
||||
"highlights": [
|
||||
"Built DataFlow platform from scratch: 21 tables, 1B+ rows, processing 5,000+ daily transactions",
|
||||
"Achieved 40% improvement in reporting efficiency through automated ETL pipelines",
|
||||
"Reduced call abandon rate by 30% via KPI framework and real-time dashboards",
|
||||
"Sole data professional supporting 150+ employees across 9 markets (Canada + US)",
|
||||
],
|
||||
},
|
||||
{
|
||||
"title": "IT Project Coordinator",
|
||||
"company": "Petrobras",
|
||||
"location": "Rio de Janeiro, Brazil",
|
||||
"period": "2015 - 2018",
|
||||
"highlights": [
|
||||
"Coordinated IT infrastructure projects for Fortune 500 energy company",
|
||||
"Managed vendor relationships and project timelines",
|
||||
"Developed reporting automation reducing manual effort by 60%",
|
||||
],
|
||||
},
|
||||
{
|
||||
"title": "Project Management Associate",
|
||||
"company": "Project Management Institute",
|
||||
"location": "Remote",
|
||||
"period": "2014 - 2015",
|
||||
"highlights": [
|
||||
"Supported global project management standards development",
|
||||
"CAPM and ITIL certified during this period",
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
# Skills - organized by category
|
||||
SKILLS = {
|
||||
"Languages": ["Python", "SQL", "R", "VBA"],
|
||||
"Data Engineering": [
|
||||
"ETL/ELT Pipelines",
|
||||
"Dimensional Modeling",
|
||||
"dbt",
|
||||
"SQLAlchemy",
|
||||
"FastAPI",
|
||||
],
|
||||
"Databases": ["PostgreSQL", "MSSQL", "Redis"],
|
||||
"Visualization": ["Plotly/Dash", "Power BI", "Tableau"],
|
||||
"Platforms": ["Genesys Cloud", "Five9", "Zoho CRM", "Azure DevOps"],
|
||||
"Currently Learning": ["Azure DP-203", "Airflow", "Snowflake"],
|
||||
}
|
||||
|
||||
# Education
|
||||
EDUCATION = [
|
||||
{
|
||||
"degree": "Bachelor of Business Administration",
|
||||
"school": "Universidade Federal do Rio de Janeiro",
|
||||
"year": "2014",
|
||||
},
|
||||
]
|
||||
|
||||
# Certifications
|
||||
CERTIFICATIONS = [
|
||||
"CAPM (Certified Associate in Project Management)",
|
||||
"ITIL Foundation",
|
||||
"Azure DP-203 (In Progress)",
|
||||
]
|
||||
|
||||
|
||||
def create_header_section() -> dmc.Paper:
|
||||
"""Create the resume header with contact info."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title(RESUME_HEADER["name"], order=1, ta="center"),
|
||||
dmc.Text(RESUME_HEADER["title"], size="xl", c="dimmed", ta="center"),
|
||||
dmc.Divider(my="sm"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(icon="tabler:map-pin", width=16),
|
||||
dmc.Text(RESUME_HEADER["location"], size="sm"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(icon="tabler:mail", width=16),
|
||||
dmc.Text(RESUME_HEADER["email"], size="sm"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(icon="tabler:phone", width=16),
|
||||
dmc.Text(RESUME_HEADER["phone"], size="sm"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
justify="center",
|
||||
gap="lg",
|
||||
wrap="wrap",
|
||||
),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Anchor(
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(icon="tabler:brand-linkedin", width=16),
|
||||
dmc.Text("LinkedIn", size="sm"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
href=f"https://{RESUME_HEADER['linkedin']}",
|
||||
target="_blank",
|
||||
),
|
||||
dmc.Anchor(
|
||||
dmc.Group(
|
||||
[
|
||||
DashIconify(icon="tabler:brand-github", width=16),
|
||||
dmc.Text("GitHub", size="sm"),
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
href=f"https://{RESUME_HEADER['github']}",
|
||||
target="_blank",
|
||||
),
|
||||
],
|
||||
justify="center",
|
||||
gap="lg",
|
||||
),
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
p="xl",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_download_section() -> dmc.Group:
|
||||
"""Create download buttons for resume files."""
|
||||
# Note: Buttons disabled until files are uploaded
|
||||
return dmc.Group(
|
||||
[
|
||||
dmc.Button(
|
||||
"Download PDF",
|
||||
variant="filled",
|
||||
leftSection=DashIconify(icon="tabler:file-type-pdf", width=18),
|
||||
disabled=True, # Enable after uploading resume.pdf to assets
|
||||
),
|
||||
dmc.Button(
|
||||
"Download DOCX",
|
||||
variant="outline",
|
||||
leftSection=DashIconify(icon="tabler:file-type-docx", width=18),
|
||||
disabled=True, # Enable after uploading resume.docx to assets
|
||||
),
|
||||
dmc.Anchor(
|
||||
dmc.Button(
|
||||
"View on LinkedIn",
|
||||
variant="subtle",
|
||||
leftSection=DashIconify(icon="tabler:brand-linkedin", width=18),
|
||||
),
|
||||
href=f"https://{RESUME_HEADER['linkedin']}",
|
||||
target="_blank",
|
||||
),
|
||||
],
|
||||
justify="center",
|
||||
gap="md",
|
||||
)
|
||||
|
||||
|
||||
def create_summary_section() -> dmc.Paper:
|
||||
"""Create the professional summary section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Professional Summary", order=2, size="h4"),
|
||||
dmc.Text(RESUME_SUMMARY, size="md"),
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_experience_item(exp: dict[str, Any]) -> dmc.Stack:
|
||||
"""Create a single experience entry."""
|
||||
return dmc.Stack(
|
||||
[
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Text(exp["title"], fw=600),
|
||||
dmc.Text(exp["period"], size="sm", c="dimmed"),
|
||||
],
|
||||
justify="space-between",
|
||||
),
|
||||
dmc.Text(f"{exp['company']} | {exp['location']}", size="sm", c="dimmed"),
|
||||
dmc.List(
|
||||
[dmc.ListItem(dmc.Text(h, size="sm")) for h in exp["highlights"]],
|
||||
spacing="xs",
|
||||
size="sm",
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
)
|
||||
|
||||
|
||||
def create_experience_section() -> dmc.Paper:
|
||||
"""Create the experience section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Experience", order=2, size="h4"),
|
||||
*[create_experience_item(exp) for exp in EXPERIENCE],
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_skills_section() -> dmc.Paper:
|
||||
"""Create the skills section with badges."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Skills", order=2, size="h4"),
|
||||
dmc.SimpleGrid(
|
||||
[
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text(category, fw=600, size="sm"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(skill, variant="light", size="sm")
|
||||
for skill in skills
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
gap="xs",
|
||||
)
|
||||
for category, skills in SKILLS.items()
|
||||
],
|
||||
cols={"base": 1, "sm": 2},
|
||||
spacing="md",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
def create_education_section() -> dmc.Paper:
|
||||
"""Create education and certifications section."""
|
||||
return dmc.Paper(
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Title("Education & Certifications", order=2, size="h4"),
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Stack(
|
||||
[
|
||||
dmc.Text(edu["degree"], fw=600),
|
||||
dmc.Text(
|
||||
f"{edu['school']} | {edu['year']}",
|
||||
size="sm",
|
||||
c="dimmed",
|
||||
),
|
||||
],
|
||||
gap=0,
|
||||
)
|
||||
for edu in EDUCATION
|
||||
],
|
||||
gap="sm",
|
||||
),
|
||||
dmc.Divider(my="sm"),
|
||||
dmc.Group(
|
||||
[
|
||||
dmc.Badge(cert, variant="outline", size="md")
|
||||
for cert in CERTIFICATIONS
|
||||
],
|
||||
gap="xs",
|
||||
),
|
||||
],
|
||||
gap="md",
|
||||
),
|
||||
p="lg",
|
||||
radius="md",
|
||||
withBorder=True,
|
||||
)
|
||||
|
||||
|
||||
layout = dmc.Container(
|
||||
dmc.Stack(
|
||||
[
|
||||
create_header_section(),
|
||||
create_download_section(),
|
||||
dmc.Alert(
|
||||
"Resume files (PDF/DOCX) will be available for download once uploaded. "
|
||||
"The inline content below is a preview.",
|
||||
title="Downloads Coming Soon",
|
||||
color="blue",
|
||||
variant="light",
|
||||
),
|
||||
create_summary_section(),
|
||||
create_experience_section(),
|
||||
create_skills_section(),
|
||||
create_education_section(),
|
||||
dmc.Space(h=40),
|
||||
],
|
||||
gap="lg",
|
||||
),
|
||||
size="md",
|
||||
py="xl",
|
||||
)
|
||||
@@ -18,8 +18,7 @@ _CMHC_ZONES_PATH = Path("data/toronto/raw/geo/cmhc_zones.geojson")
|
||||
_cmhc_parser = CMHCZoneParser(_CMHC_ZONES_PATH) if _CMHC_ZONES_PATH.exists() else None
|
||||
CMHC_ZONES_GEOJSON = _cmhc_parser.get_geojson_for_choropleth() if _cmhc_parser else None
|
||||
|
||||
# Load Toronto neighbourhoods GeoJSON for purchase choropleth maps
|
||||
# Note: This is a temporary proxy until TRREB district boundaries are digitized
|
||||
# Load Toronto neighbourhoods GeoJSON for choropleth maps
|
||||
_NEIGHBOURHOODS_PATH = Path("data/toronto/raw/geo/toronto_neighbourhoods.geojson")
|
||||
_neighbourhood_parser = (
|
||||
NeighbourhoodParser(_NEIGHBOURHOODS_PATH) if _NEIGHBOURHOODS_PATH.exists() else None
|
||||
@@ -30,9 +29,7 @@ NEIGHBOURHOODS_GEOJSON = (
|
||||
else None
|
||||
)
|
||||
|
||||
# Sample purchase data for all 158 City of Toronto neighbourhoods
|
||||
# Note: This is SAMPLE DATA until TRREB district boundaries are digitized (Issue #25)
|
||||
# Once TRREB boundaries are available, this will be replaced with real TRREB data by district
|
||||
# Sample data for all 158 City of Toronto neighbourhoods
|
||||
SAMPLE_PURCHASE_DATA = [
|
||||
{
|
||||
"neighbourhood_id": 1,
|
||||
@@ -1486,11 +1483,7 @@ SAMPLE_TIME_SERIES_DATA = [
|
||||
Input("toronto-year-selector", "value"),
|
||||
)
|
||||
def update_purchase_choropleth(metric: str, year: str) -> go.Figure:
|
||||
"""Update the purchase market choropleth map.
|
||||
|
||||
Note: Currently using City of Toronto neighbourhoods as a proxy.
|
||||
Will switch to TRREB districts when boundaries are digitized.
|
||||
"""
|
||||
"""Update the neighbourhood choropleth map."""
|
||||
return create_choropleth_figure(
|
||||
geojson=NEIGHBOURHOODS_GEOJSON,
|
||||
data=SAMPLE_PURCHASE_DATA,
|
||||
|
||||
@@ -257,9 +257,8 @@ def create_data_notice() -> dmc.Alert:
|
||||
return dmc.Alert(
|
||||
children=[
|
||||
dmc.Text(
|
||||
"This dashboard uses TRREB and CMHC data. "
|
||||
"Geographic boundaries require QGIS digitization to enable choropleth maps. "
|
||||
"Sample data is shown below.",
|
||||
"This dashboard displays Toronto neighbourhood and CMHC rental data. "
|
||||
"Sample data is shown for demonstration purposes.",
|
||||
size="sm",
|
||||
),
|
||||
],
|
||||
|
||||
@@ -46,42 +46,8 @@ def layout() -> dmc.Container:
|
||||
mb="lg",
|
||||
children=[
|
||||
dmc.Title("Data Sources", order=2, mb="md"),
|
||||
# TRREB
|
||||
dmc.Title("Purchase Data: TRREB", order=3, size="h4", mb="sm"),
|
||||
dmc.Text(
|
||||
[
|
||||
"The Toronto Regional Real Estate Board (TRREB) publishes monthly ",
|
||||
html.Strong("Market Watch"),
|
||||
" reports containing aggregate statistics for residential real estate "
|
||||
"transactions across the Greater Toronto Area.",
|
||||
],
|
||||
mb="sm",
|
||||
),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem("Source: TRREB Market Watch Reports (PDF)"),
|
||||
dmc.ListItem("Geographic granularity: ~35 TRREB Districts"),
|
||||
dmc.ListItem("Temporal granularity: Monthly"),
|
||||
dmc.ListItem("Coverage: 2021-present"),
|
||||
dmc.ListItem(
|
||||
[
|
||||
"Metrics: Sales count, average/median price, new listings, ",
|
||||
"active listings, days on market, sale-to-list ratio",
|
||||
]
|
||||
),
|
||||
],
|
||||
mb="md",
|
||||
),
|
||||
dmc.Anchor(
|
||||
"TRREB Market Watch Archive",
|
||||
href="https://trreb.ca/market-data/market-watch/market-watch-archive/",
|
||||
target="_blank",
|
||||
mb="lg",
|
||||
),
|
||||
# CMHC
|
||||
dmc.Title(
|
||||
"Rental Data: CMHC", order=3, size="h4", mb="sm", mt="md"
|
||||
),
|
||||
dmc.Title("Rental Data: CMHC", order=3, size="h4", mb="sm"),
|
||||
dmc.Text(
|
||||
[
|
||||
"Canada Mortgage and Housing Corporation (CMHC) conducts the annual ",
|
||||
@@ -124,28 +90,17 @@ def layout() -> dmc.Container:
|
||||
mb="lg",
|
||||
children=[
|
||||
dmc.Title("Geographic Considerations", order=2, mb="md"),
|
||||
dmc.Alert(
|
||||
title="Important: Non-Aligned Geographies",
|
||||
color="yellow",
|
||||
mb="md",
|
||||
children=[
|
||||
"TRREB Districts and CMHC Zones do ",
|
||||
html.Strong("not"),
|
||||
" align geographically. They are displayed as separate layers and "
|
||||
"should not be directly compared at the sub-regional level.",
|
||||
],
|
||||
),
|
||||
dmc.Text(
|
||||
"The dashboard presents three geographic layers:",
|
||||
"The dashboard presents two geographic layers:",
|
||||
mb="sm",
|
||||
),
|
||||
dmc.List(
|
||||
[
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("TRREB Districts (~35): "),
|
||||
"Used for purchase/sales data visualization. "
|
||||
"Districts are defined by TRREB and labeled with codes like W01, C01, E01.",
|
||||
html.Strong("City Neighbourhoods (158): "),
|
||||
"Official City of Toronto neighbourhood boundaries, "
|
||||
"used for neighbourhood-level analysis.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
@@ -155,13 +110,6 @@ def layout() -> dmc.Container:
|
||||
"Zones are aligned with Census Tract boundaries.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("City Neighbourhoods (158): "),
|
||||
"Reference overlay only. "
|
||||
"These are official City of Toronto neighbourhood boundaries.",
|
||||
]
|
||||
),
|
||||
],
|
||||
),
|
||||
],
|
||||
@@ -212,22 +160,15 @@ def layout() -> dmc.Container:
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Reporting Lag: "),
|
||||
"TRREB data reflects closed transactions, which may lag market "
|
||||
"conditions by 1-3 months. CMHC data is annual.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Geographic Boundaries: "),
|
||||
"TRREB district boundaries were manually digitized from reference maps "
|
||||
"and may contain minor inaccuracies.",
|
||||
"CMHC rental data is annual (October survey). "
|
||||
"Other data sources may have different update frequencies.",
|
||||
]
|
||||
),
|
||||
dmc.ListItem(
|
||||
[
|
||||
html.Strong("Data Suppression: "),
|
||||
"Some cells may be suppressed for confidentiality when transaction "
|
||||
"counts are below thresholds.",
|
||||
"Some cells may be suppressed for confidentiality when counts "
|
||||
"are below thresholds.",
|
||||
]
|
||||
),
|
||||
],
|
||||
|
||||
@@ -8,98 +8,6 @@ from datetime import date
|
||||
from typing import Any
|
||||
|
||||
|
||||
def get_demo_districts() -> list[dict[str, Any]]:
|
||||
"""Return sample TRREB district data."""
|
||||
return [
|
||||
{"district_code": "W01", "district_name": "Long Branch", "area_type": "West"},
|
||||
{"district_code": "W02", "district_name": "Mimico", "area_type": "West"},
|
||||
{
|
||||
"district_code": "W03",
|
||||
"district_name": "Kingsway South",
|
||||
"area_type": "West",
|
||||
},
|
||||
{"district_code": "W04", "district_name": "Edenbridge", "area_type": "West"},
|
||||
{"district_code": "W05", "district_name": "Islington", "area_type": "West"},
|
||||
{"district_code": "W06", "district_name": "Rexdale", "area_type": "West"},
|
||||
{"district_code": "W07", "district_name": "Willowdale", "area_type": "West"},
|
||||
{"district_code": "W08", "district_name": "York", "area_type": "West"},
|
||||
{
|
||||
"district_code": "C01",
|
||||
"district_name": "Downtown Core",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{"district_code": "C02", "district_name": "Annex", "area_type": "Central"},
|
||||
{
|
||||
"district_code": "C03",
|
||||
"district_name": "Forest Hill",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{
|
||||
"district_code": "C04",
|
||||
"district_name": "Lawrence Park",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{
|
||||
"district_code": "C06",
|
||||
"district_name": "Willowdale East",
|
||||
"area_type": "Central",
|
||||
},
|
||||
{"district_code": "C07", "district_name": "Thornhill", "area_type": "Central"},
|
||||
{"district_code": "C08", "district_name": "Waterfront", "area_type": "Central"},
|
||||
{"district_code": "E01", "district_name": "Leslieville", "area_type": "East"},
|
||||
{"district_code": "E02", "district_name": "The Beaches", "area_type": "East"},
|
||||
{"district_code": "E03", "district_name": "Danforth", "area_type": "East"},
|
||||
{"district_code": "E04", "district_name": "Birch Cliff", "area_type": "East"},
|
||||
{"district_code": "E05", "district_name": "Scarborough", "area_type": "East"},
|
||||
]
|
||||
|
||||
|
||||
def get_demo_purchase_data() -> list[dict[str, Any]]:
|
||||
"""Return sample purchase data for time series visualization."""
|
||||
import random
|
||||
|
||||
random.seed(42)
|
||||
data = []
|
||||
|
||||
base_prices = {
|
||||
"W01": 850000,
|
||||
"C01": 1200000,
|
||||
"E01": 950000,
|
||||
}
|
||||
|
||||
for year in [2024, 2025]:
|
||||
for month in range(1, 13):
|
||||
if year == 2025 and month > 12:
|
||||
break
|
||||
|
||||
for district, base_price in base_prices.items():
|
||||
# Add some randomness and trend
|
||||
trend = (year - 2024) * 12 + month
|
||||
price_variation = random.uniform(-0.05, 0.05)
|
||||
trend_factor = 1 + (trend * 0.002) # Slight upward trend
|
||||
|
||||
avg_price = int(base_price * trend_factor * (1 + price_variation))
|
||||
sales = random.randint(50, 200)
|
||||
|
||||
data.append(
|
||||
{
|
||||
"district_code": district,
|
||||
"full_date": date(year, month, 1),
|
||||
"year": year,
|
||||
"month": month,
|
||||
"avg_price": avg_price,
|
||||
"median_price": int(avg_price * 0.95),
|
||||
"sales_count": sales,
|
||||
"new_listings": int(sales * random.uniform(1.2, 1.8)),
|
||||
"active_listings": int(sales * random.uniform(2.0, 3.5)),
|
||||
"days_on_market": random.randint(15, 45),
|
||||
"sale_to_list_ratio": round(random.uniform(0.95, 1.05), 2),
|
||||
}
|
||||
)
|
||||
|
||||
return data
|
||||
|
||||
|
||||
def get_demo_rental_data() -> list[dict[str, Any]]:
|
||||
"""Return sample rental data for visualization."""
|
||||
data = []
|
||||
@@ -219,23 +127,6 @@ def get_demo_policy_events() -> list[dict[str, Any]]:
|
||||
def get_demo_summary_metrics() -> dict[str, dict[str, Any]]:
|
||||
"""Return summary metrics for KPI cards."""
|
||||
return {
|
||||
"avg_price": {
|
||||
"value": 1067968,
|
||||
"title": "Avg. Price (2025)",
|
||||
"delta": -4.7,
|
||||
"delta_suffix": "%",
|
||||
"prefix": "$",
|
||||
"format_spec": ",.0f",
|
||||
"positive_is_good": True,
|
||||
},
|
||||
"total_sales": {
|
||||
"value": 67610,
|
||||
"title": "Total Sales (2024)",
|
||||
"delta": 2.6,
|
||||
"delta_suffix": "%",
|
||||
"format_spec": ",.0f",
|
||||
"positive_is_good": True,
|
||||
},
|
||||
"avg_rent": {
|
||||
"value": 2450,
|
||||
"title": "Avg. Rent (2025)",
|
||||
|
||||
@@ -8,9 +8,7 @@ from .dimensions import (
|
||||
load_neighbourhoods,
|
||||
load_policy_events,
|
||||
load_time_dimension,
|
||||
load_trreb_districts,
|
||||
)
|
||||
from .trreb import load_trreb_purchases, load_trreb_record
|
||||
|
||||
__all__ = [
|
||||
# Base utilities
|
||||
@@ -20,13 +18,10 @@ __all__ = [
|
||||
# Dimension loaders
|
||||
"generate_date_key",
|
||||
"load_time_dimension",
|
||||
"load_trreb_districts",
|
||||
"load_cmhc_zones",
|
||||
"load_neighbourhoods",
|
||||
"load_policy_events",
|
||||
# Fact loaders
|
||||
"load_trreb_purchases",
|
||||
"load_trreb_record",
|
||||
"load_cmhc_rentals",
|
||||
"load_cmhc_record",
|
||||
]
|
||||
|
||||
@@ -9,13 +9,11 @@ from portfolio_app.toronto.models import (
|
||||
DimNeighbourhood,
|
||||
DimPolicyEvent,
|
||||
DimTime,
|
||||
DimTRREBDistrict,
|
||||
)
|
||||
from portfolio_app.toronto.schemas import (
|
||||
CMHCZone,
|
||||
Neighbourhood,
|
||||
PolicyEvent,
|
||||
TRREBDistrict,
|
||||
)
|
||||
|
||||
from .base import get_session, upsert_by_key
|
||||
@@ -97,42 +95,6 @@ def load_time_dimension(
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_trreb_districts(
|
||||
districts: list[TRREBDistrict],
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load TRREB district dimension.
|
||||
|
||||
Args:
|
||||
districts: List of validated district schemas.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
records = []
|
||||
for d in districts:
|
||||
dim = DimTRREBDistrict(
|
||||
district_code=d.district_code,
|
||||
district_name=d.district_name,
|
||||
area_type=d.area_type.value,
|
||||
geometry=d.geometry_wkt,
|
||||
)
|
||||
records.append(dim)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, DimTRREBDistrict, records, ["district_code"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_cmhc_zones(
|
||||
zones: list[CMHCZone],
|
||||
session: Session | None = None,
|
||||
|
||||
@@ -1,129 +0,0 @@
|
||||
"""Loader for TRREB purchase data into fact_purchases."""
|
||||
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from portfolio_app.toronto.models import DimTime, DimTRREBDistrict, FactPurchases
|
||||
from portfolio_app.toronto.schemas import TRREBMonthlyRecord, TRREBMonthlyReport
|
||||
|
||||
from .base import get_session, upsert_by_key
|
||||
from .dimensions import generate_date_key
|
||||
|
||||
|
||||
def load_trreb_purchases(
|
||||
report: TRREBMonthlyReport,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load TRREB monthly report data into fact_purchases.
|
||||
|
||||
Args:
|
||||
report: Validated TRREB monthly report containing records.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded.
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
# Get district key mapping
|
||||
districts = sess.query(DimTRREBDistrict).all()
|
||||
district_map = {d.district_code: d.district_key for d in districts}
|
||||
|
||||
# Build date key from report date
|
||||
date_key = generate_date_key(report.report_date)
|
||||
|
||||
# Verify time dimension exists
|
||||
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
|
||||
if not time_dim:
|
||||
raise ValueError(
|
||||
f"Time dimension not found for date_key {date_key}. "
|
||||
"Load time dimension first."
|
||||
)
|
||||
|
||||
records = []
|
||||
for record in report.records:
|
||||
district_key = district_map.get(record.area_code)
|
||||
if not district_key:
|
||||
# Skip records for unknown districts (e.g., aggregate rows)
|
||||
continue
|
||||
|
||||
fact = FactPurchases(
|
||||
date_key=date_key,
|
||||
district_key=district_key,
|
||||
sales_count=record.sales,
|
||||
dollar_volume=record.dollar_volume,
|
||||
avg_price=record.avg_price,
|
||||
median_price=record.median_price,
|
||||
new_listings=record.new_listings,
|
||||
active_listings=record.active_listings,
|
||||
avg_dom=record.avg_dom,
|
||||
avg_sp_lp=record.avg_sp_lp,
|
||||
)
|
||||
records.append(fact)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, FactPurchases, records, ["date_key", "district_key"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
|
||||
|
||||
def load_trreb_record(
|
||||
record: TRREBMonthlyRecord,
|
||||
session: Session | None = None,
|
||||
) -> int:
|
||||
"""Load a single TRREB record into fact_purchases.
|
||||
|
||||
Args:
|
||||
record: Single validated TRREB monthly record.
|
||||
session: Optional existing session.
|
||||
|
||||
Returns:
|
||||
Number of records loaded (0 or 1).
|
||||
"""
|
||||
|
||||
def _load(sess: Session) -> int:
|
||||
# Get district key
|
||||
district = (
|
||||
sess.query(DimTRREBDistrict)
|
||||
.filter_by(district_code=record.area_code)
|
||||
.first()
|
||||
)
|
||||
if not district:
|
||||
return 0
|
||||
|
||||
date_key = generate_date_key(record.report_date)
|
||||
|
||||
# Verify time dimension exists
|
||||
time_dim = sess.query(DimTime).filter_by(date_key=date_key).first()
|
||||
if not time_dim:
|
||||
raise ValueError(
|
||||
f"Time dimension not found for date_key {date_key}. "
|
||||
"Load time dimension first."
|
||||
)
|
||||
|
||||
fact = FactPurchases(
|
||||
date_key=date_key,
|
||||
district_key=district.district_key,
|
||||
sales_count=record.sales,
|
||||
dollar_volume=record.dollar_volume,
|
||||
avg_price=record.avg_price,
|
||||
median_price=record.median_price,
|
||||
new_listings=record.new_listings,
|
||||
active_listings=record.active_listings,
|
||||
avg_dom=record.avg_dom,
|
||||
avg_sp_lp=record.avg_sp_lp,
|
||||
)
|
||||
|
||||
inserted, updated = upsert_by_key(
|
||||
sess, FactPurchases, [fact], ["date_key", "district_key"]
|
||||
)
|
||||
return inserted + updated
|
||||
|
||||
if session:
|
||||
return _load(session)
|
||||
with get_session() as sess:
|
||||
return _load(sess)
|
||||
@@ -6,9 +6,8 @@ from .dimensions import (
|
||||
DimNeighbourhood,
|
||||
DimPolicyEvent,
|
||||
DimTime,
|
||||
DimTRREBDistrict,
|
||||
)
|
||||
from .facts import FactPurchases, FactRentals
|
||||
from .facts import FactRentals
|
||||
|
||||
__all__ = [
|
||||
# Base
|
||||
@@ -18,11 +17,9 @@ __all__ = [
|
||||
"create_tables",
|
||||
# Dimensions
|
||||
"DimTime",
|
||||
"DimTRREBDistrict",
|
||||
"DimCMHCZone",
|
||||
"DimNeighbourhood",
|
||||
"DimPolicyEvent",
|
||||
# Facts
|
||||
"FactPurchases",
|
||||
"FactRentals",
|
||||
]
|
||||
|
||||
@@ -23,20 +23,6 @@ class DimTime(Base):
|
||||
is_month_start: Mapped[bool] = mapped_column(Boolean, default=True)
|
||||
|
||||
|
||||
class DimTRREBDistrict(Base):
|
||||
"""TRREB district dimension table with PostGIS geometry."""
|
||||
|
||||
__tablename__ = "dim_trreb_district"
|
||||
|
||||
district_key: Mapped[int] = mapped_column(
|
||||
Integer, primary_key=True, autoincrement=True
|
||||
)
|
||||
district_code: Mapped[str] = mapped_column(String(3), nullable=False, unique=True)
|
||||
district_name: Mapped[str] = mapped_column(String(100), nullable=False)
|
||||
area_type: Mapped[str] = mapped_column(String(10), nullable=False)
|
||||
geometry = mapped_column(Geometry("POLYGON", srid=4326), nullable=True)
|
||||
|
||||
|
||||
class DimCMHCZone(Base):
|
||||
"""CMHC zone dimension table with PostGIS geometry."""
|
||||
|
||||
|
||||
@@ -6,37 +6,6 @@ from sqlalchemy.orm import Mapped, mapped_column, relationship
|
||||
from .base import Base
|
||||
|
||||
|
||||
class FactPurchases(Base):
|
||||
"""Fact table for TRREB purchase/sales data.
|
||||
|
||||
Grain: One row per district per month.
|
||||
"""
|
||||
|
||||
__tablename__ = "fact_purchases"
|
||||
|
||||
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
|
||||
date_key: Mapped[int] = mapped_column(
|
||||
Integer, ForeignKey("dim_time.date_key"), nullable=False
|
||||
)
|
||||
district_key: Mapped[int] = mapped_column(
|
||||
Integer, ForeignKey("dim_trreb_district.district_key"), nullable=False
|
||||
)
|
||||
sales_count: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
dollar_volume: Mapped[float] = mapped_column(Numeric(15, 2), nullable=False)
|
||||
avg_price: Mapped[float] = mapped_column(Numeric(12, 2), nullable=False)
|
||||
median_price: Mapped[float] = mapped_column(Numeric(12, 2), nullable=False)
|
||||
new_listings: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
active_listings: Mapped[int] = mapped_column(Integer, nullable=False)
|
||||
avg_dom: Mapped[int] = mapped_column(Integer, nullable=False) # Days on market
|
||||
avg_sp_lp: Mapped[float] = mapped_column(
|
||||
Numeric(5, 2), nullable=False
|
||||
) # Sale/List ratio
|
||||
|
||||
# Relationships
|
||||
time = relationship("DimTime", backref="purchases")
|
||||
district = relationship("DimTRREBDistrict", backref="purchases")
|
||||
|
||||
|
||||
class FactRentals(Base):
|
||||
"""Fact table for CMHC rental market data.
|
||||
|
||||
|
||||
@@ -4,17 +4,13 @@ from .cmhc import CMHCParser
|
||||
from .geo import (
|
||||
CMHCZoneParser,
|
||||
NeighbourhoodParser,
|
||||
TRREBDistrictParser,
|
||||
load_geojson,
|
||||
)
|
||||
from .trreb import TRREBParser
|
||||
|
||||
__all__ = [
|
||||
"TRREBParser",
|
||||
"CMHCParser",
|
||||
# GeoJSON parsers
|
||||
"CMHCZoneParser",
|
||||
"TRREBDistrictParser",
|
||||
"NeighbourhoodParser",
|
||||
"load_geojson",
|
||||
]
|
||||
|
||||
@@ -13,8 +13,7 @@ from pyproj import Transformer
|
||||
from shapely.geometry import mapping, shape
|
||||
from shapely.ops import transform
|
||||
|
||||
from portfolio_app.toronto.schemas import CMHCZone, Neighbourhood, TRREBDistrict
|
||||
from portfolio_app.toronto.schemas.dimensions import AreaType
|
||||
from portfolio_app.toronto.schemas import CMHCZone, Neighbourhood
|
||||
|
||||
# Transformer for reprojecting from Web Mercator to WGS84
|
||||
_TRANSFORMER_3857_TO_4326 = Transformer.from_crs(
|
||||
@@ -221,135 +220,6 @@ class CMHCZoneParser:
|
||||
return {"type": "FeatureCollection", "features": features}
|
||||
|
||||
|
||||
class TRREBDistrictParser:
|
||||
"""Parser for TRREB district boundary GeoJSON files.
|
||||
|
||||
TRREB district boundaries are manually digitized from the TRREB PDF map
|
||||
using QGIS.
|
||||
|
||||
Expected GeoJSON properties:
|
||||
- district_code: District code (W01, C01, E01, etc.)
|
||||
- district_name: District name
|
||||
- area_type: West, Central, East, or North
|
||||
"""
|
||||
|
||||
CODE_PROPERTIES = [
|
||||
"district_code",
|
||||
"District_Code",
|
||||
"DISTRICT_CODE",
|
||||
"districtcode",
|
||||
"code",
|
||||
]
|
||||
NAME_PROPERTIES = [
|
||||
"district_name",
|
||||
"District_Name",
|
||||
"DISTRICT_NAME",
|
||||
"districtname",
|
||||
"name",
|
||||
"NAME",
|
||||
]
|
||||
AREA_PROPERTIES = [
|
||||
"area_type",
|
||||
"Area_Type",
|
||||
"AREA_TYPE",
|
||||
"areatype",
|
||||
"area",
|
||||
"type",
|
||||
]
|
||||
|
||||
def __init__(self, geojson_path: Path) -> None:
|
||||
"""Initialize parser with path to GeoJSON file."""
|
||||
self.geojson_path = geojson_path
|
||||
self._geojson: dict[str, Any] | None = None
|
||||
|
||||
@property
|
||||
def geojson(self) -> dict[str, Any]:
|
||||
"""Lazy-load and return raw GeoJSON data."""
|
||||
if self._geojson is None:
|
||||
self._geojson = load_geojson(self.geojson_path)
|
||||
return self._geojson
|
||||
|
||||
def _find_property(
|
||||
self, properties: dict[str, Any], candidates: list[str]
|
||||
) -> str | None:
|
||||
"""Find a property value by checking multiple candidate names."""
|
||||
for name in candidates:
|
||||
if name in properties and properties[name] is not None:
|
||||
return str(properties[name])
|
||||
return None
|
||||
|
||||
def _infer_area_type(self, district_code: str) -> AreaType:
|
||||
"""Infer area type from district code prefix."""
|
||||
prefix = district_code[0].upper()
|
||||
mapping = {"W": AreaType.WEST, "C": AreaType.CENTRAL, "E": AreaType.EAST}
|
||||
return mapping.get(prefix, AreaType.NORTH)
|
||||
|
||||
def parse(self) -> list[TRREBDistrict]:
|
||||
"""Parse GeoJSON and return list of TRREBDistrict schemas."""
|
||||
districts = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
geom = feature.get("geometry")
|
||||
|
||||
district_code = self._find_property(props, self.CODE_PROPERTIES)
|
||||
district_name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
area_type_str = self._find_property(props, self.AREA_PROPERTIES)
|
||||
|
||||
if not district_code:
|
||||
raise ValueError(
|
||||
f"District code not found in properties: {list(props.keys())}"
|
||||
)
|
||||
if not district_name:
|
||||
district_name = district_code
|
||||
|
||||
# Infer or parse area type
|
||||
if area_type_str:
|
||||
try:
|
||||
area_type = AreaType(area_type_str)
|
||||
except ValueError:
|
||||
area_type = self._infer_area_type(district_code)
|
||||
else:
|
||||
area_type = self._infer_area_type(district_code)
|
||||
|
||||
geometry_wkt = geometry_to_wkt(geom) if geom else None
|
||||
|
||||
districts.append(
|
||||
TRREBDistrict(
|
||||
district_code=district_code,
|
||||
district_name=district_name,
|
||||
area_type=area_type,
|
||||
geometry_wkt=geometry_wkt,
|
||||
)
|
||||
)
|
||||
|
||||
return districts
|
||||
|
||||
def get_geojson_for_choropleth(
|
||||
self, key_property: str = "district_code"
|
||||
) -> dict[str, Any]:
|
||||
"""Get GeoJSON formatted for Plotly choropleth maps."""
|
||||
features = []
|
||||
for feature in self.geojson.get("features", []):
|
||||
props = feature.get("properties", {})
|
||||
new_props = dict(props)
|
||||
|
||||
district_code = self._find_property(props, self.CODE_PROPERTIES)
|
||||
district_name = self._find_property(props, self.NAME_PROPERTIES)
|
||||
|
||||
new_props["district_code"] = district_code
|
||||
new_props["district_name"] = district_name or district_code
|
||||
|
||||
features.append(
|
||||
{
|
||||
"type": "Feature",
|
||||
"properties": new_props,
|
||||
"geometry": feature.get("geometry"),
|
||||
}
|
||||
)
|
||||
|
||||
return {"type": "FeatureCollection", "features": features}
|
||||
|
||||
|
||||
class NeighbourhoodParser:
|
||||
"""Parser for City of Toronto neighbourhood boundary GeoJSON files.
|
||||
|
||||
|
||||
@@ -1,82 +0,0 @@
|
||||
"""TRREB PDF parser for monthly market watch reports.
|
||||
|
||||
This module provides the structure for parsing TRREB (Toronto Regional Real Estate Board)
|
||||
monthly Market Watch PDF reports into structured data.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from portfolio_app.toronto.schemas import TRREBMonthlyRecord, TRREBMonthlyReport
|
||||
|
||||
|
||||
class TRREBParser:
|
||||
"""Parser for TRREB Market Watch PDF reports.
|
||||
|
||||
TRREB publishes monthly Market Watch reports as PDFs containing:
|
||||
- Summary statistics by area (416, 905, Total)
|
||||
- District-level breakdowns
|
||||
- Year-over-year comparisons
|
||||
|
||||
The parser extracts tabular data from these PDFs and validates
|
||||
against the TRREBMonthlyRecord schema.
|
||||
"""
|
||||
|
||||
def __init__(self, pdf_path: Path) -> None:
|
||||
"""Initialize parser with path to PDF file.
|
||||
|
||||
Args:
|
||||
pdf_path: Path to the TRREB Market Watch PDF file.
|
||||
"""
|
||||
self.pdf_path = pdf_path
|
||||
self._validate_path()
|
||||
|
||||
def _validate_path(self) -> None:
|
||||
"""Validate that the PDF path exists and is readable."""
|
||||
if not self.pdf_path.exists():
|
||||
raise FileNotFoundError(f"PDF not found: {self.pdf_path}")
|
||||
if not self.pdf_path.suffix.lower() == ".pdf":
|
||||
raise ValueError(f"Expected PDF file, got: {self.pdf_path.suffix}")
|
||||
|
||||
def parse(self) -> TRREBMonthlyReport:
|
||||
"""Parse the PDF and return structured data.
|
||||
|
||||
Returns:
|
||||
TRREBMonthlyReport containing all extracted records.
|
||||
|
||||
Raises:
|
||||
NotImplementedError: PDF parsing not yet implemented.
|
||||
"""
|
||||
raise NotImplementedError(
|
||||
"PDF parsing requires pdfplumber/tabula-py. "
|
||||
"Implementation pending Sprint 4 data ingestion."
|
||||
)
|
||||
|
||||
def _extract_tables(self) -> list[dict[str, Any]]:
|
||||
"""Extract raw tables from PDF pages.
|
||||
|
||||
Returns:
|
||||
List of dictionaries representing table data.
|
||||
"""
|
||||
raise NotImplementedError("Table extraction not yet implemented.")
|
||||
|
||||
def _parse_district_table(
|
||||
self, table_data: list[dict[str, Any]]
|
||||
) -> list[TRREBMonthlyRecord]:
|
||||
"""Parse district-level statistics table.
|
||||
|
||||
Args:
|
||||
table_data: Raw table data extracted from PDF.
|
||||
|
||||
Returns:
|
||||
List of validated TRREBMonthlyRecord objects.
|
||||
"""
|
||||
raise NotImplementedError("District table parsing not yet implemented.")
|
||||
|
||||
def _infer_report_date(self) -> tuple[int, int]:
|
||||
"""Infer report year and month from PDF filename or content.
|
||||
|
||||
Returns:
|
||||
Tuple of (year, month).
|
||||
"""
|
||||
raise NotImplementedError("Date inference not yet implemented.")
|
||||
@@ -2,7 +2,6 @@
|
||||
|
||||
from .cmhc import BedroomType, CMHCAnnualSurvey, CMHCRentalRecord, ReliabilityCode
|
||||
from .dimensions import (
|
||||
AreaType,
|
||||
CMHCZone,
|
||||
Confidence,
|
||||
ExpectedDirection,
|
||||
@@ -11,14 +10,9 @@ from .dimensions import (
|
||||
PolicyEvent,
|
||||
PolicyLevel,
|
||||
TimeDimension,
|
||||
TRREBDistrict,
|
||||
)
|
||||
from .trreb import TRREBMonthlyRecord, TRREBMonthlyReport
|
||||
|
||||
__all__ = [
|
||||
# TRREB
|
||||
"TRREBMonthlyRecord",
|
||||
"TRREBMonthlyReport",
|
||||
# CMHC
|
||||
"CMHCRentalRecord",
|
||||
"CMHCAnnualSurvey",
|
||||
@@ -26,12 +20,10 @@ __all__ = [
|
||||
"ReliabilityCode",
|
||||
# Dimensions
|
||||
"TimeDimension",
|
||||
"TRREBDistrict",
|
||||
"CMHCZone",
|
||||
"Neighbourhood",
|
||||
"PolicyEvent",
|
||||
# Enums
|
||||
"AreaType",
|
||||
"PolicyLevel",
|
||||
"PolicyCategory",
|
||||
"ExpectedDirection",
|
||||
|
||||
@@ -41,15 +41,6 @@ class Confidence(str, Enum):
|
||||
LOW = "low"
|
||||
|
||||
|
||||
class AreaType(str, Enum):
|
||||
"""TRREB area type."""
|
||||
|
||||
WEST = "West"
|
||||
CENTRAL = "Central"
|
||||
EAST = "East"
|
||||
NORTH = "North"
|
||||
|
||||
|
||||
class TimeDimension(BaseModel):
|
||||
"""Schema for time dimension record."""
|
||||
|
||||
@@ -62,15 +53,6 @@ class TimeDimension(BaseModel):
|
||||
is_month_start: bool = True
|
||||
|
||||
|
||||
class TRREBDistrict(BaseModel):
|
||||
"""Schema for TRREB district dimension."""
|
||||
|
||||
district_code: str = Field(max_length=3, description="W01, C01, E01, etc.")
|
||||
district_name: str = Field(max_length=100)
|
||||
area_type: AreaType
|
||||
geometry_wkt: str | None = Field(default=None, description="WKT geometry string")
|
||||
|
||||
|
||||
class CMHCZone(BaseModel):
|
||||
"""Schema for CMHC zone dimension."""
|
||||
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
"""Pydantic schemas for TRREB monthly market data."""
|
||||
|
||||
from datetime import date
|
||||
from decimal import Decimal
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class TRREBMonthlyRecord(BaseModel):
|
||||
"""Schema for a single TRREB monthly summary record.
|
||||
|
||||
Represents aggregated sales data for one district in one month.
|
||||
"""
|
||||
|
||||
report_date: date = Field(description="First of month (YYYY-MM-01)")
|
||||
area_code: str = Field(
|
||||
max_length=3, description="District code (W01, C01, E01, etc.)"
|
||||
)
|
||||
area_name: str = Field(max_length=100, description="District name")
|
||||
area_type: str = Field(max_length=10, description="West / Central / East / North")
|
||||
sales: int = Field(ge=0, description="Number of transactions")
|
||||
dollar_volume: Decimal = Field(ge=0, description="Total sales volume ($)")
|
||||
avg_price: Decimal = Field(ge=0, description="Average sale price ($)")
|
||||
median_price: Decimal = Field(ge=0, description="Median sale price ($)")
|
||||
new_listings: int = Field(ge=0, description="New listings count")
|
||||
active_listings: int = Field(ge=0, description="Active listings at month end")
|
||||
avg_sp_lp: Decimal = Field(
|
||||
ge=0, le=200, description="Avg sale price / list price ratio (%)"
|
||||
)
|
||||
avg_dom: int = Field(ge=0, description="Average days on market")
|
||||
|
||||
model_config = {"str_strip_whitespace": True}
|
||||
|
||||
|
||||
class TRREBMonthlyReport(BaseModel):
|
||||
"""Schema for a complete TRREB monthly report.
|
||||
|
||||
Contains all district records for a single month.
|
||||
"""
|
||||
|
||||
report_date: date
|
||||
records: list[TRREBMonthlyRecord]
|
||||
|
||||
@property
|
||||
def total_sales(self) -> int:
|
||||
"""Total sales across all districts."""
|
||||
return sum(r.sales for r in self.records)
|
||||
|
||||
@property
|
||||
def district_count(self) -> int:
|
||||
"""Number of districts in report."""
|
||||
return len(self.records)
|
||||
9
portfolio_app/utils/__init__.py
Normal file
9
portfolio_app/utils/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
||||
"""Utility modules for the portfolio app."""
|
||||
|
||||
from portfolio_app.utils.markdown_loader import (
|
||||
get_all_articles,
|
||||
get_article,
|
||||
render_markdown,
|
||||
)
|
||||
|
||||
__all__ = ["get_all_articles", "get_article", "render_markdown"]
|
||||
109
portfolio_app/utils/markdown_loader.py
Normal file
109
portfolio_app/utils/markdown_loader.py
Normal file
@@ -0,0 +1,109 @@
|
||||
"""Markdown article loader with frontmatter support."""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import TypedDict
|
||||
|
||||
import frontmatter
|
||||
import markdown
|
||||
from markdown.extensions.codehilite import CodeHiliteExtension
|
||||
from markdown.extensions.fenced_code import FencedCodeExtension
|
||||
from markdown.extensions.tables import TableExtension
|
||||
from markdown.extensions.toc import TocExtension
|
||||
|
||||
# Content directory (relative to this file's package)
|
||||
CONTENT_DIR = Path(__file__).parent.parent / "content" / "blog"
|
||||
|
||||
|
||||
class ArticleMeta(TypedDict):
|
||||
"""Article metadata from frontmatter."""
|
||||
|
||||
slug: str
|
||||
title: str
|
||||
date: str
|
||||
description: str
|
||||
tags: list[str]
|
||||
status: str # "published" or "draft"
|
||||
|
||||
|
||||
class Article(TypedDict):
|
||||
"""Full article with metadata and content."""
|
||||
|
||||
meta: ArticleMeta
|
||||
content: str
|
||||
html: str
|
||||
|
||||
|
||||
def render_markdown(content: str) -> str:
|
||||
"""Convert markdown to HTML with syntax highlighting.
|
||||
|
||||
Args:
|
||||
content: Raw markdown string.
|
||||
|
||||
Returns:
|
||||
HTML string with syntax-highlighted code blocks.
|
||||
"""
|
||||
md = markdown.Markdown(
|
||||
extensions=[
|
||||
FencedCodeExtension(),
|
||||
CodeHiliteExtension(css_class="highlight", guess_lang=False),
|
||||
TableExtension(),
|
||||
TocExtension(permalink=True),
|
||||
"nl2br",
|
||||
]
|
||||
)
|
||||
return str(md.convert(content))
|
||||
|
||||
|
||||
def get_article(slug: str) -> Article | None:
|
||||
"""Load a single article by slug.
|
||||
|
||||
Args:
|
||||
slug: Article slug (filename without .md extension).
|
||||
|
||||
Returns:
|
||||
Article dict or None if not found.
|
||||
"""
|
||||
filepath = CONTENT_DIR / f"{slug}.md"
|
||||
if not filepath.exists():
|
||||
return None
|
||||
|
||||
post = frontmatter.load(filepath)
|
||||
|
||||
meta: ArticleMeta = {
|
||||
"slug": slug,
|
||||
"title": post.get("title", slug.replace("-", " ").title()),
|
||||
"date": str(post.get("date", "")),
|
||||
"description": post.get("description", ""),
|
||||
"tags": post.get("tags", []),
|
||||
"status": post.get("status", "published"),
|
||||
}
|
||||
|
||||
return {
|
||||
"meta": meta,
|
||||
"content": post.content,
|
||||
"html": render_markdown(post.content),
|
||||
}
|
||||
|
||||
|
||||
def get_all_articles(include_drafts: bool = False) -> list[Article]:
|
||||
"""Load all articles from the content directory.
|
||||
|
||||
Args:
|
||||
include_drafts: If True, include articles with status="draft".
|
||||
|
||||
Returns:
|
||||
List of articles sorted by date (newest first).
|
||||
"""
|
||||
if not CONTENT_DIR.exists():
|
||||
return []
|
||||
|
||||
articles: list[Article] = []
|
||||
for filepath in CONTENT_DIR.glob("*.md"):
|
||||
slug = filepath.stem
|
||||
article = get_article(slug)
|
||||
if article and (include_drafts or article["meta"]["status"] == "published"):
|
||||
articles.append(article)
|
||||
|
||||
# Sort by date descending
|
||||
articles.sort(key=lambda a: a["meta"]["date"], reverse=True)
|
||||
return articles
|
||||
@@ -48,6 +48,11 @@ dependencies = [
|
||||
# Utilities
|
||||
"python-dotenv>=1.0",
|
||||
"httpx>=0.28",
|
||||
|
||||
# Blog/Markdown
|
||||
"python-frontmatter>=1.1",
|
||||
"markdown>=3.5",
|
||||
"pygments>=2.17",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
@@ -148,5 +153,7 @@ module = [
|
||||
"pdfplumber.*",
|
||||
"tabula.*",
|
||||
"pydantic_settings.*",
|
||||
"frontmatter.*",
|
||||
"markdown.*",
|
||||
]
|
||||
ignore_missing_imports = true
|
||||
|
||||
Reference in New Issue
Block a user