staging #96

Merged
lmiranda merged 90 commits from staging into main 2026-02-01 21:33:13 +00:00
7 changed files with 794 additions and 1190 deletions
Showing only changes of commit 4818c53fd2 - Show all commits

View File

@@ -6,8 +6,8 @@ Working context for Claude Code on the Analytics Portfolio project.
## Project Status
**Current Sprint**: 9 (Neighbourhood Dashboard Transition) - **COMPLETE**
**Phase**: Toronto Neighbourhood Dashboard - Phase 6 & 7 Done
**Last Completed Sprint**: 9 (Neighbourhood Dashboard Transition)
**Current State**: Ready for deployment sprint or new features
**Branch**: `development` (feature branches merge here)
---
@@ -121,6 +121,7 @@ portfolio_app/
│ └── toronto/
│ ├── dashboard.py # Dashboard -> /toronto
│ ├── methodology.py # Methodology -> /toronto/methodology
│ ├── tabs/ # 5 tab layouts (overview, housing, safety, demographics, amenities)
│ └── callbacks/ # Dashboard interactions
├── components/ # Shared UI (sidebar, cards, controls)
│ ├── metric_card.py # KPI card component
@@ -267,9 +268,9 @@ All scripts in `scripts/`:
| Document | Location | Use When |
|----------|----------|----------|
| Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions |
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification |
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning |
| Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions, completed work |
| Developer guide | `docs/CONTRIBUTING.md` | How to add pages, blog posts, tabs |
| Lessons learned | `docs/project-lessons-learned/INDEX.md` | Past issues and solutions |
---
@@ -340,4 +341,4 @@ Every Gitea issue should include:
---
*Last Updated: Sprint 9*
*Last Updated: January 2026 (Post-Sprint 9)*

139
README.md
View File

@@ -1,36 +1,42 @@
# Analytics Portfolio
A data analytics portfolio showcasing end-to-end data engineering, visualization, and analysis capabilities.
A personal portfolio website showcasing data engineering and visualization capabilities, featuring an interactive Toronto Neighbourhood Dashboard.
## Projects
## Live Pages
### Toronto Housing Dashboard
| Route | Page | Description |
|-------|------|-------------|
| `/` | Home | Bio landing page |
| `/about` | About | Background and experience |
| `/projects` | Projects | Portfolio project showcase |
| `/resume` | Resume | Professional CV |
| `/contact` | Contact | Contact form |
| `/blog` | Blog | Technical articles |
| `/blog/{slug}` | Article | Individual blog posts |
| `/toronto` | Toronto Dashboard | Neighbourhood analysis (5 tabs) |
| `/toronto/methodology` | Methodology | Dashboard data sources and methods |
| `/health` | Health | API health check endpoint |
An interactive choropleth dashboard analyzing Toronto's housing market using multi-source data integration.
## Toronto Neighbourhood Dashboard
**Features:**
- Purchase market analysis from TRREB monthly reports
- Rental market analysis from CMHC annual surveys
- Interactive choropleth maps by district/zone
- Time series visualization with policy event annotations
- Purchase/Rental mode toggle
An interactive choropleth dashboard analyzing Toronto's 158 official neighbourhoods across five dimensions:
**Data Sources:**
- [TRREB Market Watch](https://trreb.ca/market-data/market-watch/) - Monthly purchase statistics
- [CMHC Rental Market Survey](https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market) - Annual rental data
- **Overview**: Composite livability scores, income vs safety scatter
- **Housing**: Affordability index, rent trends, dwelling types
- **Safety**: Crime rates, breakdowns by type, trend analysis
- **Demographics**: Income distribution, age pyramids, population density
- **Amenities**: Parks, schools, transit accessibility
**Tech Stack:**
- Python 3.11+ / Dash / Plotly
- PostgreSQL + PostGIS
- dbt for data transformation
- Pydantic for validation
- SQLAlchemy 2.0
**Data Sources**:
- City of Toronto Open Data Portal (neighbourhoods, census profiles, amenities)
- Toronto Police Service (crime statistics)
- CMHC Rental Market Survey (rental data by zone)
## Quick Start
```bash
# Clone and setup
git clone https://github.com/lmiranda/personal-portfolio.git
git clone https://gitea.hotserv.cloud/lmiranda/personal-portfolio.git
cd personal-portfolio
# Install dependencies and configure environment
@@ -55,48 +61,72 @@ portfolio_app/
├── app.py # Dash app factory
├── config.py # Pydantic settings
├── pages/
│ ├── home.py # Bio landing page (/)
── toronto/ # Toronto dashboard (/toronto)
│ ├── home.py # Bio landing (/)
── about.py # About page
│ ├── contact.py # Contact form
│ ├── projects.py # Project showcase
│ ├── resume.py # Resume/CV
│ ├── blog/ # Blog system
│ │ ├── index.py # Article listing
│ │ └── article.py # Article renderer
│ └── toronto/ # Toronto dashboard
│ ├── dashboard.py # Main layout with tabs
│ ├── methodology.py # Data documentation
│ ├── tabs/ # Tab layouts (5)
│ └── callbacks/ # Interaction logic
├── components/ # Shared UI components
├── figures/ # Plotly figure factories
── toronto/ # Toronto data logic
── parsers/ # PDF/CSV extraction
├── loaders/ # Database operations
├── schemas/ # Pydantic models
── models/ # SQLAlchemy ORM
── content/
── blog/ # Markdown blog articles
├── toronto/ # Toronto data logic
├── parsers/ # API data extraction
── loaders/ # Database operations
│ ├── schemas/ # Pydantic models
│ └── models/ # SQLAlchemy ORM
└── errors/ # Exception handling
dbt/
├── models/
│ ├── staging/ # 1:1 source tables
│ ├── intermediate/ # Business logic
│ └── marts/ # Analytical tables
│ ├── staging/ # 1:1 source tables
│ ├── intermediate/ # Business logic
│ └── marts/ # Analytical tables
notebooks/ # Data documentation (15 notebooks)
├── overview/ # Overview tab visualizations
├── housing/ # Housing tab visualizations
├── safety/ # Safety tab visualizations
├── demographics/ # Demographics tab visualizations
└── amenities/ # Amenities tab visualizations
docs/
├── PROJECT_REFERENCE.md # Architecture reference
├── CONTRIBUTING.md # Developer guide
└── project-lessons-learned/
```
## Tech Stack
| Layer | Technology |
|-------|------------|
| Database | PostgreSQL 16 + PostGIS |
| Validation | Pydantic 2.x |
| ORM | SQLAlchemy 2.x |
| Transformation | dbt-postgres |
| Data Processing | Pandas, GeoPandas |
| Visualization | Dash + Plotly |
| UI Components | dash-mantine-components |
| Testing | pytest |
| Python | 3.11+ |
## Development
```bash
make test # Run tests
make lint # Run linter
make test # Run pytest
make lint # Run ruff linter
make format # Format code
make ci # Run all checks
```
## Data Pipeline
```
Raw Files (PDF/Excel)
Parsers (pdfplumber, pandas)
Pydantic Validation
SQLAlchemy Loaders
PostgreSQL + PostGIS
dbt Transformations
Dash Visualization
make dbt-run # Run dbt models
make dbt-test # Run dbt tests
```
## Environment Variables
@@ -109,12 +139,19 @@ POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
```
## Documentation
- **For developers**: See `docs/CONTRIBUTING.md` for setup and contribution guidelines
- **For Claude Code**: See `CLAUDE.md` for AI assistant context
- **Architecture**: See `docs/PROJECT_REFERENCE.md` for technical details
## License
MIT
## Author
Leo Miranda - [GitHub](https://github.com/lmiranda) | [LinkedIn](https://linkedin.com/in/yourprofile)
Leo Miranda

480
docs/CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,480 @@
# Developer Guide
Instructions for contributing to the Analytics Portfolio project.
---
## Table of Contents
1. [Development Setup](#development-setup)
2. [Adding a Blog Post](#adding-a-blog-post)
3. [Adding a New Page](#adding-a-new-page)
4. [Adding a Dashboard Tab](#adding-a-dashboard-tab)
5. [Creating Figure Factories](#creating-figure-factories)
6. [Branch Workflow](#branch-workflow)
7. [Code Standards](#code-standards)
---
## Development Setup
### Prerequisites
- Python 3.11+ (via pyenv)
- Docker and Docker Compose
- Git
### Initial Setup
```bash
# Clone repository
git clone https://gitea.hotserv.cloud/lmiranda/personal-portfolio.git
cd personal-portfolio
# Run setup (creates venv, installs deps, copies .env.example)
make setup
# Start PostgreSQL + PostGIS
make docker-up
# Initialize database
make db-init
# Start development server
make run
```
The app runs at `http://localhost:8050`.
### Useful Commands
```bash
make test # Run tests
make lint # Check code style
make format # Auto-format code
make ci # Run all checks (lint + test)
make dbt-run # Run dbt transformations
make dbt-test # Run dbt tests
```
---
## Adding a Blog Post
Blog posts are Markdown files with YAML frontmatter, stored in `portfolio_app/content/blog/`.
### Step 1: Create the Markdown File
Create a new file in `portfolio_app/content/blog/`:
```bash
touch portfolio_app/content/blog/your-article-slug.md
```
The filename becomes the URL slug: `/blog/your-article-slug`
### Step 2: Add Frontmatter
Every blog post requires YAML frontmatter at the top:
```markdown
---
title: "Your Article Title"
date: "2026-01-17"
description: "A brief description for the article card (1-2 sentences)"
tags:
- data-engineering
- python
- lessons-learned
status: published
---
Your article content starts here...
```
**Required fields:**
| Field | Description |
|-------|-------------|
| `title` | Article title (displayed on cards and page) |
| `date` | Publication date in `YYYY-MM-DD` format |
| `description` | Short summary for article listing cards |
| `tags` | List of tags (displayed as badges) |
| `status` | `published` or `draft` (drafts are hidden from listing) |
### Step 3: Write Content
Use standard Markdown:
```markdown
## Section Heading
Regular paragraph text.
### Subsection
- Bullet points
- Another point
```python
# Code blocks with syntax highlighting
def example():
return "Hello"
```
**Bold text** and *italic text*.
> Blockquotes for callouts
```
### Step 4: Test Locally
```bash
make run
```
Visit `http://localhost:8050/blog` to see the article listing.
Visit `http://localhost:8050/blog/your-article-slug` for the full article.
### Example: Complete Blog Post
```markdown
---
title: "Building ETL Pipelines with Python"
date: "2026-01-17"
description: "Lessons from building production data pipelines at scale"
tags:
- python
- etl
- data-engineering
status: published
---
When I started building data pipelines, I made every mistake possible...
## The Problem
Most tutorials show toy examples. Real pipelines are different.
### Error Handling
```python
def safe_transform(df: pd.DataFrame) -> pd.DataFrame:
try:
return df.apply(transform_row, axis=1)
except ValueError as e:
logger.error(f"Transform failed: {e}")
raise
```
## Conclusion
Ship something that works, then iterate.
```
---
## Adding a New Page
Pages use Dash's automatic routing based on file location in `portfolio_app/pages/`.
### Step 1: Create the Page File
```bash
touch portfolio_app/pages/your_page.py
```
### Step 2: Register the Page
Every page must call `dash.register_page()`:
```python
"""Your page description."""
import dash
import dash_mantine_components as dmc
dash.register_page(
__name__,
path="/your-page", # URL path
name="Your Page", # Display name (for nav)
title="Your Page Title" # Browser tab title
)
def layout() -> dmc.Container:
"""Page layout function."""
return dmc.Container(
dmc.Stack(
[
dmc.Title("Your Page", order=1),
dmc.Text("Page content here."),
],
gap="lg",
),
size="md",
py="xl",
)
```
### Step 3: Page with Dynamic Content
For pages with URL parameters:
```python
# pages/blog/article.py
dash.register_page(
__name__,
path_template="/blog/<slug>", # Dynamic parameter
name="Article",
)
def layout(slug: str = "") -> dmc.Container:
"""Layout receives URL parameters as arguments."""
article = get_article(slug)
if not article:
return dmc.Text("Article not found")
return dmc.Container(
dmc.Title(article["meta"]["title"]),
# ...
)
```
### Step 4: Add Navigation (Optional)
To add the page to the sidebar, edit `portfolio_app/components/sidebar.py`:
```python
NAV_ITEMS = [
{"label": "Home", "href": "/", "icon": "tabler:home"},
{"label": "Your Page", "href": "/your-page", "icon": "tabler:star"},
# ...
]
```
### URL Routing Summary
| File Location | URL |
|---------------|-----|
| `pages/home.py` | `/` (if `path="/"`) |
| `pages/about.py` | `/about` |
| `pages/blog/index.py` | `/blog` |
| `pages/blog/article.py` | `/blog/<slug>` |
| `pages/toronto/dashboard.py` | `/toronto` |
---
## Adding a Dashboard Tab
Dashboard tabs are in `portfolio_app/pages/toronto/tabs/`.
### Step 1: Create Tab Layout
```python
# pages/toronto/tabs/your_tab.py
"""Your tab description."""
import dash_mantine_components as dmc
from portfolio_app.figures.choropleth import create_choropleth
from portfolio_app.toronto.demo_data import get_demo_data
def create_your_tab_layout() -> dmc.Stack:
"""Create the tab layout."""
data = get_demo_data()
return dmc.Stack(
[
dmc.Grid(
[
dmc.GridCol(
# Map on left
create_choropleth(data, "your_metric"),
span=8,
),
dmc.GridCol(
# KPI cards on right
create_kpi_cards(data),
span=4,
),
],
),
# Charts below
create_supporting_charts(data),
],
gap="lg",
)
```
### Step 2: Register in Dashboard
Edit `pages/toronto/dashboard.py` to add the tab:
```python
from portfolio_app.pages.toronto.tabs.your_tab import create_your_tab_layout
# In the tabs list:
dmc.TabsTab("Your Tab", value="your-tab"),
# In the panels:
dmc.TabsPanel(create_your_tab_layout(), value="your-tab"),
```
---
## Creating Figure Factories
Figure factories are in `portfolio_app/figures/`. They create reusable Plotly figures.
### Pattern
```python
# figures/your_chart.py
"""Your chart type factory."""
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
def create_your_chart(
df: pd.DataFrame,
x_col: str,
y_col: str,
title: str = "",
) -> go.Figure:
"""Create a your_chart figure.
Args:
df: DataFrame with data.
x_col: Column for x-axis.
y_col: Column for y-axis.
title: Optional chart title.
Returns:
Configured Plotly figure.
"""
fig = px.bar(df, x=x_col, y=y_col, title=title)
fig.update_layout(
template="plotly_white",
margin=dict(l=40, r=40, t=40, b=40),
)
return fig
```
### Export from `__init__.py`
```python
# figures/__init__.py
from .your_chart import create_your_chart
__all__ = [
"create_your_chart",
# ...
]
```
---
## Branch Workflow
```
main (production)
staging (pre-production)
development (integration)
feature/XX-description (your work)
```
### Creating a Feature Branch
```bash
# Start from development
git checkout development
git pull origin development
# Create feature branch
git checkout -b feature/10-add-new-page
# Work, commit, push
git add .
git commit -m "feat: Add new page"
git push -u origin feature/10-add-new-page
```
### Merging
```bash
# Merge into development
git checkout development
git merge feature/10-add-new-page
git push origin development
# Delete feature branch
git branch -d feature/10-add-new-page
git push origin --delete feature/10-add-new-page
```
**Rules:**
- Never commit directly to `main` or `staging`
- Never delete `development`
- Feature branches are temporary
---
## Code Standards
### Type Hints
Use Python 3.10+ style:
```python
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
...
```
### Imports
| Context | Style |
|---------|-------|
| Same directory | `from .module import X` |
| Sibling directory | `from ..schemas.model import Y` |
| External packages | `import pandas as pd` |
### Formatting
```bash
make format # Runs ruff formatter
make lint # Checks style
```
### Docstrings
Google style, only for non-obvious functions:
```python
def calculate_score(values: list[float], weights: list[float]) -> float:
"""Calculate weighted score.
Args:
values: Raw metric values.
weights: Weight for each metric.
Returns:
Weighted average score.
"""
...
```
---
## Questions?
Check `CLAUDE.md` for AI assistant context and architectural decisions.

View File

@@ -1,21 +1,171 @@
# Portfolio Project Reference
**Project**: Analytics Portfolio
**Owner**: Leo
**Status**: Ready for Sprint 1
**Owner**: Leo Miranda
**Status**: Sprint 9 Complete (Dashboard Implementation Done)
**Last Updated**: January 2026
---
## Project Overview
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.
Personal portfolio website with an interactive Toronto Neighbourhood Dashboard demonstrating data engineering, visualization, and analytics capabilities.
| Project | Domain | Key Skills | Phase |
|---------|--------|------------|-------|
| **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) |
| **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) |
| Component | Description | Status |
|-----------|-------------|--------|
| Portfolio Website | Bio, About, Projects, Resume, Contact, Blog | Complete |
| Toronto Dashboard | 5-tab neighbourhood analysis | Complete |
| Data Pipeline | dbt models, figure factories | Complete |
| Deployment | Production deployment | Pending |
**Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).
---
## Completed Work
### Sprint 1-6: Foundation
- Repository setup, Docker, PostgreSQL + PostGIS
- Bio landing page implementation
- Initial data model design
### Sprint 7: Navigation & Theme
- Sidebar navigation
- Dark/light theme toggle
- dash-mantine-components integration
### Sprint 8: Portfolio Website
- About, Contact, Projects, Resume pages
- Blog system with Markdown/frontmatter
- Health endpoint
### Sprint 9: Neighbourhood Dashboard Transition
- Phase 1: Deleted legacy TRREB code
- Phase 2: Documentation cleanup
- Phase 3: New neighbourhood-centric data model
- Phase 4: dbt model restructuring
- Phase 5: 5-tab dashboard implementation
- Phase 6: 15 documentation notebooks
- Phase 7: Final documentation review
---
## Application Architecture
### URL Routes
| URL | Page | File |
|-----|------|------|
| `/` | Home | `pages/home.py` |
| `/about` | About | `pages/about.py` |
| `/contact` | Contact | `pages/contact.py` |
| `/projects` | Projects | `pages/projects.py` |
| `/resume` | Resume | `pages/resume.py` |
| `/blog` | Blog listing | `pages/blog/index.py` |
| `/blog/{slug}` | Article | `pages/blog/article.py` |
| `/toronto` | Dashboard | `pages/toronto/dashboard.py` |
| `/toronto/methodology` | Methodology | `pages/toronto/methodology.py` |
| `/health` | Health check | `pages/health.py` |
### Directory Structure
```
portfolio_app/
├── app.py # Dash app factory
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images
├── callbacks/ # Global callbacks (sidebar, theme)
├── components/ # Shared UI components
├── content/blog/ # Markdown blog articles
├── errors/ # Exception handling
├── figures/ # Plotly figure factories
├── pages/
│ ├── home.py
│ ├── about.py
│ ├── contact.py
│ ├── projects.py
│ ├── resume.py
│ ├── health.py
│ ├── blog/
│ │ ├── index.py
│ │ └── article.py
│ └── toronto/
│ ├── dashboard.py
│ ├── methodology.py
│ ├── tabs/ # 5 tab layouts
│ └── callbacks/ # Dashboard interactions
├── toronto/ # Data logic
│ ├── parsers/ # API extraction
│ ├── loaders/ # Database operations
│ ├── schemas/ # Pydantic models
│ ├── models/ # SQLAlchemy ORM
│ └── demo_data.py # Sample data
└── utils/
└── markdown_loader.py # Blog article loading
```
---
## Toronto Dashboard
### Data Sources
| Source | Data | Format |
|--------|------|--------|
| City of Toronto Open Data | Neighbourhoods (158), Census profiles, Parks, Schools, Childcare, TTC | GeoJSON, CSV, API |
| Toronto Police Service | Crime rates, MCI, Shootings | CSV, API |
| CMHC | Rental Market Survey | CSV |
### Geographic Model
```
City of Toronto Neighbourhoods (158) ← Primary analysis unit
CMHC Zones (~20) ← Rental data (Census Tract aligned)
```
### Dashboard Tabs
| Tab | Choropleth Metric | Supporting Charts |
|-----|-------------------|-------------------|
| Overview | Livability score | Top/Bottom 10 bar, Income vs Safety scatter |
| Housing | Affordability index | Rent trend line, Tenure breakdown bar |
| Safety | Crime rate per 100K | Crime breakdown bar, Crime trend line |
| Demographics | Median income | Age distribution, Population density bar |
| Amenities | Amenity index | Amenity radar, Transit accessibility bar |
### Star Schema
| Table | Type | Description |
|-------|------|-------------|
| `dim_neighbourhood` | Dimension | 158 neighbourhoods with geometry |
| `dim_time` | Dimension | Date dimension |
| `dim_cmhc_zone` | Dimension | ~20 CMHC zones with geometry |
| `fact_census` | Fact | Census indicators by neighbourhood |
| `fact_crime` | Fact | Crime stats by neighbourhood |
| `fact_rentals` | Fact | Rental data by CMHC zone |
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
### dbt Layers
| Layer | Naming | Example |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | `stg_toronto__neighbourhoods` |
| Intermediate | `int_{domain}__{transform}` | `int_neighbourhood__demographics` |
| Marts | `mart_{domain}` | `mart_neighbourhood_overview` |
---
## Tech Stack
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | 2.x |
| ORM | SQLAlchemy | 2.x |
| Transformation | dbt-postgres | 1.7+ |
| Data Processing | Pandas, GeoPandas | Latest |
| Visualization | Dash + Plotly | 2.14+ |
| UI Components | dash-mantine-components | Latest |
| Testing | pytest | 7.0+ |
| Python | 3.11+ | Via pyenv |
---
@@ -23,293 +173,51 @@ Two-project analytics portfolio demonstrating end-to-end data engineering, visua
| Branch | Purpose | Deploys To |
|--------|---------|------------|
| `main` | Production releases only | VPS (production) |
| `main` | Production releases | VPS (production) |
| `staging` | Pre-production testing | VPS (staging) |
| `development` | Active development | Local only |
**Rules**:
- All feature branches created FROM `development`
- All feature branches merge INTO `development`
- `development``staging` for testing
- `staging``main` for release
- Direct commits to `main` or `staging` are forbidden
- Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}`
**Rules:**
- Feature branches from `development`: `feature/{sprint}-{description}`
- Merge into `development` when complete
- `development``staging` `main` for releases
- Never delete `development`
---
## Tech Stack (Locked)
## Code Standards
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | ≥2.0 |
| ORM | SQLAlchemy | ≥2.0 (2.0-style API only) |
| Transformation | dbt-postgres | ≥1.7 |
| Data Processing | Pandas | ≥2.1 |
| Geospatial | GeoPandas + Shapely | ≥0.14 |
| Visualization | Dash + Plotly | ≥2.14 |
| UI Components | dash-mantine-components | Latest stable |
| Testing | pytest | ≥7.0 |
| Python | 3.11+ | Via pyenv |
### Type Hints (Python 3.10+)
**Compatibility Notes**:
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
- PostGIS extension required—enable during db init
- Docker Compose V2 (no `version` field in compose files)
```python
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
...
```
---
### Imports
## Code Conventions
### Import Style
| Context | Style | Example |
|---------|-------|---------|
| Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` |
| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` |
| External packages | Absolute | `import pandas as pd` |
### Module Separation
| Directory | Contains | Purpose |
|-----------|----------|---------|
| `schemas/` | Pydantic models | Data validation |
| `models/` | SQLAlchemy ORM | Database persistence |
| `parsers/` | API/CSV extraction | Raw data ingestion |
| `loaders/` | Database operations | Data loading |
| `figures/` | Chart factories | Plotly figure generation |
| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` |
| `errors/` | Exceptions + handlers | Error handling |
### Code Standards
- **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`)
- **Functions**: Single responsibility, verb naming, early returns over nesting
- **Docstrings**: Google style, minimal—only for non-obvious behavior
- **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config
| Context | Style |
|---------|-------|
| Same directory | `from .module import X` |
| Sibling directory | `from ..schemas.model import Y` |
| External | `import pandas as pd` |
### Error Handling
```python
# errors/exceptions.py
class PortfolioError(Exception):
"""Base exception."""
class ParseError(PortfolioError):
"""PDF/CSV parsing failed."""
"""Data parsing failed."""
class ValidationError(PortfolioError):
"""Pydantic or business rule validation failed."""
"""Validation failed."""
class LoadError(PortfolioError):
"""Database load operation failed."""
"""Database load failed."""
```
- Decorators for infrastructure concerns (logging, retry, transactions)
- Explicit handling for domain logic (business rules, recovery strategies)
---
## Application Architecture
### Dash Pages Structure
```
portfolio_app/
├── app.py # Dash app factory with Pages routing
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images (auto-served by Dash)
├── pages/
│ ├── home.py # Bio landing page → /
│ ├── toronto/
│ │ ├── dashboard.py # Layout only → /toronto
│ │ └── callbacks/ # Interaction logic
│ └── energy/ # Phase 3
├── components/ # Shared UI (navbar, footer, cards)
├── figures/ # Shared chart factories
├── toronto/ # Toronto data logic
│ ├── parsers/
│ ├── loaders/
│ ├── schemas/ # Pydantic
│ └── models/ # SQLAlchemy
└── errors/
```
### URL Routing (Automatic)
| URL | Page | Status |
|-----|------|--------|
| `/` | Bio landing page | Sprint 2 |
| `/toronto` | Toronto Housing Dashboard | Sprint 6 |
| `/energy` | Energy Pricing Dashboard | Phase 3 |
---
## Phase 1: Toronto Neighbourhood Dashboard
### Data Sources
| Track | Source | Format | Geography | Frequency |
|-------|--------|--------|-----------|-----------|
| Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual |
| Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
| Policy Events | Curated list | CSV | N/A | Event-based |
### Geographic Reality
```
┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit
├─────────────────────────────────────────────────────────────────┤
│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data
└─────────────────────────────────────────────────────────────────┘
```
### Data Model (Star Schema)
| Table | Type | Keys |
|-------|------|------|
| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone |
| `dim_time` | Dimension | date_key (PK) |
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
| `dim_policy_event` | Dimension | event_id (PK) |
### dbt Layer Structure
| Layer | Naming | Purpose |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
| Intermediate | `int_{domain}__{transform}` | Business logic, filtering |
| Marts | `mart_{domain}` | Final analytical tables |
---
## Sprint Overview
| Sprint | Focus | Milestone |
|--------|-------|-----------|
| 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** |
| 7 | Navigation & theme modernization | — |
| 8 | Portfolio website expansion | **Launch 2: Website Live** |
| 9 | Neighbourhood dashboard transition | Cleanup complete |
| 10+ | Dashboard implementation | **Launch 3: Dashboard Live** |
---
## Scope Boundaries
### Phase 1 — Build These
- Bio landing page and portfolio website
- CMHC rental data processor
- Toronto neighbourhood data integration
- PostgreSQL + PostGIS database layer
- Star schema (facts + dimensions)
- dbt models with tests
- Choropleth visualization (Dash)
- Policy event annotation layer
### Deferred Features
| Feature | Reason | When |
|---------|--------|------|
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase |
| ML prediction models | Energy project scope | Phase 3 |
| Multi-project shared infrastructure | Build first, abstract second | Future |
If a task seems to require deferred features, **stop and flag it**.
---
## File Structure
### Root-Level Files (Allowed)
| File | Purpose |
|------|---------|
| `README.md` | Project overview |
| `CLAUDE.md` | AI assistant context |
| `pyproject.toml` | Python packaging |
| `.gitignore` | Git ignore rules |
| `.env.example` | Environment template |
| `.python-version` | pyenv version |
| `.pre-commit-config.yaml` | Pre-commit hooks |
| `docker-compose.yml` | Container orchestration |
| `Makefile` | Task automation |
### Directory Structure
```
portfolio/
├── portfolio_app/ # Monolithic Dash application
│ ├── app.py
│ ├── config.py
│ ├── assets/
│ ├── pages/
│ ├── components/
│ ├── figures/
│ ├── toronto/
│ └── errors/
├── tests/
├── dbt/
├── data/
│ └── toronto/
│ ├── raw/
│ ├── processed/ # gitignored
│ └── reference/
├── scripts/
│ ├── db/
│ ├── docker/
│ ├── deploy/
│ ├── dbt/
│ └── dev/
├── docs/
├── notebooks/
├── backups/ # gitignored
└── reports/ # gitignored
```
### Gitignored Directories
- `data/*/processed/`
- `reports/`
- `backups/`
- `notebooks/*.html`
- `.env`
- `__pycache__/`
- `.venv/`
---
## Makefile Targets
| Target | Purpose |
|--------|---------|
| `setup` | Install deps, create .env, init pre-commit |
| `docker-up` | Start PostgreSQL + PostGIS |
| `docker-down` | Stop containers |
| `db-init` | Initialize database schema |
| `run` | Start Dash dev server |
| `test` | Run pytest |
| `dbt-run` | Run dbt models |
| `dbt-test` | Run dbt tests |
| `lint` | Run ruff linter |
| `format` | Run ruff formatter |
| `ci` | Run all checks |
| `deploy` | Deploy to production |
---
## Script Standards
All scripts in `scripts/`:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use `set -euo pipefail` for bash
- Log to stdout, errors to stderr
---
## Environment Variables
@@ -328,41 +236,52 @@ LOG_LEVEL=INFO
---
## Success Criteria
## Makefile Targets
### Launch 1 (Bio Live)
- [x] Bio page accessible via HTTPS
- [x] All bio content rendered
- [x] No placeholder text visible
- [x] Mobile responsive
- [x] Social links functional
### Launch 2 (Website Live)
- [x] Full portfolio website with navigation
- [x] About, Contact, Projects, Resume, Blog pages
- [x] Dark mode theme support
- [x] Sidebar navigation
### Launch 3 (Dashboard Live)
- [ ] Choropleth renders neighbourhoods and CMHC zones
- [ ] Rental data visualization works
- [ ] Time navigation works
- [ ] Policy event markers visible
- [ ] Methodology documentation published
- [ ] Data sources cited
| Target | Purpose |
|--------|---------|
| `setup` | Install deps, create .env, init pre-commit |
| `docker-up` | Start PostgreSQL + PostGIS |
| `docker-down` | Stop containers |
| `db-init` | Initialize database schema |
| `run` | Start Dash dev server |
| `test` | Run pytest |
| `dbt-run` | Run dbt models |
| `dbt-test` | Run dbt tests |
| `lint` | Run ruff linter |
| `format` | Run ruff formatter |
| `ci` | Run all checks |
---
## Reference Documents
## Next Steps
For detailed specifications, see:
### Deployment (Sprint 10+)
- [ ] Production Docker configuration
- [ ] CI/CD pipeline
- [ ] HTTPS/SSL setup
- [ ] Domain configuration
| Document | Location | Use When |
|----------|----------|----------|
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification |
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning |
### Data Enhancement
- [ ] Connect to live APIs (currently using demo data)
- [ ] Data refresh automation
- [ ] Historical data loading
### Future Projects
- Energy Pricing Analysis dashboard (planned)
---
*Reference Version: 2.0*
*Updated: Sprint 9*
## Related Documents
| Document | Purpose |
|----------|---------|
| `README.md` | Quick start guide |
| `CLAUDE.md` | AI assistant context |
| `docs/CONTRIBUTING.md` | Developer guide |
| `notebooks/README.md` | Notebook documentation |
---
*Reference Version: 3.0*
*Updated: January 2026*

View File

@@ -1,134 +0,0 @@
# Portfolio Bio Content
**Version**: 2.0
**Last Updated**: January 2026
**Purpose**: Content source for `portfolio_app/pages/home.py`
---
## Document Context
| Attribute | Value |
|-----------|-------|
| **Parent Document** | `portfolio_project_plan_v5.md` |
| **Role** | Bio content and social links for landing page |
| **Consumed By** | `portfolio_app/pages/home.py` |
---
## Headline
**Primary**: Leo | Data Engineer & Analytics Developer
**Tagline**: I build data infrastructure that actually gets used.
---
## Professional Summary
Over the past 5 years, I've designed and evolved an enterprise analytics platform from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call abandon rates, and dashboards that executives actually open.
My approach: dimensional modeling (star schema), layered transformations (staging → intermediate → marts), and automation that eliminates manual work. I've built everything from self-service analytics portals to OCR-powered receipt processing systems.
Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states. Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the Project Management Institute.
---
## Tech Stack
| Category | Technologies |
|----------|--------------|
| **Languages** | Python, SQL |
| **Data Processing** | Pandas, SQLAlchemy, FastAPI |
| **Databases** | PostgreSQL, MSSQL |
| **Visualization** | Power BI, Plotly, Dash |
| **Patterns** | dbt, dimensional modeling, star schema |
| **Other** | Genesys Cloud |
**Display Format** (for landing page):
```
Python (Pandas, SQLAlchemy, FastAPI) • SQL (MSSQL, PostgreSQL) • Power BI • Plotly/Dash • Genesys Cloud • dbt patterns
```
---
## Side Project
**Bandit Labs** — Building automation and AI tooling for small businesses.
*Note: Keep this brief on portfolio; link only if separate landing page exists.*
---
## Social Links
| Platform | URL | Icon |
|----------|-----|------|
| **LinkedIn** | `https://linkedin.com/in/[USERNAME]` | `lucide-react: Linkedin` |
| **GitHub** | `https://github.com/[USERNAME]` | `lucide-react: Github` |
> **TODO**: Replace `[USERNAME]` placeholders with actual URLs before bio page launch.
---
## Availability Statement
Open to **Senior Data Analyst**, **Analytics Engineer**, and **BI Developer** opportunities in Toronto or remote.
---
## Portfolio Projects Section
*Dynamically populated based on deployed projects.*
| Project | Status | Link |
|---------|--------|------|
| Toronto Housing Dashboard | In Development | `/toronto` |
| Energy Pricing Analysis | Planned | `/energy` |
**Display Logic**:
- Show only projects with `status = deployed`
- "In Development" projects can show as coming soon or be hidden (user preference)
---
## Implementation Notes
### Content Hierarchy for `home.py`
```
1. Name + Tagline (hero section)
2. Professional Summary (2-3 paragraphs)
3. Tech Stack (horizontal chips or inline list)
4. Portfolio Projects (cards linking to dashboards)
5. Social Links (icon buttons)
6. Availability statement (subtle, bottom)
```
### Styling Recommendations
- Clean, minimal — let the projects speak
- Dark/light mode support via dash-mantine-components theme
- No headshot required (optional)
- Mobile-responsive layout
### Content Updates
When updating bio content:
1. Edit this document
2. Update `home.py` to reflect changes
3. Redeploy
---
## Related Documents
| Document | Relationship |
|----------|--------------|
| `portfolio_project_plan_v5.md` | Parent — references this for bio content |
| `portfolio_app/pages/home.py` | Consumer — implements this content |
---
*Document Version: 2.0*
*Updated: January 2026*

View File

@@ -1,276 +0,0 @@
# Toronto Neighbourhood Dashboard — Implementation Plan
**Document Type:** Execution Guide
**Target:** Transition from TRREB-based to Neighbourhood-based Dashboard
**Version:** 2.0 | January 2026
---
## Overview
Transition from TRREB district-based housing dashboard to a comprehensive Toronto Neighbourhood Dashboard built around the city's 158 official neighbourhoods.
**Key Changes:**
- Geographic foundation: TRREB districts (~35) → City Neighbourhoods (158)
- Data sources: PDF parsing → Open APIs (Toronto Open Data, Toronto Police, CMHC)
- Scope: Housing-only → 5 thematic tabs (Overview, Housing, Safety, Demographics, Amenities)
---
## Phase 1: Repository Cleanup
### Files to DELETE
| File | Reason |
|------|--------|
| `portfolio_app/toronto/schemas/trreb.py` | TRREB schema obsolete |
| `portfolio_app/toronto/parsers/trreb.py` | PDF parsing no longer needed |
| `portfolio_app/toronto/loaders/trreb.py` | TRREB loading logic obsolete |
| `dbt/models/staging/stg_trreb__purchases.sql` | TRREB staging obsolete |
| `dbt/models/intermediate/int_purchases__monthly.sql` | TRREB intermediate obsolete |
| `dbt/models/marts/mart_toronto_purchases.sql` | Will rebuild for neighbourhood grain |
### Files to MODIFY (Remove TRREB References)
| File | Action |
|------|--------|
| `portfolio_app/toronto/schemas/__init__.py` | Remove TRREB imports |
| `portfolio_app/toronto/parsers/__init__.py` | Remove TRREB parser imports |
| `portfolio_app/toronto/loaders/__init__.py` | Remove TRREB loader imports |
| `portfolio_app/toronto/models/facts.py` | Remove `FactPurchases` model |
| `portfolio_app/toronto/models/dimensions.py` | Remove `DimTRREBDistrict` model |
| `portfolio_app/toronto/demo_data.py` | Remove TRREB demo data |
| `dbt/models/sources.yml` | Remove TRREB source definitions |
| `dbt/models/schema.yml` | Remove TRREB model documentation |
### Files to KEEP (Reusable)
| File | Why |
|------|-----|
| `portfolio_app/toronto/schemas/cmhc.py` | CMHC data still used |
| `portfolio_app/toronto/parsers/cmhc.py` | Reusable with modifications |
| `portfolio_app/toronto/loaders/base.py` | Generic database utilities |
| `portfolio_app/toronto/loaders/dimensions.py` | Dimension loading patterns |
| `portfolio_app/toronto/models/base.py` | SQLAlchemy base class |
| `portfolio_app/figures/*.py` | All chart factories reusable |
| `portfolio_app/components/*.py` | All UI components reusable |
---
## Phase 2: Documentation Updates
| Document | Action |
|----------|--------|
| `CLAUDE.md` | Update data model section, mark transition complete |
| `docs/PROJECT_REFERENCE.md` | Update architecture, data sources |
| `docs/toronto_housing_dashboard_spec_v5.md` | Archive or delete |
| `docs/wbs_sprint_plan_v4.md` | Archive or delete |
---
## Phase 3: New Data Model
### Star Schema (Neighbourhood-Centric)
| Table | Type | Description |
|-------|------|-------------|
| `dim_neighbourhood` | Central Dimension | 158 neighbourhoods with geometry |
| `dim_time` | Dimension | Date dimension (keep existing) |
| `dim_cmhc_zone` | Bridge Dimension | 15 CMHC zones with neighbourhood mapping |
| `bridge_cmhc_neighbourhood` | Bridge | Zone-to-neighbourhood area weights |
| `fact_census` | Fact | Census indicators by neighbourhood |
| `fact_crime` | Fact | Crime stats by neighbourhood |
| `fact_rentals` | Fact | Rental data by CMHC zone (keep existing) |
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
### New Schema Files
| File | Contains |
|------|----------|
| `toronto/schemas/neighbourhood.py` | NeighbourhoodRecord, CensusRecord, CrimeRecord |
| `toronto/schemas/amenities.py` | AmenityType enum, AmenityRecord |
### New Parser Files
| File | Data Source | API |
|------|-------------|-----|
| `toronto/parsers/toronto_open_data.py` | Neighbourhoods, Census, Parks, Schools, Childcare | Toronto Open Data Portal |
| `toronto/parsers/toronto_police.py` | Crime Rates, MCI, Shootings | Toronto Police Portal |
### New Loader Files
| File | Purpose |
|------|---------|
| `toronto/loaders/neighbourhoods.py` | Load GeoJSON boundaries |
| `toronto/loaders/census.py` | Load neighbourhood profiles |
| `toronto/loaders/crime.py` | Load crime statistics |
| `toronto/loaders/amenities.py` | Load parks, schools, childcare |
| `toronto/loaders/cmhc_crosswalk.py` | Build CMHC-neighbourhood bridge |
---
## Phase 4: dbt Restructuring
### Staging Layer
| Model | Source |
|-------|--------|
| `stg_toronto__neighbourhoods` | dim_neighbourhood |
| `stg_toronto__census` | fact_census |
| `stg_toronto__crime` | fact_crime |
| `stg_toronto__amenities` | fact_amenities |
| `stg_cmhc__rentals` | fact_rentals (modify existing) |
| `stg_cmhc__zone_crosswalk` | bridge_cmhc_neighbourhood |
### Intermediate Layer
| Model | Purpose |
|-------|---------|
| `int_neighbourhood__demographics` | Combined census demographics |
| `int_neighbourhood__housing` | Housing indicators |
| `int_neighbourhood__crime_summary` | Aggregated crime by type |
| `int_neighbourhood__amenity_scores` | Normalized amenity metrics |
| `int_rentals__neighbourhood_allocated` | CMHC rentals allocated to neighbourhoods |
### Mart Layer (One per Tab)
| Model | Tab | Key Metrics |
|-------|-----|-------------|
| `mart_neighbourhood_overview` | Overview | Composite livability score |
| `mart_neighbourhood_housing` | Housing | Affordability index, rent-to-income |
| `mart_neighbourhood_safety` | Safety | Crime rates, YoY change |
| `mart_neighbourhood_demographics` | Demographics | Income, age, diversity |
| `mart_neighbourhood_amenities` | Amenities | Parks, schools, transit per capita |
---
## Phase 5: Dashboard Implementation
### Tab Structure
```
pages/toronto/
├── dashboard.py # Main layout with tab navigation
├── tabs/
│ ├── overview.py # Composite livability
│ ├── housing.py # Affordability
│ ├── safety.py # Crime
│ ├── demographics.py # Population
│ └── amenities.py # Services
└── callbacks/
├── map_callbacks.py
├── chart_callbacks.py
└── selection_callbacks.py
```
### Layout Pattern (All Tabs)
Each tab follows the same structure:
1. **Choropleth Map** (left) — 158 neighbourhoods, click to select
2. **KPI Cards** (right) — 3-4 contextual metrics
3. **Supporting Charts** (bottom) — Trend + comparison visualizations
4. **Details Panel** (collapsible) — All metrics for selected neighbourhood
### Graphs by Tab
| Tab | Choropleth Metric | Chart 1 | Chart 2 |
|-----|-------------------|---------|---------|
| Overview | Livability score | Top/Bottom 10 bar | Income vs Crime scatter |
| Housing | Affordability index | Rent trend (5yr line) | Dwelling types (pie/bar) |
| Safety | Crime rate per 100K | Crime breakdown (stacked bar) | Crime trend (5yr line) |
| Demographics | Median income | Age pyramid | Top languages (bar) |
| Amenities | Park area per capita | Amenity radar | Transit accessibility (bar) |
---
## Phase 6: Jupyter Notebooks
### Purpose
One notebook per graph to document:
1. **Data Reference** — How the data was built (query, transformation steps, sample output)
2. **Data Visualization** — Import figure factory, render the graph
### Directory Structure
```
notebooks/
├── README.md
├── overview/
├── housing/
├── safety/
├── demographics/
└── amenities/
```
### Notebook Template
```markdown
# [Graph Name]
## 1. Data Reference
### Source Tables
- List tables/marts used
- Grain of each table
### Query
```sql
SELECT ... FROM ...
```
### Transformation Steps
1. Step description
2. Step description
### Sample Data
```python
df = pd.read_sql(query, engine)
df.head(10)
```
## 2. Data Visualization
```python
from portfolio_app.figures.choropleth import create_choropleth_figure
fig = create_choropleth_figure(...)
fig.show()
```
```
Create one notebook per graph as each is implemented (15 total across 5 tabs).
---
## Phase 7: Final Documentation Review
After all implementation, audit and update:
- [ ] `CLAUDE.md` — Project status, app structure, data model, URL routes
- [ ] `README.md` — Project description, installation, quick start
- [ ] `docs/PROJECT_REFERENCE.md` — Architecture matches implementation
- [ ] Remove or archive legacy spec documents
---
## Data Source Reference
| Source | Datasets | URL |
|--------|----------|-----|
| Toronto Open Data | Neighbourhoods, Census Profiles, Parks, Schools, Childcare, TTC | open.toronto.ca |
| Toronto Police | Crime Rates, MCI, Shootings | data.torontopolice.on.ca |
| CMHC | Rental Market Survey | cmhc-schl.gc.ca |
---
## CMHC Zone Mapping Note
CMHC uses 15 zones that don't align with 158 neighbourhoods. Strategy:
- Create `bridge_cmhc_neighbourhood` with area weights
- Allocate rental metrics proportionally to overlapping neighbourhoods
- Document methodology in `/toronto/methodology` page
---
*Document Version: 2.0*
*Trimmed from v1.0 for execution clarity*

View File

@@ -1,423 +0,0 @@
# Toronto Neighbourhood Dashboard — Deliverables
**Project Type:** Interactive Data Visualization Dashboard
**Geographic Scope:** City of Toronto, 158 Official Neighbourhoods
**Author:** Leo Miranda
**Version:** 1.0 | January 2026
---
## Executive Summary
Multi-tab analytics dashboard built around Toronto's official neighbourhood boundaries. The core interaction is a choropleth map where users explore the city through different thematic lenses—housing affordability, safety, demographics, amenities—with supporting visualizations that tell a cohesive story per theme.
**Primary Goals:**
1. Demonstrate interactive data visualization skills (Plotly/Dash)
2. Showcase data engineering capabilities (multi-source ETL, dimensional modeling)
3. Create a portfolio piece with genuine analytical value
---
## Part 1: Geographic Foundation (Required First)
| Dataset | Source | Format | Last Updated | Download |
|---------|--------|--------|--------------|----------|
| **Neighbourhoods Boundaries** | Toronto Open Data | GeoJSON | 2024 | [Link](https://open.toronto.ca/dataset/neighbourhoods/) |
| **Neighbourhood Profiles** | Toronto Open Data | CSV | 2021 Census | [Link](https://open.toronto.ca/dataset/neighbourhood-profiles/) |
**Critical Notes:**
- Toronto uses 158 official neighbourhoods (updated 2024, was 140)
- GeoJSON includes `AREA_ID` for joining to tabular data
- Neighbourhood Profiles has 2,400+ indicators per neighbourhood from Census
---
## Part 2: Tier 1 — MVP Datasets
| Dataset | Source | Measures Available | Update Freq | Granularity |
|---------|--------|-------------------|-------------|-------------|
| **Neighbourhoods GeoJSON** | Toronto Open Data | Boundary polygons, area IDs | Static | Neighbourhood |
| **Neighbourhood Profiles (full)** | Toronto Open Data | 2,400+ Census indicators | Every 5 years | Neighbourhood |
| **Neighbourhood Crime Rates** | Toronto Police Portal | MCI rates per 100K by year | Annual | Neighbourhood |
| **CMHC Rental Market Survey** | CMHC Portal | Avg rent by bedroom, vacancy rate | Annual (Oct) | 15 CMHC Zones |
| **Parks** | Toronto Open Data | Park locations, area, type | Annual | Point/Polygon |
**Total API/Download Calls:** 5
**Data Volume:** ~50MB combined
### Tier 1 Measures to Extract
**From Neighbourhood Profiles:**
- Population, population density
- Median household income
- Age distribution (0-14, 15-24, 25-44, 45-64, 65+)
- % Immigrants, % Visible minorities
- Top languages spoken
- Unemployment rate
- Education attainment (% with post-secondary)
- Housing tenure (own vs rent %)
- Dwelling types distribution
- Average rent, housing costs as % of income
**From Crime Rates:**
- Total MCI rate per 100K population
- Year-over-year crime trend
**From CMHC:**
- Average monthly rent (1BR, 2BR, 3BR)
- Vacancy rates
**From Parks:**
- Park count per neighbourhood
- Park area per capita
---
## Part 3: Tier 2 — Expansion Datasets
| Dataset | Source | Measures Available | Update Freq | Granularity |
|---------|--------|-------------------|-------------|-------------|
| **Major Crime Indicators (MCI)** | Toronto Police Portal | Assault, B&E, auto theft, robbery, theft over | Quarterly | Neighbourhood |
| **Shootings & Firearm Discharges** | Toronto Police Portal | Shooting incidents, injuries, fatalities | Quarterly | Neighbourhood |
| **Building Permits** | Toronto Open Data | New construction, permits by type | Monthly | Address-level |
| **Schools** | Toronto Open Data | Public/Catholic, elementary/secondary | Annual | Point |
| **TTC Routes & Stops** | Toronto Open Data | Route geometry, stop locations | Static | Route/Stop |
| **Licensed Child Care Centres** | Toronto Open Data | Capacity, ages served, locations | Annual | Point |
### Tier 2 Measures to Extract
**From MCI Details:**
- Breakdown by crime type (assault, B&E, auto theft, robbery, theft over)
**From Shootings:**
- Shooting incidents count
- Injuries/fatalities
**From Building Permits:**
- New construction permits (trailing 12 months)
- Permit types distribution
**From Schools:**
- Schools per 1000 children
- School type breakdown
**From TTC:**
- Transit stops within neighbourhood
- Transit accessibility score
**From Child Care:**
- Child care spaces per capita
- Coverage by age group
---
## Part 4: Data Sources by Thematic Group
### GROUP A: Housing & Affordability
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Neighbourhood Profiles (Housing) | 1 | Avg rent, ownership %, dwelling types, housing costs as % of income | Every 5 years |
| CMHC Rental Market Survey | 1 | Avg rent by bedroom, vacancy rate, rental universe | Annual |
| Building Permits | 2 | New construction, permits by type | Monthly |
**Calculated Metrics:**
- Rent-to-Income Ratio (CMHC rent ÷ Census income)
- Affordability Index (% of income spent on housing)
---
### GROUP B: Safety & Crime
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Neighbourhood Crime Rates | 1 | MCI rates per 100K pop by year | Annual |
| Major Crime Indicators (MCI) | 2 | Assault, B&E, auto theft, robbery, theft over | Quarterly |
| Shootings & Firearm Discharges | 2 | Shooting incidents, injuries, fatalities | Quarterly |
**Calculated Metrics:**
- Year-over-year crime change %
- Crime type distribution
---
### GROUP C: Demographics & Community
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Neighbourhood Profiles (Demographics) | 1 | Age distribution, household composition, income | Every 5 years |
| Neighbourhood Profiles (Immigration) | 1 | Immigration status, visible minorities, languages | Every 5 years |
| Neighbourhood Profiles (Education) | 1 | Education attainment, field of study | Every 5 years |
| Neighbourhood Profiles (Labour) | 1 | Employment rate, occupation, industry | Every 5 years |
---
### GROUP D: Transportation & Mobility
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Commute Mode (Census) | 1 | % car, transit, walk, bike | Every 5 years |
| TTC Routes & Stops | 2 | Route geometry, stop locations | Static |
**Calculated Metrics:**
- Transit accessibility (stops within 500m of neighbourhood centroid)
---
### GROUP E: Amenities & Services
| Dataset | Tier | Measures | Update Freq |
|---------|------|----------|-------------|
| Parks | 1 | Park locations, area, type | Annual |
| Schools | 2 | Public/Catholic, elementary/secondary | Annual |
| Licensed Child Care Centres | 2 | Capacity, ages served | Annual |
**Calculated Metrics:**
- Park area per capita
- Schools per 1000 children (ages 5-17)
- Child care spaces per 1000 children (ages 0-4)
---
## Part 5: Tab Structure
### Tab Architecture
```
┌────────────────────────────────────────────────────────────────┐
│ [Overview] [Housing] [Safety] [Demographics] [Amenities] │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────┐ ┌────────────────┐ │
│ │ │ │ KPI Card 1 │ │
│ │ CHOROPLETH MAP │ ├────────────────┤ │
│ │ (158 Neighbourhoods) │ │ KPI Card 2 │ │
│ │ │ ├────────────────┤ │
│ │ Click to select │ │ KPI Card 3 │ │
│ │ │ └────────────────┘ │
│ └─────────────────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Supporting Chart 1 │ │ Supporting Chart 2 │ │
│ │ (Context/Trend) │ │ (Comparison/Rank) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
│ [Neighbourhood: Selected Name] ──────────────────────── │
│ Details panel with all metrics for selected area │
└────────────────────────────────────────────────────────────────┘
```
---
### Tab 1: Overview (Default Landing)
**Story:** "How do Toronto neighbourhoods compare across key livability metrics?"
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Composite livability score | Calculated from weighted metrics |
| KPI Cards | Population, Median Income, Avg Crime Rate | Neighbourhood Profiles, Crime Rates |
| Chart 1 | Top 10 / Bottom 10 by livability score | Calculated |
| Chart 2 | Income vs Crime scatter plot | Neighbourhood Profiles, Crime Rates |
**Metric Selector:** Allow user to change map colour by any single metric.
---
### Tab 2: Housing & Affordability
**Story:** "Where can you afford to live, and what's being built?"
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Rent-to-Income Ratio (Affordability Index) | CMHC + Census income |
| KPI Cards | Median Rent (1BR), Vacancy Rate, New Permits (12mo) | CMHC, Building Permits |
| Chart 1 | Rent trend (5-year line chart by bedroom) | CMHC historical |
| Chart 2 | Dwelling type breakdown (pie/bar) | Neighbourhood Profiles |
**Metric Selector:** Toggle between rent, ownership %, dwelling types.
---
### Tab 3: Safety
**Story:** "How safe is each neighbourhood, and what crimes are most common?"
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Total MCI Rate per 100K | Crime Rates |
| KPI Cards | Total Crimes, YoY Change %, Shooting Incidents | Crime Rates, Shootings |
| Chart 1 | Crime type breakdown (stacked bar) | MCI Details |
| Chart 2 | 5-year crime trend (line chart) | Crime Rates historical |
**Metric Selector:** Toggle between total crime, specific crime types, shootings.
---
### Tab 4: Demographics
**Story:** "Who lives here? Age, income, diversity."
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Median Household Income | Neighbourhood Profiles |
| KPI Cards | Population, % Immigrant, Unemployment Rate | Neighbourhood Profiles |
| Chart 1 | Age distribution (population pyramid or bar) | Neighbourhood Profiles |
| Chart 2 | Top languages spoken (horizontal bar) | Neighbourhood Profiles |
**Metric Selector:** Income, immigrant %, age groups, education.
---
### Tab 5: Amenities & Services
**Story:** "What's nearby? Parks, schools, child care, transit."
| Element | Content | Data Source |
|---------|---------|-------------|
| Map Colour | Park Area per Capita | Parks + Population |
| KPI Cards | Parks Count, Schools Count, Child Care Spaces | Multiple datasets |
| Chart 1 | Amenity density comparison (radar or bar) | Calculated |
| Chart 2 | Transit accessibility (stops within 500m) | TTC Stops |
**Metric Selector:** Parks, schools, child care, transit access.
---
## Part 6: Data Pipeline Architecture
### ETL Flow
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ DATA SOURCES │ │ STAGING LAYER │ │ MART LAYER │
│ │ │ │ │ │
│ Toronto Open │────▶│ stg_geography │────▶│ dim_neighbourhood│
│ Data Portal │ │ stg_census │ │ fact_crime │
│ │ │ stg_crime │ │ fact_housing │
│ CMHC Portal │────▶│ stg_rental │ │ fact_amenities │
│ │ │ stg_permits │ │ │
│ Toronto Police │────▶│ stg_amenities │ │ agg_dashboard │
│ Portal │ │ stg_childcare │ │ (pre-computed) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
### Key Transformations
| Transformation | Description |
|----------------|-------------|
| **Geography Standardization** | Ensure all datasets use `neighbourhood_id` (AREA_ID from GeoJSON) |
| **Census Pivot** | Neighbourhood Profiles is wide format — pivot to metrics per neighbourhood |
| **CMHC Zone Mapping** | Create crosswalk from 15 CMHC zones to 158 neighbourhoods |
| **Amenity Aggregation** | Spatial join point data (schools, parks, child care) to neighbourhood polygons |
| **Rate Calculations** | Normalize counts to per-capita or per-100K |
### Data Refresh Schedule
| Layer | Frequency | Trigger |
|-------|-----------|---------|
| Staging (API pulls) | Weekly | Scheduled job |
| Marts (transforms) | Weekly | Post-staging |
| Dashboard cache | On-demand | User refresh button |
---
## Part 7: Technical Stack
### Core Stack
| Component | Technology | Rationale |
|-----------|------------|-----------|
| **Frontend** | Plotly Dash | Production-ready, rapid iteration |
| **Mapping** | Plotly `choropleth_mapbox` | Native Dash integration |
| **Data Store** | PostgreSQL + PostGIS | Spatial queries, existing expertise |
| **ETL** | Python (Pandas, SQLAlchemy) | Existing stack |
| **Deployment** | Render / Railway | Free tier, easy Dash hosting |
### Alternative (Portfolio Stretch)
| Component | Technology | Why Consider |
|-----------|------------|--------------|
| **Frontend** | React + deck.gl | More "modern" for portfolio |
| **Data Store** | DuckDB | Serverless, embeddable |
| **ETL** | dbt | Aligns with skills roadmap |
---
## Appendix A: Data Source URLs
| Source | URL |
|--------|-----|
| Toronto Open Data — Neighbourhoods | https://open.toronto.ca/dataset/neighbourhoods/ |
| Toronto Open Data — Neighbourhood Profiles | https://open.toronto.ca/dataset/neighbourhood-profiles/ |
| Toronto Police — Neighbourhood Crime Rates | https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-open-data |
| Toronto Police — MCI | https://data.torontopolice.on.ca/datasets/major-crime-indicators-open-data |
| Toronto Police — Shootings | https://data.torontopolice.on.ca/datasets/shootings-firearm-discharges-open-data |
| CMHC Rental Market Survey | https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market |
| Toronto Open Data — Parks | https://open.toronto.ca/dataset/parks/ |
| Toronto Open Data — Schools | https://open.toronto.ca/dataset/school-locations-all-types/ |
| Toronto Open Data — Building Permits | https://open.toronto.ca/dataset/building-permits-cleared-permits/ |
| Toronto Open Data — Child Care | https://open.toronto.ca/dataset/licensed-child-care-centres/ |
| Toronto Open Data — TTC Routes | https://open.toronto.ca/dataset/ttc-routes-and-schedules/ |
---
## Appendix B: Colour Palettes
### Affordability (Diverging)
| Status | Hex | Usage |
|--------|-----|-------|
| Affordable (<30% income) | `#2ecc71` | Green |
| Stretched (30-50%) | `#f1c40f` | Yellow |
| Unaffordable (>50%) | `#e74c3c` | Red |
### Safety (Sequential)
| Status | Hex | Usage |
|--------|-----|-------|
| Safest (lowest crime) | `#27ae60` | Dark green |
| Moderate | `#f39c12` | Orange |
| Highest Crime | `#c0392b` | Dark red |
### Demographics — Income (Sequential)
| Level | Hex | Usage |
|-------|-----|-------|
| Highest Income | `#1a5276` | Dark blue |
| Mid Income | `#5dade2` | Light blue |
| Lowest Income | `#ecf0f1` | Light gray |
### General Recommendation
Use **Viridis** or **Plasma** colorscales for perceptually uniform gradients on continuous metrics.
---
## Appendix C: Glossary
| Term | Definition |
|------|------------|
| **MCI** | Major Crime Indicators — Assault, B&E, Auto Theft, Robbery, Theft Over |
| **CMHC Zone** | Canada Mortgage and Housing Corporation rental market survey zones (15 in Toronto) |
| **Rent-to-Income Ratio** | Monthly rent ÷ monthly household income; <30% is considered affordable |
| **PostGIS** | PostgreSQL extension for geographic data |
| **Choropleth** | Thematic map where areas are shaded based on a statistical variable |
---
## Appendix D: Interview Talking Points
When discussing this project in interviews, emphasize:
1. **Data Engineering:** "I built a multi-source ETL pipeline that standardizes geographic keys across Census data, police data, and CMHC rental surveys—three different granularities I had to reconcile."
2. **Dimensional Modeling:** "The data model follows star schema patterns with a central neighbourhood dimension table and fact tables for crime, housing, and amenities."
3. **dbt Patterns:** "The transformation layer uses staging → intermediate → mart patterns, which I've documented for maintainability."
4. **Business Value:** "The dashboard answers questions like 'Where can a young professional afford to live that's safe and has good transit?' — turning raw data into actionable insights."
5. **Technical Decisions:** "I chose Plotly Dash over a React frontend because it let me iterate faster while maintaining production-quality interactivity. For a portfolio piece, speed to working demo matters."
---
*Document Version: 1.0*
*Created: January 2026*
*Author: Leo Miranda / Claude*