From 4818c53fd29a0ef2dba736a8d89d4e90aa251ff8 Mon Sep 17 00:00:00 2001 From: lmiranda Date: Sat, 17 Jan 2026 12:27:25 -0500 Subject: [PATCH] docs: Rewrite documentation with accurate project state - Delete obsolete change proposals and bio content source - Rewrite README.md with correct features, data sources, structure - Update PROJECT_REFERENCE.md with accurate status and completed work - Update CLAUDE.md references and sprint status - Add docs/CONTRIBUTING.md developer guide with: - How to add blog posts (frontmatter, markdown) - How to add new pages (Dash routing) - How to add dashboard tabs - How to create figure factories - Branch workflow and code standards Co-Authored-By: Claude Opus 4.5 --- CLAUDE.md | 13 +- README.md | 139 +++-- docs/CONTRIBUTING.md | 480 ++++++++++++++++ docs/PROJECT_REFERENCE.md | 519 ++++++++---------- docs/bio_content_v2.md | 134 ----- .../Change-Toronto-Analysis-Reviewed.md | 276 ---------- docs/changes/Change-Toronto-Analysis.md | 423 -------------- 7 files changed, 794 insertions(+), 1190 deletions(-) create mode 100644 docs/CONTRIBUTING.md delete mode 100644 docs/bio_content_v2.md delete mode 100644 docs/changes/Change-Toronto-Analysis-Reviewed.md delete mode 100644 docs/changes/Change-Toronto-Analysis.md diff --git a/CLAUDE.md b/CLAUDE.md index 8a51f15..642e9dd 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,8 +6,8 @@ Working context for Claude Code on the Analytics Portfolio project. ## Project Status -**Current Sprint**: 9 (Neighbourhood Dashboard Transition) - **COMPLETE** -**Phase**: Toronto Neighbourhood Dashboard - Phase 6 & 7 Done +**Last Completed Sprint**: 9 (Neighbourhood Dashboard Transition) +**Current State**: Ready for deployment sprint or new features **Branch**: `development` (feature branches merge here) --- @@ -121,6 +121,7 @@ portfolio_app/ │ └── toronto/ │ ├── dashboard.py # Dashboard -> /toronto │ ├── methodology.py # Methodology -> /toronto/methodology +│ ├── tabs/ # 5 tab layouts (overview, housing, safety, demographics, amenities) │ └── callbacks/ # Dashboard interactions ├── components/ # Shared UI (sidebar, cards, controls) │ ├── metric_card.py # KPI card component @@ -267,9 +268,9 @@ All scripts in `scripts/`: | Document | Location | Use When | |----------|----------|----------| -| Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions | -| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification | -| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning | +| Project reference | `docs/PROJECT_REFERENCE.md` | Architecture decisions, completed work | +| Developer guide | `docs/CONTRIBUTING.md` | How to add pages, blog posts, tabs | +| Lessons learned | `docs/project-lessons-learned/INDEX.md` | Past issues and solutions | --- @@ -340,4 +341,4 @@ Every Gitea issue should include: --- -*Last Updated: Sprint 9* +*Last Updated: January 2026 (Post-Sprint 9)* diff --git a/README.md b/README.md index c944d5d..2f5c1b7 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,42 @@ # Analytics Portfolio -A data analytics portfolio showcasing end-to-end data engineering, visualization, and analysis capabilities. +A personal portfolio website showcasing data engineering and visualization capabilities, featuring an interactive Toronto Neighbourhood Dashboard. -## Projects +## Live Pages -### Toronto Housing Dashboard +| Route | Page | Description | +|-------|------|-------------| +| `/` | Home | Bio landing page | +| `/about` | About | Background and experience | +| `/projects` | Projects | Portfolio project showcase | +| `/resume` | Resume | Professional CV | +| `/contact` | Contact | Contact form | +| `/blog` | Blog | Technical articles | +| `/blog/{slug}` | Article | Individual blog posts | +| `/toronto` | Toronto Dashboard | Neighbourhood analysis (5 tabs) | +| `/toronto/methodology` | Methodology | Dashboard data sources and methods | +| `/health` | Health | API health check endpoint | -An interactive choropleth dashboard analyzing Toronto's housing market using multi-source data integration. +## Toronto Neighbourhood Dashboard -**Features:** -- Purchase market analysis from TRREB monthly reports -- Rental market analysis from CMHC annual surveys -- Interactive choropleth maps by district/zone -- Time series visualization with policy event annotations -- Purchase/Rental mode toggle +An interactive choropleth dashboard analyzing Toronto's 158 official neighbourhoods across five dimensions: -**Data Sources:** -- [TRREB Market Watch](https://trreb.ca/market-data/market-watch/) - Monthly purchase statistics -- [CMHC Rental Market Survey](https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market) - Annual rental data +- **Overview**: Composite livability scores, income vs safety scatter +- **Housing**: Affordability index, rent trends, dwelling types +- **Safety**: Crime rates, breakdowns by type, trend analysis +- **Demographics**: Income distribution, age pyramids, population density +- **Amenities**: Parks, schools, transit accessibility -**Tech Stack:** -- Python 3.11+ / Dash / Plotly -- PostgreSQL + PostGIS -- dbt for data transformation -- Pydantic for validation -- SQLAlchemy 2.0 +**Data Sources**: +- City of Toronto Open Data Portal (neighbourhoods, census profiles, amenities) +- Toronto Police Service (crime statistics) +- CMHC Rental Market Survey (rental data by zone) ## Quick Start ```bash # Clone and setup -git clone https://github.com/lmiranda/personal-portfolio.git +git clone https://gitea.hotserv.cloud/lmiranda/personal-portfolio.git cd personal-portfolio # Install dependencies and configure environment @@ -55,48 +61,72 @@ portfolio_app/ ├── app.py # Dash app factory ├── config.py # Pydantic settings ├── pages/ -│ ├── home.py # Bio landing page (/) -│ └── toronto/ # Toronto dashboard (/toronto) +│ ├── home.py # Bio landing (/) +│ ├── about.py # About page +│ ├── contact.py # Contact form +│ ├── projects.py # Project showcase +│ ├── resume.py # Resume/CV +│ ├── blog/ # Blog system +│ │ ├── index.py # Article listing +│ │ └── article.py # Article renderer +│ └── toronto/ # Toronto dashboard +│ ├── dashboard.py # Main layout with tabs +│ ├── methodology.py # Data documentation +│ ├── tabs/ # Tab layouts (5) +│ └── callbacks/ # Interaction logic ├── components/ # Shared UI components ├── figures/ # Plotly figure factories -└── toronto/ # Toronto data logic - ├── parsers/ # PDF/CSV extraction - ├── loaders/ # Database operations - ├── schemas/ # Pydantic models - └── models/ # SQLAlchemy ORM +├── content/ +│ └── blog/ # Markdown blog articles +├── toronto/ # Toronto data logic +│ ├── parsers/ # API data extraction +│ ├── loaders/ # Database operations +│ ├── schemas/ # Pydantic models +│ └── models/ # SQLAlchemy ORM +└── errors/ # Exception handling dbt/ ├── models/ -│ ├── staging/ # 1:1 source tables -│ ├── intermediate/ # Business logic -│ └── marts/ # Analytical tables +│ ├── staging/ # 1:1 source tables +│ ├── intermediate/ # Business logic +│ └── marts/ # Analytical tables + +notebooks/ # Data documentation (15 notebooks) +├── overview/ # Overview tab visualizations +├── housing/ # Housing tab visualizations +├── safety/ # Safety tab visualizations +├── demographics/ # Demographics tab visualizations +└── amenities/ # Amenities tab visualizations + +docs/ +├── PROJECT_REFERENCE.md # Architecture reference +├── CONTRIBUTING.md # Developer guide +└── project-lessons-learned/ ``` +## Tech Stack + +| Layer | Technology | +|-------|------------| +| Database | PostgreSQL 16 + PostGIS | +| Validation | Pydantic 2.x | +| ORM | SQLAlchemy 2.x | +| Transformation | dbt-postgres | +| Data Processing | Pandas, GeoPandas | +| Visualization | Dash + Plotly | +| UI Components | dash-mantine-components | +| Testing | pytest | +| Python | 3.11+ | + ## Development ```bash -make test # Run tests -make lint # Run linter +make test # Run pytest +make lint # Run ruff linter make format # Format code make ci # Run all checks -``` - -## Data Pipeline - -``` -Raw Files (PDF/Excel) - ↓ -Parsers (pdfplumber, pandas) - ↓ -Pydantic Validation - ↓ -SQLAlchemy Loaders - ↓ -PostgreSQL + PostGIS - ↓ -dbt Transformations - ↓ -Dash Visualization +make dbt-run # Run dbt models +make dbt-test # Run dbt tests ``` ## Environment Variables @@ -109,12 +139,19 @@ POSTGRES_USER=portfolio POSTGRES_PASSWORD= POSTGRES_DB=portfolio DASH_DEBUG=true +SECRET_KEY= ``` +## Documentation + +- **For developers**: See `docs/CONTRIBUTING.md` for setup and contribution guidelines +- **For Claude Code**: See `CLAUDE.md` for AI assistant context +- **Architecture**: See `docs/PROJECT_REFERENCE.md` for technical details + ## License MIT ## Author -Leo Miranda - [GitHub](https://github.com/lmiranda) | [LinkedIn](https://linkedin.com/in/yourprofile) +Leo Miranda diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md new file mode 100644 index 0000000..d4a0816 --- /dev/null +++ b/docs/CONTRIBUTING.md @@ -0,0 +1,480 @@ +# Developer Guide + +Instructions for contributing to the Analytics Portfolio project. + +--- + +## Table of Contents + +1. [Development Setup](#development-setup) +2. [Adding a Blog Post](#adding-a-blog-post) +3. [Adding a New Page](#adding-a-new-page) +4. [Adding a Dashboard Tab](#adding-a-dashboard-tab) +5. [Creating Figure Factories](#creating-figure-factories) +6. [Branch Workflow](#branch-workflow) +7. [Code Standards](#code-standards) + +--- + +## Development Setup + +### Prerequisites + +- Python 3.11+ (via pyenv) +- Docker and Docker Compose +- Git + +### Initial Setup + +```bash +# Clone repository +git clone https://gitea.hotserv.cloud/lmiranda/personal-portfolio.git +cd personal-portfolio + +# Run setup (creates venv, installs deps, copies .env.example) +make setup + +# Start PostgreSQL + PostGIS +make docker-up + +# Initialize database +make db-init + +# Start development server +make run +``` + +The app runs at `http://localhost:8050`. + +### Useful Commands + +```bash +make test # Run tests +make lint # Check code style +make format # Auto-format code +make ci # Run all checks (lint + test) +make dbt-run # Run dbt transformations +make dbt-test # Run dbt tests +``` + +--- + +## Adding a Blog Post + +Blog posts are Markdown files with YAML frontmatter, stored in `portfolio_app/content/blog/`. + +### Step 1: Create the Markdown File + +Create a new file in `portfolio_app/content/blog/`: + +```bash +touch portfolio_app/content/blog/your-article-slug.md +``` + +The filename becomes the URL slug: `/blog/your-article-slug` + +### Step 2: Add Frontmatter + +Every blog post requires YAML frontmatter at the top: + +```markdown +--- +title: "Your Article Title" +date: "2026-01-17" +description: "A brief description for the article card (1-2 sentences)" +tags: + - data-engineering + - python + - lessons-learned +status: published +--- + +Your article content starts here... +``` + +**Required fields:** + +| Field | Description | +|-------|-------------| +| `title` | Article title (displayed on cards and page) | +| `date` | Publication date in `YYYY-MM-DD` format | +| `description` | Short summary for article listing cards | +| `tags` | List of tags (displayed as badges) | +| `status` | `published` or `draft` (drafts are hidden from listing) | + +### Step 3: Write Content + +Use standard Markdown: + +```markdown +## Section Heading + +Regular paragraph text. + +### Subsection + +- Bullet points +- Another point + +```python +# Code blocks with syntax highlighting +def example(): + return "Hello" +``` + +**Bold text** and *italic text*. + +> Blockquotes for callouts +``` + +### Step 4: Test Locally + +```bash +make run +``` + +Visit `http://localhost:8050/blog` to see the article listing. +Visit `http://localhost:8050/blog/your-article-slug` for the full article. + +### Example: Complete Blog Post + +```markdown +--- +title: "Building ETL Pipelines with Python" +date: "2026-01-17" +description: "Lessons from building production data pipelines at scale" +tags: + - python + - etl + - data-engineering +status: published +--- + +When I started building data pipelines, I made every mistake possible... + +## The Problem + +Most tutorials show toy examples. Real pipelines are different. + +### Error Handling + +```python +def safe_transform(df: pd.DataFrame) -> pd.DataFrame: + try: + return df.apply(transform_row, axis=1) + except ValueError as e: + logger.error(f"Transform failed: {e}") + raise +``` + +## Conclusion + +Ship something that works, then iterate. +``` + +--- + +## Adding a New Page + +Pages use Dash's automatic routing based on file location in `portfolio_app/pages/`. + +### Step 1: Create the Page File + +```bash +touch portfolio_app/pages/your_page.py +``` + +### Step 2: Register the Page + +Every page must call `dash.register_page()`: + +```python +"""Your page description.""" + +import dash +import dash_mantine_components as dmc + +dash.register_page( + __name__, + path="/your-page", # URL path + name="Your Page", # Display name (for nav) + title="Your Page Title" # Browser tab title +) + + +def layout() -> dmc.Container: + """Page layout function.""" + return dmc.Container( + dmc.Stack( + [ + dmc.Title("Your Page", order=1), + dmc.Text("Page content here."), + ], + gap="lg", + ), + size="md", + py="xl", + ) +``` + +### Step 3: Page with Dynamic Content + +For pages with URL parameters: + +```python +# pages/blog/article.py +dash.register_page( + __name__, + path_template="/blog/", # Dynamic parameter + name="Article", +) + + +def layout(slug: str = "") -> dmc.Container: + """Layout receives URL parameters as arguments.""" + article = get_article(slug) + if not article: + return dmc.Text("Article not found") + + return dmc.Container( + dmc.Title(article["meta"]["title"]), + # ... + ) +``` + +### Step 4: Add Navigation (Optional) + +To add the page to the sidebar, edit `portfolio_app/components/sidebar.py`: + +```python +NAV_ITEMS = [ + {"label": "Home", "href": "/", "icon": "tabler:home"}, + {"label": "Your Page", "href": "/your-page", "icon": "tabler:star"}, + # ... +] +``` + +### URL Routing Summary + +| File Location | URL | +|---------------|-----| +| `pages/home.py` | `/` (if `path="/"`) | +| `pages/about.py` | `/about` | +| `pages/blog/index.py` | `/blog` | +| `pages/blog/article.py` | `/blog/` | +| `pages/toronto/dashboard.py` | `/toronto` | + +--- + +## Adding a Dashboard Tab + +Dashboard tabs are in `portfolio_app/pages/toronto/tabs/`. + +### Step 1: Create Tab Layout + +```python +# pages/toronto/tabs/your_tab.py +"""Your tab description.""" + +import dash_mantine_components as dmc + +from portfolio_app.figures.choropleth import create_choropleth +from portfolio_app.toronto.demo_data import get_demo_data + + +def create_your_tab_layout() -> dmc.Stack: + """Create the tab layout.""" + data = get_demo_data() + + return dmc.Stack( + [ + dmc.Grid( + [ + dmc.GridCol( + # Map on left + create_choropleth(data, "your_metric"), + span=8, + ), + dmc.GridCol( + # KPI cards on right + create_kpi_cards(data), + span=4, + ), + ], + ), + # Charts below + create_supporting_charts(data), + ], + gap="lg", + ) +``` + +### Step 2: Register in Dashboard + +Edit `pages/toronto/dashboard.py` to add the tab: + +```python +from portfolio_app.pages.toronto.tabs.your_tab import create_your_tab_layout + +# In the tabs list: +dmc.TabsTab("Your Tab", value="your-tab"), + +# In the panels: +dmc.TabsPanel(create_your_tab_layout(), value="your-tab"), +``` + +--- + +## Creating Figure Factories + +Figure factories are in `portfolio_app/figures/`. They create reusable Plotly figures. + +### Pattern + +```python +# figures/your_chart.py +"""Your chart type factory.""" + +import plotly.express as px +import plotly.graph_objects as go +import pandas as pd + + +def create_your_chart( + df: pd.DataFrame, + x_col: str, + y_col: str, + title: str = "", +) -> go.Figure: + """Create a your_chart figure. + + Args: + df: DataFrame with data. + x_col: Column for x-axis. + y_col: Column for y-axis. + title: Optional chart title. + + Returns: + Configured Plotly figure. + """ + fig = px.bar(df, x=x_col, y=y_col, title=title) + + fig.update_layout( + template="plotly_white", + margin=dict(l=40, r=40, t=40, b=40), + ) + + return fig +``` + +### Export from `__init__.py` + +```python +# figures/__init__.py +from .your_chart import create_your_chart + +__all__ = [ + "create_your_chart", + # ... +] +``` + +--- + +## Branch Workflow + +``` +main (production) + ↑ +staging (pre-production) + ↑ +development (integration) + ↑ +feature/XX-description (your work) +``` + +### Creating a Feature Branch + +```bash +# Start from development +git checkout development +git pull origin development + +# Create feature branch +git checkout -b feature/10-add-new-page + +# Work, commit, push +git add . +git commit -m "feat: Add new page" +git push -u origin feature/10-add-new-page +``` + +### Merging + +```bash +# Merge into development +git checkout development +git merge feature/10-add-new-page +git push origin development + +# Delete feature branch +git branch -d feature/10-add-new-page +git push origin --delete feature/10-add-new-page +``` + +**Rules:** +- Never commit directly to `main` or `staging` +- Never delete `development` +- Feature branches are temporary + +--- + +## Code Standards + +### Type Hints + +Use Python 3.10+ style: + +```python +def process(items: list[str], config: dict[str, int] | None = None) -> bool: + ... +``` + +### Imports + +| Context | Style | +|---------|-------| +| Same directory | `from .module import X` | +| Sibling directory | `from ..schemas.model import Y` | +| External packages | `import pandas as pd` | + +### Formatting + +```bash +make format # Runs ruff formatter +make lint # Checks style +``` + +### Docstrings + +Google style, only for non-obvious functions: + +```python +def calculate_score(values: list[float], weights: list[float]) -> float: + """Calculate weighted score. + + Args: + values: Raw metric values. + weights: Weight for each metric. + + Returns: + Weighted average score. + """ + ... +``` + +--- + +## Questions? + +Check `CLAUDE.md` for AI assistant context and architectural decisions. diff --git a/docs/PROJECT_REFERENCE.md b/docs/PROJECT_REFERENCE.md index 43d2735..92ca37f 100644 --- a/docs/PROJECT_REFERENCE.md +++ b/docs/PROJECT_REFERENCE.md @@ -1,21 +1,171 @@ # Portfolio Project Reference **Project**: Analytics Portfolio -**Owner**: Leo -**Status**: Ready for Sprint 1 +**Owner**: Leo Miranda +**Status**: Sprint 9 Complete (Dashboard Implementation Done) +**Last Updated**: January 2026 --- ## Project Overview -Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities. +Personal portfolio website with an interactive Toronto Neighbourhood Dashboard demonstrating data engineering, visualization, and analytics capabilities. -| Project | Domain | Key Skills | Phase | -|---------|--------|------------|-------| -| **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) | -| **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) | +| Component | Description | Status | +|-----------|-------------|--------| +| Portfolio Website | Bio, About, Projects, Resume, Contact, Blog | Complete | +| Toronto Dashboard | 5-tab neighbourhood analysis | Complete | +| Data Pipeline | dbt models, figure factories | Complete | +| Deployment | Production deployment | Pending | -**Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards). +--- + +## Completed Work + +### Sprint 1-6: Foundation +- Repository setup, Docker, PostgreSQL + PostGIS +- Bio landing page implementation +- Initial data model design + +### Sprint 7: Navigation & Theme +- Sidebar navigation +- Dark/light theme toggle +- dash-mantine-components integration + +### Sprint 8: Portfolio Website +- About, Contact, Projects, Resume pages +- Blog system with Markdown/frontmatter +- Health endpoint + +### Sprint 9: Neighbourhood Dashboard Transition +- Phase 1: Deleted legacy TRREB code +- Phase 2: Documentation cleanup +- Phase 3: New neighbourhood-centric data model +- Phase 4: dbt model restructuring +- Phase 5: 5-tab dashboard implementation +- Phase 6: 15 documentation notebooks +- Phase 7: Final documentation review + +--- + +## Application Architecture + +### URL Routes + +| URL | Page | File | +|-----|------|------| +| `/` | Home | `pages/home.py` | +| `/about` | About | `pages/about.py` | +| `/contact` | Contact | `pages/contact.py` | +| `/projects` | Projects | `pages/projects.py` | +| `/resume` | Resume | `pages/resume.py` | +| `/blog` | Blog listing | `pages/blog/index.py` | +| `/blog/{slug}` | Article | `pages/blog/article.py` | +| `/toronto` | Dashboard | `pages/toronto/dashboard.py` | +| `/toronto/methodology` | Methodology | `pages/toronto/methodology.py` | +| `/health` | Health check | `pages/health.py` | + +### Directory Structure + +``` +portfolio_app/ +├── app.py # Dash app factory +├── config.py # Pydantic BaseSettings +├── assets/ # CSS, images +├── callbacks/ # Global callbacks (sidebar, theme) +├── components/ # Shared UI components +├── content/blog/ # Markdown blog articles +├── errors/ # Exception handling +├── figures/ # Plotly figure factories +├── pages/ +│ ├── home.py +│ ├── about.py +│ ├── contact.py +│ ├── projects.py +│ ├── resume.py +│ ├── health.py +│ ├── blog/ +│ │ ├── index.py +│ │ └── article.py +│ └── toronto/ +│ ├── dashboard.py +│ ├── methodology.py +│ ├── tabs/ # 5 tab layouts +│ └── callbacks/ # Dashboard interactions +├── toronto/ # Data logic +│ ├── parsers/ # API extraction +│ ├── loaders/ # Database operations +│ ├── schemas/ # Pydantic models +│ ├── models/ # SQLAlchemy ORM +│ └── demo_data.py # Sample data +└── utils/ + └── markdown_loader.py # Blog article loading +``` + +--- + +## Toronto Dashboard + +### Data Sources + +| Source | Data | Format | +|--------|------|--------| +| City of Toronto Open Data | Neighbourhoods (158), Census profiles, Parks, Schools, Childcare, TTC | GeoJSON, CSV, API | +| Toronto Police Service | Crime rates, MCI, Shootings | CSV, API | +| CMHC | Rental Market Survey | CSV | + +### Geographic Model + +``` +City of Toronto Neighbourhoods (158) ← Primary analysis unit +CMHC Zones (~20) ← Rental data (Census Tract aligned) +``` + +### Dashboard Tabs + +| Tab | Choropleth Metric | Supporting Charts | +|-----|-------------------|-------------------| +| Overview | Livability score | Top/Bottom 10 bar, Income vs Safety scatter | +| Housing | Affordability index | Rent trend line, Tenure breakdown bar | +| Safety | Crime rate per 100K | Crime breakdown bar, Crime trend line | +| Demographics | Median income | Age distribution, Population density bar | +| Amenities | Amenity index | Amenity radar, Transit accessibility bar | + +### Star Schema + +| Table | Type | Description | +|-------|------|-------------| +| `dim_neighbourhood` | Dimension | 158 neighbourhoods with geometry | +| `dim_time` | Dimension | Date dimension | +| `dim_cmhc_zone` | Dimension | ~20 CMHC zones with geometry | +| `fact_census` | Fact | Census indicators by neighbourhood | +| `fact_crime` | Fact | Crime stats by neighbourhood | +| `fact_rentals` | Fact | Rental data by CMHC zone | +| `fact_amenities` | Fact | Amenity counts by neighbourhood | + +### dbt Layers + +| Layer | Naming | Example | +|-------|--------|---------| +| Staging | `stg_{source}__{entity}` | `stg_toronto__neighbourhoods` | +| Intermediate | `int_{domain}__{transform}` | `int_neighbourhood__demographics` | +| Marts | `mart_{domain}` | `mart_neighbourhood_overview` | + +--- + +## Tech Stack + +| Layer | Technology | Version | +|-------|------------|---------| +| Database | PostgreSQL + PostGIS | 16.x | +| Validation | Pydantic | 2.x | +| ORM | SQLAlchemy | 2.x | +| Transformation | dbt-postgres | 1.7+ | +| Data Processing | Pandas, GeoPandas | Latest | +| Visualization | Dash + Plotly | 2.14+ | +| UI Components | dash-mantine-components | Latest | +| Testing | pytest | 7.0+ | +| Python | 3.11+ | Via pyenv | --- @@ -23,293 +173,51 @@ Two-project analytics portfolio demonstrating end-to-end data engineering, visua | Branch | Purpose | Deploys To | |--------|---------|------------| -| `main` | Production releases only | VPS (production) | +| `main` | Production releases | VPS (production) | | `staging` | Pre-production testing | VPS (staging) | | `development` | Active development | Local only | -**Rules**: -- All feature branches created FROM `development` -- All feature branches merge INTO `development` -- `development` → `staging` for testing -- `staging` → `main` for release -- Direct commits to `main` or `staging` are forbidden -- Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}` +**Rules:** +- Feature branches from `development`: `feature/{sprint}-{description}` +- Merge into `development` when complete +- `development` → `staging` → `main` for releases +- Never delete `development` --- -## Tech Stack (Locked) +## Code Standards -| Layer | Technology | Version | -|-------|------------|---------| -| Database | PostgreSQL + PostGIS | 16.x | -| Validation | Pydantic | ≥2.0 | -| ORM | SQLAlchemy | ≥2.0 (2.0-style API only) | -| Transformation | dbt-postgres | ≥1.7 | -| Data Processing | Pandas | ≥2.1 | -| Geospatial | GeoPandas + Shapely | ≥0.14 | -| Visualization | Dash + Plotly | ≥2.14 | -| UI Components | dash-mantine-components | Latest stable | -| Testing | pytest | ≥7.0 | -| Python | 3.11+ | Via pyenv | +### Type Hints (Python 3.10+) -**Compatibility Notes**: -- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs -- PostGIS extension required—enable during db init -- Docker Compose V2 (no `version` field in compose files) +```python +def process(items: list[str], config: dict[str, int] | None = None) -> bool: + ... +``` ---- +### Imports -## Code Conventions - -### Import Style - -| Context | Style | Example | -|---------|-------|---------| -| Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` | -| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` | -| External packages | Absolute | `import pandas as pd` | - -### Module Separation - -| Directory | Contains | Purpose | -|-----------|----------|---------| -| `schemas/` | Pydantic models | Data validation | -| `models/` | SQLAlchemy ORM | Database persistence | -| `parsers/` | API/CSV extraction | Raw data ingestion | -| `loaders/` | Database operations | Data loading | -| `figures/` | Chart factories | Plotly figure generation | -| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` | -| `errors/` | Exceptions + handlers | Error handling | - -### Code Standards - -- **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`) -- **Functions**: Single responsibility, verb naming, early returns over nesting -- **Docstrings**: Google style, minimal—only for non-obvious behavior -- **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config +| Context | Style | +|---------|-------| +| Same directory | `from .module import X` | +| Sibling directory | `from ..schemas.model import Y` | +| External | `import pandas as pd` | ### Error Handling ```python -# errors/exceptions.py class PortfolioError(Exception): """Base exception.""" class ParseError(PortfolioError): - """PDF/CSV parsing failed.""" + """Data parsing failed.""" class ValidationError(PortfolioError): - """Pydantic or business rule validation failed.""" + """Validation failed.""" class LoadError(PortfolioError): - """Database load operation failed.""" + """Database load failed.""" ``` -- Decorators for infrastructure concerns (logging, retry, transactions) -- Explicit handling for domain logic (business rules, recovery strategies) - ---- - -## Application Architecture - -### Dash Pages Structure - -``` -portfolio_app/ -├── app.py # Dash app factory with Pages routing -├── config.py # Pydantic BaseSettings -├── assets/ # CSS, images (auto-served by Dash) -├── pages/ -│ ├── home.py # Bio landing page → / -│ ├── toronto/ -│ │ ├── dashboard.py # Layout only → /toronto -│ │ └── callbacks/ # Interaction logic -│ └── energy/ # Phase 3 -├── components/ # Shared UI (navbar, footer, cards) -├── figures/ # Shared chart factories -├── toronto/ # Toronto data logic -│ ├── parsers/ -│ ├── loaders/ -│ ├── schemas/ # Pydantic -│ └── models/ # SQLAlchemy -└── errors/ -``` - -### URL Routing (Automatic) - -| URL | Page | Status | -|-----|------|--------| -| `/` | Bio landing page | Sprint 2 | -| `/toronto` | Toronto Housing Dashboard | Sprint 6 | -| `/energy` | Energy Pricing Dashboard | Phase 3 | - ---- - -## Phase 1: Toronto Neighbourhood Dashboard - -### Data Sources - -| Track | Source | Format | Geography | Frequency | -|-------|--------|--------|-----------|-----------| -| Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual | -| Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census | -| Policy Events | Curated list | CSV | N/A | Event-based | - -### Geographic Reality - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit -├─────────────────────────────────────────────────────────────────┤ -│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data -└─────────────────────────────────────────────────────────────────┘ -``` - -### Data Model (Star Schema) - -| Table | Type | Keys | -|-------|------|------| -| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone | -| `dim_time` | Dimension | date_key (PK) | -| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry | -| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry | -| `dim_policy_event` | Dimension | event_id (PK) | - -### dbt Layer Structure - -| Layer | Naming | Purpose | -|-------|--------|---------| -| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed | -| Intermediate | `int_{domain}__{transform}` | Business logic, filtering | -| Marts | `mart_{domain}` | Final analytical tables | - ---- - -## Sprint Overview - -| Sprint | Focus | Milestone | -|--------|-------|-----------| -| 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** | -| 7 | Navigation & theme modernization | — | -| 8 | Portfolio website expansion | **Launch 2: Website Live** | -| 9 | Neighbourhood dashboard transition | Cleanup complete | -| 10+ | Dashboard implementation | **Launch 3: Dashboard Live** | - ---- - -## Scope Boundaries - -### Phase 1 — Build These - -- Bio landing page and portfolio website -- CMHC rental data processor -- Toronto neighbourhood data integration -- PostgreSQL + PostGIS database layer -- Star schema (facts + dimensions) -- dbt models with tests -- Choropleth visualization (Dash) -- Policy event annotation layer - -### Deferred Features - -| Feature | Reason | When | -|---------|--------|------| -| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase | -| ML prediction models | Energy project scope | Phase 3 | -| Multi-project shared infrastructure | Build first, abstract second | Future | - -If a task seems to require deferred features, **stop and flag it**. - ---- - -## File Structure - -### Root-Level Files (Allowed) - -| File | Purpose | -|------|---------| -| `README.md` | Project overview | -| `CLAUDE.md` | AI assistant context | -| `pyproject.toml` | Python packaging | -| `.gitignore` | Git ignore rules | -| `.env.example` | Environment template | -| `.python-version` | pyenv version | -| `.pre-commit-config.yaml` | Pre-commit hooks | -| `docker-compose.yml` | Container orchestration | -| `Makefile` | Task automation | - -### Directory Structure - -``` -portfolio/ -├── portfolio_app/ # Monolithic Dash application -│ ├── app.py -│ ├── config.py -│ ├── assets/ -│ ├── pages/ -│ ├── components/ -│ ├── figures/ -│ ├── toronto/ -│ └── errors/ -├── tests/ -├── dbt/ -├── data/ -│ └── toronto/ -│ ├── raw/ -│ ├── processed/ # gitignored -│ └── reference/ -├── scripts/ -│ ├── db/ -│ ├── docker/ -│ ├── deploy/ -│ ├── dbt/ -│ └── dev/ -├── docs/ -├── notebooks/ -├── backups/ # gitignored -└── reports/ # gitignored -``` - -### Gitignored Directories - -- `data/*/processed/` -- `reports/` -- `backups/` -- `notebooks/*.html` -- `.env` -- `__pycache__/` -- `.venv/` - ---- - -## Makefile Targets - -| Target | Purpose | -|--------|---------| -| `setup` | Install deps, create .env, init pre-commit | -| `docker-up` | Start PostgreSQL + PostGIS | -| `docker-down` | Stop containers | -| `db-init` | Initialize database schema | -| `run` | Start Dash dev server | -| `test` | Run pytest | -| `dbt-run` | Run dbt models | -| `dbt-test` | Run dbt tests | -| `lint` | Run ruff linter | -| `format` | Run ruff formatter | -| `ci` | Run all checks | -| `deploy` | Deploy to production | - ---- - -## Script Standards - -All scripts in `scripts/`: -- Include usage comments at top -- Idempotent where possible -- Exit codes: 0 = success, 1 = error -- Use `set -euo pipefail` for bash -- Log to stdout, errors to stderr - --- ## Environment Variables @@ -328,41 +236,52 @@ LOG_LEVEL=INFO --- -## Success Criteria +## Makefile Targets -### Launch 1 (Bio Live) -- [x] Bio page accessible via HTTPS -- [x] All bio content rendered -- [x] No placeholder text visible -- [x] Mobile responsive -- [x] Social links functional - -### Launch 2 (Website Live) -- [x] Full portfolio website with navigation -- [x] About, Contact, Projects, Resume, Blog pages -- [x] Dark mode theme support -- [x] Sidebar navigation - -### Launch 3 (Dashboard Live) -- [ ] Choropleth renders neighbourhoods and CMHC zones -- [ ] Rental data visualization works -- [ ] Time navigation works -- [ ] Policy event markers visible -- [ ] Methodology documentation published -- [ ] Data sources cited +| Target | Purpose | +|--------|---------| +| `setup` | Install deps, create .env, init pre-commit | +| `docker-up` | Start PostgreSQL + PostGIS | +| `docker-down` | Stop containers | +| `db-init` | Initialize database schema | +| `run` | Start Dash dev server | +| `test` | Run pytest | +| `dbt-run` | Run dbt models | +| `dbt-test` | Run dbt tests | +| `lint` | Run ruff linter | +| `format` | Run ruff formatter | +| `ci` | Run all checks | --- -## Reference Documents +## Next Steps -For detailed specifications, see: +### Deployment (Sprint 10+) +- [ ] Production Docker configuration +- [ ] CI/CD pipeline +- [ ] HTTPS/SSL setup +- [ ] Domain configuration -| Document | Location | Use When | -|----------|----------|----------| -| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification | -| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning | +### Data Enhancement +- [ ] Connect to live APIs (currently using demo data) +- [ ] Data refresh automation +- [ ] Historical data loading + +### Future Projects +- Energy Pricing Analysis dashboard (planned) --- -*Reference Version: 2.0* -*Updated: Sprint 9* +## Related Documents + +| Document | Purpose | +|----------|---------| +| `README.md` | Quick start guide | +| `CLAUDE.md` | AI assistant context | +| `docs/CONTRIBUTING.md` | Developer guide | +| `notebooks/README.md` | Notebook documentation | + +--- + +*Reference Version: 3.0* +*Updated: January 2026* diff --git a/docs/bio_content_v2.md b/docs/bio_content_v2.md deleted file mode 100644 index 1a21860..0000000 --- a/docs/bio_content_v2.md +++ /dev/null @@ -1,134 +0,0 @@ -# Portfolio Bio Content - -**Version**: 2.0 -**Last Updated**: January 2026 -**Purpose**: Content source for `portfolio_app/pages/home.py` - ---- - -## Document Context - -| Attribute | Value | -|-----------|-------| -| **Parent Document** | `portfolio_project_plan_v5.md` | -| **Role** | Bio content and social links for landing page | -| **Consumed By** | `portfolio_app/pages/home.py` | - ---- - -## Headline - -**Primary**: Leo | Data Engineer & Analytics Developer - -**Tagline**: I build data infrastructure that actually gets used. - ---- - -## Professional Summary - -Over the past 5 years, I've designed and evolved an enterprise analytics platform from scratch—now processing 1B+ rows across 21 tables with Python-based ETL pipelines and dbt-style SQL transformations. The result: 40% efficiency gains, 30% reduction in call abandon rates, and dashboards that executives actually open. - -My approach: dimensional modeling (star schema), layered transformations (staging → intermediate → marts), and automation that eliminates manual work. I've built everything from self-service analytics portals to OCR-powered receipt processing systems. - -Currently at Summitt Energy supporting multi-market operations across Canada and 8 US states. Previously cut my teeth on IT infrastructure projects at Petrobras (Fortune 500) and the Project Management Institute. - ---- - -## Tech Stack - -| Category | Technologies | -|----------|--------------| -| **Languages** | Python, SQL | -| **Data Processing** | Pandas, SQLAlchemy, FastAPI | -| **Databases** | PostgreSQL, MSSQL | -| **Visualization** | Power BI, Plotly, Dash | -| **Patterns** | dbt, dimensional modeling, star schema | -| **Other** | Genesys Cloud | - -**Display Format** (for landing page): -``` -Python (Pandas, SQLAlchemy, FastAPI) • SQL (MSSQL, PostgreSQL) • Power BI • Plotly/Dash • Genesys Cloud • dbt patterns -``` - ---- - -## Side Project - -**Bandit Labs** — Building automation and AI tooling for small businesses. - -*Note: Keep this brief on portfolio; link only if separate landing page exists.* - ---- - -## Social Links - -| Platform | URL | Icon | -|----------|-----|------| -| **LinkedIn** | `https://linkedin.com/in/[USERNAME]` | `lucide-react: Linkedin` | -| **GitHub** | `https://github.com/[USERNAME]` | `lucide-react: Github` | - -> **TODO**: Replace `[USERNAME]` placeholders with actual URLs before bio page launch. - ---- - -## Availability Statement - -Open to **Senior Data Analyst**, **Analytics Engineer**, and **BI Developer** opportunities in Toronto or remote. - ---- - -## Portfolio Projects Section - -*Dynamically populated based on deployed projects.* - -| Project | Status | Link | -|---------|--------|------| -| Toronto Housing Dashboard | In Development | `/toronto` | -| Energy Pricing Analysis | Planned | `/energy` | - -**Display Logic**: -- Show only projects with `status = deployed` -- "In Development" projects can show as coming soon or be hidden (user preference) - ---- - -## Implementation Notes - -### Content Hierarchy for `home.py` - -``` -1. Name + Tagline (hero section) -2. Professional Summary (2-3 paragraphs) -3. Tech Stack (horizontal chips or inline list) -4. Portfolio Projects (cards linking to dashboards) -5. Social Links (icon buttons) -6. Availability statement (subtle, bottom) -``` - -### Styling Recommendations - -- Clean, minimal — let the projects speak -- Dark/light mode support via dash-mantine-components theme -- No headshot required (optional) -- Mobile-responsive layout - -### Content Updates - -When updating bio content: -1. Edit this document -2. Update `home.py` to reflect changes -3. Redeploy - ---- - -## Related Documents - -| Document | Relationship | -|----------|--------------| -| `portfolio_project_plan_v5.md` | Parent — references this for bio content | -| `portfolio_app/pages/home.py` | Consumer — implements this content | - ---- - -*Document Version: 2.0* -*Updated: January 2026* diff --git a/docs/changes/Change-Toronto-Analysis-Reviewed.md b/docs/changes/Change-Toronto-Analysis-Reviewed.md deleted file mode 100644 index 4ace6b9..0000000 --- a/docs/changes/Change-Toronto-Analysis-Reviewed.md +++ /dev/null @@ -1,276 +0,0 @@ -# Toronto Neighbourhood Dashboard — Implementation Plan - -**Document Type:** Execution Guide -**Target:** Transition from TRREB-based to Neighbourhood-based Dashboard -**Version:** 2.0 | January 2026 - ---- - -## Overview - -Transition from TRREB district-based housing dashboard to a comprehensive Toronto Neighbourhood Dashboard built around the city's 158 official neighbourhoods. - -**Key Changes:** -- Geographic foundation: TRREB districts (~35) → City Neighbourhoods (158) -- Data sources: PDF parsing → Open APIs (Toronto Open Data, Toronto Police, CMHC) -- Scope: Housing-only → 5 thematic tabs (Overview, Housing, Safety, Demographics, Amenities) - ---- - -## Phase 1: Repository Cleanup - -### Files to DELETE - -| File | Reason | -|------|--------| -| `portfolio_app/toronto/schemas/trreb.py` | TRREB schema obsolete | -| `portfolio_app/toronto/parsers/trreb.py` | PDF parsing no longer needed | -| `portfolio_app/toronto/loaders/trreb.py` | TRREB loading logic obsolete | -| `dbt/models/staging/stg_trreb__purchases.sql` | TRREB staging obsolete | -| `dbt/models/intermediate/int_purchases__monthly.sql` | TRREB intermediate obsolete | -| `dbt/models/marts/mart_toronto_purchases.sql` | Will rebuild for neighbourhood grain | - -### Files to MODIFY (Remove TRREB References) - -| File | Action | -|------|--------| -| `portfolio_app/toronto/schemas/__init__.py` | Remove TRREB imports | -| `portfolio_app/toronto/parsers/__init__.py` | Remove TRREB parser imports | -| `portfolio_app/toronto/loaders/__init__.py` | Remove TRREB loader imports | -| `portfolio_app/toronto/models/facts.py` | Remove `FactPurchases` model | -| `portfolio_app/toronto/models/dimensions.py` | Remove `DimTRREBDistrict` model | -| `portfolio_app/toronto/demo_data.py` | Remove TRREB demo data | -| `dbt/models/sources.yml` | Remove TRREB source definitions | -| `dbt/models/schema.yml` | Remove TRREB model documentation | - -### Files to KEEP (Reusable) - -| File | Why | -|------|-----| -| `portfolio_app/toronto/schemas/cmhc.py` | CMHC data still used | -| `portfolio_app/toronto/parsers/cmhc.py` | Reusable with modifications | -| `portfolio_app/toronto/loaders/base.py` | Generic database utilities | -| `portfolio_app/toronto/loaders/dimensions.py` | Dimension loading patterns | -| `portfolio_app/toronto/models/base.py` | SQLAlchemy base class | -| `portfolio_app/figures/*.py` | All chart factories reusable | -| `portfolio_app/components/*.py` | All UI components reusable | - ---- - -## Phase 2: Documentation Updates - -| Document | Action | -|----------|--------| -| `CLAUDE.md` | Update data model section, mark transition complete | -| `docs/PROJECT_REFERENCE.md` | Update architecture, data sources | -| `docs/toronto_housing_dashboard_spec_v5.md` | Archive or delete | -| `docs/wbs_sprint_plan_v4.md` | Archive or delete | - ---- - -## Phase 3: New Data Model - -### Star Schema (Neighbourhood-Centric) - -| Table | Type | Description | -|-------|------|-------------| -| `dim_neighbourhood` | Central Dimension | 158 neighbourhoods with geometry | -| `dim_time` | Dimension | Date dimension (keep existing) | -| `dim_cmhc_zone` | Bridge Dimension | 15 CMHC zones with neighbourhood mapping | -| `bridge_cmhc_neighbourhood` | Bridge | Zone-to-neighbourhood area weights | -| `fact_census` | Fact | Census indicators by neighbourhood | -| `fact_crime` | Fact | Crime stats by neighbourhood | -| `fact_rentals` | Fact | Rental data by CMHC zone (keep existing) | -| `fact_amenities` | Fact | Amenity counts by neighbourhood | - -### New Schema Files - -| File | Contains | -|------|----------| -| `toronto/schemas/neighbourhood.py` | NeighbourhoodRecord, CensusRecord, CrimeRecord | -| `toronto/schemas/amenities.py` | AmenityType enum, AmenityRecord | - -### New Parser Files - -| File | Data Source | API | -|------|-------------|-----| -| `toronto/parsers/toronto_open_data.py` | Neighbourhoods, Census, Parks, Schools, Childcare | Toronto Open Data Portal | -| `toronto/parsers/toronto_police.py` | Crime Rates, MCI, Shootings | Toronto Police Portal | - -### New Loader Files - -| File | Purpose | -|------|---------| -| `toronto/loaders/neighbourhoods.py` | Load GeoJSON boundaries | -| `toronto/loaders/census.py` | Load neighbourhood profiles | -| `toronto/loaders/crime.py` | Load crime statistics | -| `toronto/loaders/amenities.py` | Load parks, schools, childcare | -| `toronto/loaders/cmhc_crosswalk.py` | Build CMHC-neighbourhood bridge | - ---- - -## Phase 4: dbt Restructuring - -### Staging Layer - -| Model | Source | -|-------|--------| -| `stg_toronto__neighbourhoods` | dim_neighbourhood | -| `stg_toronto__census` | fact_census | -| `stg_toronto__crime` | fact_crime | -| `stg_toronto__amenities` | fact_amenities | -| `stg_cmhc__rentals` | fact_rentals (modify existing) | -| `stg_cmhc__zone_crosswalk` | bridge_cmhc_neighbourhood | - -### Intermediate Layer - -| Model | Purpose | -|-------|---------| -| `int_neighbourhood__demographics` | Combined census demographics | -| `int_neighbourhood__housing` | Housing indicators | -| `int_neighbourhood__crime_summary` | Aggregated crime by type | -| `int_neighbourhood__amenity_scores` | Normalized amenity metrics | -| `int_rentals__neighbourhood_allocated` | CMHC rentals allocated to neighbourhoods | - -### Mart Layer (One per Tab) - -| Model | Tab | Key Metrics | -|-------|-----|-------------| -| `mart_neighbourhood_overview` | Overview | Composite livability score | -| `mart_neighbourhood_housing` | Housing | Affordability index, rent-to-income | -| `mart_neighbourhood_safety` | Safety | Crime rates, YoY change | -| `mart_neighbourhood_demographics` | Demographics | Income, age, diversity | -| `mart_neighbourhood_amenities` | Amenities | Parks, schools, transit per capita | - ---- - -## Phase 5: Dashboard Implementation - -### Tab Structure - -``` -pages/toronto/ -├── dashboard.py # Main layout with tab navigation -├── tabs/ -│ ├── overview.py # Composite livability -│ ├── housing.py # Affordability -│ ├── safety.py # Crime -│ ├── demographics.py # Population -│ └── amenities.py # Services -└── callbacks/ - ├── map_callbacks.py - ├── chart_callbacks.py - └── selection_callbacks.py -``` - -### Layout Pattern (All Tabs) - -Each tab follows the same structure: -1. **Choropleth Map** (left) — 158 neighbourhoods, click to select -2. **KPI Cards** (right) — 3-4 contextual metrics -3. **Supporting Charts** (bottom) — Trend + comparison visualizations -4. **Details Panel** (collapsible) — All metrics for selected neighbourhood - -### Graphs by Tab - -| Tab | Choropleth Metric | Chart 1 | Chart 2 | -|-----|-------------------|---------|---------| -| Overview | Livability score | Top/Bottom 10 bar | Income vs Crime scatter | -| Housing | Affordability index | Rent trend (5yr line) | Dwelling types (pie/bar) | -| Safety | Crime rate per 100K | Crime breakdown (stacked bar) | Crime trend (5yr line) | -| Demographics | Median income | Age pyramid | Top languages (bar) | -| Amenities | Park area per capita | Amenity radar | Transit accessibility (bar) | - ---- - -## Phase 6: Jupyter Notebooks - -### Purpose - -One notebook per graph to document: -1. **Data Reference** — How the data was built (query, transformation steps, sample output) -2. **Data Visualization** — Import figure factory, render the graph - -### Directory Structure - -``` -notebooks/ -├── README.md -├── overview/ -├── housing/ -├── safety/ -├── demographics/ -└── amenities/ -``` - -### Notebook Template - -```markdown -# [Graph Name] - -## 1. Data Reference - -### Source Tables -- List tables/marts used -- Grain of each table - -### Query -```sql -SELECT ... FROM ... -``` - -### Transformation Steps -1. Step description -2. Step description - -### Sample Data -```python -df = pd.read_sql(query, engine) -df.head(10) -``` - -## 2. Data Visualization - -```python -from portfolio_app.figures.choropleth import create_choropleth_figure -fig = create_choropleth_figure(...) -fig.show() -``` -``` - -Create one notebook per graph as each is implemented (15 total across 5 tabs). - ---- - -## Phase 7: Final Documentation Review - -After all implementation, audit and update: - -- [ ] `CLAUDE.md` — Project status, app structure, data model, URL routes -- [ ] `README.md` — Project description, installation, quick start -- [ ] `docs/PROJECT_REFERENCE.md` — Architecture matches implementation -- [ ] Remove or archive legacy spec documents - ---- - -## Data Source Reference - -| Source | Datasets | URL | -|--------|----------|-----| -| Toronto Open Data | Neighbourhoods, Census Profiles, Parks, Schools, Childcare, TTC | open.toronto.ca | -| Toronto Police | Crime Rates, MCI, Shootings | data.torontopolice.on.ca | -| CMHC | Rental Market Survey | cmhc-schl.gc.ca | - ---- - -## CMHC Zone Mapping Note - -CMHC uses 15 zones that don't align with 158 neighbourhoods. Strategy: -- Create `bridge_cmhc_neighbourhood` with area weights -- Allocate rental metrics proportionally to overlapping neighbourhoods -- Document methodology in `/toronto/methodology` page - ---- - -*Document Version: 2.0* -*Trimmed from v1.0 for execution clarity* diff --git a/docs/changes/Change-Toronto-Analysis.md b/docs/changes/Change-Toronto-Analysis.md deleted file mode 100644 index 3b25890..0000000 --- a/docs/changes/Change-Toronto-Analysis.md +++ /dev/null @@ -1,423 +0,0 @@ -# Toronto Neighbourhood Dashboard — Deliverables - -**Project Type:** Interactive Data Visualization Dashboard -**Geographic Scope:** City of Toronto, 158 Official Neighbourhoods -**Author:** Leo Miranda -**Version:** 1.0 | January 2026 - ---- - -## Executive Summary - -Multi-tab analytics dashboard built around Toronto's official neighbourhood boundaries. The core interaction is a choropleth map where users explore the city through different thematic lenses—housing affordability, safety, demographics, amenities—with supporting visualizations that tell a cohesive story per theme. - -**Primary Goals:** -1. Demonstrate interactive data visualization skills (Plotly/Dash) -2. Showcase data engineering capabilities (multi-source ETL, dimensional modeling) -3. Create a portfolio piece with genuine analytical value - ---- - -## Part 1: Geographic Foundation (Required First) - -| Dataset | Source | Format | Last Updated | Download | -|---------|--------|--------|--------------|----------| -| **Neighbourhoods Boundaries** | Toronto Open Data | GeoJSON | 2024 | [Link](https://open.toronto.ca/dataset/neighbourhoods/) | -| **Neighbourhood Profiles** | Toronto Open Data | CSV | 2021 Census | [Link](https://open.toronto.ca/dataset/neighbourhood-profiles/) | - -**Critical Notes:** -- Toronto uses 158 official neighbourhoods (updated 2024, was 140) -- GeoJSON includes `AREA_ID` for joining to tabular data -- Neighbourhood Profiles has 2,400+ indicators per neighbourhood from Census - ---- - -## Part 2: Tier 1 — MVP Datasets - -| Dataset | Source | Measures Available | Update Freq | Granularity | -|---------|--------|-------------------|-------------|-------------| -| **Neighbourhoods GeoJSON** | Toronto Open Data | Boundary polygons, area IDs | Static | Neighbourhood | -| **Neighbourhood Profiles (full)** | Toronto Open Data | 2,400+ Census indicators | Every 5 years | Neighbourhood | -| **Neighbourhood Crime Rates** | Toronto Police Portal | MCI rates per 100K by year | Annual | Neighbourhood | -| **CMHC Rental Market Survey** | CMHC Portal | Avg rent by bedroom, vacancy rate | Annual (Oct) | 15 CMHC Zones | -| **Parks** | Toronto Open Data | Park locations, area, type | Annual | Point/Polygon | - -**Total API/Download Calls:** 5 -**Data Volume:** ~50MB combined - -### Tier 1 Measures to Extract - -**From Neighbourhood Profiles:** -- Population, population density -- Median household income -- Age distribution (0-14, 15-24, 25-44, 45-64, 65+) -- % Immigrants, % Visible minorities -- Top languages spoken -- Unemployment rate -- Education attainment (% with post-secondary) -- Housing tenure (own vs rent %) -- Dwelling types distribution -- Average rent, housing costs as % of income - -**From Crime Rates:** -- Total MCI rate per 100K population -- Year-over-year crime trend - -**From CMHC:** -- Average monthly rent (1BR, 2BR, 3BR) -- Vacancy rates - -**From Parks:** -- Park count per neighbourhood -- Park area per capita - ---- - -## Part 3: Tier 2 — Expansion Datasets - -| Dataset | Source | Measures Available | Update Freq | Granularity | -|---------|--------|-------------------|-------------|-------------| -| **Major Crime Indicators (MCI)** | Toronto Police Portal | Assault, B&E, auto theft, robbery, theft over | Quarterly | Neighbourhood | -| **Shootings & Firearm Discharges** | Toronto Police Portal | Shooting incidents, injuries, fatalities | Quarterly | Neighbourhood | -| **Building Permits** | Toronto Open Data | New construction, permits by type | Monthly | Address-level | -| **Schools** | Toronto Open Data | Public/Catholic, elementary/secondary | Annual | Point | -| **TTC Routes & Stops** | Toronto Open Data | Route geometry, stop locations | Static | Route/Stop | -| **Licensed Child Care Centres** | Toronto Open Data | Capacity, ages served, locations | Annual | Point | - -### Tier 2 Measures to Extract - -**From MCI Details:** -- Breakdown by crime type (assault, B&E, auto theft, robbery, theft over) - -**From Shootings:** -- Shooting incidents count -- Injuries/fatalities - -**From Building Permits:** -- New construction permits (trailing 12 months) -- Permit types distribution - -**From Schools:** -- Schools per 1000 children -- School type breakdown - -**From TTC:** -- Transit stops within neighbourhood -- Transit accessibility score - -**From Child Care:** -- Child care spaces per capita -- Coverage by age group - ---- - -## Part 4: Data Sources by Thematic Group - -### GROUP A: Housing & Affordability - -| Dataset | Tier | Measures | Update Freq | -|---------|------|----------|-------------| -| Neighbourhood Profiles (Housing) | 1 | Avg rent, ownership %, dwelling types, housing costs as % of income | Every 5 years | -| CMHC Rental Market Survey | 1 | Avg rent by bedroom, vacancy rate, rental universe | Annual | -| Building Permits | 2 | New construction, permits by type | Monthly | - -**Calculated Metrics:** -- Rent-to-Income Ratio (CMHC rent ÷ Census income) -- Affordability Index (% of income spent on housing) - ---- - -### GROUP B: Safety & Crime - -| Dataset | Tier | Measures | Update Freq | -|---------|------|----------|-------------| -| Neighbourhood Crime Rates | 1 | MCI rates per 100K pop by year | Annual | -| Major Crime Indicators (MCI) | 2 | Assault, B&E, auto theft, robbery, theft over | Quarterly | -| Shootings & Firearm Discharges | 2 | Shooting incidents, injuries, fatalities | Quarterly | - -**Calculated Metrics:** -- Year-over-year crime change % -- Crime type distribution - ---- - -### GROUP C: Demographics & Community - -| Dataset | Tier | Measures | Update Freq | -|---------|------|----------|-------------| -| Neighbourhood Profiles (Demographics) | 1 | Age distribution, household composition, income | Every 5 years | -| Neighbourhood Profiles (Immigration) | 1 | Immigration status, visible minorities, languages | Every 5 years | -| Neighbourhood Profiles (Education) | 1 | Education attainment, field of study | Every 5 years | -| Neighbourhood Profiles (Labour) | 1 | Employment rate, occupation, industry | Every 5 years | - ---- - -### GROUP D: Transportation & Mobility - -| Dataset | Tier | Measures | Update Freq | -|---------|------|----------|-------------| -| Commute Mode (Census) | 1 | % car, transit, walk, bike | Every 5 years | -| TTC Routes & Stops | 2 | Route geometry, stop locations | Static | - -**Calculated Metrics:** -- Transit accessibility (stops within 500m of neighbourhood centroid) - ---- - -### GROUP E: Amenities & Services - -| Dataset | Tier | Measures | Update Freq | -|---------|------|----------|-------------| -| Parks | 1 | Park locations, area, type | Annual | -| Schools | 2 | Public/Catholic, elementary/secondary | Annual | -| Licensed Child Care Centres | 2 | Capacity, ages served | Annual | - -**Calculated Metrics:** -- Park area per capita -- Schools per 1000 children (ages 5-17) -- Child care spaces per 1000 children (ages 0-4) - ---- - -## Part 5: Tab Structure - -### Tab Architecture - -``` -┌────────────────────────────────────────────────────────────────┐ -│ [Overview] [Housing] [Safety] [Demographics] [Amenities] │ -├────────────────────────────────────────────────────────────────┤ -│ │ -│ ┌─────────────────────────────────┐ ┌────────────────┐ │ -│ │ │ │ KPI Card 1 │ │ -│ │ CHOROPLETH MAP │ ├────────────────┤ │ -│ │ (158 Neighbourhoods) │ │ KPI Card 2 │ │ -│ │ │ ├────────────────┤ │ -│ │ Click to select │ │ KPI Card 3 │ │ -│ │ │ └────────────────┘ │ -│ └─────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────┐ ┌─────────────────────┐ │ -│ │ Supporting Chart 1 │ │ Supporting Chart 2 │ │ -│ │ (Context/Trend) │ │ (Comparison/Rank) │ │ -│ └─────────────────────┘ └─────────────────────┘ │ -│ │ -│ [Neighbourhood: Selected Name] ──────────────────────── │ -│ Details panel with all metrics for selected area │ -└────────────────────────────────────────────────────────────────┘ -``` - ---- - -### Tab 1: Overview (Default Landing) - -**Story:** "How do Toronto neighbourhoods compare across key livability metrics?" - -| Element | Content | Data Source | -|---------|---------|-------------| -| Map Colour | Composite livability score | Calculated from weighted metrics | -| KPI Cards | Population, Median Income, Avg Crime Rate | Neighbourhood Profiles, Crime Rates | -| Chart 1 | Top 10 / Bottom 10 by livability score | Calculated | -| Chart 2 | Income vs Crime scatter plot | Neighbourhood Profiles, Crime Rates | - -**Metric Selector:** Allow user to change map colour by any single metric. - ---- - -### Tab 2: Housing & Affordability - -**Story:** "Where can you afford to live, and what's being built?" - -| Element | Content | Data Source | -|---------|---------|-------------| -| Map Colour | Rent-to-Income Ratio (Affordability Index) | CMHC + Census income | -| KPI Cards | Median Rent (1BR), Vacancy Rate, New Permits (12mo) | CMHC, Building Permits | -| Chart 1 | Rent trend (5-year line chart by bedroom) | CMHC historical | -| Chart 2 | Dwelling type breakdown (pie/bar) | Neighbourhood Profiles | - -**Metric Selector:** Toggle between rent, ownership %, dwelling types. - ---- - -### Tab 3: Safety - -**Story:** "How safe is each neighbourhood, and what crimes are most common?" - -| Element | Content | Data Source | -|---------|---------|-------------| -| Map Colour | Total MCI Rate per 100K | Crime Rates | -| KPI Cards | Total Crimes, YoY Change %, Shooting Incidents | Crime Rates, Shootings | -| Chart 1 | Crime type breakdown (stacked bar) | MCI Details | -| Chart 2 | 5-year crime trend (line chart) | Crime Rates historical | - -**Metric Selector:** Toggle between total crime, specific crime types, shootings. - ---- - -### Tab 4: Demographics - -**Story:** "Who lives here? Age, income, diversity." - -| Element | Content | Data Source | -|---------|---------|-------------| -| Map Colour | Median Household Income | Neighbourhood Profiles | -| KPI Cards | Population, % Immigrant, Unemployment Rate | Neighbourhood Profiles | -| Chart 1 | Age distribution (population pyramid or bar) | Neighbourhood Profiles | -| Chart 2 | Top languages spoken (horizontal bar) | Neighbourhood Profiles | - -**Metric Selector:** Income, immigrant %, age groups, education. - ---- - -### Tab 5: Amenities & Services - -**Story:** "What's nearby? Parks, schools, child care, transit." - -| Element | Content | Data Source | -|---------|---------|-------------| -| Map Colour | Park Area per Capita | Parks + Population | -| KPI Cards | Parks Count, Schools Count, Child Care Spaces | Multiple datasets | -| Chart 1 | Amenity density comparison (radar or bar) | Calculated | -| Chart 2 | Transit accessibility (stops within 500m) | TTC Stops | - -**Metric Selector:** Parks, schools, child care, transit access. - ---- - -## Part 6: Data Pipeline Architecture - -### ETL Flow - -``` -┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ DATA SOURCES │ │ STAGING LAYER │ │ MART LAYER │ -│ │ │ │ │ │ -│ Toronto Open │────▶│ stg_geography │────▶│ dim_neighbourhood│ -│ Data Portal │ │ stg_census │ │ fact_crime │ -│ │ │ stg_crime │ │ fact_housing │ -│ CMHC Portal │────▶│ stg_rental │ │ fact_amenities │ -│ │ │ stg_permits │ │ │ -│ Toronto Police │────▶│ stg_amenities │ │ agg_dashboard │ -│ Portal │ │ stg_childcare │ │ (pre-computed) │ -└─────────────────┘ └─────────────────┘ └─────────────────┘ -``` - -### Key Transformations - -| Transformation | Description | -|----------------|-------------| -| **Geography Standardization** | Ensure all datasets use `neighbourhood_id` (AREA_ID from GeoJSON) | -| **Census Pivot** | Neighbourhood Profiles is wide format — pivot to metrics per neighbourhood | -| **CMHC Zone Mapping** | Create crosswalk from 15 CMHC zones to 158 neighbourhoods | -| **Amenity Aggregation** | Spatial join point data (schools, parks, child care) to neighbourhood polygons | -| **Rate Calculations** | Normalize counts to per-capita or per-100K | - -### Data Refresh Schedule - -| Layer | Frequency | Trigger | -|-------|-----------|---------| -| Staging (API pulls) | Weekly | Scheduled job | -| Marts (transforms) | Weekly | Post-staging | -| Dashboard cache | On-demand | User refresh button | - ---- - -## Part 7: Technical Stack - -### Core Stack - -| Component | Technology | Rationale | -|-----------|------------|-----------| -| **Frontend** | Plotly Dash | Production-ready, rapid iteration | -| **Mapping** | Plotly `choropleth_mapbox` | Native Dash integration | -| **Data Store** | PostgreSQL + PostGIS | Spatial queries, existing expertise | -| **ETL** | Python (Pandas, SQLAlchemy) | Existing stack | -| **Deployment** | Render / Railway | Free tier, easy Dash hosting | - -### Alternative (Portfolio Stretch) - -| Component | Technology | Why Consider | -|-----------|------------|--------------| -| **Frontend** | React + deck.gl | More "modern" for portfolio | -| **Data Store** | DuckDB | Serverless, embeddable | -| **ETL** | dbt | Aligns with skills roadmap | - ---- - -## Appendix A: Data Source URLs - -| Source | URL | -|--------|-----| -| Toronto Open Data — Neighbourhoods | https://open.toronto.ca/dataset/neighbourhoods/ | -| Toronto Open Data — Neighbourhood Profiles | https://open.toronto.ca/dataset/neighbourhood-profiles/ | -| Toronto Police — Neighbourhood Crime Rates | https://data.torontopolice.on.ca/datasets/neighbourhood-crime-rates-open-data | -| Toronto Police — MCI | https://data.torontopolice.on.ca/datasets/major-crime-indicators-open-data | -| Toronto Police — Shootings | https://data.torontopolice.on.ca/datasets/shootings-firearm-discharges-open-data | -| CMHC Rental Market Survey | https://www.cmhc-schl.gc.ca/professionals/housing-markets-data-and-research/housing-data/data-tables/rental-market | -| Toronto Open Data — Parks | https://open.toronto.ca/dataset/parks/ | -| Toronto Open Data — Schools | https://open.toronto.ca/dataset/school-locations-all-types/ | -| Toronto Open Data — Building Permits | https://open.toronto.ca/dataset/building-permits-cleared-permits/ | -| Toronto Open Data — Child Care | https://open.toronto.ca/dataset/licensed-child-care-centres/ | -| Toronto Open Data — TTC Routes | https://open.toronto.ca/dataset/ttc-routes-and-schedules/ | - ---- - -## Appendix B: Colour Palettes - -### Affordability (Diverging) -| Status | Hex | Usage | -|--------|-----|-------| -| Affordable (<30% income) | `#2ecc71` | Green | -| Stretched (30-50%) | `#f1c40f` | Yellow | -| Unaffordable (>50%) | `#e74c3c` | Red | - -### Safety (Sequential) -| Status | Hex | Usage | -|--------|-----|-------| -| Safest (lowest crime) | `#27ae60` | Dark green | -| Moderate | `#f39c12` | Orange | -| Highest Crime | `#c0392b` | Dark red | - -### Demographics — Income (Sequential) -| Level | Hex | Usage | -|-------|-----|-------| -| Highest Income | `#1a5276` | Dark blue | -| Mid Income | `#5dade2` | Light blue | -| Lowest Income | `#ecf0f1` | Light gray | - -### General Recommendation -Use **Viridis** or **Plasma** colorscales for perceptually uniform gradients on continuous metrics. - ---- - -## Appendix C: Glossary - -| Term | Definition | -|------|------------| -| **MCI** | Major Crime Indicators — Assault, B&E, Auto Theft, Robbery, Theft Over | -| **CMHC Zone** | Canada Mortgage and Housing Corporation rental market survey zones (15 in Toronto) | -| **Rent-to-Income Ratio** | Monthly rent ÷ monthly household income; <30% is considered affordable | -| **PostGIS** | PostgreSQL extension for geographic data | -| **Choropleth** | Thematic map where areas are shaded based on a statistical variable | - ---- - -## Appendix D: Interview Talking Points - -When discussing this project in interviews, emphasize: - -1. **Data Engineering:** "I built a multi-source ETL pipeline that standardizes geographic keys across Census data, police data, and CMHC rental surveys—three different granularities I had to reconcile." - -2. **Dimensional Modeling:** "The data model follows star schema patterns with a central neighbourhood dimension table and fact tables for crime, housing, and amenities." - -3. **dbt Patterns:** "The transformation layer uses staging → intermediate → mart patterns, which I've documented for maintainability." - -4. **Business Value:** "The dashboard answers questions like 'Where can a young professional afford to live that's safe and has good transit?' — turning raw data into actionable insights." - -5. **Technical Decisions:** "I chose Plotly Dash over a React frontend because it let me iterate faster while maintaining production-quality interactivity. For a portfolio piece, speed to working demo matters." - ---- - -*Document Version: 1.0* -*Created: January 2026* -*Author: Leo Miranda / Claude*