Template

Files

lmiranda d0f32edba7 fix: Repair data pipeline with StatCan CMHC rental data

- Add StatCan CMHC parser to fetch rental data from Statistics Canada API
- Create year spine (2014-2025) as time dimension driver instead of census
- Add CMA-level rental and income intermediate models
- Update mart_neighbourhood_overview to use rental years as base
- Fix neighbourhood_service queries to match dbt schema
- Add CMHC data loading to pipeline script

Data now flows correctly: 158 neighbourhoods × 12 years = 1,896 records
Rent data available 2019-2025, crime data 2014-2024

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-17 15:38:31 -05:00

11 KiB

Raw Blame History

CLAUDE.md

Working context for Claude Code on the Analytics Portfolio project.

Project Status

Last Completed Sprint: 9 (Neighbourhood Dashboard Transition) Current State: Ready for deployment sprint or new features Branch: development (feature branches merge here)

Quick Reference

Run Commands

make setup          # Install deps, create .env, init pre-commit
make docker-up      # Start PostgreSQL + PostGIS (auto-detects x86/ARM)
make docker-down    # Stop containers
make db-init        # Initialize database schema
make run            # Start Dash dev server
make test           # Run pytest
make lint           # Run ruff linter
make format         # Run ruff formatter
make ci             # Run all checks

Branch Workflow

Create feature branch FROM development: git checkout -b feature/{sprint}-{description}
Work and commit on feature branch
Merge INTO development when complete
Delete the feature branch after merge (keep branches clean)
development -> staging -> main for releases

CRITICAL: NEVER DELETE the development branch. It is the main integration branch.

Code Conventions

Import Style

Context	Style	Example
Same directory	Single dot	`from .neighbourhood import NeighbourhoodRecord`
Sibling directory	Double dot	`from ..schemas.neighbourhood import CensusRecord`
External packages	Absolute	`import pandas as pd`

Module Responsibilities

Directory	Contains	Purpose
`schemas/`	Pydantic models	Data validation
`models/`	SQLAlchemy ORM	Database persistence
`parsers/`	API/CSV extraction	Raw data ingestion
`loaders/`	Database operations	Data loading
`figures/`	Chart factories	Plotly figure generation
`callbacks/`	Dash callbacks	In `pages/{dashboard}/callbacks/`
`errors/`	Exceptions + handlers	Error handling

Type Hints

Use Python 3.10+ style:

def process(items: list[str], config: dict[str, int] | None = None) -> bool:
    ...

Error Handling

# errors/exceptions.py
class PortfolioError(Exception):
    """Base exception."""

class ParseError(PortfolioError):
    """PDF/CSV parsing failed."""

class ValidationError(PortfolioError):
    """Pydantic or business rule validation failed."""

class LoadError(PortfolioError):
    """Database load operation failed."""

Code Standards

Single responsibility functions with verb naming
Early returns over deep nesting
Google-style docstrings only for non-obvious behavior
Module-level constants for magic values
Pydantic BaseSettings for runtime config

Application Structure

portfolio_app/
├── app.py                    # Dash app factory with Pages routing
├── config.py                 # Pydantic BaseSettings
├── assets/                   # CSS, images (auto-served)
│   └── sidebar.css          # Navigation styling
├── callbacks/               # Global callbacks
│   ├── sidebar.py           # Sidebar toggle
│   └── theme.py             # Dark/light theme
├── pages/
│   ├── home.py              # Bio landing page -> /
│   ├── about.py             # About page -> /about
│   ├── contact.py           # Contact form -> /contact
│   ├── health.py            # Health endpoint -> /health
│   ├── projects.py          # Project showcase -> /projects
│   ├── resume.py            # Resume/CV -> /resume
│   ├── blog/
│   │   ├── index.py         # Blog listing -> /blog
│   │   └── article.py       # Blog article -> /blog/{slug}
│   └── toronto/
│       ├── dashboard.py     # Dashboard -> /toronto
│       ├── methodology.py   # Methodology -> /toronto/methodology
│       ├── tabs/            # 5 tab layouts (overview, housing, safety, demographics, amenities)
│       └── callbacks/       # Dashboard interactions
├── components/              # Shared UI (sidebar, cards, controls)
│   ├── metric_card.py       # KPI card component
│   ├── map_controls.py      # Map control panel
│   ├── sidebar.py           # Navigation sidebar
│   └── time_slider.py       # Time range selector
├── figures/                 # Shared chart factories
│   ├── choropleth.py        # Map visualizations
│   ├── bar_charts.py        # Ranking, stacked, horizontal bars
│   ├── scatter.py           # Scatter and bubble plots
│   ├── radar.py             # Radar/spider charts
│   ├── demographics.py      # Age pyramids, donut charts
│   ├── time_series.py       # Trend lines
│   └── summary_cards.py     # KPI figures
├── content/                 # Markdown content
│   └── blog/                # Blog articles
├── toronto/                 # Toronto data logic
│   ├── parsers/
│   ├── loaders/
│   ├── schemas/             # Pydantic
│   ├── models/              # SQLAlchemy
│   └── demo_data.py         # Sample data
├── utils/                   # Utilities
│   └── markdown_loader.py   # Markdown processing
└── errors/

notebooks/                   # Data documentation (Phase 6)
├── README.md                # Template and usage guide
├── overview/                # Overview tab notebooks (3)
├── housing/                 # Housing tab notebooks (3)
├── safety/                  # Safety tab notebooks (3)
├── demographics/            # Demographics tab notebooks (3)
└── amenities/               # Amenities tab notebooks (3)

URL Routing

URL	Page	Sprint
`/`	Bio landing page	2
`/about`	About page	8
`/contact`	Contact form	8
`/health`	Health endpoint	8
`/projects`	Project showcase	8
`/resume`	Resume/CV	8
`/blog`	Blog listing	8
`/blog/{slug}`	Blog article	8
`/toronto`	Toronto Dashboard	6
`/toronto/methodology`	Dashboard methodology	6

Tech Stack (Locked)

Layer	Technology	Version
Database	PostgreSQL + PostGIS	16.x
Validation	Pydantic	>=2.0
ORM	SQLAlchemy	>=2.0 (2.0-style API only)
Transformation	dbt-postgres	>=1.7
Data Processing	Pandas	>=2.1
Geospatial	GeoPandas + Shapely	>=0.14
Visualization	Dash + Plotly	>=2.14
UI Components	dash-mantine-components	Latest stable
Testing	pytest	>=7.0
Python	3.11+	Via pyenv

Notes:

SQLAlchemy 2.0 + Pydantic 2.0 only (never mix 1.x APIs)
PostGIS extension required in database
Docker Compose V2 format (no version field)
Multi-architecture support: make docker-up auto-detects CPU architecture and uses the appropriate PostGIS image (x86_64: postgis/postgis, ARM64: imresamu/postgis)

Data Model Overview

Geographic Reality (Toronto Housing)

City Neighbourhoods (158) - Primary geographic unit for analysis
CMHC Zones (~20)          - Rental data (Census Tract aligned)

Star Schema

Table	Type	Keys
`fact_rentals`	Fact	-> dim_time, dim_cmhc_zone
`dim_time`	Dimension	date_key (PK)
`dim_cmhc_zone`	Dimension	zone_key (PK), geometry
`dim_neighbourhood`	Dimension	neighbourhood_id (PK), geometry
`dim_policy_event`	Dimension	event_id (PK)

dbt Layers

Layer	Naming	Purpose
Staging	`stg_{source}__{entity}`	1:1 source, cleaned, typed
Intermediate	`int_{domain}__{transform}`	Business logic
Marts	`mart_{domain}`	Final analytical tables

Deferred Features

Stop and flag if a task seems to require these:

Feature	Reason
Historical boundary reconciliation (140->158)	2021+ data only for V1
ML prediction models	Energy project scope (future phase)
Multi-project shared infrastructure	Build first, abstract second

Environment Variables

Required in .env:

DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
LOG_LEVEL=INFO

Script Standards

All scripts in scripts/:

Include usage comments at top
Idempotent where possible
Exit codes: 0 = success, 1 = error
Use set -euo pipefail for bash
Log to stdout, errors to stderr

Reference Documents

Document	Location	Use When
Project reference	`docs/PROJECT_REFERENCE.md`	Architecture decisions, completed work
Developer guide	`docs/CONTRIBUTING.md`	How to add pages, blog posts, tabs
Lessons learned	`docs/project-lessons-learned/INDEX.md`	Past issues and solutions

Projman Plugin Workflow

CRITICAL: Always use the projman plugin for sprint and task management.

When to Use Projman Skills

Skill	Trigger	Purpose
`/projman:sprint-plan`	New sprint or phase implementation	Architecture analysis + Gitea issue creation
`/projman:sprint-start`	Beginning implementation work	Load lessons learned (Wiki.js or local), start execution
`/projman:sprint-status`	Check progress	Review blockers and completion status
`/projman:sprint-close`	Sprint completion	Capture lessons learned (Wiki.js or local backup)

Default Behavior

When user requests implementation work:

ALWAYS start with /projman:sprint-plan before writing code
Create Gitea issues with proper labels and acceptance criteria
Use /projman:sprint-start to begin execution with lessons learned
Track progress via Gitea issue comments
Close sprint with /projman:sprint-close to document lessons

Gitea Repository

Repo: lmiranda/personal-portfolio
Host: gitea.hotserv.cloud
Note: lmiranda is a user account (not org), so label lookup may require repo-level labels

MCP Tools Available

Gitea:

list_issues, get_issue, create_issue, update_issue, add_comment
get_labels, suggest_labels

Wiki.js:

search_lessons, create_lesson, search_pages, get_page

Lessons Learned (Backup Method)

When Wiki.js is unavailable, use the local backup in docs/project-lessons-learned/:

At Sprint Start:

Review docs/project-lessons-learned/INDEX.md for relevant past lessons
Search lesson files by tags/keywords before implementation
Apply prevention strategies from applicable lessons

At Sprint Close:

Try Wiki.js create_lesson first
If Wiki.js fails, create lesson in docs/project-lessons-learned/
Use naming convention: {phase-or-sprint}-{short-description}.md
Update INDEX.md with new entry
Follow the lesson template in INDEX.md

Migration: Once Wiki.js is configured, lessons will be migrated there for better searchability.

Issue Structure

Every Gitea issue should include:

Overview: Brief description
Files to Create/Modify: Explicit paths
Acceptance Criteria: Checkboxes
Technical Notes: Implementation hints
Labels: Listed in body (workaround for label API issues)

Last Updated: January 2026 (Post-Sprint 9)

11 KiB Raw Blame History