CLAUDE.md: - Update project status to Sprint 9 - Remove TRREB references from data model section - Update star schema to reflect current tables - Simplify deferred features section - Update reference documents PROJECT_REFERENCE.md: - Update import examples to neighbourhood-based - Update data sources for neighbourhood dashboard - Update geographic reality diagram - Update star schema - Modernize sprint overview - Update scope boundaries - Update success criteria with completed milestones - Update reference documents Closes #52 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
369 lines
11 KiB
Markdown
369 lines
11 KiB
Markdown
# Portfolio Project Reference
|
|
|
|
**Project**: Analytics Portfolio
|
|
**Owner**: Leo
|
|
**Status**: Ready for Sprint 1
|
|
|
|
---
|
|
|
|
## Project Overview
|
|
|
|
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.
|
|
|
|
| Project | Domain | Key Skills | Phase |
|
|
|---------|--------|------------|-------|
|
|
| **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) |
|
|
| **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) |
|
|
|
|
**Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).
|
|
|
|
---
|
|
|
|
## Branching Strategy
|
|
|
|
| Branch | Purpose | Deploys To |
|
|
|--------|---------|------------|
|
|
| `main` | Production releases only | VPS (production) |
|
|
| `staging` | Pre-production testing | VPS (staging) |
|
|
| `development` | Active development | Local only |
|
|
|
|
**Rules**:
|
|
- All feature branches created FROM `development`
|
|
- All feature branches merge INTO `development`
|
|
- `development` → `staging` for testing
|
|
- `staging` → `main` for release
|
|
- Direct commits to `main` or `staging` are forbidden
|
|
- Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}`
|
|
|
|
---
|
|
|
|
## Tech Stack (Locked)
|
|
|
|
| Layer | Technology | Version |
|
|
|-------|------------|---------|
|
|
| Database | PostgreSQL + PostGIS | 16.x |
|
|
| Validation | Pydantic | ≥2.0 |
|
|
| ORM | SQLAlchemy | ≥2.0 (2.0-style API only) |
|
|
| Transformation | dbt-postgres | ≥1.7 |
|
|
| Data Processing | Pandas | ≥2.1 |
|
|
| Geospatial | GeoPandas + Shapely | ≥0.14 |
|
|
| Visualization | Dash + Plotly | ≥2.14 |
|
|
| UI Components | dash-mantine-components | Latest stable |
|
|
| Testing | pytest | ≥7.0 |
|
|
| Python | 3.11+ | Via pyenv |
|
|
|
|
**Compatibility Notes**:
|
|
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
|
|
- PostGIS extension required—enable during db init
|
|
- Docker Compose V2 (no `version` field in compose files)
|
|
|
|
---
|
|
|
|
## Code Conventions
|
|
|
|
### Import Style
|
|
|
|
| Context | Style | Example |
|
|
|---------|-------|---------|
|
|
| Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` |
|
|
| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` |
|
|
| External packages | Absolute | `import pandas as pd` |
|
|
|
|
### Module Separation
|
|
|
|
| Directory | Contains | Purpose |
|
|
|-----------|----------|---------|
|
|
| `schemas/` | Pydantic models | Data validation |
|
|
| `models/` | SQLAlchemy ORM | Database persistence |
|
|
| `parsers/` | API/CSV extraction | Raw data ingestion |
|
|
| `loaders/` | Database operations | Data loading |
|
|
| `figures/` | Chart factories | Plotly figure generation |
|
|
| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` |
|
|
| `errors/` | Exceptions + handlers | Error handling |
|
|
|
|
### Code Standards
|
|
|
|
- **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`)
|
|
- **Functions**: Single responsibility, verb naming, early returns over nesting
|
|
- **Docstrings**: Google style, minimal—only for non-obvious behavior
|
|
- **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config
|
|
|
|
### Error Handling
|
|
|
|
```python
|
|
# errors/exceptions.py
|
|
class PortfolioError(Exception):
|
|
"""Base exception."""
|
|
|
|
class ParseError(PortfolioError):
|
|
"""PDF/CSV parsing failed."""
|
|
|
|
class ValidationError(PortfolioError):
|
|
"""Pydantic or business rule validation failed."""
|
|
|
|
class LoadError(PortfolioError):
|
|
"""Database load operation failed."""
|
|
```
|
|
|
|
- Decorators for infrastructure concerns (logging, retry, transactions)
|
|
- Explicit handling for domain logic (business rules, recovery strategies)
|
|
|
|
---
|
|
|
|
## Application Architecture
|
|
|
|
### Dash Pages Structure
|
|
|
|
```
|
|
portfolio_app/
|
|
├── app.py # Dash app factory with Pages routing
|
|
├── config.py # Pydantic BaseSettings
|
|
├── assets/ # CSS, images (auto-served by Dash)
|
|
├── pages/
|
|
│ ├── home.py # Bio landing page → /
|
|
│ ├── toronto/
|
|
│ │ ├── dashboard.py # Layout only → /toronto
|
|
│ │ └── callbacks/ # Interaction logic
|
|
│ └── energy/ # Phase 3
|
|
├── components/ # Shared UI (navbar, footer, cards)
|
|
├── figures/ # Shared chart factories
|
|
├── toronto/ # Toronto data logic
|
|
│ ├── parsers/
|
|
│ ├── loaders/
|
|
│ ├── schemas/ # Pydantic
|
|
│ └── models/ # SQLAlchemy
|
|
└── errors/
|
|
```
|
|
|
|
### URL Routing (Automatic)
|
|
|
|
| URL | Page | Status |
|
|
|-----|------|--------|
|
|
| `/` | Bio landing page | Sprint 2 |
|
|
| `/toronto` | Toronto Housing Dashboard | Sprint 6 |
|
|
| `/energy` | Energy Pricing Dashboard | Phase 3 |
|
|
|
|
---
|
|
|
|
## Phase 1: Toronto Neighbourhood Dashboard
|
|
|
|
### Data Sources
|
|
|
|
| Track | Source | Format | Geography | Frequency |
|
|
|-------|--------|--------|-----------|-----------|
|
|
| Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual |
|
|
| Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
|
|
| Policy Events | Curated list | CSV | N/A | Event-based |
|
|
|
|
### Geographic Reality
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Data Model (Star Schema)
|
|
|
|
| Table | Type | Keys |
|
|
|-------|------|------|
|
|
| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone |
|
|
| `dim_time` | Dimension | date_key (PK) |
|
|
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
|
|
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
|
|
| `dim_policy_event` | Dimension | event_id (PK) |
|
|
|
|
### dbt Layer Structure
|
|
|
|
| Layer | Naming | Purpose |
|
|
|-------|--------|---------|
|
|
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
|
|
| Intermediate | `int_{domain}__{transform}` | Business logic, filtering |
|
|
| Marts | `mart_{domain}` | Final analytical tables |
|
|
|
|
---
|
|
|
|
## Sprint Overview
|
|
|
|
| Sprint | Focus | Milestone |
|
|
|--------|-------|-----------|
|
|
| 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** |
|
|
| 7 | Navigation & theme modernization | — |
|
|
| 8 | Portfolio website expansion | **Launch 2: Website Live** |
|
|
| 9 | Neighbourhood dashboard transition | Cleanup complete |
|
|
| 10+ | Dashboard implementation | **Launch 3: Dashboard Live** |
|
|
|
|
---
|
|
|
|
## Scope Boundaries
|
|
|
|
### Phase 1 — Build These
|
|
|
|
- Bio landing page and portfolio website
|
|
- CMHC rental data processor
|
|
- Toronto neighbourhood data integration
|
|
- PostgreSQL + PostGIS database layer
|
|
- Star schema (facts + dimensions)
|
|
- dbt models with tests
|
|
- Choropleth visualization (Dash)
|
|
- Policy event annotation layer
|
|
|
|
### Deferred Features
|
|
|
|
| Feature | Reason | When |
|
|
|---------|--------|------|
|
|
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase |
|
|
| ML prediction models | Energy project scope | Phase 3 |
|
|
| Multi-project shared infrastructure | Build first, abstract second | Future |
|
|
|
|
If a task seems to require deferred features, **stop and flag it**.
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
### Root-Level Files (Allowed)
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `README.md` | Project overview |
|
|
| `CLAUDE.md` | AI assistant context |
|
|
| `pyproject.toml` | Python packaging |
|
|
| `.gitignore` | Git ignore rules |
|
|
| `.env.example` | Environment template |
|
|
| `.python-version` | pyenv version |
|
|
| `.pre-commit-config.yaml` | Pre-commit hooks |
|
|
| `docker-compose.yml` | Container orchestration |
|
|
| `Makefile` | Task automation |
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
portfolio/
|
|
├── portfolio_app/ # Monolithic Dash application
|
|
│ ├── app.py
|
|
│ ├── config.py
|
|
│ ├── assets/
|
|
│ ├── pages/
|
|
│ ├── components/
|
|
│ ├── figures/
|
|
│ ├── toronto/
|
|
│ └── errors/
|
|
├── tests/
|
|
├── dbt/
|
|
├── data/
|
|
│ └── toronto/
|
|
│ ├── raw/
|
|
│ ├── processed/ # gitignored
|
|
│ └── reference/
|
|
├── scripts/
|
|
│ ├── db/
|
|
│ ├── docker/
|
|
│ ├── deploy/
|
|
│ ├── dbt/
|
|
│ └── dev/
|
|
├── docs/
|
|
├── notebooks/
|
|
├── backups/ # gitignored
|
|
└── reports/ # gitignored
|
|
```
|
|
|
|
### Gitignored Directories
|
|
|
|
- `data/*/processed/`
|
|
- `reports/`
|
|
- `backups/`
|
|
- `notebooks/*.html`
|
|
- `.env`
|
|
- `__pycache__/`
|
|
- `.venv/`
|
|
|
|
---
|
|
|
|
## Makefile Targets
|
|
|
|
| Target | Purpose |
|
|
|--------|---------|
|
|
| `setup` | Install deps, create .env, init pre-commit |
|
|
| `docker-up` | Start PostgreSQL + PostGIS |
|
|
| `docker-down` | Stop containers |
|
|
| `db-init` | Initialize database schema |
|
|
| `run` | Start Dash dev server |
|
|
| `test` | Run pytest |
|
|
| `dbt-run` | Run dbt models |
|
|
| `dbt-test` | Run dbt tests |
|
|
| `lint` | Run ruff linter |
|
|
| `format` | Run ruff formatter |
|
|
| `ci` | Run all checks |
|
|
| `deploy` | Deploy to production |
|
|
|
|
---
|
|
|
|
## Script Standards
|
|
|
|
All scripts in `scripts/`:
|
|
- Include usage comments at top
|
|
- Idempotent where possible
|
|
- Exit codes: 0 = success, 1 = error
|
|
- Use `set -euo pipefail` for bash
|
|
- Log to stdout, errors to stderr
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
Required in `.env`:
|
|
|
|
```bash
|
|
DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
|
|
POSTGRES_USER=portfolio
|
|
POSTGRES_PASSWORD=<secure>
|
|
POSTGRES_DB=portfolio
|
|
DASH_DEBUG=true
|
|
SECRET_KEY=<random>
|
|
LOG_LEVEL=INFO
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Launch 1 (Bio Live)
|
|
- [x] Bio page accessible via HTTPS
|
|
- [x] All bio content rendered
|
|
- [x] No placeholder text visible
|
|
- [x] Mobile responsive
|
|
- [x] Social links functional
|
|
|
|
### Launch 2 (Website Live)
|
|
- [x] Full portfolio website with navigation
|
|
- [x] About, Contact, Projects, Resume, Blog pages
|
|
- [x] Dark mode theme support
|
|
- [x] Sidebar navigation
|
|
|
|
### Launch 3 (Dashboard Live)
|
|
- [ ] Choropleth renders neighbourhoods and CMHC zones
|
|
- [ ] Rental data visualization works
|
|
- [ ] Time navigation works
|
|
- [ ] Policy event markers visible
|
|
- [ ] Methodology documentation published
|
|
- [ ] Data sources cited
|
|
|
|
---
|
|
|
|
## Reference Documents
|
|
|
|
For detailed specifications, see:
|
|
|
|
| Document | Location | Use When |
|
|
|----------|----------|----------|
|
|
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification |
|
|
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning |
|
|
|
|
---
|
|
|
|
*Reference Version: 2.0*
|
|
*Updated: Sprint 9*
|