docs: Rewrite documentation with accurate project state

- Delete obsolete change proposals and bio content source
- Rewrite README.md with correct features, data sources, structure
- Update PROJECT_REFERENCE.md with accurate status and completed work
- Update CLAUDE.md references and sprint status
- Add docs/CONTRIBUTING.md developer guide with:
  - How to add blog posts (frontmatter, markdown)
  - How to add new pages (Dash routing)
  - How to add dashboard tabs
  - How to create figure factories
  - Branch workflow and code standards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-17 12:27:25 -05:00
parent 1a878313f8
commit 4818c53fd2
7 changed files with 794 additions and 1190 deletions

View File

@@ -1,21 +1,171 @@
# Portfolio Project Reference
**Project**: Analytics Portfolio
**Owner**: Leo
**Status**: Ready for Sprint 1
**Owner**: Leo Miranda
**Status**: Sprint 9 Complete (Dashboard Implementation Done)
**Last Updated**: January 2026
---
## Project Overview
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.
Personal portfolio website with an interactive Toronto Neighbourhood Dashboard demonstrating data engineering, visualization, and analytics capabilities.
| Project | Domain | Key Skills | Phase |
|---------|--------|------------|-------|
| **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) |
| **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) |
| Component | Description | Status |
|-----------|-------------|--------|
| Portfolio Website | Bio, About, Projects, Resume, Contact, Blog | Complete |
| Toronto Dashboard | 5-tab neighbourhood analysis | Complete |
| Data Pipeline | dbt models, figure factories | Complete |
| Deployment | Production deployment | Pending |
**Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).
---
## Completed Work
### Sprint 1-6: Foundation
- Repository setup, Docker, PostgreSQL + PostGIS
- Bio landing page implementation
- Initial data model design
### Sprint 7: Navigation & Theme
- Sidebar navigation
- Dark/light theme toggle
- dash-mantine-components integration
### Sprint 8: Portfolio Website
- About, Contact, Projects, Resume pages
- Blog system with Markdown/frontmatter
- Health endpoint
### Sprint 9: Neighbourhood Dashboard Transition
- Phase 1: Deleted legacy TRREB code
- Phase 2: Documentation cleanup
- Phase 3: New neighbourhood-centric data model
- Phase 4: dbt model restructuring
- Phase 5: 5-tab dashboard implementation
- Phase 6: 15 documentation notebooks
- Phase 7: Final documentation review
---
## Application Architecture
### URL Routes
| URL | Page | File |
|-----|------|------|
| `/` | Home | `pages/home.py` |
| `/about` | About | `pages/about.py` |
| `/contact` | Contact | `pages/contact.py` |
| `/projects` | Projects | `pages/projects.py` |
| `/resume` | Resume | `pages/resume.py` |
| `/blog` | Blog listing | `pages/blog/index.py` |
| `/blog/{slug}` | Article | `pages/blog/article.py` |
| `/toronto` | Dashboard | `pages/toronto/dashboard.py` |
| `/toronto/methodology` | Methodology | `pages/toronto/methodology.py` |
| `/health` | Health check | `pages/health.py` |
### Directory Structure
```
portfolio_app/
├── app.py # Dash app factory
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images
├── callbacks/ # Global callbacks (sidebar, theme)
├── components/ # Shared UI components
├── content/blog/ # Markdown blog articles
├── errors/ # Exception handling
├── figures/ # Plotly figure factories
├── pages/
│ ├── home.py
│ ├── about.py
│ ├── contact.py
│ ├── projects.py
│ ├── resume.py
│ ├── health.py
│ ├── blog/
│ │ ├── index.py
│ │ └── article.py
│ └── toronto/
│ ├── dashboard.py
│ ├── methodology.py
│ ├── tabs/ # 5 tab layouts
│ └── callbacks/ # Dashboard interactions
├── toronto/ # Data logic
│ ├── parsers/ # API extraction
│ ├── loaders/ # Database operations
│ ├── schemas/ # Pydantic models
│ ├── models/ # SQLAlchemy ORM
│ └── demo_data.py # Sample data
└── utils/
└── markdown_loader.py # Blog article loading
```
---
## Toronto Dashboard
### Data Sources
| Source | Data | Format |
|--------|------|--------|
| City of Toronto Open Data | Neighbourhoods (158), Census profiles, Parks, Schools, Childcare, TTC | GeoJSON, CSV, API |
| Toronto Police Service | Crime rates, MCI, Shootings | CSV, API |
| CMHC | Rental Market Survey | CSV |
### Geographic Model
```
City of Toronto Neighbourhoods (158) ← Primary analysis unit
CMHC Zones (~20) ← Rental data (Census Tract aligned)
```
### Dashboard Tabs
| Tab | Choropleth Metric | Supporting Charts |
|-----|-------------------|-------------------|
| Overview | Livability score | Top/Bottom 10 bar, Income vs Safety scatter |
| Housing | Affordability index | Rent trend line, Tenure breakdown bar |
| Safety | Crime rate per 100K | Crime breakdown bar, Crime trend line |
| Demographics | Median income | Age distribution, Population density bar |
| Amenities | Amenity index | Amenity radar, Transit accessibility bar |
### Star Schema
| Table | Type | Description |
|-------|------|-------------|
| `dim_neighbourhood` | Dimension | 158 neighbourhoods with geometry |
| `dim_time` | Dimension | Date dimension |
| `dim_cmhc_zone` | Dimension | ~20 CMHC zones with geometry |
| `fact_census` | Fact | Census indicators by neighbourhood |
| `fact_crime` | Fact | Crime stats by neighbourhood |
| `fact_rentals` | Fact | Rental data by CMHC zone |
| `fact_amenities` | Fact | Amenity counts by neighbourhood |
### dbt Layers
| Layer | Naming | Example |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | `stg_toronto__neighbourhoods` |
| Intermediate | `int_{domain}__{transform}` | `int_neighbourhood__demographics` |
| Marts | `mart_{domain}` | `mart_neighbourhood_overview` |
---
## Tech Stack
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | 2.x |
| ORM | SQLAlchemy | 2.x |
| Transformation | dbt-postgres | 1.7+ |
| Data Processing | Pandas, GeoPandas | Latest |
| Visualization | Dash + Plotly | 2.14+ |
| UI Components | dash-mantine-components | Latest |
| Testing | pytest | 7.0+ |
| Python | 3.11+ | Via pyenv |
---
@@ -23,293 +173,51 @@ Two-project analytics portfolio demonstrating end-to-end data engineering, visua
| Branch | Purpose | Deploys To |
|--------|---------|------------|
| `main` | Production releases only | VPS (production) |
| `main` | Production releases | VPS (production) |
| `staging` | Pre-production testing | VPS (staging) |
| `development` | Active development | Local only |
**Rules**:
- All feature branches created FROM `development`
- All feature branches merge INTO `development`
- `development``staging` for testing
- `staging``main` for release
- Direct commits to `main` or `staging` are forbidden
- Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}`
**Rules:**
- Feature branches from `development`: `feature/{sprint}-{description}`
- Merge into `development` when complete
- `development``staging` `main` for releases
- Never delete `development`
---
## Tech Stack (Locked)
## Code Standards
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | ≥2.0 |
| ORM | SQLAlchemy | ≥2.0 (2.0-style API only) |
| Transformation | dbt-postgres | ≥1.7 |
| Data Processing | Pandas | ≥2.1 |
| Geospatial | GeoPandas + Shapely | ≥0.14 |
| Visualization | Dash + Plotly | ≥2.14 |
| UI Components | dash-mantine-components | Latest stable |
| Testing | pytest | ≥7.0 |
| Python | 3.11+ | Via pyenv |
### Type Hints (Python 3.10+)
**Compatibility Notes**:
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
- PostGIS extension required—enable during db init
- Docker Compose V2 (no `version` field in compose files)
```python
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
...
```
---
### Imports
## Code Conventions
### Import Style
| Context | Style | Example |
|---------|-------|---------|
| Same directory | Single dot | `from .neighbourhood import NeighbourhoodParser` |
| Sibling directory | Double dot | `from ..schemas.neighbourhood import CensusRecord` |
| External packages | Absolute | `import pandas as pd` |
### Module Separation
| Directory | Contains | Purpose |
|-----------|----------|---------|
| `schemas/` | Pydantic models | Data validation |
| `models/` | SQLAlchemy ORM | Database persistence |
| `parsers/` | API/CSV extraction | Raw data ingestion |
| `loaders/` | Database operations | Data loading |
| `figures/` | Chart factories | Plotly figure generation |
| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` |
| `errors/` | Exceptions + handlers | Error handling |
### Code Standards
- **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`)
- **Functions**: Single responsibility, verb naming, early returns over nesting
- **Docstrings**: Google style, minimal—only for non-obvious behavior
- **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config
| Context | Style |
|---------|-------|
| Same directory | `from .module import X` |
| Sibling directory | `from ..schemas.model import Y` |
| External | `import pandas as pd` |
### Error Handling
```python
# errors/exceptions.py
class PortfolioError(Exception):
"""Base exception."""
class ParseError(PortfolioError):
"""PDF/CSV parsing failed."""
"""Data parsing failed."""
class ValidationError(PortfolioError):
"""Pydantic or business rule validation failed."""
"""Validation failed."""
class LoadError(PortfolioError):
"""Database load operation failed."""
"""Database load failed."""
```
- Decorators for infrastructure concerns (logging, retry, transactions)
- Explicit handling for domain logic (business rules, recovery strategies)
---
## Application Architecture
### Dash Pages Structure
```
portfolio_app/
├── app.py # Dash app factory with Pages routing
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images (auto-served by Dash)
├── pages/
│ ├── home.py # Bio landing page → /
│ ├── toronto/
│ │ ├── dashboard.py # Layout only → /toronto
│ │ └── callbacks/ # Interaction logic
│ └── energy/ # Phase 3
├── components/ # Shared UI (navbar, footer, cards)
├── figures/ # Shared chart factories
├── toronto/ # Toronto data logic
│ ├── parsers/
│ ├── loaders/
│ ├── schemas/ # Pydantic
│ └── models/ # SQLAlchemy
└── errors/
```
### URL Routing (Automatic)
| URL | Page | Status |
|-----|------|--------|
| `/` | Bio landing page | Sprint 2 |
| `/toronto` | Toronto Housing Dashboard | Sprint 6 |
| `/energy` | Energy Pricing Dashboard | Phase 3 |
---
## Phase 1: Toronto Neighbourhood Dashboard
### Data Sources
| Track | Source | Format | Geography | Frequency |
|-------|--------|--------|-----------|-----------|
| Rentals | CMHC Rental Market Survey | API/CSV | ~20 Zones | Annual |
| Neighbourhoods | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
| Policy Events | Curated list | CSV | N/A | Event-based |
### Geographic Reality
```
┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Neighbourhoods (158) │ ← Primary analysis unit
├─────────────────────────────────────────────────────────────────┤
│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data
└─────────────────────────────────────────────────────────────────┘
```
### Data Model (Star Schema)
| Table | Type | Keys |
|-------|------|------|
| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone |
| `dim_time` | Dimension | date_key (PK) |
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
| `dim_policy_event` | Dimension | event_id (PK) |
### dbt Layer Structure
| Layer | Naming | Purpose |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
| Intermediate | `int_{domain}__{transform}` | Business logic, filtering |
| Marts | `mart_{domain}` | Final analytical tables |
---
## Sprint Overview
| Sprint | Focus | Milestone |
|--------|-------|-----------|
| 1-6 | Foundation and initial dashboard | **Launch 1: Bio Live** |
| 7 | Navigation & theme modernization | — |
| 8 | Portfolio website expansion | **Launch 2: Website Live** |
| 9 | Neighbourhood dashboard transition | Cleanup complete |
| 10+ | Dashboard implementation | **Launch 3: Dashboard Live** |
---
## Scope Boundaries
### Phase 1 — Build These
- Bio landing page and portfolio website
- CMHC rental data processor
- Toronto neighbourhood data integration
- PostgreSQL + PostGIS database layer
- Star schema (facts + dimensions)
- dbt models with tests
- Choropleth visualization (Dash)
- Policy event annotation layer
### Deferred Features
| Feature | Reason | When |
|---------|--------|------|
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Future phase |
| ML prediction models | Energy project scope | Phase 3 |
| Multi-project shared infrastructure | Build first, abstract second | Future |
If a task seems to require deferred features, **stop and flag it**.
---
## File Structure
### Root-Level Files (Allowed)
| File | Purpose |
|------|---------|
| `README.md` | Project overview |
| `CLAUDE.md` | AI assistant context |
| `pyproject.toml` | Python packaging |
| `.gitignore` | Git ignore rules |
| `.env.example` | Environment template |
| `.python-version` | pyenv version |
| `.pre-commit-config.yaml` | Pre-commit hooks |
| `docker-compose.yml` | Container orchestration |
| `Makefile` | Task automation |
### Directory Structure
```
portfolio/
├── portfolio_app/ # Monolithic Dash application
│ ├── app.py
│ ├── config.py
│ ├── assets/
│ ├── pages/
│ ├── components/
│ ├── figures/
│ ├── toronto/
│ └── errors/
├── tests/
├── dbt/
├── data/
│ └── toronto/
│ ├── raw/
│ ├── processed/ # gitignored
│ └── reference/
├── scripts/
│ ├── db/
│ ├── docker/
│ ├── deploy/
│ ├── dbt/
│ └── dev/
├── docs/
├── notebooks/
├── backups/ # gitignored
└── reports/ # gitignored
```
### Gitignored Directories
- `data/*/processed/`
- `reports/`
- `backups/`
- `notebooks/*.html`
- `.env`
- `__pycache__/`
- `.venv/`
---
## Makefile Targets
| Target | Purpose |
|--------|---------|
| `setup` | Install deps, create .env, init pre-commit |
| `docker-up` | Start PostgreSQL + PostGIS |
| `docker-down` | Stop containers |
| `db-init` | Initialize database schema |
| `run` | Start Dash dev server |
| `test` | Run pytest |
| `dbt-run` | Run dbt models |
| `dbt-test` | Run dbt tests |
| `lint` | Run ruff linter |
| `format` | Run ruff formatter |
| `ci` | Run all checks |
| `deploy` | Deploy to production |
---
## Script Standards
All scripts in `scripts/`:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use `set -euo pipefail` for bash
- Log to stdout, errors to stderr
---
## Environment Variables
@@ -328,41 +236,52 @@ LOG_LEVEL=INFO
---
## Success Criteria
## Makefile Targets
### Launch 1 (Bio Live)
- [x] Bio page accessible via HTTPS
- [x] All bio content rendered
- [x] No placeholder text visible
- [x] Mobile responsive
- [x] Social links functional
### Launch 2 (Website Live)
- [x] Full portfolio website with navigation
- [x] About, Contact, Projects, Resume, Blog pages
- [x] Dark mode theme support
- [x] Sidebar navigation
### Launch 3 (Dashboard Live)
- [ ] Choropleth renders neighbourhoods and CMHC zones
- [ ] Rental data visualization works
- [ ] Time navigation works
- [ ] Policy event markers visible
- [ ] Methodology documentation published
- [ ] Data sources cited
| Target | Purpose |
|--------|---------|
| `setup` | Install deps, create .env, init pre-commit |
| `docker-up` | Start PostgreSQL + PostGIS |
| `docker-down` | Stop containers |
| `db-init` | Initialize database schema |
| `run` | Start Dash dev server |
| `test` | Run pytest |
| `dbt-run` | Run dbt models |
| `dbt-test` | Run dbt tests |
| `lint` | Run ruff linter |
| `format` | Run ruff formatter |
| `ci` | Run all checks |
---
## Reference Documents
## Next Steps
For detailed specifications, see:
### Deployment (Sprint 10+)
- [ ] Production Docker configuration
- [ ] CI/CD pipeline
- [ ] HTTPS/SSL setup
- [ ] Domain configuration
| Document | Location | Use When |
|----------|----------|----------|
| Dashboard vision | `docs/changes/Change-Toronto-Analysis.md` | Dashboard specification |
| Implementation plan | `docs/changes/Change-Toronto-Analysis-Reviewed.md` | Sprint planning |
### Data Enhancement
- [ ] Connect to live APIs (currently using demo data)
- [ ] Data refresh automation
- [ ] Historical data loading
### Future Projects
- Energy Pricing Analysis dashboard (planned)
---
*Reference Version: 2.0*
*Updated: Sprint 9*
## Related Documents
| Document | Purpose |
|----------|---------|
| `README.md` | Quick start guide |
| `CLAUDE.md` | AI assistant context |
| `docs/CONTRIBUTING.md` | Developer guide |
| `notebooks/README.md` | Notebook documentation |
---
*Reference Version: 3.0*
*Updated: January 2026*