Merge pull request 'uploaded initial documentation.' (#2) from init-setup into development

Reviewed-on: lmiranda/personal-portfolio#2
This commit was merged in pull request #2.
This commit is contained in:
2026-01-11 18:41:21 +00:00

396
docs/PROJECT_REFERENCE.md Normal file
View File

@@ -0,0 +1,396 @@
# Portfolio Project Reference
**Project**: Analytics Portfolio
**Owner**: Leo
**Status**: Ready for Sprint 1
---
## Project Overview
Two-project analytics portfolio demonstrating end-to-end data engineering, visualization, and ML capabilities.
| Project | Domain | Key Skills | Phase |
|---------|--------|------------|-------|
| **Toronto Housing Dashboard** | Real estate | ETL, dimensional modeling, geospatial, choropleth | Phase 1 (Active) |
| **Energy Pricing Analysis** | Utility markets | Time series, ML prediction, API integration | Phase 3 (Future) |
**Platform**: Monolithic Dash application on self-hosted VPS (bio landing page + dashboards).
---
## Branching Strategy
| Branch | Purpose | Deploys To |
|--------|---------|------------|
| `main` | Production releases only | VPS (production) |
| `staging` | Pre-production testing | VPS (staging) |
| `development` | Active development | Local only |
**Rules**:
- All feature branches created FROM `development`
- All feature branches merge INTO `development`
- `development``staging` for testing
- `staging``main` for release
- Direct commits to `main` or `staging` are forbidden
- Branch naming: `feature/{sprint}-{description}` or `fix/{issue-id}`
---
## Tech Stack (Locked)
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | ≥2.0 |
| ORM | SQLAlchemy | ≥2.0 (2.0-style API only) |
| Transformation | dbt-postgres | ≥1.7 |
| Data Processing | Pandas | ≥2.1 |
| Geospatial | GeoPandas + Shapely | ≥0.14 |
| Visualization | Dash + Plotly | ≥2.14 |
| UI Components | dash-mantine-components | Latest stable |
| Testing | pytest | ≥7.0 |
| Python | 3.11+ | Via pyenv |
**Compatibility Notes**:
- SQLAlchemy 2.0 + Pydantic 2.0 integrate well—never mix 1.x APIs
- PostGIS extension required—enable during db init
- Docker Compose V2 (no `version` field in compose files)
---
## Code Conventions
### Import Style
| Context | Style | Example |
|---------|-------|---------|
| Same directory | Single dot | `from .trreb import TRREBParser` |
| Sibling directory | Double dot | `from ..schemas.trreb import TRREBRecord` |
| External packages | Absolute | `import pandas as pd` |
### Module Separation
| Directory | Contains | Purpose |
|-----------|----------|---------|
| `schemas/` | Pydantic models | Data validation |
| `models/` | SQLAlchemy ORM | Database persistence |
| `parsers/` | PDF/CSV extraction | Raw data ingestion |
| `loaders/` | Database operations | Data loading |
| `figures/` | Chart factories | Plotly figure generation |
| `callbacks/` | Dash callbacks | Per-dashboard, in `pages/{dashboard}/callbacks/` |
| `errors/` | Exceptions + handlers | Error handling |
### Code Standards
- **Type hints**: Mandatory, Python 3.10+ style (`list[str]`, `dict[str, int]`, `X | None`)
- **Functions**: Single responsibility, verb naming, early returns over nesting
- **Docstrings**: Google style, minimal—only for non-obvious behavior
- **Constants**: Module-level for magic values, Pydantic BaseSettings for runtime config
### Error Handling
```python
# errors/exceptions.py
class PortfolioError(Exception):
"""Base exception."""
class ParseError(PortfolioError):
"""PDF/CSV parsing failed."""
class ValidationError(PortfolioError):
"""Pydantic or business rule validation failed."""
class LoadError(PortfolioError):
"""Database load operation failed."""
```
- Decorators for infrastructure concerns (logging, retry, transactions)
- Explicit handling for domain logic (business rules, recovery strategies)
---
## Application Architecture
### Dash Pages Structure
```
portfolio_app/
├── app.py # Dash app factory with Pages routing
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images (auto-served by Dash)
├── pages/
│ ├── home.py # Bio landing page → /
│ ├── toronto/
│ │ ├── dashboard.py # Layout only → /toronto
│ │ └── callbacks/ # Interaction logic
│ └── energy/ # Phase 3
├── components/ # Shared UI (navbar, footer, cards)
├── figures/ # Shared chart factories
├── toronto/ # Toronto data logic
│ ├── parsers/
│ ├── loaders/
│ ├── schemas/ # Pydantic
│ └── models/ # SQLAlchemy
└── errors/
```
### URL Routing (Automatic)
| URL | Page | Status |
|-----|------|--------|
| `/` | Bio landing page | Sprint 2 |
| `/toronto` | Toronto Housing Dashboard | Sprint 6 |
| `/energy` | Energy Pricing Dashboard | Phase 3 |
---
## Phase 1: Toronto Housing Dashboard
### Data Sources
| Track | Source | Format | Geography | Frequency |
|-------|--------|--------|-----------|-----------|
| Purchases | TRREB Monthly Reports | PDF | ~35 Districts | Monthly |
| Rentals | CMHC Rental Market Survey | CSV | ~20 Zones | Annual |
| Enrichment | City of Toronto Open Data | GeoJSON/CSV | 158 Neighbourhoods | Census |
| Policy Events | Curated list | CSV | N/A | Event-based |
### Geographic Reality
```
┌─────────────────────────────────────────────────────────────────┐
│ City of Toronto Neighbourhoods (158) │ ← Enrichment only
├─────────────────────────────────────────────────────────────────┤
│ TRREB Districts (~35) — W01, C01, E01, etc. │ ← Purchase data
├─────────────────────────────────────────────────────────────────┤
│ CMHC Zones (~20) — Census Tract aligned │ ← Rental data
└─────────────────────────────────────────────────────────────────┘
```
**Critical**: These geographies do NOT align. Display as separate layers with toggle—do not force crosswalks.
### Data Model (Star Schema)
| Table | Type | Keys |
|-------|------|------|
| `fact_purchases` | Fact | → dim_time, dim_trreb_district |
| `fact_rentals` | Fact | → dim_time, dim_cmhc_zone |
| `dim_time` | Dimension | date_key (PK) |
| `dim_trreb_district` | Dimension | district_key (PK), geometry |
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
| `dim_policy_event` | Dimension | event_id (PK) |
**V1 Rule**: `dim_neighbourhood` has NO FK to fact tables—reference overlay only.
### dbt Layer Structure
| Layer | Naming | Purpose |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
| Intermediate | `int_{domain}__{transform}` | Business logic, filtering |
| Marts | `mart_{domain}` | Final analytical tables |
---
## Sprint Overview
| Sprint | Focus | Milestone |
|--------|-------|-----------|
| 1 | Project bootstrap, start TRREB digitization | — |
| 2 | Bio page, data acquisition | **Launch 1: Bio Live** |
| 3 | Parsers, schemas, models | — |
| 4 | Loaders, dbt | — |
| 5 | Visualization | — |
| 6 | Polish, deploy dashboard | **Launch 2: Dashboard Live** |
| 7 | Buffer | — |
### Sprint 1 Deliverables
| Category | Tasks |
|----------|-------|
| **Bootstrap** | Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md |
| **Infrastructure** | Docker Compose (PostgreSQL + PostGIS), scripts/ directory |
| **App Foundation** | portfolio_app/ structure, config.py, error handling |
| **Tests** | tests/ directory, conftest.py, pytest config |
| **Data Acquisition** | Download TRREB PDFs, START boundary digitization (HUMAN task) |
### Human Tasks (Cannot Automate)
| Task | Tool | Effort |
|------|------|--------|
| Digitize TRREB district boundaries | QGIS | 3-4 hours |
| Research policy events (10-20) | Manual research | 2-3 hours |
| Replace social link placeholders | Manual | 5 minutes |
---
## Scope Boundaries
### Phase 1 — Build These
- Bio landing page with content from bio_content_v2.md
- TRREB PDF parser
- CMHC CSV processor
- PostgreSQL + PostGIS database layer
- Star schema (facts + dimensions)
- dbt models with tests
- Choropleth visualization (Dash)
- Policy event annotation layer
- Neighbourhood overlay (toggle-able)
### Phase 1 — Do NOT Build
| Feature | Reason | When |
|---------|--------|------|
| `bridge_district_neighbourhood` table | Area-weighted aggregation is Phase 4 | After Energy project |
| Crime data integration | Deferred scope | Phase 4 |
| Historical boundary reconciliation (140→158) | 2021+ data only for V1 | Phase 4 |
| ML prediction models | Energy project scope | Phase 3 |
| Multi-project shared infrastructure | Build first, abstract second | Phase 2 |
If a task seems to require Phase 3/4 features, **stop and flag it**.
---
## File Structure
### Root-Level Files (Allowed)
| File | Purpose |
|------|---------|
| `README.md` | Project overview |
| `CLAUDE.md` | AI assistant context |
| `pyproject.toml` | Python packaging |
| `.gitignore` | Git ignore rules |
| `.env.example` | Environment template |
| `.python-version` | pyenv version |
| `.pre-commit-config.yaml` | Pre-commit hooks |
| `docker-compose.yml` | Container orchestration |
| `Makefile` | Task automation |
### Directory Structure
```
portfolio/
├── portfolio_app/ # Monolithic Dash application
│ ├── app.py
│ ├── config.py
│ ├── assets/
│ ├── pages/
│ ├── components/
│ ├── figures/
│ ├── toronto/
│ └── errors/
├── tests/
├── dbt/
├── data/
│ └── toronto/
│ ├── raw/
│ ├── processed/ # gitignored
│ └── reference/
├── scripts/
│ ├── db/
│ ├── docker/
│ ├── deploy/
│ ├── dbt/
│ └── dev/
├── docs/
├── notebooks/
├── backups/ # gitignored
└── reports/ # gitignored
```
### Gitignored Directories
- `data/*/processed/`
- `reports/`
- `backups/`
- `notebooks/*.html`
- `.env`
- `__pycache__/`
- `.venv/`
---
## Makefile Targets
| Target | Purpose |
|--------|---------|
| `setup` | Install deps, create .env, init pre-commit |
| `docker-up` | Start PostgreSQL + PostGIS |
| `docker-down` | Stop containers |
| `db-init` | Initialize database schema |
| `run` | Start Dash dev server |
| `test` | Run pytest |
| `dbt-run` | Run dbt models |
| `dbt-test` | Run dbt tests |
| `lint` | Run ruff linter |
| `format` | Run ruff formatter |
| `ci` | Run all checks |
| `deploy` | Deploy to production |
---
## Script Standards
All scripts in `scripts/`:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use `set -euo pipefail` for bash
- Log to stdout, errors to stderr
---
## Environment Variables
Required in `.env`:
```bash
DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
LOG_LEVEL=INFO
```
---
## Success Criteria
### Launch 1 (Sprint 2)
- [ ] Bio page accessible via HTTPS
- [ ] All bio content rendered (from bio_content_v2.md)
- [ ] No placeholder text visible
- [ ] Mobile responsive
- [ ] Social links functional
### Launch 2 (Sprint 6)
- [ ] Choropleth renders TRREB districts and CMHC zones
- [ ] Purchase/rental mode toggle works
- [ ] Time navigation works
- [ ] Policy event markers visible
- [ ] Neighbourhood overlay toggleable
- [ ] Methodology documentation published
- [ ] Data sources cited
---
## Reference Documents
For detailed specifications, see:
| Document | Location | Use When |
|----------|----------|----------|
| Data schemas | `docs/toronto_housing_spec.md` | Parser/model tasks |
| WBS details | `docs/wbs.md` | Sprint planning |
| Bio content | `docs/bio_content.md` | Building home.py |
---
*Reference Version: 1.0*
*Created: January 2026*