feat: project bootstrap and structure

Sprint 1 initialization:
- Project directory structure (portfolio_app/, tests/, dbt/, data/, scripts/)
- CLAUDE.md with AI assistant context
- pyproject.toml with all dependencies
- docker-compose.yml for PostgreSQL 16 + PostGIS
- Makefile with standard targets
- Pre-commit configuration (ruff, mypy)
- Environment template (.env.example)
- Error handling foundation (PortfolioError hierarchy)
- Test configuration (conftest.py, pytest config)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-11 13:49:28 -05:00
parent 01a0984333
commit c7e9b88adb
38 changed files with 709 additions and 1 deletions

258
CLAUDE.md Normal file
View File

@@ -0,0 +1,258 @@
# CLAUDE.md
Working context for Claude Code on the Analytics Portfolio project.
---
## Project Status
**Current Sprint**: 1 (Project Bootstrap)
**Phase**: 1 - Toronto Housing Dashboard
**Branch**: `development` (feature branches merge here)
---
## Quick Reference
### Run Commands
```bash
make setup # Install deps, create .env, init pre-commit
make docker-up # Start PostgreSQL + PostGIS
make docker-down # Stop containers
make db-init # Initialize database schema
make run # Start Dash dev server
make test # Run pytest
make lint # Run ruff linter
make format # Run ruff formatter
make ci # Run all checks
```
### Branch Workflow
1. Create feature branch FROM `development`: `git checkout -b feature/{sprint}-{description}`
2. Work and commit on feature branch
3. Merge INTO `development` when complete
4. `development` -> `staging` -> `main` for releases
---
## Code Conventions
### Import Style
| Context | Style | Example |
|---------|-------|---------|
| Same directory | Single dot | `from .trreb import TRREBParser` |
| Sibling directory | Double dot | `from ..schemas.trreb import TRREBRecord` |
| External packages | Absolute | `import pandas as pd` |
### Module Responsibilities
| Directory | Contains | Purpose |
|-----------|----------|---------|
| `schemas/` | Pydantic models | Data validation |
| `models/` | SQLAlchemy ORM | Database persistence |
| `parsers/` | PDF/CSV extraction | Raw data ingestion |
| `loaders/` | Database operations | Data loading |
| `figures/` | Chart factories | Plotly figure generation |
| `callbacks/` | Dash callbacks | In `pages/{dashboard}/callbacks/` |
| `errors/` | Exceptions + handlers | Error handling |
### Type Hints
Use Python 3.10+ style:
```python
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
...
```
### Error Handling
```python
# errors/exceptions.py
class PortfolioError(Exception):
"""Base exception."""
class ParseError(PortfolioError):
"""PDF/CSV parsing failed."""
class ValidationError(PortfolioError):
"""Pydantic or business rule validation failed."""
class LoadError(PortfolioError):
"""Database load operation failed."""
```
### Code Standards
- Single responsibility functions with verb naming
- Early returns over deep nesting
- Google-style docstrings only for non-obvious behavior
- Module-level constants for magic values
- Pydantic BaseSettings for runtime config
---
## Application Structure
```
portfolio_app/
├── app.py # Dash app factory with Pages routing
├── config.py # Pydantic BaseSettings
├── assets/ # CSS, images (auto-served)
├── pages/
│ ├── home.py # Bio landing page -> /
│ └── toronto/
│ ├── dashboard.py # Layout only -> /toronto
│ └── callbacks/ # Interaction logic
├── components/ # Shared UI (navbar, footer, cards)
├── figures/ # Shared chart factories
├── toronto/ # Toronto data logic
│ ├── parsers/
│ ├── loaders/
│ ├── schemas/ # Pydantic
│ └── models/ # SQLAlchemy
└── errors/
```
### URL Routing
| URL | Page | Sprint |
|-----|------|--------|
| `/` | Bio landing page | 2 |
| `/toronto` | Toronto Housing Dashboard | 6 |
---
## Tech Stack (Locked)
| Layer | Technology | Version |
|-------|------------|---------|
| Database | PostgreSQL + PostGIS | 16.x |
| Validation | Pydantic | >=2.0 |
| ORM | SQLAlchemy | >=2.0 (2.0-style API only) |
| Transformation | dbt-postgres | >=1.7 |
| Data Processing | Pandas | >=2.1 |
| Geospatial | GeoPandas + Shapely | >=0.14 |
| Visualization | Dash + Plotly | >=2.14 |
| UI Components | dash-mantine-components | Latest stable |
| Testing | pytest | >=7.0 |
| Python | 3.11+ | Via pyenv |
**Notes**:
- SQLAlchemy 2.0 + Pydantic 2.0 only (never mix 1.x APIs)
- PostGIS extension required in database
- Docker Compose V2 format (no `version` field)
---
## Data Model Overview
### Geographic Reality (Toronto Housing)
```
TRREB Districts (~35) - Purchase data (W01, C01, E01...)
CMHC Zones (~20) - Rental data (Census Tract aligned)
City Neighbourhoods (158) - Enrichment/overlay only
```
**Critical**: These geographies do NOT align. Display as separate layers—do not force crosswalks.
### Star Schema
| Table | Type | Keys |
|-------|------|------|
| `fact_purchases` | Fact | -> dim_time, dim_trreb_district |
| `fact_rentals` | Fact | -> dim_time, dim_cmhc_zone |
| `dim_time` | Dimension | date_key (PK) |
| `dim_trreb_district` | Dimension | district_key (PK), geometry |
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
| `dim_policy_event` | Dimension | event_id (PK) |
**V1 Rule**: `dim_neighbourhood` has NO FK to fact tables—reference overlay only.
### dbt Layers
| Layer | Naming | Purpose |
|-------|--------|---------|
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
| Intermediate | `int_{domain}__{transform}` | Business logic |
| Marts | `mart_{domain}` | Final analytical tables |
---
## DO NOT BUILD (Phase 1)
**Stop and flag if a task seems to require these**:
| Feature | Reason |
|---------|--------|
| `bridge_district_neighbourhood` table | Area-weighted aggregation is Phase 4 |
| Crime data integration | Deferred to Phase 4 |
| Historical boundary reconciliation (140->158) | 2021+ data only for V1 |
| ML prediction models | Energy project scope (Phase 3) |
| Multi-project shared infrastructure | Build first, abstract second (Phase 2) |
---
## Sprint 1 Deliverables
| Category | Tasks |
|----------|-------|
| **Bootstrap** | Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md |
| **Infrastructure** | Docker Compose (PostgreSQL + PostGIS), scripts/ directory |
| **App Foundation** | portfolio_app/ structure, config.py, error handling |
| **Tests** | tests/ directory, conftest.py, pytest config |
| **Data Acquisition** | Download TRREB PDFs, START boundary digitization (HUMAN task) |
### Human Tasks (Cannot Automate)
| Task | Tool | Effort |
|------|------|--------|
| Digitize TRREB district boundaries | QGIS | 3-4 hours |
| Research policy events (10-20) | Manual | 2-3 hours |
| Replace social link placeholders | Manual | 5 minutes |
---
## Environment Variables
Required in `.env`:
```bash
DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
POSTGRES_USER=portfolio
POSTGRES_PASSWORD=<secure>
POSTGRES_DB=portfolio
DASH_DEBUG=true
SECRET_KEY=<random>
LOG_LEVEL=INFO
```
---
## Script Standards
All scripts in `scripts/`:
- Include usage comments at top
- Idempotent where possible
- Exit codes: 0 = success, 1 = error
- Use `set -euo pipefail` for bash
- Log to stdout, errors to stderr
---
## Reference Documents
| Document | Location | Use When |
|----------|----------|----------|
| Full specification | `docs/PROJECT_REFERENCE.md` | Architecture decisions |
| Data schemas | `docs/toronto_housing_spec.md` | Parser/model tasks |
| WBS details | `docs/wbs.md` | Sprint planning |
| Bio content | `docs/bio_content.md` | Building home.py |
---
*Last Updated: Sprint 1*