feat: project bootstrap and structure
Sprint 1 initialization: - Project directory structure (portfolio_app/, tests/, dbt/, data/, scripts/) - CLAUDE.md with AI assistant context - pyproject.toml with all dependencies - docker-compose.yml for PostgreSQL 16 + PostGIS - Makefile with standard targets - Pre-commit configuration (ruff, mypy) - Environment template (.env.example) - Error handling foundation (PortfolioError hierarchy) - Test configuration (conftest.py, pytest config) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
258
CLAUDE.md
Normal file
258
CLAUDE.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# CLAUDE.md
|
||||
|
||||
Working context for Claude Code on the Analytics Portfolio project.
|
||||
|
||||
---
|
||||
|
||||
## Project Status
|
||||
|
||||
**Current Sprint**: 1 (Project Bootstrap)
|
||||
**Phase**: 1 - Toronto Housing Dashboard
|
||||
**Branch**: `development` (feature branches merge here)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Run Commands
|
||||
|
||||
```bash
|
||||
make setup # Install deps, create .env, init pre-commit
|
||||
make docker-up # Start PostgreSQL + PostGIS
|
||||
make docker-down # Stop containers
|
||||
make db-init # Initialize database schema
|
||||
make run # Start Dash dev server
|
||||
make test # Run pytest
|
||||
make lint # Run ruff linter
|
||||
make format # Run ruff formatter
|
||||
make ci # Run all checks
|
||||
```
|
||||
|
||||
### Branch Workflow
|
||||
|
||||
1. Create feature branch FROM `development`: `git checkout -b feature/{sprint}-{description}`
|
||||
2. Work and commit on feature branch
|
||||
3. Merge INTO `development` when complete
|
||||
4. `development` -> `staging` -> `main` for releases
|
||||
|
||||
---
|
||||
|
||||
## Code Conventions
|
||||
|
||||
### Import Style
|
||||
|
||||
| Context | Style | Example |
|
||||
|---------|-------|---------|
|
||||
| Same directory | Single dot | `from .trreb import TRREBParser` |
|
||||
| Sibling directory | Double dot | `from ..schemas.trreb import TRREBRecord` |
|
||||
| External packages | Absolute | `import pandas as pd` |
|
||||
|
||||
### Module Responsibilities
|
||||
|
||||
| Directory | Contains | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| `schemas/` | Pydantic models | Data validation |
|
||||
| `models/` | SQLAlchemy ORM | Database persistence |
|
||||
| `parsers/` | PDF/CSV extraction | Raw data ingestion |
|
||||
| `loaders/` | Database operations | Data loading |
|
||||
| `figures/` | Chart factories | Plotly figure generation |
|
||||
| `callbacks/` | Dash callbacks | In `pages/{dashboard}/callbacks/` |
|
||||
| `errors/` | Exceptions + handlers | Error handling |
|
||||
|
||||
### Type Hints
|
||||
|
||||
Use Python 3.10+ style:
|
||||
```python
|
||||
def process(items: list[str], config: dict[str, int] | None = None) -> bool:
|
||||
...
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
```python
|
||||
# errors/exceptions.py
|
||||
class PortfolioError(Exception):
|
||||
"""Base exception."""
|
||||
|
||||
class ParseError(PortfolioError):
|
||||
"""PDF/CSV parsing failed."""
|
||||
|
||||
class ValidationError(PortfolioError):
|
||||
"""Pydantic or business rule validation failed."""
|
||||
|
||||
class LoadError(PortfolioError):
|
||||
"""Database load operation failed."""
|
||||
```
|
||||
|
||||
### Code Standards
|
||||
|
||||
- Single responsibility functions with verb naming
|
||||
- Early returns over deep nesting
|
||||
- Google-style docstrings only for non-obvious behavior
|
||||
- Module-level constants for magic values
|
||||
- Pydantic BaseSettings for runtime config
|
||||
|
||||
---
|
||||
|
||||
## Application Structure
|
||||
|
||||
```
|
||||
portfolio_app/
|
||||
├── app.py # Dash app factory with Pages routing
|
||||
├── config.py # Pydantic BaseSettings
|
||||
├── assets/ # CSS, images (auto-served)
|
||||
├── pages/
|
||||
│ ├── home.py # Bio landing page -> /
|
||||
│ └── toronto/
|
||||
│ ├── dashboard.py # Layout only -> /toronto
|
||||
│ └── callbacks/ # Interaction logic
|
||||
├── components/ # Shared UI (navbar, footer, cards)
|
||||
├── figures/ # Shared chart factories
|
||||
├── toronto/ # Toronto data logic
|
||||
│ ├── parsers/
|
||||
│ ├── loaders/
|
||||
│ ├── schemas/ # Pydantic
|
||||
│ └── models/ # SQLAlchemy
|
||||
└── errors/
|
||||
```
|
||||
|
||||
### URL Routing
|
||||
|
||||
| URL | Page | Sprint |
|
||||
|-----|------|--------|
|
||||
| `/` | Bio landing page | 2 |
|
||||
| `/toronto` | Toronto Housing Dashboard | 6 |
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack (Locked)
|
||||
|
||||
| Layer | Technology | Version |
|
||||
|-------|------------|---------|
|
||||
| Database | PostgreSQL + PostGIS | 16.x |
|
||||
| Validation | Pydantic | >=2.0 |
|
||||
| ORM | SQLAlchemy | >=2.0 (2.0-style API only) |
|
||||
| Transformation | dbt-postgres | >=1.7 |
|
||||
| Data Processing | Pandas | >=2.1 |
|
||||
| Geospatial | GeoPandas + Shapely | >=0.14 |
|
||||
| Visualization | Dash + Plotly | >=2.14 |
|
||||
| UI Components | dash-mantine-components | Latest stable |
|
||||
| Testing | pytest | >=7.0 |
|
||||
| Python | 3.11+ | Via pyenv |
|
||||
|
||||
**Notes**:
|
||||
- SQLAlchemy 2.0 + Pydantic 2.0 only (never mix 1.x APIs)
|
||||
- PostGIS extension required in database
|
||||
- Docker Compose V2 format (no `version` field)
|
||||
|
||||
---
|
||||
|
||||
## Data Model Overview
|
||||
|
||||
### Geographic Reality (Toronto Housing)
|
||||
|
||||
```
|
||||
TRREB Districts (~35) - Purchase data (W01, C01, E01...)
|
||||
CMHC Zones (~20) - Rental data (Census Tract aligned)
|
||||
City Neighbourhoods (158) - Enrichment/overlay only
|
||||
```
|
||||
|
||||
**Critical**: These geographies do NOT align. Display as separate layers—do not force crosswalks.
|
||||
|
||||
### Star Schema
|
||||
|
||||
| Table | Type | Keys |
|
||||
|-------|------|------|
|
||||
| `fact_purchases` | Fact | -> dim_time, dim_trreb_district |
|
||||
| `fact_rentals` | Fact | -> dim_time, dim_cmhc_zone |
|
||||
| `dim_time` | Dimension | date_key (PK) |
|
||||
| `dim_trreb_district` | Dimension | district_key (PK), geometry |
|
||||
| `dim_cmhc_zone` | Dimension | zone_key (PK), geometry |
|
||||
| `dim_neighbourhood` | Dimension | neighbourhood_id (PK), geometry |
|
||||
| `dim_policy_event` | Dimension | event_id (PK) |
|
||||
|
||||
**V1 Rule**: `dim_neighbourhood` has NO FK to fact tables—reference overlay only.
|
||||
|
||||
### dbt Layers
|
||||
|
||||
| Layer | Naming | Purpose |
|
||||
|-------|--------|---------|
|
||||
| Staging | `stg_{source}__{entity}` | 1:1 source, cleaned, typed |
|
||||
| Intermediate | `int_{domain}__{transform}` | Business logic |
|
||||
| Marts | `mart_{domain}` | Final analytical tables |
|
||||
|
||||
---
|
||||
|
||||
## DO NOT BUILD (Phase 1)
|
||||
|
||||
**Stop and flag if a task seems to require these**:
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| `bridge_district_neighbourhood` table | Area-weighted aggregation is Phase 4 |
|
||||
| Crime data integration | Deferred to Phase 4 |
|
||||
| Historical boundary reconciliation (140->158) | 2021+ data only for V1 |
|
||||
| ML prediction models | Energy project scope (Phase 3) |
|
||||
| Multi-project shared infrastructure | Build first, abstract second (Phase 2) |
|
||||
|
||||
---
|
||||
|
||||
## Sprint 1 Deliverables
|
||||
|
||||
| Category | Tasks |
|
||||
|----------|-------|
|
||||
| **Bootstrap** | Git init, pyproject.toml, .env.example, Makefile, CLAUDE.md |
|
||||
| **Infrastructure** | Docker Compose (PostgreSQL + PostGIS), scripts/ directory |
|
||||
| **App Foundation** | portfolio_app/ structure, config.py, error handling |
|
||||
| **Tests** | tests/ directory, conftest.py, pytest config |
|
||||
| **Data Acquisition** | Download TRREB PDFs, START boundary digitization (HUMAN task) |
|
||||
|
||||
### Human Tasks (Cannot Automate)
|
||||
|
||||
| Task | Tool | Effort |
|
||||
|------|------|--------|
|
||||
| Digitize TRREB district boundaries | QGIS | 3-4 hours |
|
||||
| Research policy events (10-20) | Manual | 2-3 hours |
|
||||
| Replace social link placeholders | Manual | 5 minutes |
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Required in `.env`:
|
||||
|
||||
```bash
|
||||
DATABASE_URL=postgresql://user:pass@localhost:5432/portfolio
|
||||
POSTGRES_USER=portfolio
|
||||
POSTGRES_PASSWORD=<secure>
|
||||
POSTGRES_DB=portfolio
|
||||
DASH_DEBUG=true
|
||||
SECRET_KEY=<random>
|
||||
LOG_LEVEL=INFO
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Script Standards
|
||||
|
||||
All scripts in `scripts/`:
|
||||
- Include usage comments at top
|
||||
- Idempotent where possible
|
||||
- Exit codes: 0 = success, 1 = error
|
||||
- Use `set -euo pipefail` for bash
|
||||
- Log to stdout, errors to stderr
|
||||
|
||||
---
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Location | Use When |
|
||||
|----------|----------|----------|
|
||||
| Full specification | `docs/PROJECT_REFERENCE.md` | Architecture decisions |
|
||||
| Data schemas | `docs/toronto_housing_spec.md` | Parser/model tasks |
|
||||
| WBS details | `docs/wbs.md` | Sprint planning |
|
||||
| Bio content | `docs/bio_content.md` | Building home.py |
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: Sprint 1*
|
||||
Reference in New Issue
Block a user