Create data loading pipeline script #65

Closed
opened 2026-01-17 16:07:39 +00:00 by lmiranda · 0 comments
Owner

Summary

Create an orchestration script that fetches data from Toronto Open Data and Toronto Police APIs, loads it into PostgreSQL, and runs dbt to populate the mart tables.

Files to Create

File Purpose
scripts/data/load_toronto_data.py Main orchestration script
scripts/data/__init__.py Package init

Script Workflow

  1. Fetch from APIs:

    • Neighbourhoods (boundaries)
    • Census profiles (demographics)
    • Crime rates (safety)
    • Parks, Schools, Childcare (amenities)
    • CMHC rentals (housing)
  2. Load to Database:

    • Use existing loaders (census.py, crime.py, amenities.py, etc.)
    • Upsert to avoid duplicates
  3. Run dbt:

    • Execute dbt run to transform staging → intermediate → marts
    • Validate with dbt test

Acceptance Criteria

  • Script can be run via python scripts/data/load_toronto_data.py
  • Supports --skip-fetch flag to only run dbt
  • Supports --skip-dbt flag to only load data
  • Logs progress and errors clearly
  • Handles API failures gracefully with retries
  • Makefile target added: make load-data

Technical Notes

  • Use existing parsers and loaders from portfolio_app/toronto/
  • Consider adding progress bars for long-running operations
  • Cache API responses to avoid re-fetching during development

Labels: type:feature, component:backend, priority:high, tech:python, tech:dbt

## Summary Create an orchestration script that fetches data from Toronto Open Data and Toronto Police APIs, loads it into PostgreSQL, and runs dbt to populate the mart tables. ## Files to Create | File | Purpose | |------|---------| | `scripts/data/load_toronto_data.py` | Main orchestration script | | `scripts/data/__init__.py` | Package init | ## Script Workflow 1. **Fetch from APIs:** - Neighbourhoods (boundaries) - Census profiles (demographics) - Crime rates (safety) - Parks, Schools, Childcare (amenities) - CMHC rentals (housing) 2. **Load to Database:** - Use existing loaders (census.py, crime.py, amenities.py, etc.) - Upsert to avoid duplicates 3. **Run dbt:** - Execute `dbt run` to transform staging → intermediate → marts - Validate with `dbt test` ## Acceptance Criteria - [ ] Script can be run via `python scripts/data/load_toronto_data.py` - [ ] Supports `--skip-fetch` flag to only run dbt - [ ] Supports `--skip-dbt` flag to only load data - [ ] Logs progress and errors clearly - [ ] Handles API failures gracefully with retries - [ ] Makefile target added: `make load-data` ## Technical Notes - Use existing parsers and loaders from `portfolio_app/toronto/` - Consider adding progress bars for long-running operations - Cache API responses to avoid re-fetching during development **Labels:** type:feature, component:backend, priority:high, tech:python, tech:dbt
Sign in to join this conversation.