Create scripts/etl/toronto.sh #85

Closed
opened 2026-01-17 21:53:46 +00:00 by lmiranda · 0 comments
Owner

Overview

Create an ETL script for the Toronto data pipeline with full and incremental loading options.

Acceptance Criteria

  • Create scripts/etl/toronto.sh
  • Support --full flag for complete data reload
  • Support --incremental flag for new data only
  • Default to incremental if no flag provided
  • Log output to .dev/logs/etl/ directory
  • Create log directory if it doesn't exist
  • Include timestamp in log filenames
  • Include usage comments at top
  • Use set -euo pipefail for bash safety
  • Make script executable

Technical Notes

Example implementation:

#!/usr/bin/env bash
# Usage: scripts/etl/toronto.sh [--full|--incremental]
# Run Toronto data pipeline.
# Options:
#   --full        Complete reload of all data
#   --incremental Load new data only (default)

set -euo pipefail

MODE="${1:---incremental}"
LOG_DIR=".dev/logs/etl"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
LOG_FILE="${LOG_DIR}/toronto_${TIMESTAMP}.log"

mkdir -p "$LOG_DIR"

echo "Running Toronto ETL (${MODE})..." | tee "$LOG_FILE"

case "$MODE" in
    --full)
        echo "Full reload mode" | tee -a "$LOG_FILE"
        # Run full ETL pipeline
        python -m toronto.etl --full 2>&1 | tee -a "$LOG_FILE"
        ;;
    --incremental)
        echo "Incremental mode" | tee -a "$LOG_FILE"
        # Run incremental ETL
        python -m toronto.etl --incremental 2>&1 | tee -a "$LOG_FILE"
        ;;
    *)
        echo "Unknown option: $MODE" | tee -a "$LOG_FILE"
        echo "Usage: $0 [--full|--incremental]" | tee -a "$LOG_FILE"
        exit 1
        ;;
esac

echo "ETL complete. Log: $LOG_FILE"

Labels

  • Type/Feature
  • Priority/Medium
  • Complexity/Medium
  • Component/Backend
  • Tech/Python

Phase: 3 - Operational Scripts

## Overview Create an ETL script for the Toronto data pipeline with full and incremental loading options. ## Acceptance Criteria - [ ] Create `scripts/etl/toronto.sh` - [ ] Support `--full` flag for complete data reload - [ ] Support `--incremental` flag for new data only - [ ] Default to incremental if no flag provided - [ ] Log output to `.dev/logs/etl/` directory - [ ] Create log directory if it doesn't exist - [ ] Include timestamp in log filenames - [ ] Include usage comments at top - [ ] Use `set -euo pipefail` for bash safety - [ ] Make script executable ## Technical Notes Example implementation: ```bash #!/usr/bin/env bash # Usage: scripts/etl/toronto.sh [--full|--incremental] # Run Toronto data pipeline. # Options: # --full Complete reload of all data # --incremental Load new data only (default) set -euo pipefail MODE="${1:---incremental}" LOG_DIR=".dev/logs/etl" TIMESTAMP=$(date +%Y%m%d_%H%M%S) LOG_FILE="${LOG_DIR}/toronto_${TIMESTAMP}.log" mkdir -p "$LOG_DIR" echo "Running Toronto ETL (${MODE})..." | tee "$LOG_FILE" case "$MODE" in --full) echo "Full reload mode" | tee -a "$LOG_FILE" # Run full ETL pipeline python -m toronto.etl --full 2>&1 | tee -a "$LOG_FILE" ;; --incremental) echo "Incremental mode" | tee -a "$LOG_FILE" # Run incremental ETL python -m toronto.etl --incremental 2>&1 | tee -a "$LOG_FILE" ;; *) echo "Unknown option: $MODE" | tee -a "$LOG_FILE" echo "Usage: $0 [--full|--incremental]" | tee -a "$LOG_FILE" exit 1 ;; esac echo "ETL complete. Log: $LOG_FILE" ``` ## Labels - Type/Feature - Priority/Medium - Complexity/Medium - Component/Backend - Tech/Python ## Phase: 3 - Operational Scripts
Sign in to join this conversation.