Files
personal-portfolio/notebooks/overview/income_safety_scatter.ipynb
lmiranda 1eba95d4d1 docs: Complete Phase 6 notebooks and Phase 7 documentation review
Phase 6 - Jupyter Notebooks (15 total):
- Overview tab: livability_choropleth, top_bottom_10_bar, income_safety_scatter
- Housing tab: affordability_choropleth, rent_trend_line, tenure_breakdown_bar
- Safety tab: crime_rate_choropleth, crime_breakdown_bar, crime_trend_line
- Demographics tab: income_choropleth, age_distribution, population_density_bar
- Amenities tab: amenity_index_choropleth, amenity_radar, transit_accessibility_bar

Phase 7 - Documentation:
- Updated CLAUDE.md with Sprint 9 completion status
- Added notebooks directory to application structure
- Expanded figures directory listing

Closes #71, #72, #73, #74, #75, #76, #77

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 12:10:46 -05:00

184 lines
5.0 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Income vs Safety Scatter Plot\n",
"\n",
"Explores the correlation between median household income and safety score across Toronto neighbourhoods."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Reference\n",
"\n",
"### Source Tables\n",
"\n",
"| Table | Grain | Key Columns |\n",
"|-------|-------|-------------|\n",
"| `mart_neighbourhood_overview` | neighbourhood × year | neighbourhood_name, median_household_income, safety_score, population |\n",
"\n",
"### SQL Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from sqlalchemy import create_engine\n",
"import os\n",
"\n",
"engine = create_engine(os.environ.get('DATABASE_URL', 'postgresql://portfolio:portfolio@localhost:5432/portfolio'))\n",
"\n",
"query = \"\"\"\n",
"SELECT\n",
" neighbourhood_name,\n",
" median_household_income,\n",
" safety_score,\n",
" population,\n",
" livability_score,\n",
" crime_rate_per_100k\n",
"FROM mart_neighbourhood_overview\n",
"WHERE year = (SELECT MAX(year) FROM mart_neighbourhood_overview)\n",
" AND median_household_income IS NOT NULL\n",
" AND safety_score IS NOT NULL\n",
"ORDER BY median_household_income DESC\n",
"\"\"\"\n",
"\n",
"df = pd.read_sql(query, engine)\n",
"print(f\"Loaded {len(df)} neighbourhoods with income and safety data\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformation Steps\n",
"\n",
"1. Filter out null values for income and safety\n",
"2. Optionally scale income to thousands for readability\n",
"3. Pass to scatter figure factory with optional trendline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Scale income to thousands for better axis readability\n",
"df['income_thousands'] = df['median_household_income'] / 1000\n",
"\n",
"# Prepare data for figure factory\n",
"data = df.to_dict('records')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample Output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df[['neighbourhood_name', 'median_household_income', 'safety_score', 'crime_rate_per_100k']].head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Data Visualization\n",
"\n",
"### Figure Factory\n",
"\n",
"Uses `create_scatter_figure` from `portfolio_app.figures.scatter`.\n",
"\n",
"**Key Parameters:**\n",
"- `x_column`: 'income_thousands' (median household income in $K)\n",
"- `y_column`: 'safety_score' (0-100 percentile rank)\n",
"- `name_column`: 'neighbourhood_name' (hover label)\n",
"- `size_column`: 'population' (optional, bubble size)\n",
"- `trendline`: True (adds OLS regression line)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '../..')\n",
"\n",
"from portfolio_app.figures.scatter import create_scatter_figure\n",
"\n",
"fig = create_scatter_figure(\n",
" data=data,\n",
" x_column='income_thousands',\n",
" y_column='safety_score',\n",
" name_column='neighbourhood_name',\n",
" size_column='population',\n",
" title='Income vs Safety by Neighbourhood',\n",
" x_title='Median Household Income ($K)',\n",
" y_title='Safety Score (0-100)',\n",
" trendline=True,\n",
")\n",
"\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interpretation\n",
"\n",
"This scatter plot reveals the relationship between income and safety:\n",
"\n",
"- **Positive correlation**: Higher income neighbourhoods tend to have higher safety scores\n",
"- **Bubble size**: Represents population (larger = more people)\n",
"- **Trendline**: Orange dashed line shows the overall trend\n",
"- **Outliers**: Neighbourhoods far from the trendline are interesting cases\n",
" - Above line: Safer than income would predict\n",
" - Below line: Less safe than income would predict"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Calculate correlation coefficient\n",
"correlation = df['median_household_income'].corr(df['safety_score'])\n",
"print(f\"Correlation coefficient (Income vs Safety): {correlation:.3f}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}