Add "lessons/patterns/reset-pandas-index-after-filtering-to-prevent-column-pollution"

2026-01-26 22:28:13 +00:00
parent f37b5dc340
commit faebbe8e92
1 changed files with 48 additions and 0 deletions
--- a/lessons%2Fpatterns%2Freset-pandas-index-after-filtering-to-prevent-column-pollution.-.md
+++ b/lessons%2Fpatterns%2Freset-pandas-index-after-filtering-to-prevent-column-pollution.-.md
@@ -0,0 +1,48 @@
 ## Context
 When building MCP tools that manipulate pandas DataFrames, operations that subset or filter data can unexpectedly add columns to the result.
 **Issue:** #203 - filter tool adds unexpected `__index_level_0__` column
 ## Problem
 The `filter` tool used `df.query(condition)` to filter rows. This preserves the original DataFrame's index. When the filtered result was later serialized or stored, pandas converted the preserved index into a column named `__index_level_0__`.
 ```python
 # Before (problematic)
 filtered = df.query(condition)
 # Result has original index preserved, becomes column on storage
 ```
 **Symptom:** Users reported filtered DataFrames having 5 columns when the source had 4.
 ## Solution
 Always reset the index after filtering operations that subset rows:
 ```python
 # After (correct)
 filtered = df.query(condition).reset_index(drop=True)
 ```
 The `drop=True` parameter discards the old index entirely rather than converting it to a column.
 ## Prevention
 When implementing pandas operations in MCP tools:
 1. **Filter/query operations** - Always add `.reset_index(drop=True)`
 2. **Groupby operations** - Use `.reset_index()` (already correct in our impl)
 3. **Merge/join operations** - `pd.merge()` handles this automatically
 4. **Slicing operations** (head, tail) - Consider if index reset is needed
 **Rule of thumb:** If an operation changes which rows are in the DataFrame, reset the index.
 ## Related
 - File: `mcp-servers/data-platform/mcp_server/pandas_tools.py`
 - Fix commit: `4ed3ed7`
 ---
 **Tags:** pandas, data-platform, mcp-server, python, dataframe