Add "lessons/patterns/reset-pandas-index-after-filtering-to-prevent-column-pollution"
48
lessons%2Fpatterns%2Freset-pandas-index-after-filtering-to-prevent-column-pollution.-.md
Normal file
48
lessons%2Fpatterns%2Freset-pandas-index-after-filtering-to-prevent-column-pollution.-.md
Normal file
@@ -0,0 +1,48 @@
|
||||
## Context
|
||||
|
||||
When building MCP tools that manipulate pandas DataFrames, operations that subset or filter data can unexpectedly add columns to the result.
|
||||
|
||||
**Issue:** #203 - filter tool adds unexpected `__index_level_0__` column
|
||||
|
||||
## Problem
|
||||
|
||||
The `filter` tool used `df.query(condition)` to filter rows. This preserves the original DataFrame's index. When the filtered result was later serialized or stored, pandas converted the preserved index into a column named `__index_level_0__`.
|
||||
|
||||
```python
|
||||
# Before (problematic)
|
||||
filtered = df.query(condition)
|
||||
# Result has original index preserved, becomes column on storage
|
||||
```
|
||||
|
||||
**Symptom:** Users reported filtered DataFrames having 5 columns when the source had 4.
|
||||
|
||||
## Solution
|
||||
|
||||
Always reset the index after filtering operations that subset rows:
|
||||
|
||||
```python
|
||||
# After (correct)
|
||||
filtered = df.query(condition).reset_index(drop=True)
|
||||
```
|
||||
|
||||
The `drop=True` parameter discards the old index entirely rather than converting it to a column.
|
||||
|
||||
## Prevention
|
||||
|
||||
When implementing pandas operations in MCP tools:
|
||||
|
||||
1. **Filter/query operations** - Always add `.reset_index(drop=True)`
|
||||
2. **Groupby operations** - Use `.reset_index()` (already correct in our impl)
|
||||
3. **Merge/join operations** - `pd.merge()` handles this automatically
|
||||
4. **Slicing operations** (head, tail) - Consider if index reset is needed
|
||||
|
||||
**Rule of thumb:** If an operation changes which rows are in the DataFrame, reset the index.
|
||||
|
||||
## Related
|
||||
|
||||
- File: `mcp-servers/data-platform/mcp_server/pandas_tools.py`
|
||||
- Fix commit: `4ed3ed7`
|
||||
|
||||
|
||||
---
|
||||
**Tags:** pandas, data-platform, mcp-server, python, dataframe
|
||||
Reference in New Issue
Block a user