Add "lessons/patterns/reset-pandas-index-after-filtering-to-prevent-column-pollution"
48
lessons%2Fpatterns%2Freset-pandas-index-after-filtering-to-prevent-column-pollution.-.md
Normal file
48
lessons%2Fpatterns%2Freset-pandas-index-after-filtering-to-prevent-column-pollution.-.md
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
## Context
|
||||||
|
|
||||||
|
When building MCP tools that manipulate pandas DataFrames, operations that subset or filter data can unexpectedly add columns to the result.
|
||||||
|
|
||||||
|
**Issue:** #203 - filter tool adds unexpected `__index_level_0__` column
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The `filter` tool used `df.query(condition)` to filter rows. This preserves the original DataFrame's index. When the filtered result was later serialized or stored, pandas converted the preserved index into a column named `__index_level_0__`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Before (problematic)
|
||||||
|
filtered = df.query(condition)
|
||||||
|
# Result has original index preserved, becomes column on storage
|
||||||
|
```
|
||||||
|
|
||||||
|
**Symptom:** Users reported filtered DataFrames having 5 columns when the source had 4.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
Always reset the index after filtering operations that subset rows:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# After (correct)
|
||||||
|
filtered = df.query(condition).reset_index(drop=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `drop=True` parameter discards the old index entirely rather than converting it to a column.
|
||||||
|
|
||||||
|
## Prevention
|
||||||
|
|
||||||
|
When implementing pandas operations in MCP tools:
|
||||||
|
|
||||||
|
1. **Filter/query operations** - Always add `.reset_index(drop=True)`
|
||||||
|
2. **Groupby operations** - Use `.reset_index()` (already correct in our impl)
|
||||||
|
3. **Merge/join operations** - `pd.merge()` handles this automatically
|
||||||
|
4. **Slicing operations** (head, tail) - Consider if index reset is needed
|
||||||
|
|
||||||
|
**Rule of thumb:** If an operation changes which rows are in the DataFrame, reset the index.
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- File: `mcp-servers/data-platform/mcp_server/pandas_tools.py`
|
||||||
|
- Fix commit: `4ed3ed7`
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
**Tags:** pandas, data-platform, mcp-server, python, dataframe
|
||||||
Reference in New Issue
Block a user