Table of Contents
Context
When building MCP tools that manipulate pandas DataFrames, operations that subset or filter data can unexpectedly add columns to the result.
Issue: #203 - filter tool adds unexpected __index_level_0__ column
Problem
The filter tool used df.query(condition) to filter rows. This preserves the original DataFrame's index. When the filtered result was later serialized or stored, pandas converted the preserved index into a column named __index_level_0__.
# Before (problematic)
filtered = df.query(condition)
# Result has original index preserved, becomes column on storage
Symptom: Users reported filtered DataFrames having 5 columns when the source had 4.
Solution
Always reset the index after filtering operations that subset rows:
# After (correct)
filtered = df.query(condition).reset_index(drop=True)
The drop=True parameter discards the old index entirely rather than converting it to a column.
Prevention
When implementing pandas operations in MCP tools:
- Filter/query operations - Always add
.reset_index(drop=True) - Groupby operations - Use
.reset_index()(already correct in our impl) - Merge/join operations -
pd.merge()handles this automatically - Slicing operations (head, tail) - Consider if index reset is needed
Rule of thumb: If an operation changes which rows are in the DataFrame, reset the index.
Related
- File:
mcp-servers/data-platform/mcp_server/pandas_tools.py - Fix commit:
4ed3ed7
Tags: pandas, data-platform, mcp-server, python, dataframe