Clone
1
lessons/patterns/reset-pandas-index-after-filtering-to-prevent-column-pollution
Leo Miranda edited this page 2026-01-26 22:28:13 +00:00

Context

When building MCP tools that manipulate pandas DataFrames, operations that subset or filter data can unexpectedly add columns to the result.

Issue: #203 - filter tool adds unexpected __index_level_0__ column

Problem

The filter tool used df.query(condition) to filter rows. This preserves the original DataFrame's index. When the filtered result was later serialized or stored, pandas converted the preserved index into a column named __index_level_0__.

# Before (problematic)
filtered = df.query(condition)
# Result has original index preserved, becomes column on storage

Symptom: Users reported filtered DataFrames having 5 columns when the source had 4.

Solution

Always reset the index after filtering operations that subset rows:

# After (correct)
filtered = df.query(condition).reset_index(drop=True)

The drop=True parameter discards the old index entirely rather than converting it to a column.

Prevention

When implementing pandas operations in MCP tools:

  1. Filter/query operations - Always add .reset_index(drop=True)
  2. Groupby operations - Use .reset_index() (already correct in our impl)
  3. Merge/join operations - pd.merge() handles this automatically
  4. Slicing operations (head, tail) - Consider if index reset is needed

Rule of thumb: If an operation changes which rows are in the DataFrame, reset the index.

  • File: mcp-servers/data-platform/mcp_server/pandas_tools.py
  • Fix commit: 4ed3ed7

Tags: pandas, data-platform, mcp-server, python, dataframe