Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Add fillna at the beginning of _where not to fill NA. #60729 #60772

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -676,6 +676,7 @@ Interval

Indexing
^^^^^^^^
- Bug in :func:`Series.mask` unexpectedly filling pd.NA (:issue:`60729`)
- Bug in :meth:`DataFrame.__getitem__` returning modified columns when called with ``slice`` in Python 3.12 (:issue:`57500`)
- Bug in :meth:`DataFrame.from_records` throwing a ``ValueError`` when passed an empty list in ``index`` (:issue:`58594`)
- Bug in :meth:`MultiIndex.insert` when a new value inserted to a datetime-like level gets cast to ``NaT`` and fails indexing (:issue:`60388`)
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -9674,6 +9674,13 @@ def _where(
if axis is not None:
axis = self._get_axis_number(axis)

# We should not be filling NA. See GH#60729
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this trying to fill missing values when NaN is the missing value indicator? I don't think that is right either - the missing values should propogate for all types. We may just be missing coverage for the NaN case (which should be added to the test)

Copy link
Author

@sanggon6107 sanggon6107 Jan 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, @WillAyd .
I thought we could make the values propagate by filling cond with True, since _where() would finally keep the values in self alive where its cond is True.
Even if I don't fill those values here, _where would call fillna() using inplace at the below code. That's also why the result varies depending on whether inpalce=True or not.

pandas/pandas/core/generic.py

Lines 9695 to 9698 in e3b2de8

# make sure we are boolean
fill_value = bool(inplace)
cond = cond.fillna(fill_value)
cond = cond.infer_objects()

Could you explain in more detail what you mean by propagate for all type? Do you mean we need to keep NA as it is even after this line?

if isinstance(cond, np.ndarray):
cond = np.array(cond)
cond[np.isnan(cond)] = True
elif isinstance(cond, NDFrame):
cond = cond.fillna(True)

# align the cond to same shape as myself
cond = common.apply_if_callable(cond, self)
if isinstance(cond, NDFrame):
Expand Down
14 changes: 13 additions & 1 deletion pandas/tests/series/indexing/test_mask.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import numpy as np
import pytest

from pandas import Series
from pandas import (
Int64Dtype,
Series,
)
import pandas._testing as tm


Expand Down Expand Up @@ -67,3 +70,12 @@ def test_mask_inplace():
rs = s.copy()
rs.mask(cond, -s, inplace=True)
tm.assert_series_equal(rs, s.mask(cond, -s))


def test_mask_na():
# We should not be filling pd.NA. See GH#60729
series = Series([None, 1, 2, None, 3, 4, None], dtype=Int64Dtype())
result = series.mask(series <= 2, -99)
expected = Series([None, -99, -99, None, 3, 4, None], dtype=Int64Dtype())

tm.assert_series_equal(result, expected)