Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#59745 broke groupby().value_counts with all nan column #59989

Closed
phofl opened this issue Oct 7, 2024 · 2 comments · Fixed by #59999
Closed

#59745 broke groupby().value_counts with all nan column #59989

phofl opened this issue Oct 7, 2024 · 2 comments · Fixed by #59999
Assignees
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@phofl
Copy link
Member

phofl commented Oct 7, 2024

          This broke the following case:
df = pd.DataFrame({"a": [1, 2, 1, 2], "b": np.nan})
df.groupby("a").value_counts()

with

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2024.1/scratches/scratch_1.py", line 100, in <module>
    df.groupby("a").value_counts()
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/generic.py", line 2723, in value_counts
    return self._value_counts(subset, normalize, sort, ascending, dropna)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 2535, in _value_counts
    result_series = cast(Series, gb.size())
                                 ^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/groupby.py", line 2747, in size
    result = self._grouper.size()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 696, in size
    ids = self.ids
          ^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 750, in ids
    return self.result_index_and_ids[1]
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 769, in result_index_and_ids
    result_index, ids = self._ob_index_and_ids(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/patrick/mambaforge/envs/dask-expr/lib/python3.12/site-packages/pandas/core/groupby/ops.py", line 884, in _ob_index_and_ids
    ob_ids = np.where(ob_ids == -1, -1, index.take(ob_ids))
                                        ^^^^^^^^^^^^^^^^^^
IndexError: cannot do a non-empty take from an empty axes.

Originally posted by @phofl in #59745 (comment)

@phofl phofl added this to the 3.0 milestone Oct 7, 2024
@phofl phofl added Regression Functionality that used to work in a prior pandas version Groupby labels Oct 7, 2024
@phofl
Copy link
Member Author

phofl commented Oct 7, 2024

cc @rhshadrach

@rhshadrach
Copy link
Member

Thanks - will take this up.

@rhshadrach rhshadrach self-assigned this Oct 7, 2024
@rhshadrach rhshadrach added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants