Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ValueError in pandas.DataFrame.replace with regex on single-row DataFrame with None/NaN #60688

Open
3 tasks done
martinandrovich opened this issue Jan 9, 2025 · 3 comments · May be fixed by #60691
Open
3 tasks done
Assignees
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method

Comments

@martinandrovich
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame({"ticker": ["#1234#"], "name": [None]})

df.replace({col: {r"^#": "$"} for col in df.columns}, regex=True)  # raises
df.fillna("").replace({col: {r"^#": "$"} for col in df.columns}, regex=True)  # works
df.astype(str).replace({col: {r"^#": "$"} for col in df.columns}, regex=True)  # works
df.astype(pd.StringDtype()).replace({col: {r"^#": "$"} for col in df.columns}, regex=True)  # works

Issue Description

Using replace with a regex pattern on a single-row DataFrame containing None values raises the following error:

ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

Expected Behavior

The replace function should handle None values gracefully without requiring a manual fill or type conversion.

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.11.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.27766
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Denmark.1252

pandas : 2.2.3
numpy : 2.0.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : 8.30.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.4.2
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : 1.1
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : 5.3.0
matplotlib : 3.9.2
numba : 0.60.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 18.1.0
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.1
sqlalchemy : 2.0.36
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None

@martinandrovich martinandrovich added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 9, 2025
@rhshadrach
Copy link
Member

Thanks for the report! Confirmed on main. It looks like pandas.core.array_algos.replace.compare_or_regex_search does not properly handle the case of all NA values. Further investigations and PRs to fix are welcome!

@rhshadrach rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 10, 2025
@snitish
Copy link
Contributor

snitish commented Jan 10, 2025

take

@snitish
Copy link
Contributor

snitish commented Jan 10, 2025

Upon further investigation, I found that this is indeed a bug in pandas.core.array_algos.replace.compare_or_regex_search as @rhshadrach mentioned earlier. It uses np.vectorize to call a vectorized operation on non-NA elements in the array which, in this case, would be an empty array. This apparently works only if the otypes argument is set when calling np.vectorize. Created PR #60691 to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate replace replace method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants