Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.options.display.float_format did not follow left side or before decimal places format #59876

Closed
2 of 3 tasks
yasirroni opened this issue Sep 23, 2024 · 21 comments · Fixed by #59964
Closed
2 of 3 tasks
Assignees
Labels
Bug good first issue Output-Formatting __repr__ of pandas objects, to_string

Comments

@yasirroni
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Set the global float format
pd.options.display.float_format = '{:6.3f}'.format

# Example DataFrame
df = pd.DataFrame({
    'A': [123.456, 789.1011],
    'B': [2.71828, 3.14159]
})

df

Issue Description

Pandas pd.options.display.float_format did not follow left side or before decimal places format.

Expected Behavior

If also follows the left side or before decimal places format.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.10.11.final.0
python-bits : 64
OS : Darwin
OS-release : 23.6.0
Version : Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:21 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T8103
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.1.0
pip : 24.0
Cython : None
pytest : 8.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.23.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.4
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

@yasirroni yasirroni added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 23, 2024
@rhshadrach
Copy link
Member

Thanks for the report. Can you include what output you currently get and the output you expect to get.

@rhshadrach rhshadrach added Output-Formatting __repr__ of pandas objects, to_string Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 24, 2024
@yasirroni
Copy link
Author

Current output:

image

Expected output:

The width of all column should be the same based on the allocated space by "format".

image

Example code to generate my output:

import pandas as pd

# Set the global float format
fmt = '{:6.3f}'.format
pd.options.display.float_format = fmt

df = pd.DataFrame({
    'A': [123.456, 789.1011],
    'B': [2.71828, 3.14159]
})

for index, row in df.iterrows():
    formatted_row = [fmt(value) for value in row]  # Format each value in the row
    print(f"Col {index}: {formatted_row}")

@rhshadrach
Copy link
Member

Thanks for the information. It appears to me you are using something akin to Juptyer's display which is a different method than printing.

pd.options.display.float_format = '{:12.3f}'.format

# Example DataFrame
df = pd.DataFrame({
    'A': [123.456, 789.1011],
    'B': [2.71828, 3.14159]
})

print(df)
#              A            B
# 0      123.456        2.718
# 1      789.101        3.142

You can see the option is having the expected impact on printed DataFrames. It's not clear to me whether this is due to a limitation on the Jupyter (or other notebooks) side. Further investigations are welcome!

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action and removed Needs Info Clarification about behavior needed to assess issue labels Sep 25, 2024
@yasirroni
Copy link
Author

Thank you. I'm using VSCode Jupyter Notebook and I can confirm that print is working as expected but display is not. Using jupeyter lab also works the same.

import pandas as pd

from IPython.display import display

pd.options.display.float_format = '{:.3f}'.format

# Example DataFrame
df = pd.DataFrame({
    'A': [123.456, 789.1011],
    'B': [2.71828, 3.14159]
})
print(df)
display(df)

pd.options.display.float_format = '{:12.3f}'.format

# Example DataFrame
df = pd.DataFrame({
    'A': [123.456, 789.1011],
    'B': [2.71828, 3.14159]
})
print(df)
display(df)
image

@yasirroni
Copy link
Author

So, I think we should close this and pass it to jupyter developer? Please give me feedback on where is the best place to bring this (is it pandas or jupyter).

@yasirroni
Copy link
Author

yasirroni commented Sep 25, 2024

After some investigation, even string format didn't respected by display.

pd.options.display.float_format = '{:.3f}'.format.  # if float, use .3f
df_formatted = df.map(lambda x: str(f'{x:12.3f}')).astype('string').  # change to string to ignore float_format
display(df_formatted)
print(df_formatted)  # correctly using str(f'{x:12.3f}')

The workaround is to directly change Styler:

styled_df = df_formatted.style.set_table_styles(
    [{'selector': 'td', 'props': [('min-width', '80px')]}]
)

display(styled_df)

@rhshadrach rhshadrach reopened this Sep 25, 2024
@rhshadrach
Copy link
Member

Thanks for the investigation - I think your investigation suggests this is an issue with HTML formatting. We still control the HTML that is produced by display(df), so I suspect we may be able to fix it. Even if that is the case, perhaps we should consider having some formatting options only for printed DataFrames.

Leaving this open for now. I plan to investigate it in the near future.

@rhshadrach
Copy link
Member

Two things need to change in order to implement this. First, is passing get_option("display.float_format") to DataFrameFormatter in frame.DataFrame._repr_html. The 2nd is adding " ": " " to esc in io.formats.html.HTMLFormatter._write_cell.

For the 2nd, we also fix other issues with multiple spaces in strings, e.g.

df = pd.DataFrame({"A": ["foo      foo", "bar"]})
display(df)

fixed:

image

main:

image

I think each of these are not controversial, marking as a good first issue for now. But cc @pandas-dev/pandas-core for any thoughts.

@rhshadrach rhshadrach added good first issue and removed Needs Discussion Requires discussion from core team before further action labels Sep 29, 2024
@saldanhad
Copy link
Contributor

Are you expecting a pytest script to cover this change?

@rhshadrach
Copy link
Member

Yes - I think something along the lines of test_info_repr_html would be sufficient.

@saldanhad
Copy link
Contributor

take

@saldanhad
Copy link
Contributor

For the 2nd implementation is it ok if the html output is <td>&nbsp;foo&nbsp;&nbsp;&nbsp;&nbsp;foo</td> or should it have to be of the form <td>foo&nbsp;&nbsp;&nbsp;&nbsp;foo</td>

@rhshadrach
Copy link
Member

@saldanhad - offhand I'm not sure; it may be that another section of code is adding &nbsp; prior to foo outside of where I indicated in #59876 (comment). If that's occurring, I think it should remain.

@saldanhad
Copy link
Contributor

Thanks for clarifying. The leading &nbsp was not there before and it seems to be appearing after adding it to esc, so I am assuming this is not expected.

@rhshadrach
Copy link
Member

Interesting - I'll take a deeper look.

@saldanhad
Copy link
Contributor

Meanwhile I was able to implement the float formatting with unit tests passed, if you don't mind can I make the PR for review for completion of this part ?

@rhshadrach
Copy link
Member

Sure, that sounds fine.

@rhshadrach
Copy link
Member

rhshadrach commented Oct 1, 2024

The issue with the extra space is the use of strip here:

rs = pprint_thing(s, escape_chars=esc).strip()

By replacing spaces with &nbsp, that strip no longer does anything. This goes back to #4987.

I think the solution is to leave esc alone and use rs = rs.replace(" ", "&nbsp") after the call to pprint_thing.

@saldanhad
Copy link
Contributor

Thanks for your help on this. Implementing it, this way, does give the desired outcome now. My understanding prior was to have replace come inside the if conditional before pretty print and it wasn't getting implemented.

However still a few tests are failing and further investigation is needed:

FAILED pandas/tests/io/formats/test_to_html.py::test_to_html_escaped[kwargs0-<type 'str'>-escaped] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_to_html_escaped[kwargs1-<b>bold</b>-escape_disabled] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[10-to_html-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[10-_repr_html_-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[20-to_html-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[20-_repr_html_-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[50-to_html-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[50-_repr_html_-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[100-to_html-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_ignore_display_max_colwidth[100-_repr_html_-<lambda>] - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_to_html_na_rep_non_scalar_data - AssertionError
FAILED pandas/tests/io/formats/test_to_html.py::test_to_html_tuple_col_with_colspace - AssertionError

@rhshadrach
Copy link
Member

Perhaps it's better to use rs.replace(" ", "&nbsp;&nbsp;"). In the case there is 1 space, this would leave it alone. In the case of an odd number of spaces greater than 1, it would give the slightly odd result of something like foo&nbsp;&nbsp;&nbsp;&nbsp; foo. However this still produces correct HTML, with the added benefit of leaving single spaces alone. This should reduce (eliminate?) the number of test failures you're seeing.

@saldanhad
Copy link
Contributor

Thanks, this worked, raising PR to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants