Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape multiple spaces in HTML #19665

Closed
2 tasks done
rhshadrach opened this issue Nov 6, 2024 · 1 comment · Fixed by #19783
Closed
2 tasks done

Escape multiple spaces in HTML #19665

rhshadrach opened this issue Nov 6, 2024 · 1 comment · Fixed by #19783
Labels
bug Something isn't working good first issue Good for newcomers needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@rhshadrach
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

print(pl.DataFrame({"a": ["1", "1 1", "1   1", "1     1"]}))

# In e.g. Jupyter notebooks
display(pl.DataFrame({"a": ["1", "1 1", "1   1", "1     1"]}))

Log output

shape: (4, 1)
┌─────────┐
│ a       │
│ ---     │
│ str     │
╞═════════╡
│ 1       │
│ 1 1     │
│ 1   1   │
│ 1     1 │
└─────────┘

Issue description

Screenshot from Jupyter notebook:

image

When there are multiple spaces in a row, HTML treats it as a single space. This does not correctly reflect the underlying data, and such spaces should be escaped with &nbsp instead. This only needs to be done when there are two or more consecutive spaces.

Somewhat related: #18102
pandas equivalent: pandas-dev/pandas#59876 (comment)

Expected behavior

HTML should reflect that there are multiple spaces when adjacent.

Installed versions

--------Version info---------
Polars:              1.12.0
Index type:          UInt32
Platform:            Linux-6.8.0-48-generic-x86_64-with-glibc2.39
Python:              3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  1.2.0
altair               <not installed>
cloudpickle          3.1.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
great_tables         0.13.0
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             3.1.5
pandas               3.0.0.dev0+1632.geacf0326ef
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           2.0.36
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@rhshadrach rhshadrach added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 6, 2024
@coastalwhite coastalwhite added the good first issue Good for newcomers label Nov 6, 2024
@nimit
Copy link

nimit commented Nov 11, 2024

Hi @coastalwhite,
I am interested in contributing to Polars by solving this issue
As far as I know, I cannot edit the make_str_val function in polars-core/src/fmt.rs to replace " " with "&nbsp;" because it will format all string versions of the dataframe even if it isn't printed in a notebook environment or HTML
So, should I edit the "/py-polars/polars/dataframe/_html.py" write_body method of the HTMLFormatter class?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
3 participants