Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame argument columns fails to type-check when passed a list of strings (pandas 2.2.0) #56995

Closed
2 of 3 tasks
eachimei opened this issue Jan 21, 2024 · 8 comments
Closed
2 of 3 tasks
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Typing type annotations, mypy/pyright type checking

Comments

@eachimei
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# content of example.py:
from pandas import DataFrame

DataFrame([[1,2,3],[4,5,6]], columns=["A", "B", "C"])  # fails type check

data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
DataFrame.from_dict(data, orient='index', columns=['A', 'B', 'C', 'D'])  # also fails type check


# then run `pyright` type checker:

$ pyright example.py
example.py
  example.py:3:38 - error: Argument of type "list[str]" cannot be assigned to parameter "columns" of type "Axes | None" in function "__init__"
    Type "list[str]" cannot be assigned to type "Axes | None"
      "list[str]" is incompatible with "ExtensionArray"
      "list[str]" is incompatible with "ndarray[Unknown, Unknown]"
      "list[str]" is incompatible with "Index"
      "list[str]" is incompatible with "Series"
      "list[str]" is incompatible with protocol "SequenceNotStr[Unknown]"
        "index" is an incompatible type
          Type "(__value: str, __start: SupportsIndex = 0, __stop: SupportsIndex = sys.maxsize, /) -> int" cannot be assigned to type "(value: Any, /, start: int = 0, stop: int = ...) -> int"
    ... (reportGeneralTypeIssues)
  c:\temp\example.py:6:51 - error: Argument of type "list[str]" cannot be assigned to parameter "columns" of type "Axes | None" in function "from_dict"
    Type "list[str]" cannot be assigned to type "Axes | None"
      "list[str]" is incompatible with "ExtensionArray"
      "list[str]" is incompatible with "ndarray[Unknown, Unknown]"
      "list[str]" is incompatible with "Index"
      "list[str]" is incompatible with "Series"
      "list[str]" is incompatible with protocol "SequenceNotStr[Unknown]"
        "index" is an incompatible type
          Type "(__value: str, __start: SupportsIndex = 0, __stop: SupportsIndex = sys.maxsize, /) -> int" cannot be assigned to type "(value: Any, /, start: int = 0, stop: int = ...) -> int"
    ... (reportGeneralTypeIssues)
2 errors, 0 warnings, 0 informations

Issue Description

Although the documentation shows that columns argument accepts a list of strings, the declared type annotation fails on pyright type check. This issue showed up on pandas version 2.2.0

Examples from the docs:
image
image

Expected Behavior

Passing a list of strings to the columns argument is expected to pass type check.

Installed Versions

INSTALLED VERSIONS

commit : f538741
python : 3.10.11.final.0
python-bits : 64
OS : Windows
OS-release : 10
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 2.2.0
numpy : 1.26.1
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.17.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

pyright version: 1.1.347

@eachimei eachimei added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2024
@eachimei
Copy link
Author

Seems that it boils down to columns: Axes | None type annotation Axes from pandas._typing:

Axes = ListLike

And:
ListLike = Union[AnyArrayLike, SequenceNotStr, range]

The list of strings usage should have fitted into SequenceNotStr:
pandas._typing:

class SequenceNotStr(Protocol[_T_co]):
    @overload
    def __getitem__(self, index: SupportsIndex, /) -> _T_co:
        ...

    @overload
    def __getitem__(self, index: slice, /) -> Sequence[_T_co]:
        ...

    def __contains__(self, value: object, /) -> bool:
        ...

    def __len__(self) -> int:
        ...

    def __iter__(self) -> Iterator[_T_co]:
        ...

    def index(self, value: Any, /, start: int = 0, stop: int = ...) -> int:
        ...

    def count(self, value: Any, /) -> int:
        ...

    def __reversed__(self) -> Iterator[_T_co]:
        ...

So if we further minimize the failing case:

# content of example2.py:
from pandas._typing import SequenceNotStr

l: SequenceNotStr = ["A", "B"]

Then:

$ pyright example2.py
example2.py
  example2.py:3:21 - error: Expression of type "list[str]" cannot be assigned to declared type "SequenceNotStr[Unknown]"
    "list[str]" is incompatible with protocol "SequenceNotStr[Unknown]"
      "index" is an incompatible type
        Type "(__value: str, __start: SupportsIndex = 0, __stop: SupportsIndex = sys.maxsize, /) -> int" cannot be assigned to type "(value: Any, /, start: int = 0, stop: int = ...) -> int"
          Position-only parameter mismatch; parameter "start" is not position-only
          Position-only parameter mismatch; parameter "stop" is not position-only (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations

So as can be seen the built-in list.index is not compatible with the provided signature of SequenceNotStr.index.
Note the conflict in regards to positional-only arguments.
References from typeshed:
list.index:
https://github.com/python/typeshed/blob/main/stdlib/builtins.pyi#L983
Sequence.index:
https://github.com/python/typeshed/blob/main/stdlib/typing.pyi#L524

@twoertwein
Copy link
Member

I think this is already fixed on main:

def index(self, value: Any, start: int = ..., stop: int = ..., /) -> int:

@twoertwein twoertwein added Typing type annotations, mypy/pyright type checking and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2024
@twoertwein
Copy link
Member

If you don't want to wait until the next pandas release:

  • you can install pandas-stubs
  • or install an older version of pyright which has an older version of typeshed which was compatible with the protocol definition

@twoertwein twoertwein added the Closing Candidate May be closeable, needs more eyeballs label Jan 21, 2024
@eachimei
Copy link
Author

eachimei commented Jan 21, 2024

Thanks for the prompt response! Sounds good (tested with pandas-stubs and no error was raised).

@jscheel
Copy link

jscheel commented Sep 11, 2024

This still seems to be an issue with 2.2.2

@wyattscarpenter
Copy link

wyattscarpenter commented Sep 19, 2024

I'm also getting this type error with 2.2.2, 2.2.1, and 2.2.0. The version before that, 2.1.4, does not raise this error. (I am also using the latest pyright, pyright 1.1.381, for all of these.)

@smith-garrett
Copy link

I'm getting this issue with pandas 2.2.3 and pyright 1.1.383.

@twoertwein
Copy link
Member

This issue is fixed on main but did not make it into the 2.2.* releases. You will have to wait for 2.3/3.0 or install pandas-stubs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

5 participants