Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: generic save and from_file methods on DataFrame #60786

Open
1 of 3 tasks
zkurtz opened this issue Jan 25, 2025 · 0 comments
Open
1 of 3 tasks

ENH: generic save and from_file methods on DataFrame #60786

zkurtz opened this issue Jan 25, 2025 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@zkurtz
Copy link

zkurtz commented Jan 25, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently, pandas has separate IO methods for each file format (to_csv, read_parquet, etc.). This requires users to:

  • Remember multiple method names
  • Change code when switching formats

Feature Description

A unified save/read API would simplify common IO operations while maintaining explicit control when needed:

  • File type is inferred from the filepath extension, but a file_type arg can be passed to be explicit, raising an error in some cases where the inferred file type disagrees with passed file type.
  • Both methods accept **kwargs and pass them along to the underlying file-type-specific pandas IO methods.
  • Optionally, support some basic translation across discrepancies in arg names in existing IO methods (i.e. "usecols" in read_csv vs "columns" in read_parquet).
# Simplest happy path:
df.save('data.csv')  # Uses to_csv
df = pd.read('data.parquet')  # Uses read_parquet

# Optionally, be explicit about expected file type
df.save('data.csv', file_type="csv")  # Uses to_csv
df = pd.read('data.parquet', file_type="parquet")  # Uses read_parquet

# Raises ValueError for conflicting file_type info:
df.save('data.csv', file_type='parquet')  # Conflicting types
df.save('data.txt', file_type='csv')  # .txt implies text format

# Reading allows overrides for misnamed files (or should we require users to rename their files properly first?)
df = pd.read('mislabeled.txt', file_type='parquet')

# Not sure if we should allow save when inferred file type is not a standard type:
df.save('data', file_type='csv')  # No extension, needs type
df.save('mydata.unknown', file_type='csv')  # Unclear extension

Alternative Solutions

Existing functionality is OK, just not the simplest to use.

Additional Context

No response

@zkurtz zkurtz added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant