-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add totality validation to merge method #58547
Comments
To maybe add a common use case. Here the goal is to add the biological domain to the favorite animal of certain people: import pandas as pd
# Create the first DataFrame with person names and favorite animals
df1_data = {
'Person': ['John', 'Emma', 'Alex','Darleen'],
'Animal': ['Dog', 'Spider', 'Snake','Cat']
}
df1 = pd.DataFrame(df1_data)
# Create the second DataFrame with mapping of animals to biological class
df2_data = {
'Animal': ['Dog', 'Snake', 'Cat'],
'Biological_Class': ['Mammal', 'Reptile', 'Mammal']
}
df2 = pd.DataFrame(df2_data)
# Merge the DataFrames on the 'Animal' column
merged_df = pd.merge(
df1,
df2,
on='Animal',
validate='m:1'
) The |
To transfer some of the arguments from the (for now suspended) MR:
|
Thanks for the request and PR. I agree this is a natural thing to validate, and if the implementation that pandas could provide was significantly more efficient than what a user could do separately (e.g. using various things computed in the course of doing the merge), then I would be more supportive of its inclusion in pandas. However, the implementation in the current PR is no more efficient than what can be accomplished by the current public API. As such, it would add more maintenance burden without providing anything a user could not already do (albeit, with more keystrokes). As such, I'm currently opposed to its inclusion. |
@rhshadrach I do not see how this extension is more trivial than the current validations. The 1:m check is basically just a check for |
I agree and would not support their inclusion today. In addition, if they were prohibitive to further enhancements to core functionality, I would support their deprecation and removal. As it is, I think keeping them so as to not disrupt their current users makes sense. |
thank you for your feedback. I guess this can be closed then. |
Feature Type
Problem Description
The available validation methods lack checks for (left-/right-)totality. I am frequently encountering cases where I need to manually check that eg. a one-to-one merge also finds a match match in the right DF for every row in the left DF or vice versa.
Feature Description
Add the following to
one_to_one
,one_to_many
andmany_to_one
merge validations:left_total
... Each row in the left DataFrame is matched to (at least) one row in the right DataFrameright_total
... Each row in the right DataFrame is matched to (at least) one row in the left DataFrametotal
... Bothleft_total
andright_total
must holdA combination of join relation and totality constraint should be possible by combining with a
+
:one_to_one+left_total
Alternative Solutions
Currently, doing an outer join and checking for
NaN
values in the "foreign" columns works to find unmerged rows. However, this will fail if there are alreadyNaN
values in the initial DataFrames.Additional Context
No response
The text was updated successfully, but these errors were encountered: