Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add residuals_ attribute to pg.anova output #260

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

lahdjirayhan
Copy link

@codecov
Copy link

codecov bot commented Apr 9, 2022

Codecov Report

Merging #260 (50a6bbe) into master (b1c334d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #260   +/-   ##
=======================================
  Coverage   98.99%   99.00%           
=======================================
  Files          19       19           
  Lines        3290     3318   +28     
  Branches      527      528    +1     
=======================================
+ Hits         3257     3285   +28     
  Misses         17       17           
  Partials       16       16           
Impacted Files Coverage Δ
pingouin/parametric.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1c334d...50a6bbe. Read the comment docs.

@raphaelvallat raphaelvallat self-requested a review April 12, 2022 21:02
@raphaelvallat raphaelvallat added the feature request 🚧 New feature or request label Apr 12, 2022
@raphaelvallat
Copy link
Owner

Hi @lahdjirayhan,

This is great — thank you so much for the clean PR!

Two minor requests before I merge:

  1. Could you please add some unit tests in the test_parametric.py file. Specifically, we want to make sure that the obtained residuals are correct, that is, similar to what we get with another statistical program (e.g. statsmodels in Python, JAMOVI, JASP, etc).

  2. If you want, feel free to edit the changelog.rst file to update the release notes in the documentation.

Thanks again for your help,
Raphael

@raphaelvallat
Copy link
Owner

This looks great, thanks! I saw that you added the residuals for the anova, mixed_anova, welch_anova and ancova function. Would it be possible to add the residuals to the rm_anova function as well, such that all the ANOVA-related functions are covered?

Also, thanks for adding the unit testing! Can you please make sure to have at least 3 decimals in the expected (R) output?

Raphael

@lahdjirayhan
Copy link
Author

lahdjirayhan commented Apr 15, 2022

Some of the tests currently compare up to 2 decimal places only because of the dataset used. I'll use different dataset so it will produce 3 or more decimal places.

Regarding rm_anova, unfortunately I think I can't add residuals for it. I'm not confident enough even after researching for more.

@lahdjirayhan
Copy link
Author

Never mind, I think I found a way to add residuals to rm_anova. Expect more commits to come.

@lahdjirayhan
Copy link
Author

Unfortunately the rm_anova2 dataset (which is used for testing rm_anova2) produces residuals up to one decimal place. (The first five entry doesn't even have decimal places).

What do you recommend me to do? @raphaelvallat

Copy link
Owner

@raphaelvallat raphaelvallat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lahdjirayhan I have added a few more requests. Thanks again!

docs/changelog.rst Outdated Show resolved Hide resolved
@@ -532,7 +532,8 @@ def rm_anova(data=None, dv=None, within=None, subject=None, correction='auto',

# Calculate sums of squares
ss_with = ((grp_with.mean() - grandmean)**2 * grp_with.count()).sum()
ss_resall = grp_with.apply(lambda x: (x - x.mean())**2).sum()
resid = grp_with.apply(lambda x: (x - x.mean()))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any documentation for this definition of the residuals? If so, can you add as a comment

@@ -938,9 +956,9 @@ def anova(data=None, dv=None, between=None, ss_type=2, detailed=False,
ssbetween = ((grp.mean() - data[dv].mean())**2 * grp.count()).sum()
# Within effect (= error between)
# = (grp.var(ddof=0) * grp.count()).sum()
sserror = grp.apply(lambda x: (x - x.mean())**2).sum()
error = grp.apply(lambda x: x - x.mean())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, do you have any documentation / reference implementation for this method of calculating the residuals?

return _postprocess_dataframe(aov)
aov = _postprocess_dataframe(aov)

aov.residuals_ = 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could use pandas.DataFrame.attrs, e.g:

df.attrs = dict(residuals=resid)

although I have never tried it so I don't know if it works well

dv='Cholesterol', between=['Risk']
).residuals_
array_equal(
resid[0:5].round(3),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that you are always testing only the first 5 values of the residuals. We need to test all values, i.e. array_equal on all the residuals

@raphaelvallat
Copy link
Owner

Ping @lahdjirayhan. I'd love to release a new version of Pingouin in July. Do you think you'll have time to answer my comments before then? Thank you!

@AKJama
Copy link

AKJama commented Nov 24, 2024

Are we still waiting for an update on this for residuals?

@raphaelvallat
Copy link
Owner

I think there is some work required on this PR before we can merge it (see my comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request 🚧 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants