Add residuals_ attribute to pg.anova output #260

lahdjirayhan · 2022-04-09T09:14:05Z

codecov · 2022-04-09T09:16:23Z

Codecov Report

Merging #260 (50a6bbe) into master (b1c334d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #260   +/-   ##
=======================================
  Coverage   98.99%   99.00%           
=======================================
  Files          19       19           
  Lines        3290     3318   +28     
  Branches      527      528    +1     
=======================================
+ Hits         3257     3285   +28     
  Misses         17       17           
  Partials       16       16

Impacted Files	Coverage Δ
pingouin/parametric.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1c334d...50a6bbe. Read the comment docs.

raphaelvallat · 2022-04-12T21:09:50Z

Hi @lahdjirayhan,

This is great — thank you so much for the clean PR!

Two minor requests before I merge:

Could you please add some unit tests in the test_parametric.py file. Specifically, we want to make sure that the obtained residuals are correct, that is, similar to what we get with another statistical program (e.g. statsmodels in Python, JAMOVI, JASP, etc).
If you want, feel free to edit the changelog.rst file to update the release notes in the documentation.

Thanks again for your help,
Raphael

raphaelvallat · 2022-04-14T23:15:34Z

This looks great, thanks! I saw that you added the residuals for the anova, mixed_anova, welch_anova and ancova function. Would it be possible to add the residuals to the rm_anova function as well, such that all the ANOVA-related functions are covered?

Also, thanks for adding the unit testing! Can you please make sure to have at least 3 decimals in the expected (R) output?

Raphael

lahdjirayhan · 2022-04-15T14:07:56Z

Some of the tests currently compare up to 2 decimal places only because of the dataset used. I'll use different dataset so it will produce 3 or more decimal places.

Regarding rm_anova, unfortunately I think I can't add residuals for it. I'm not confident enough even after researching for more.

lahdjirayhan · 2022-04-15T18:22:49Z

Never mind, I think I found a way to add residuals to rm_anova. Expect more commits to come.

lahdjirayhan · 2022-04-15T22:54:40Z

Unfortunately the rm_anova2 dataset (which is used for testing rm_anova2) produces residuals up to one decimal place. (The first five entry doesn't even have decimal places).

What do you recommend me to do? @raphaelvallat

raphaelvallat

@lahdjirayhan I have added a few more requests. Thanks again!

docs/changelog.rst

raphaelvallat · 2022-05-02T22:34:51Z

pingouin/parametric.py

@@ -532,7 +532,8 @@ def rm_anova(data=None, dv=None, within=None, subject=None, correction='auto',

    # Calculate sums of squares
    ss_with = ((grp_with.mean() - grandmean)**2 * grp_with.count()).sum()
-    ss_resall = grp_with.apply(lambda x: (x - x.mean())**2).sum()
+    resid = grp_with.apply(lambda x: (x - x.mean()))


Do you have any documentation for this definition of the residuals? If so, can you add as a comment

raphaelvallat · 2022-05-02T22:36:21Z

pingouin/parametric.py

@@ -938,9 +956,9 @@ def anova(data=None, dv=None, between=None, ss_type=2, detailed=False,
    ssbetween = ((grp.mean() - data[dv].mean())**2 * grp.count()).sum()
    # Within effect (= error between)
    #  = (grp.var(ddof=0) * grp.count()).sum()
-    sserror = grp.apply(lambda x: (x - x.mean())**2).sum()
+    error = grp.apply(lambda x: x - x.mean())


Same here, do you have any documentation / reference implementation for this method of calculating the residuals?

raphaelvallat · 2022-05-02T22:42:26Z

pingouin/parametric.py

-    return _postprocess_dataframe(aov)
+    aov = _postprocess_dataframe(aov)
+
+    aov.residuals_ = 0


Alternatively, we could use pandas.DataFrame.attrs, e.g:

df.attrs = dict(residuals=resid)

although I have never tried it so I don't know if it works well

raphaelvallat · 2022-05-02T22:48:59Z

pingouin/tests/test_parametric.py

+            dv='Cholesterol', between=['Risk']
+        ).residuals_
+        array_equal(
+            resid[0:5].round(3),


I noticed that you are always testing only the first 5 values of the residuals. We need to test all values, i.e. array_equal on all the residuals

raphaelvallat · 2022-06-07T17:55:18Z

Ping @lahdjirayhan. I'd love to release a new version of Pingouin in July. Do you think you'll have time to answer my comments before then? Thank you!

AKJama · 2024-11-24T18:04:48Z

Are we still waiting for an update on this for residuals?

raphaelvallat · 2024-12-08T12:56:41Z

I think there is some work required on this PR before we can merge it (see my comments).

lahdjirayhan added 2 commits April 9, 2022 07:54

Add residuals_ attribute to ANOVA outputs

fdda464

Update docstring for pg.anova to reflect residuals_

cd566bb

raphaelvallat self-requested a review April 12, 2022 21:02

raphaelvallat added the feature request 🚧 New feature or request label Apr 12, 2022

lahdjirayhan added 2 commits April 14, 2022 00:54

Add tests to compare anova residuals_ to results from R

e21366b

Update changelog for anova residuals_

4c38d70

Update test for anova residuals to compare >3 decimal places against R

dbe190f

lahdjirayhan added 3 commits April 16, 2022 03:11

Add tests for residuals_ of mixed_anova, rm_anova, and rm_anova2

dccc1a9

Add residuals_ to rm_anova

7089127

Modify tests to round residuals_ first before asserting array_equal

7b71c7f

raphaelvallat requested changes May 2, 2022

View reviewed changes

raphaelvallat added 5 commits June 20, 2022 10:23

Remove change to changelog

99f6e47

Merge branch 'master' into anova-resid

6c1b086

Black formatting

d9ab352

Revert changes to unit tests

1d5a064

Typo in unit tests

50a6bbe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add residuals_ attribute to pg.anova output #260

Add residuals_ attribute to pg.anova output #260

lahdjirayhan commented Apr 9, 2022

codecov bot commented Apr 9, 2022 •

edited

Loading

raphaelvallat commented Apr 12, 2022

raphaelvallat commented Apr 14, 2022

lahdjirayhan commented Apr 15, 2022 •

edited

Loading

lahdjirayhan commented Apr 15, 2022

lahdjirayhan commented Apr 15, 2022

raphaelvallat left a comment

raphaelvallat May 2, 2022

raphaelvallat May 2, 2022

raphaelvallat May 2, 2022

raphaelvallat May 2, 2022

raphaelvallat commented Jun 7, 2022

AKJama commented Nov 24, 2024

raphaelvallat commented Dec 8, 2024

Add residuals_ attribute to pg.anova output #260

Are you sure you want to change the base?

Add residuals_ attribute to pg.anova output #260

Conversation

lahdjirayhan commented Apr 9, 2022

codecov bot commented Apr 9, 2022 • edited Loading

Codecov Report

raphaelvallat commented Apr 12, 2022

raphaelvallat commented Apr 14, 2022

lahdjirayhan commented Apr 15, 2022 • edited Loading

lahdjirayhan commented Apr 15, 2022

lahdjirayhan commented Apr 15, 2022

raphaelvallat left a comment

Choose a reason for hiding this comment

raphaelvallat May 2, 2022

Choose a reason for hiding this comment

raphaelvallat May 2, 2022

Choose a reason for hiding this comment

raphaelvallat May 2, 2022

Choose a reason for hiding this comment

raphaelvallat May 2, 2022

Choose a reason for hiding this comment

raphaelvallat commented Jun 7, 2022

AKJama commented Nov 24, 2024

raphaelvallat commented Dec 8, 2024

codecov bot commented Apr 9, 2022 •

edited

Loading

lahdjirayhan commented Apr 15, 2022 •

edited

Loading