Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating test_utils.py #113

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open

Creating test_utils.py #113

wants to merge 13 commits into from

Conversation

Jaydon2005
Copy link
Contributor

Provides a DataFrame with random values for logistic regression testing. Ensures that a scatter plot has been added by checking the number of collections in the axis. Makes sure that the axis limits and labels are set correctly

Provides a DataFrame with random values for logistic regression testing. Ensures that a scatter plot has been added by checking the number of collections in the axis. Makes sure that the axis limits and labels are set correctly
@Jaydon2005 Jaydon2005 requested a review from erexer August 20, 2024 18:59
@Jaydon2005 Jaydon2005 marked this pull request as ready for review August 20, 2024 18:59
@Jaydon2005 Jaydon2005 marked this pull request as draft November 5, 2024 18:54
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

Attention: Patch coverage is 95.52239% with 3 lines in your changes missing coverage. Please review.

Project coverage is 73.63%. Comparing base (bc0710a) to head (3e265a4).
Report is 106 commits behind head on dev.

Files with missing lines Patch % Lines
msdbook/tests/test_utils.py 95.38% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             dev     #113       +/-   ##
==========================================
+ Coverage   2.07%   73.63%   +71.55%     
==========================================
  Files          9       12        +3     
  Lines        529      880      +351     
==========================================
+ Hits          11      648      +637     
+ Misses       518      232      -286     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jaydon2005 Jaydon2005 marked this pull request as ready for review January 6, 2025 18:16
Copy link
Collaborator

@erexer erexer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check that you have pre-commit running, my comp said that two files were reformatted by the black hook.


def test_fit_logit(sample_data):
"""Test the fit_logit function."""
predictors = ['Predictor1', 'Predictor2']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should define this in sample_data()

msdbook/tests/test_utils.py Show resolved Hide resolved

# Call the plot function
contourset = plot_contour_map(
ax, result, sample_data, contour_cmap, dot_cmap, levels, xgrid, ygrid, 'Predictor1', 'Predictor2', base=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should avoid hardcoding values as much as possible

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been clearer here, for tests you want to have hardcoded values as inputs and outputs so that you can make sure the function is doing the right thing, but in general, if you're using the same value over and over, you should put it into a variable and use that variable. For example, you use 'Predictor1' and 'Predictor2' a lot so you should put those into variables.

Added pytest and test mock under independences since they are both required
The sample data is established within the sample_data() fixture. Refrain from hardcoding values in the test_plot_contour_map, opting instead for the dynamic generation of xgrid, ygrid, and levels.
pyproject.toml Outdated Show resolved Hide resolved

# Call the plot function
contourset = plot_contour_map(
ax, result, sample_data, contour_cmap, dot_cmap, levels, xgrid, ygrid, 'Predictor1', 'Predictor2', base=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have been clearer here, for tests you want to have hardcoded values as inputs and outputs so that you can make sure the function is doing the right thing, but in general, if you're using the same value over and over, you should put it into a variable and use that variable. For example, you use 'Predictor1' and 'Predictor2' a lot so you should put those into variables.


# Check that parameters (coefficients) are not empty
assert result.params is not None
assert result.pvalues is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check that the parameters and values are correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the tests only confirm that the parameters and p-values are not empty or None. They do not ensure that the values are correct in a meaningful way, such as being logically reasonable or statistically valid.

msdbook/tests/test_utils.py Show resolved Hide resolved
Jaydon2005 and others added 3 commits January 13, 2025 09:48
Put pytest and pytest mock in dev section
The names of the columns Predictor1, Predictor2, and Interaction are now set as variables at the start of the file, which are PREDICTOR_1, PREDICTOR_2, and INTERACTION. The test_empty_data function was made clearer. First checking if the dataframe is empty before plotting or fitting the model. The empty_df.empty check helps prevent plotting when there is no data. The test also raises a ValueError if you attempt to fit the model on an empty dataset. np.all(np.isfinite(result.pvalues)) checks that all p-values are valid numbers. Checks if at least one of the coefficients has a p-value below 0.05 using assert np.any(result.pvalues < 0.05. This approach makes more sense because p-values can vary in regression analysis.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants