Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mobt 2066 convert forecast to climatological anomaly #2072

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

maxwhitemet
Copy link
Contributor

Addresses #2066

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

The structure of the new CalculateClimateAnomalies class follows that agreed with @brhooper

  • Initialisation
  • Verification of inputs
    • Matching standard names
    • Matching units
    • Matching grids
    • Matching time coordinates
  • Calculation of anomalies: both unstandardised and standardised
  • Creation of output cube with appropriate metadata

…ical anomaly, and appropriate tests.

**Main Script**
The structure of the new CalculateClimateAnomalies class follows that agreed with @brhooper.
- Initialisation
- Verification of inputs
	- Matching standard names
	- Matching units
	- Matching grids
	- Matching time coordinates
- Calculation of anomalies: both unstandardised and standardised
- Creation of output cube with appropriate metadata
Copy link

codecov bot commented Dec 31, 2024

Codecov Report

Attention: Patch coverage is 93.02326% with 6 lines in your changes missing coverage. Please review.

Project coverage is 98.37%. Comparing base (84a8944) to head (c89dbd4).
Report is 62 commits behind head on master.

Files with missing lines Patch % Lines
improver/utilities/mathematical_operations.py 93.02% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2072      +/-   ##
==========================================
- Coverage   98.39%   98.37%   -0.03%     
==========================================
  Files         124      134      +10     
  Lines       12212    13325    +1113     
==========================================
+ Hits        12016    13108    +1092     
- Misses        196      217      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maxwhitemet maxwhitemet requested a review from brhooper January 3, 2025 17:09
Copy link
Contributor

@brhooper brhooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @maxwhitemet, it's a good start to fulfilling the requirements t have a plugin which calculates anomaly data for use in the SAMOS work.

I've added some comments to the plugin and what I think are some spurious changes to the checksums. Once these have been resolved, I'll review the unit tests more thoroughly. I've not done this in detail yet as I think that some of the changes I've suggested to the plugin will necessitate changes to the tests anyway.

Comment on lines +442 to +446
bd3e4c956a902258be2dcbd3f323cae7b372b138236cc4c7c3cb335258cbb038 ./freezing-rain/one_hour/kgo.nc
85d998556342350f65b991e76ed10ec1c1b272bbe5548fed841dcb7cd1dd6510 ./freezing-rain/one_hour/rain_acc.nc
51602571159257f173e355acbc06a1d5ddcf8ea534602152e308e8fb895adcc7 ./freezing-rain/one_hour/sleet_acc.nc
4873e9a078206d2774207a23c987c2a1f1a514128e50bd185532f55db1b32471 ./freezing-rain/one_hour/temperature_min.nc
5c60fc373a7e7fd8b0fe7143eae5455e2e1bda741ca543e4bc787094294f68b0 ./freezing-rain/three_hour/kgo.nc
e488d0417ccddf388375c137e3527fa8c2f78e1abee52d6db54e5469900702ea ./freezing-rain/three_hour/kgo.nc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that some changes from another branch have snuck in here - these changes shouldn't be here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 492 to 501
def _verify_time_coords_match(self, cubes_to_check: List[Cube]) -> None:
diagnostic_cube_time_range = self._get_cube_time_range(self.diagnostic_cube)
for cube in cubes_to_check:
cube_time_range = self._get_cube_time_range(cube)
if cube_time_range != diagnostic_cube_time_range:
raise ValueError(
f"The diagnostic cube and {cube} must have the same time points. "
f"The supplied diagnostic cube has time points: {diagnostic_cube_time_range} "
f"and the supplied {cube} has time points: {cube_time_range}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that we need to check the following for time coordinates:

  • The mean and variance cubes should have matching coordinates, as we want to check that they have come from the same climatology calculation
  • The time of the diagnostic cube should be within the time bounds of the mean and variance cubes (rather than needing to be equal). This is because the climatology is valid for the period over which it was calculated, rather than for a specific date.
  • We may also want to be able to ignore the result of a time check between the diagnostic cube and mean and variance cube bounds. This is for the case where we (for example) calculate a climatology over 30 days, then want to calculate the climatological anomaly of day 31.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added each bullet point. I solved point 3 by setting another parameter in the CalculateClimateAnomalies class called 'ignore_temporal_mismatch' with default value False that when set to True skips this check.


def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None:
"""Check that all cubes have the same standard name to prevent accidental
use of cubes with referring to different physical phenomenon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use of cubes with referring to different physical phenomenon
use of cubes containing data for different diagnostics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 447 to 451
def _get_cubes_to_check(self) -> List[Cube]:
cubes_to_check = [self.mean_cube]
if self.variance_cube:
cubes_to_check.append(self.variance_cube)
return cubes_to_check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a case of personal preference, but I would be inclined to include this functionality within verify_inputs rather than separating it out as a new function. It only gets used once and doesn't contain many lines of code, so I don't think the extra function is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment on lines 477 to 490
def _verify_grids_match(self, cubes_to_check: List[Cube]) -> None:
"""Check that all cubes have the same spatial coordinates (i.e. the same grid)"""

diagnostic_cube_grid = self._get_cube_grid(self.diagnostic_cube)
for cube in cubes_to_check:
cube_grid = self._get_cube_grid(cube)
if not np.array_equal(
cube_grid[0], diagnostic_cube_grid[0]
) or not np.array_equal(cube_grid[1], diagnostic_cube_grid[1]):
raise ValueError(
f"The diagnostic cube and {cube} must have the same grid. "
f"The supplied diagnostic cube has grid: {diagnostic_cube_grid} "
f"and the supplied {cube} has grid: {cube_grid}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could look at using the spatial_coord_match function in improver/utilities/cube_checker.py here, as we cannot guarantee that the spatial dimensions will always be in the same position on each cube.

We also need to consider how this will work with site data, where the cubes don't have x and y axes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the spatial_coord_match function specified and used the below conditional statement to account for the site data concern. I used the sum of the points on the 'spot_index' coordinate to make the comparison hopefully faster, as the between-cube comparisons are then made using only a single value per cube rather than element-wise for each array or points.

if any('spot_index' in get_dim_coord_names(cube) for cube in [self.cubes_to_check, self.diagnostic_cube]):
            if any(cube.coord('spot_index').points.sum() != self.diagnostic_cube.coord('spot_index').points.sum() for cube in self.cubes_to_check):
                raise ValueError(
                    f"Spot index coordinates do not match.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is safe to use the sum function for this test. Consider 2 cubes with spot indices as follows:

  1. Cube 1 points = [1, 4]
  2. Cube 2 points = [2, 3]

I think that your test will pass given these 2 cubes, which is not what we want to happen.

Comment on lines 414 to 416
diagnostic_cube: Cube,
mean_cube: Cube,
variance_cube: Optional[Cube] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention in IMPROVER data inputs (like cubes) are passed in to the process method, rather than __init__. More generally, inputs are separated as follows:

  • __init__ settings/configuration for how the method should be applied, for example in Combine the choice of what operation to apply is passed in here.
  • process takes the data inputs, for example in Combine this is the cubes to be combined.

The reason for this is another 'it made sense at the time'; StaGE development began before IMPROVER and they used this convention in StaGE, so it was carried over into IMPROVER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Noted for future.

Comment on lines 453 to 464
def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None:
"""Check that all cubes have the same standard name to prevent accidental
use of cubes with referring to different physical phenomenon
(e.g. temperature and precipitation).
"""
for cube in cubes_to_check:
if cube.standard_name != self.diagnostic_cube.standard_name:
raise ValueError(
f"The diagnostic cube and {cube} must have the same standard name. "
f"The supplied diagnostic cube has standard name: {self.diagnostic_cube.standard_name} "
f"and the supplied {cube} has standard name: {cube.standard_name}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferable to not print out whole cubes in error messages like this. In this instance, you could try:

Suggested change
def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None:
"""Check that all cubes have the same standard name to prevent accidental
use of cubes with referring to different physical phenomenon
(e.g. temperature and precipitation).
"""
for cube in cubes_to_check:
if cube.standard_name != self.diagnostic_cube.standard_name:
raise ValueError(
f"The diagnostic cube and {cube} must have the same standard name. "
f"The supplied diagnostic cube has standard name: {self.diagnostic_cube.standard_name} "
f"and the supplied {cube} has standard name: {cube.standard_name}"
)
@staticmethod
def _verify_standard_names_match(cubes_to_check: List[Cube]) -> None:
"""Check that all cubes have the same standard name to prevent accidental
use of cubes with referring to different physical phenomenon
(e.g. temperature and precipitation).
"""
names = [c.standard_name for c in cubes_to_check]
if len(set(names)) != 1:
raise ValueError(
f"All input cubes must have the same standard_name. The following names were found: {names}."
)

Assuming that cubes_to _check was updated to include all input cubes. This might be easier due to my comment on the __init__ method.

Comment on lines 466 to 475
def _verify_units_match(self, cubes_to_check: List[Cube]) -> None:
"""Check that all cubes have the same units. E.g. to prevent accidental
use of cubes with rate data with cubes with accumulation data"""
for cube in cubes_to_check:
if cube.units != self.diagnostic_cube.units:
raise ValueError(
f"The diagnostic cube and {cube} must have the same units. "
f"The supplied diagnostic cube has units: {self.diagnostic_cube.units} "
f"and the supplied {cube} has units: {cube.units}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on _verify_standard_names_match, you could also use set() here to check that all of the list items are the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.
I noticed I made a mistake in my previous commit here: the variance cube will have units squared. I have thus not used set() here, but added a further check on whether variance cube units are appropriate, if this cube is provided as an input that is.

Comment on lines 509 to 513
def _get_cube_time_range(cube: Cube) -> Tuple[float, float]:
"""Get the time range of the input cube"""
start_time = cube.coord("time").points[0]
end_time = cube.coord("time").points[-1]
return start_time, end_time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the .bounds() method here, as I expect that the mean and variance cubes will have single point time coordinates with bounds representing the period over which they were calculated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced all instances of _get_cube_time_range() with the .coord('time').bounds() method.

Comment on lines 523 to 551
def _generate_output_name_and_units(self) -> str:
"""Gets suitable output name and units from input cube metadata"""
if self.variance_cube:
new_name = f"{self.diagnostic_cube.name()}_standardised_anomalies"
new_units = None
else:
new_name = f"{self.diagnostic_cube.name()}_anomalies"
new_units = self.diagnostic_cube.units
return new_name, new_units

def _create_output_cube(self, data: Union[List[float], ndarray]) -> Cube:
"""
Populates a template cube with data from the anomalies calculation.

Args:
data:
Anomalies data

Returns:
Cube with data from anomalies calculation
"""

name, units = self._generate_output_name_and_units()
attributes = generate_mandatory_attributes([self.diagnostic_cube])
output_cube = create_new_diagnostic_cube(
name, units, self.diagnostic_cube, attributes, data=np.array(data)
)

return output_cube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've messaged @bjwheltor and @maxwhitemet to discuss what metadata is required to properly represent the anomaly data that we are producing.

Resolved prior requests from Ben Hooper following review. Only now require implementation of further metadata adjustments.
… conventions for anomalies.

Metadata added for outputed cube:
- Standard names suffixed with "anomaly" if mean cube provided or "standard anomaly" if variance cube also provided
- Units set to None if variance cube used and copied from the provided diagnostic cube if no variance cube provided
- "Reference epoch" scalar coordinate with (1) timebound from mean cube for anomaly cube, and (2) validity time from diagnostic cube
- Cell method of "reference epoch: anomaly"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants