New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Mobt 2066 convert forecast to climatological anomaly #2072

Open

maxwhitemet wants to merge 11 commits into metoppv:master from maxwhitemet:mobt_2066_convert_forecast_to_climatological_anomaly

Contributor

maxwhitemet commented Dec 31, 2024

Addresses #2066

Testing:

Ran tests and they passed OK
Added new tests for the new feature(s)

The structure of the new CalculateClimateAnomalies class follows that agreed with @brhooper

Initialisation
Verification of inputs
- Matching standard names
- Matching units
- Matching grids
- Matching time coordinates
Calculation of anomalies: both unstandardised and standardised
Creation of output cube with appropriate metadata

maxwhitemet added 2 commits

December 19, 2024 10:50


          Recreated checksums

365ff15


          Added the requested functionality to convert a forecast to climatolog…

103604e

…ical anomaly, and appropriate tests.

**Main Script**
The structure of the new CalculateClimateAnomalies class follows that agreed with @brhooper.
- Initialisation
- Verification of inputs
	- Matching standard names
	- Matching units
	- Matching grids
	- Matching time coordinates
- Calculation of anomalies: both unstandardised and standardised
- Creation of output cube with appropriate metadata

codecov bot commented Dec 31, 2024 •

edited

Loading

Codecov Report

Attention: Patch coverage is 93.02326% with 6 lines in your changes missing coverage. Please review.

Project coverage is 98.37%. Comparing base (84a8944) to head (c89dbd4).
Report is 62 commits behind head on master.

Files with missing lines	Patch %	Lines
improver/utilities/mathematical_operations.py	93.02%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2072      +/-   ##
==========================================
- Coverage   98.39%   98.37%   -0.03%     
==========================================
  Files         124      134      +10     
  Lines       12212    13325    +1113     
==========================================
+ Hits        12016    13108    +1092     
- Misses        196      217      +21

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maxwhitemet added 4 commits

December 31, 2024 12:27


          Removed whitespace from files.

a25b7bb


          Resolved test coverage issues Codecov highlighted.

d8e1f3d


          Resolved remaining test coverage issues.

8c39946


          Resolved formatting issues.

7743d79

maxwhitemet requested a review from brhooper

January 3, 2025 17:09

brhooper requested changes

View reviewed changes

Contributor

brhooper left a comment

Thanks for this @maxwhitemet, it's a good start to fulfilling the requirements t have a plugin which calculates anomaly data for use in the SAMOS work.

I've added some comments to the plugin and what I think are some spurious changes to the checksums. Once these have been resolved, I'll review the unit tests more thoroughly. I've not done this in detail yet as I think that some of the changes I've suggested to the plugin will necessitate changes to the tests anyway.

improver_tests/acceptance/SHA256SUMS

Comment on lines +442 to +446

    
              bd3e4c956a902258be2dcbd3f323cae7b372b138236cc4c7c3cb335258cbb038  ./freezing-rain/one_hour/kgo.nc

              85d998556342350f65b991e76ed10ec1c1b272bbe5548fed841dcb7cd1dd6510  ./freezing-rain/one_hour/rain_acc.nc

              51602571159257f173e355acbc06a1d5ddcf8ea534602152e308e8fb895adcc7  ./freezing-rain/one_hour/sleet_acc.nc

              4873e9a078206d2774207a23c987c2a1f1a514128e50bd185532f55db1b32471  ./freezing-rain/one_hour/temperature_min.nc

              5c60fc373a7e7fd8b0fe7143eae5455e2e1bda741ca543e4bc787094294f68b0  ./freezing-rain/three_hour/kgo.nc

              e488d0417ccddf388375c137e3527fa8c2f78e1abee52d6db54e5469900702ea  ./freezing-rain/three_hour/kgo.nc

Contributor

brhooper Jan 9, 2025

I think that some changes from another branch have snuck in here - these changes shouldn't be here.

Contributor Author

maxwhitemet Jan 17, 2025

Done.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 492 to 501

+                  def _verify_time_coords_match(self, cubes_to_check: List[Cube]) -> None:
+                      diagnostic_cube_time_range = self._get_cube_time_range(self.diagnostic_cube)
+                      for cube in cubes_to_check:
+                          cube_time_range = self._get_cube_time_range(cube)
+                          if cube_time_range != diagnostic_cube_time_range:
+                              raise ValueError(
+                                  f"The diagnostic cube and {cube} must have the same time points. "
+                                  f"The supplied diagnostic cube has time points: {diagnostic_cube_time_range} "
+                                  f"and the supplied {cube} has time points: {cube_time_range}"
+                              )

Contributor

brhooper Jan 9, 2025

I expect that we need to check the following for time coordinates:

The mean and variance cubes should have matching coordinates, as we want to check that they have come from the same climatology calculation
The time of the diagnostic cube should be within the time bounds of the mean and variance cubes (rather than needing to be equal). This is because the climatology is valid for the period over which it was calculated, rather than for a specific date.
We may also want to be able to ignore the result of a time check between the diagnostic cube and mean and variance cube bounds. This is for the case where we (for example) calculate a climatology over 30 days, then want to calculate the climatological anomaly of day 31.

Contributor Author

maxwhitemet Jan 16, 2025

Added each bullet point. I solved point 3 by setting another parameter in the CalculateClimateAnomalies class called 'ignore_temporal_mismatch' with default value False that when set to True skips this check.

improver/utilities/mathematical_operations.py Outdated

+                  def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None:
+                      """Check that all cubes have the same standard name to prevent accidental
+                      use of cubes with referring to different physical phenomenon

Contributor

brhooper Jan 9, 2025

Suggested change

      
                    use of cubes with referring to different physical phenomenon
          
                    use of cubes containing data for different diagnostics

Contributor Author

maxwhitemet Jan 16, 2025

Done.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 447 to 451

+                  def _get_cubes_to_check(self) -> List[Cube]:
+                      cubes_to_check = [self.mean_cube]
+                      if self.variance_cube:
+                          cubes_to_check.append(self.variance_cube)
+                      return cubes_to_check

Contributor

brhooper Jan 9, 2025

This is probably a case of personal preference, but I would be inclined to include this functionality within verify_inputs rather than separating it out as a new function. It only gets used once and doesn't contain many lines of code, so I don't think the extra function is necessary.

Contributor Author

maxwhitemet Jan 16, 2025

Added.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 477 to 490

+                  def _verify_grids_match(self, cubes_to_check: List[Cube]) -> None:
+                      """Check that all cubes have the same spatial coordinates (i.e. the same grid)"""
+                      diagnostic_cube_grid = self._get_cube_grid(self.diagnostic_cube)
+                      for cube in cubes_to_check:
+                          cube_grid = self._get_cube_grid(cube)
+                          if not np.array_equal(
+                              cube_grid[0], diagnostic_cube_grid[0]
+                          ) or not np.array_equal(cube_grid[1], diagnostic_cube_grid[1]):
+                              raise ValueError(
+                                  f"The diagnostic cube and {cube} must have the same grid. "
+                                  f"The supplied diagnostic cube has grid: {diagnostic_cube_grid} "
+                                  f"and the supplied {cube} has grid: {cube_grid}"
+                              )

Contributor

brhooper Jan 10, 2025

You could look at using the spatial_coord_match function in improver/utilities/cube_checker.py here, as we cannot guarantee that the spatial dimensions will always be in the same position on each cube.

We also need to consider how this will work with site data, where the cubes don't have x and y axes.

Contributor Author

maxwhitemet Jan 17, 2025

Used the spatial_coord_match function specified and used the below conditional statement to account for the site data concern. I used the sum of the points on the 'spot_index' coordinate to make the comparison hopefully faster, as the between-cube comparisons are then made using only a single value per cube rather than element-wise for each array or points.

if any('spot_index' in get_dim_coord_names(cube) for cube in [self.cubes_to_check, self.diagnostic_cube]):
            if any(cube.coord('spot_index').points.sum() != self.diagnostic_cube.coord('spot_index').points.sum() for cube in self.cubes_to_check):
                raise ValueError(
                    f"Spot index coordinates do not match.")

Contributor

brhooper Jan 20, 2025

I don't think that it is safe to use the sum function for this test. Consider 2 cubes with spot indices as follows:

Cube 1 points = [1, 4]
Cube 2 points = [2, 3]

I think that your test will pass given these 2 cubes, which is not what we want to happen.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 414 to 416

+                      diagnostic_cube: Cube,
+                      mean_cube: Cube,
+                      variance_cube: Optional[Cube] = None,

Contributor

brhooper Jan 14, 2025

By convention in IMPROVER data inputs (like cubes) are passed in to the process method, rather than __init__. More generally, inputs are separated as follows:

__init__ settings/configuration for how the method should be applied, for example in Combine the choice of what operation to apply is passed in here.
process takes the data inputs, for example in Combine this is the cubes to be combined.

The reason for this is another 'it made sense at the time'; StaGE development began before IMPROVER and they used this convention in StaGE, so it was carried over into IMPROVER.

Contributor Author

maxwhitemet Jan 17, 2025

Done. Noted for future.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 453 to 464

+                  def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None:
+                      """Check that all cubes have the same standard name to prevent accidental
+                      use of cubes with referring to different physical phenomenon
+                      (e.g. temperature and precipitation).
+                      """
+                      for cube in cubes_to_check:
+                          if cube.standard_name != self.diagnostic_cube.standard_name:
+                              raise ValueError(
+                                  f"The diagnostic cube and {cube} must have the same standard name. "
+                                  f"The supplied diagnostic cube has standard name: {self.diagnostic_cube.standard_name} "
+                                  f"and the supplied {cube} has standard name: {cube.standard_name}"
+                              )

Contributor

brhooper Jan 14, 2025

It's preferable to not print out whole cubes in error messages like this. In this instance, you could try:

Suggested change

      
                def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None:
          
                    """Check that all cubes have the same standard name to prevent accidental
          
                    use of cubes with referring to different physical phenomenon
          
                    (e.g. temperature and precipitation).
          
                    """
          
                    for cube in cubes_to_check:
          
                        if cube.standard_name != self.diagnostic_cube.standard_name:
          
                            raise ValueError(
          
                                f"The diagnostic cube and {cube} must have the same standard name. "
          
                                f"The supplied diagnostic cube has standard name: {self.diagnostic_cube.standard_name} "
          
                                f"and the supplied {cube} has standard name: {cube.standard_name}"
          
                            )
          
                @staticmethod
          
                def _verify_standard_names_match(cubes_to_check: List[Cube]) -> None:
          
                    """Check that all cubes have the same standard name to prevent accidental
          
                    use of cubes with referring to different physical phenomenon
          
                    (e.g. temperature and precipitation).
          
                    """
          
                    names = [c.standard_name for c in cubes_to_check]
          
                    if len(set(names)) != 1:
          
                        raise ValueError(
          
                            f"All input cubes must have the same standard_name. The following names were found: {names}."
          
                        )

Assuming that cubes_to _check was updated to include all input cubes. This might be easier due to my comment on the __init__ method.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 466 to 475

+                  def _verify_units_match(self, cubes_to_check: List[Cube]) -> None:
+                      """Check that all cubes have the same units. E.g. to prevent accidental
+                      use of cubes with rate data with cubes with accumulation data"""
+                      for cube in cubes_to_check:
+                          if cube.units != self.diagnostic_cube.units:
+                              raise ValueError(
+                                  f"The diagnostic cube and {cube} must have the same units. "
+                                  f"The supplied diagnostic cube has units: {self.diagnostic_cube.units} "
+                                  f"and the supplied {cube} has units: {cube.units}"
+                              )

Contributor

brhooper Jan 14, 2025

See my comment on _verify_standard_names_match, you could also use set() here to check that all of the list items are the same.

Contributor Author

maxwhitemet Jan 17, 2025

Done.
I noticed I made a mistake in my previous commit here: the variance cube will have units squared. I have thus not used set() here, but added a further check on whether variance cube units are appropriate, if this cube is provided as an input that is.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 509 to 513

+                  def _get_cube_time_range(cube: Cube) -> Tuple[float, float]:
+                      """Get the time range of the input cube"""
+                      start_time = cube.coord("time").points[0]
+                      end_time = cube.coord("time").points[-1]
+                      return start_time, end_time

Contributor

brhooper Jan 14, 2025

You could use the .bounds() method here, as I expect that the mean and variance cubes will have single point time coordinates with bounds representing the period over which they were calculated.

Contributor Author

maxwhitemet Jan 16, 2025

I have replaced all instances of _get_cube_time_range() with the .coord('time').bounds() method.

improver/utilities/mathematical_operations.py Outdated

Comment on lines 523 to 551

+                  def _generate_output_name_and_units(self) -> str:
+                      """Gets suitable output name and units from input cube metadata"""
+                      if self.variance_cube:
+                          new_name = f"{self.diagnostic_cube.name()}_standardised_anomalies"
+                          new_units = None
+                      else:
+                          new_name = f"{self.diagnostic_cube.name()}_anomalies"
+                          new_units = self.diagnostic_cube.units
+                      return new_name, new_units
+                  def _create_output_cube(self, data: Union[List[float], ndarray]) -> Cube:
+                      """
+                      Populates a template cube with data from the anomalies calculation.
+                      Args:
+                          data:
+                              Anomalies data
+                      Returns:
+                          Cube with data from anomalies calculation
+                      """
+                      name, units = self._generate_output_name_and_units()
+                      attributes = generate_mandatory_attributes([self.diagnostic_cube])
+                      output_cube = create_new_diagnostic_cube(
+                          name, units, self.diagnostic_cube, attributes, data=np.array(data)
+                      )
+                      return output_cube

Contributor

brhooper Jan 14, 2025

I've messaged @bjwheltor and @maxwhitemet to discuss what metadata is required to properly represent the anomaly data that we are producing.

maxwhitemet added 5 commits

January 17, 2025 10:49


          Merge branch 'master' of github.com:metoppv/improver into mobt_2066_c…

61a8a52

…onvert_forecast_to_climatological_anomaly


          Partial commit. Switching branches to work on another ticket.

9c21fc1

Resolved prior requests from Ben Hooper following review. Only now require implementation of further metadata adjustments.


          Implemented points from previous review, including compliance with CF…

089d17c

… conventions for anomalies.

Metadata added for outputed cube:
- Standard names suffixed with "anomaly" if mean cube provided or "standard anomaly" if variance cube also provided
- Units set to None if variance cube used and copied from the provided diagnostic cube if no variance cube provided
- "Reference epoch" scalar coordinate with (1) timebound from mean cube for anomaly cube, and (2) validity time from diagnostic cube
- Cell method of "reference epoch: anomaly"


          Resolved failure of pre-commmit checks due to comparing to 'None' usi…

…ng '==' and not 'is'


          Resolved warning from pre-commit checks that 'expected_attributes' va…

c89dbd4

…riable assigned but never used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet