-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mobt 2066 convert forecast to climatological anomaly #2072
base: master
Are you sure you want to change the base?
Mobt 2066 convert forecast to climatological anomaly #2072
Conversation
…ical anomaly, and appropriate tests. **Main Script** The structure of the new CalculateClimateAnomalies class follows that agreed with @brhooper. - Initialisation - Verification of inputs - Matching standard names - Matching units - Matching grids - Matching time coordinates - Calculation of anomalies: both unstandardised and standardised - Creation of output cube with appropriate metadata
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2072 +/- ##
==========================================
- Coverage 98.39% 98.37% -0.03%
==========================================
Files 124 134 +10
Lines 12212 13325 +1113
==========================================
+ Hits 12016 13108 +1092
- Misses 196 217 +21 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @maxwhitemet, it's a good start to fulfilling the requirements t have a plugin which calculates anomaly data for use in the SAMOS work.
I've added some comments to the plugin and what I think are some spurious changes to the checksums. Once these have been resolved, I'll review the unit tests more thoroughly. I've not done this in detail yet as I think that some of the changes I've suggested to the plugin will necessitate changes to the tests anyway.
bd3e4c956a902258be2dcbd3f323cae7b372b138236cc4c7c3cb335258cbb038 ./freezing-rain/one_hour/kgo.nc | ||
85d998556342350f65b991e76ed10ec1c1b272bbe5548fed841dcb7cd1dd6510 ./freezing-rain/one_hour/rain_acc.nc | ||
51602571159257f173e355acbc06a1d5ddcf8ea534602152e308e8fb895adcc7 ./freezing-rain/one_hour/sleet_acc.nc | ||
4873e9a078206d2774207a23c987c2a1f1a514128e50bd185532f55db1b32471 ./freezing-rain/one_hour/temperature_min.nc | ||
5c60fc373a7e7fd8b0fe7143eae5455e2e1bda741ca543e4bc787094294f68b0 ./freezing-rain/three_hour/kgo.nc | ||
e488d0417ccddf388375c137e3527fa8c2f78e1abee52d6db54e5469900702ea ./freezing-rain/three_hour/kgo.nc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that some changes from another branch have snuck in here - these changes shouldn't be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
def _verify_time_coords_match(self, cubes_to_check: List[Cube]) -> None: | ||
diagnostic_cube_time_range = self._get_cube_time_range(self.diagnostic_cube) | ||
for cube in cubes_to_check: | ||
cube_time_range = self._get_cube_time_range(cube) | ||
if cube_time_range != diagnostic_cube_time_range: | ||
raise ValueError( | ||
f"The diagnostic cube and {cube} must have the same time points. " | ||
f"The supplied diagnostic cube has time points: {diagnostic_cube_time_range} " | ||
f"and the supplied {cube} has time points: {cube_time_range}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that we need to check the following for time coordinates:
- The mean and variance cubes should have matching coordinates, as we want to check that they have come from the same climatology calculation
- The time of the diagnostic cube should be within the time bounds of the mean and variance cubes (rather than needing to be equal). This is because the climatology is valid for the period over which it was calculated, rather than for a specific date.
- We may also want to be able to ignore the result of a time check between the diagnostic cube and mean and variance cube bounds. This is for the case where we (for example) calculate a climatology over 30 days, then want to calculate the climatological anomaly of day 31.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added each bullet point. I solved point 3 by setting another parameter in the CalculateClimateAnomalies class called 'ignore_temporal_mismatch' with default value False that when set to True skips this check.
|
||
def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None: | ||
"""Check that all cubes have the same standard name to prevent accidental | ||
use of cubes with referring to different physical phenomenon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use of cubes with referring to different physical phenomenon | |
use of cubes containing data for different diagnostics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
def _get_cubes_to_check(self) -> List[Cube]: | ||
cubes_to_check = [self.mean_cube] | ||
if self.variance_cube: | ||
cubes_to_check.append(self.variance_cube) | ||
return cubes_to_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a case of personal preference, but I would be inclined to include this functionality within verify_inputs
rather than separating it out as a new function. It only gets used once and doesn't contain many lines of code, so I don't think the extra function is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
def _verify_grids_match(self, cubes_to_check: List[Cube]) -> None: | ||
"""Check that all cubes have the same spatial coordinates (i.e. the same grid)""" | ||
|
||
diagnostic_cube_grid = self._get_cube_grid(self.diagnostic_cube) | ||
for cube in cubes_to_check: | ||
cube_grid = self._get_cube_grid(cube) | ||
if not np.array_equal( | ||
cube_grid[0], diagnostic_cube_grid[0] | ||
) or not np.array_equal(cube_grid[1], diagnostic_cube_grid[1]): | ||
raise ValueError( | ||
f"The diagnostic cube and {cube} must have the same grid. " | ||
f"The supplied diagnostic cube has grid: {diagnostic_cube_grid} " | ||
f"and the supplied {cube} has grid: {cube_grid}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could look at using the spatial_coord_match
function in improver/utilities/cube_checker.py
here, as we cannot guarantee that the spatial dimensions will always be in the same position on each cube.
We also need to consider how this will work with site data, where the cubes don't have x and y axes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the spatial_coord_match function specified and used the below conditional statement to account for the site data concern. I used the sum of the points on the 'spot_index' coordinate to make the comparison hopefully faster, as the between-cube comparisons are then made using only a single value per cube rather than element-wise for each array or points.
if any('spot_index' in get_dim_coord_names(cube) for cube in [self.cubes_to_check, self.diagnostic_cube]):
if any(cube.coord('spot_index').points.sum() != self.diagnostic_cube.coord('spot_index').points.sum() for cube in self.cubes_to_check):
raise ValueError(
f"Spot index coordinates do not match.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it is safe to use the sum function for this test. Consider 2 cubes with spot indices as follows:
- Cube 1 points = [1, 4]
- Cube 2 points = [2, 3]
I think that your test will pass given these 2 cubes, which is not what we want to happen.
diagnostic_cube: Cube, | ||
mean_cube: Cube, | ||
variance_cube: Optional[Cube] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention in IMPROVER data inputs (like cubes) are passed in to the process
method, rather than __init__
. More generally, inputs are separated as follows:
__init__
settings/configuration for how the method should be applied, for example in Combine the choice of what operation to apply is passed in here.process
takes the data inputs, for example in Combine this is the cubes to be combined.
The reason for this is another 'it made sense at the time'; StaGE development began before IMPROVER and they used this convention in StaGE, so it was carried over into IMPROVER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Noted for future.
def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None: | ||
"""Check that all cubes have the same standard name to prevent accidental | ||
use of cubes with referring to different physical phenomenon | ||
(e.g. temperature and precipitation). | ||
""" | ||
for cube in cubes_to_check: | ||
if cube.standard_name != self.diagnostic_cube.standard_name: | ||
raise ValueError( | ||
f"The diagnostic cube and {cube} must have the same standard name. " | ||
f"The supplied diagnostic cube has standard name: {self.diagnostic_cube.standard_name} " | ||
f"and the supplied {cube} has standard name: {cube.standard_name}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's preferable to not print out whole cubes in error messages like this. In this instance, you could try:
def _verify_standard_names_match(self, cubes_to_check: List[Cube]) -> None: | |
"""Check that all cubes have the same standard name to prevent accidental | |
use of cubes with referring to different physical phenomenon | |
(e.g. temperature and precipitation). | |
""" | |
for cube in cubes_to_check: | |
if cube.standard_name != self.diagnostic_cube.standard_name: | |
raise ValueError( | |
f"The diagnostic cube and {cube} must have the same standard name. " | |
f"The supplied diagnostic cube has standard name: {self.diagnostic_cube.standard_name} " | |
f"and the supplied {cube} has standard name: {cube.standard_name}" | |
) | |
@staticmethod | |
def _verify_standard_names_match(cubes_to_check: List[Cube]) -> None: | |
"""Check that all cubes have the same standard name to prevent accidental | |
use of cubes with referring to different physical phenomenon | |
(e.g. temperature and precipitation). | |
""" | |
names = [c.standard_name for c in cubes_to_check] | |
if len(set(names)) != 1: | |
raise ValueError( | |
f"All input cubes must have the same standard_name. The following names were found: {names}." | |
) |
Assuming that cubes_to _check
was updated to include all input cubes. This might be easier due to my comment on the __init__
method.
def _verify_units_match(self, cubes_to_check: List[Cube]) -> None: | ||
"""Check that all cubes have the same units. E.g. to prevent accidental | ||
use of cubes with rate data with cubes with accumulation data""" | ||
for cube in cubes_to_check: | ||
if cube.units != self.diagnostic_cube.units: | ||
raise ValueError( | ||
f"The diagnostic cube and {cube} must have the same units. " | ||
f"The supplied diagnostic cube has units: {self.diagnostic_cube.units} " | ||
f"and the supplied {cube} has units: {cube.units}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on _verify_standard_names_match
, you could also use set()
here to check that all of the list items are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I noticed I made a mistake in my previous commit here: the variance cube will have units squared. I have thus not used set() here, but added a further check on whether variance cube units are appropriate, if this cube is provided as an input that is.
def _get_cube_time_range(cube: Cube) -> Tuple[float, float]: | ||
"""Get the time range of the input cube""" | ||
start_time = cube.coord("time").points[0] | ||
end_time = cube.coord("time").points[-1] | ||
return start_time, end_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the .bounds()
method here, as I expect that the mean and variance cubes will have single point time coordinates with bounds representing the period over which they were calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced all instances of _get_cube_time_range()
with the .coord('time').bounds()
method.
def _generate_output_name_and_units(self) -> str: | ||
"""Gets suitable output name and units from input cube metadata""" | ||
if self.variance_cube: | ||
new_name = f"{self.diagnostic_cube.name()}_standardised_anomalies" | ||
new_units = None | ||
else: | ||
new_name = f"{self.diagnostic_cube.name()}_anomalies" | ||
new_units = self.diagnostic_cube.units | ||
return new_name, new_units | ||
|
||
def _create_output_cube(self, data: Union[List[float], ndarray]) -> Cube: | ||
""" | ||
Populates a template cube with data from the anomalies calculation. | ||
|
||
Args: | ||
data: | ||
Anomalies data | ||
|
||
Returns: | ||
Cube with data from anomalies calculation | ||
""" | ||
|
||
name, units = self._generate_output_name_and_units() | ||
attributes = generate_mandatory_attributes([self.diagnostic_cube]) | ||
output_cube = create_new_diagnostic_cube( | ||
name, units, self.diagnostic_cube, attributes, data=np.array(data) | ||
) | ||
|
||
return output_cube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've messaged @bjwheltor and @maxwhitemet to discuss what metadata is required to properly represent the anomaly data that we are producing.
…onvert_forecast_to_climatological_anomaly
Resolved prior requests from Ben Hooper following review. Only now require implementation of further metadata adjustments.
… conventions for anomalies. Metadata added for outputed cube: - Standard names suffixed with "anomaly" if mean cube provided or "standard anomaly" if variance cube also provided - Units set to None if variance cube used and copied from the provided diagnostic cube if no variance cube provided - "Reference epoch" scalar coordinate with (1) timebound from mean cube for anomaly cube, and (2) validity time from diagnostic cube - Cell method of "reference epoch: anomaly"
…ng '==' and not 'is'
…riable assigned but never used
Addresses #2066
Testing:
The structure of the new CalculateClimateAnomalies class follows that agreed with @brhooper