-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IMPRO-511 implement metadata cube saving #538
IMPRO-511 implement metadata cube saving #538
Conversation
… Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient
…uares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file
@@ -225,6 +225,11 @@ def test_ordering_for_realization_threshold_percentile_over_coordinate( | |||
self.assertArrayAlmostEqual(result.coord_dims("latitude")[0], 4) | |||
self.assertArrayAlmostEqual(result.coord_dims("longitude")[0], 5) | |||
|
|||
def test_attributes(self): | |||
"""Test that metadata attributes are successfully striped out.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "stripped"
@@ -52,6 +52,7 @@ def set_up_test_cube(): | |||
cube = set_up_cube(data, 'air_temperature', 'K', realizations=([0])) | |||
cube.attributes['Conventions'] = 'CF-1.5' | |||
cube.attributes['source_realizations'] = np.arange(12) | |||
cube.attributes['spp__form_of_difference'] = 'spv__forward_difference' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line.
"""Test that metadata attributes are successfully striped out.""" | ||
result = load_cube(self.filepath) | ||
self.assertNotIn('bald__isPrefixedBy', result.attributes.keys()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a new unit test to check that the "prefixes" cube is not loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! - unittest now added that checks there is no prefix cube after load. Also corrected typo and removed spp__form of difference line as per comments.
…m test_save.py as per reviewers comments
Does this play nicely with third party tools and iris load cube outside IMPROVER, or does it cause headaches? |
In IMPRO-511 I have attached some screenshots from the results using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! 😄 My only suggestion is that unit tests are needed for the new function that has been added as part of this PR. All tests pass with the updated known good output.
lib/improver/utilities/save.py
Outdated
@@ -33,6 +33,36 @@ | |||
import iris | |||
|
|||
|
|||
def append_metadata_cube(cubelist): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needs some sort of unit tests. You could check that the resulting prefix cube actually has the attributes that you're adding, as well as that the resulting cube has a bald__isPrefixedBy
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test class and unittests added to check prefix cube attributes and bald__isPrefixedBy attribute on other cubes in cube list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked through the kgo data that Mark has put together and I think there are issues with the metadata:
-
The metadata seems to have moved the global attributes into the variables which isn't right at all.
-
It then doesn't have grid_id, institution, source, title, um_version within the Global attributes.
I'm not sure why this is happening, but I think it needs investigating before I can carry on reviewing it.
Good catch ! It seems that that unless the attributes are on both the data and prefix cube, they will not be saved as 'global' attributes during the netCDF save, (according to the Iris documentation). Easiest solution might be to just copy the data cube attributes to the prefix cube, but it is not desirable to copy all of the attributes across. It also appears that not all the test data used has global attributes 'grid_id', 'institution' etc. which is why this not appearing. Also its been noted that the prefix cube has been assigned units of '1' by Iris. |
…bal attributes from the data cubes and add them to the prefix cube
lib/improver/utilities/save.py
Outdated
or k is 'grid_id' | ||
or k is 'history' | ||
or k is 'Conventions' | ||
or k is 'title'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid multiple definitions, could we pass in the list of global keys as an argument to append_metadata_cube
? This would then simplify this code block to be:
keys_for_global_attr = {k for k in cube.attributes.keys() if k in global_keys}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so I've just tried this out using a list of global keys in append_metadata_cube (which could or course be passed in as argument ultimately).
Your code seems to be producing a set rather than dictionary which caused problems when adding the keys to the prefix cube, so I added a line to produce a dictionary from the set which also set's the global key's value to ' '. I'm not sure if this is what you meant?
For the attributes to kept as global I'm not sure if the keys and the values of the attributes have to be the same or just the keys? (Its not clear to me from the Iris2 docs, but guess I'll find out when I generate some new data)
I'll push up this commit so you can see what you think.......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could have it as I put it (returning a set), but then further down (at line 65) you need the following:
for key in keys_for_global_attr:
prefix_cube.attributes[key] = cube.attributes[key]
That's slightly simpler, but probably doesn't matter if your code works.
However I do think it would be best if we didn't define the list of global keys twice. I'd recommend changing append_metadata_cube(cubelist)
to append_metadata_cube(cubelist, global_keys)
, and passing in the list of global keys as defined in save_netcdf
. An alternative would be to define GLOBAL_KEYS
at the top of the file as a global variable, rather than locally within the functions, but this is not the preferred convention (see https://google.github.io/styleguide/pyguide.html#Global_variables).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be the best way, passing in the global_keys list as an argument to append_metadata_cube.
lib/improver/utilities/save.py
Outdated
# global attributes to the prefix cube (otherwise they will be made | ||
# variables in the netCDF file). | ||
for key, value in keys_for_global_attr.items(): | ||
prefix_cube.attributes[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually, I think what this will do (when you create a set above) is add attributes to the prefix cube which have no value (as value
is an empty string). Have you looked at an output cube to see what this does? I think you may need to assign using prefix_cube.attributes[key] = cube.attributes[key]
to ensure the same global attributes are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe a unit test to check the right global attributes are added?
…ot be left as empty string
… change made to 02_basic_mean.bats used for capturing output
k for k in keys_in_prefix_cube.keys() if k in self.global_keys_ref] | ||
|
||
self.assertListEqual( | ||
sorted(self.global_keys_ref), sorted(prefix_global_keys)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is comparing strictly the right thing. We want to make sure that all global keys that were on the input cube are also now on the prefix cube. Even though all of self.global_keys_ref
are currently on self.cube
, this is not by design and could lead to misinterpretation in the future. Suggest the following:
prefix_global_keys = [
k for k in keys_in_prefix_cube.keys() if k in self.global_keys_ref]
data_cube_global_keys = [
k for k in self.cube.attributes.keys() if k in self.global_keys_ref]
self.assertListEqual(
sorted(prefix_global_keys), sorted(data_cube_global_keys))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually what might be even better is:
prefix_global_attrs = [
attr for attr in keys_in_prefix_cube if attr.keys() in self.global_keys_ref]
data_global_attrs = [
attr for attr in self.cube.attributes if attr.keys() in self.global_keys_ref]
self.assertDictEqual(prefix_global_attrs, data_global_attrs)
As I believe this will check both the attributes' keys and their values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've established the above doesn't work. It would still be good to check values using eg the following:
for key in prefix_global_keys:
self.assertEqual(metadata_cubelist[-1].attributes[key], self.cube.attributes[keys])
so that Iris2 keeps these attributes global in any resultant | ||
netCDF file saved using these cubes""" | ||
|
||
cube_list = ([self.cube, self.cube]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't the expected functionality (long term), suggest only one cube in this list (ie cube_list = [self.cube]
).
… prefix cube than rather than keys from global_keys_ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a very small change to this PR, but otherwise I'm happy with this. Thanks for your work @markysparks! 😄
* 'master' of github.com:metoppv/improver: Fix for IMPRO-564 where ECDF bounds were not defined for max daytime… (metoppv#550) Removed changes to forecast reference time coordinate in min / max spot extract (metoppv#552) Edit to force the gradient CLI to convert a tuple to a cubelist, as save_netcdf expects to be handling cubelists. (metoppv#557) IMPRO-511 implement metadata cube saving (metoppv#538) Removed use of deprecated flavor argument within pandas. (metoppv#553) Update PULL_REQUEST_TEMPLATE.md Created wind direction averaging plugin (metoppv#535)
* Initial incorporation of metadata cube saving based on IMPRO-466/464. Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient * Revert changes to spatial.py and test_DifferenceBetweenAdjacentGridSquares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file * Remove some print statements only used during testing * Remove superfluous comment * Correct comment typo and remove spp__form_of_difference attribute from test_save.py as per reviewers comments * Add unit test to check that metatdata prefixes cube is discarded during load * Add unit tests for append_metadata_cube functionality * inital fix to keep global attributes in netCDF file, get required global attributes from the data cubes and add them to the prefix cube * change iteritems() to items() for Python 3 compatability * Use a list of global keys as suggested by cgsandford * Pass in global_keys as an argument to append_metadata_cube so as to avoid redefining this list * Changes as per cgsandford suggestions, tests confirmed key value cannot be left as empty string * amend comments * add unit test to check global attributes added to prefix cube, revert change made to 02_basic_mean.bats used for capturing output * only use one cube in input list, compare global keys on input cube to prefix cube than rather than keys from global_keys_ref * shorten line 223 length, travis build complaining about line length of 80 characters * add check that key values are also the same in prefix and data cube * Removed redundant conversion of cubelist to be a variable of type list.
Reference issue #439
Testing:
Implement the prototype work undertaken in IMPRO-466 (GH issue #414) to insert attributes that provide a link to a metadata description of the IMPROVER project specific metadata.
Note that the changes from IMPRO-464 (GH issue #412) to
spatial.py
andtest_DifferenceBetweenAdjacentGridSquares.py
have not been included so as not to update the 'form_of_difference' attribute output from improver-gradient, only the metadata cube from IMPRO-466. Implementation of IMPRO-464 is linked to decisions made as a result of IMPRO-507.