IMPRO-511 implement metadata cube saving #538

markysparks · 2018-05-07T19:39:17Z

Reference issue #439

Testing:

Ran tests and they passed OK using new kgo.
Added new tests for the new feature(s)

Implement the prototype work undertaken in IMPRO-466 (GH issue #414) to insert attributes that provide a link to a metadata description of the IMPROVER project specific metadata.

Note that the changes from IMPRO-464 (GH issue #412) to spatial.py and test_DifferenceBetweenAdjacentGridSquares.py have not been included so as not to update the 'form_of_difference' attribute output from improver-gradient, only the metadata cube from IMPRO-466. Implementation of IMPRO-464 is linked to decisions made as a result of IMPRO-507.

… Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient

…uares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file

cgsandford · 2018-05-08T07:38:54Z

lib/improver/tests/utilities/test_load.py

@@ -225,6 +225,11 @@ def test_ordering_for_realization_threshold_percentile_over_coordinate(
        self.assertArrayAlmostEqual(result.coord_dims("latitude")[0], 4)
        self.assertArrayAlmostEqual(result.coord_dims("longitude")[0], 5)

+    def test_attributes(self):
+        """Test that metadata attributes are successfully striped out."""


Typo: "stripped"

cgsandford · 2018-05-08T07:39:11Z

lib/improver/tests/utilities/test_save.py

@@ -52,6 +52,7 @@ def set_up_test_cube():
    cube = set_up_cube(data, 'air_temperature', 'K', realizations=([0]))
    cube.attributes['Conventions'] = 'CF-1.5'
    cube.attributes['source_realizations'] = np.arange(12)
+    cube.attributes['spp__form_of_difference'] = 'spv__forward_difference'


Please remove this line.

cgsandford · 2018-05-08T07:43:35Z

lib/improver/tests/utilities/test_load.py

+        """Test that metadata attributes are successfully striped out."""
+        result = load_cube(self.filepath)
+        self.assertNotIn('bald__isPrefixedBy', result.attributes.keys())
+


Can we add a new unit test to check that the "prefixes" cube is not loaded.

Good point! - unittest now added that checks there is no prefix cube after load. Also corrected typo and removed spp__form of difference line as per comments.

…m test_save.py as per reviewers comments

benfitzpatrick · 2018-05-08T08:11:11Z

Does this play nicely with third party tools and iris load cube outside IMPROVER, or does it cause headaches?

markysparks · 2018-05-08T08:17:34Z

In IMPRO-511 I have attached some screenshots from the results using ncview, ncinfo, cfchecker and iris.load which seem to be working fine.

gavinevans

Good work! 😄 My only suggestion is that unit tests are needed for the new function that has been added as part of this PR. All tests pass with the updated known good output.

gavinevans · 2018-05-08T10:22:30Z

lib/improver/utilities/save.py

@@ -33,6 +33,36 @@
 import iris


+def append_metadata_cube(cubelist):


This function needs some sort of unit tests. You could check that the resulting prefix cube actually has the attributes that you're adding, as well as that the resulting cube has a bald__isPrefixedBy attribute.

New test class and unittests added to check prefix cube attributes and bald__isPrefixedBy attribute on other cubes in cube list.

…ng load

Katie-Howard

I have looked through the kgo data that Mark has put together and I think there are issues with the metadata:

The metadata seems to have moved the global attributes into the variables which isn't right at all.
It then doesn't have grid_id, institution, source, title, um_version within the Global attributes.

I'm not sure why this is happening, but I think it needs investigating before I can carry on reviewing it.

markysparks · 2018-05-10T19:17:39Z

Good catch ! It seems that that unless the attributes are on both the data and prefix cube, they will not be saved as 'global' attributes during the netCDF save, (according to the Iris documentation). Easiest solution might be to just copy the data cube attributes to the prefix cube, but it is not desirable to copy all of the attributes across.

It also appears that not all the test data used has global attributes 'grid_id', 'institution' etc. which is why this not appearing.

Also its been noted that the prefix cube has been assigned units of '1' by Iris.

…bal attributes from the data cubes and add them to the prefix cube

cgsandford · 2018-05-17T13:19:34Z

lib/improver/utilities/save.py

+                                or k is 'grid_id'
+                                or k is 'history'
+                                or k is 'Conventions'
+                                or k is 'title'}


To avoid multiple definitions, could we pass in the list of global keys as an argument to append_metadata_cube? This would then simplify this code block to be:
keys_for_global_attr = {k for k in cube.attributes.keys() if k in global_keys}

OK so I've just tried this out using a list of global keys in append_metadata_cube (which could or course be passed in as argument ultimately).

Your code seems to be producing a set rather than dictionary which caused problems when adding the keys to the prefix cube, so I added a line to produce a dictionary from the set which also set's the global key's value to ' '. I'm not sure if this is what you meant?

For the attributes to kept as global I'm not sure if the keys and the values of the attributes have to be the same or just the keys? (Its not clear to me from the Iris2 docs, but guess I'll find out when I generate some new data)

I'll push up this commit so you can see what you think.......

I think you could have it as I put it (returning a set), but then further down (at line 65) you need the following:

for key in keys_for_global_attr: prefix_cube.attributes[key] = cube.attributes[key]

That's slightly simpler, but probably doesn't matter if your code works.

However I do think it would be best if we didn't define the list of global keys twice. I'd recommend changing append_metadata_cube(cubelist) to append_metadata_cube(cubelist, global_keys), and passing in the list of global keys as defined in save_netcdf. An alternative would be to define GLOBAL_KEYS at the top of the file as a global variable, rather than locally within the functions, but this is not the preferred convention (see https://google.github.io/styleguide/pyguide.html#Global_variables).

Yes, I think that would be the best way, passing in the global_keys list as an argument to append_metadata_cube.

…void redefining this list

cgsandford · 2018-05-17T15:21:51Z

lib/improver/utilities/save.py

+    # global attributes to the prefix cube (otherwise they will be made
+    # variables in the netCDF file).
+    for key, value in keys_for_global_attr.items():
+        prefix_cube.attributes[key] = value


Ah actually, I think what this will do (when you create a set above) is add attributes to the prefix cube which have no value (as value is an empty string). Have you looked at an output cube to see what this does? I think you may need to assign using prefix_cube.attributes[key] = cube.attributes[key] to ensure the same global attributes are present.

And maybe a unit test to check the right global attributes are added?

…ot be left as empty string

… change made to 02_basic_mean.bats used for capturing output

cgsandford · 2018-05-21T08:48:00Z

lib/improver/tests/utilities/test_save.py

+            k for k in keys_in_prefix_cube.keys() if k in self.global_keys_ref]
+
+        self.assertListEqual(
+            sorted(self.global_keys_ref), sorted(prefix_global_keys))


Not sure this is comparing strictly the right thing. We want to make sure that all global keys that were on the input cube are also now on the prefix cube. Even though all of self.global_keys_ref are currently on self.cube, this is not by design and could lead to misinterpretation in the future. Suggest the following:

prefix_global_keys = [ k for k in keys_in_prefix_cube.keys() if k in self.global_keys_ref] data_cube_global_keys = [ k for k in self.cube.attributes.keys() if k in self.global_keys_ref] self.assertListEqual( sorted(prefix_global_keys), sorted(data_cube_global_keys))

Actually what might be even better is:

prefix_global_attrs = [ attr for attr in keys_in_prefix_cube if attr.keys() in self.global_keys_ref] data_global_attrs = [ attr for attr in self.cube.attributes if attr.keys() in self.global_keys_ref] self.assertDictEqual(prefix_global_attrs, data_global_attrs)

As I believe this will check both the attributes' keys and their values.

We've established the above doesn't work. It would still be good to check values using eg the following:

for key in prefix_global_keys: self.assertEqual(metadata_cubelist[-1].attributes[key], self.cube.attributes[keys])

cgsandford · 2018-05-21T08:50:26Z

lib/improver/tests/utilities/test_save.py

+        so that Iris2 keeps these attributes global in any resultant
+        netCDF file saved using these cubes"""
+
+        cube_list = ([self.cube, self.cube])


Since this isn't the expected functionality (long term), suggest only one cube in this list (ie cube_list = [self.cube]).

… prefix cube than rather than keys from global_keys_ref

…f 80 characters

gavinevans

I made a very small change to this PR, but otherwise I'm happy with this. Thanks for your work @markysparks! 😄

* 'master' of github.com:metoppv/improver: Fix for IMPRO-564 where ECDF bounds were not defined for max daytime… (metoppv#550) Removed changes to forecast reference time coordinate in min / max spot extract (metoppv#552) Edit to force the gradient CLI to convert a tuple to a cubelist, as save_netcdf expects to be handling cubelists. (metoppv#557) IMPRO-511 implement metadata cube saving (metoppv#538) Removed use of deprecated flavor argument within pandas. (metoppv#553) Update PULL_REQUEST_TEMPLATE.md Created wind direction averaging plugin (metoppv#535)

* Initial incorporation of metadata cube saving based on IMPRO-466/464. Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient * Revert changes to spatial.py and test_DifferenceBetweenAdjacentGridSquares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file * Remove some print statements only used during testing * Remove superfluous comment * Correct comment typo and remove spp__form_of_difference attribute from test_save.py as per reviewers comments * Add unit test to check that metatdata prefixes cube is discarded during load * Add unit tests for append_metadata_cube functionality * inital fix to keep global attributes in netCDF file, get required global attributes from the data cubes and add them to the prefix cube * change iteritems() to items() for Python 3 compatability * Use a list of global keys as suggested by cgsandford * Pass in global_keys as an argument to append_metadata_cube so as to avoid redefining this list * Changes as per cgsandford suggestions, tests confirmed key value cannot be left as empty string * amend comments * add unit test to check global attributes added to prefix cube, revert change made to 02_basic_mean.bats used for capturing output * only use one cube in input list, compare global keys on input cube to prefix cube than rather than keys from global_keys_ref * shorten line 223 length, travis build complaining about line length of 80 characters * add check that key values are also the same in prefix and data cube * Removed redundant conversion of cubelist to be a variable of type list.

markysparks added 4 commits May 7, 2018 20:09

Initial incorporation of metadata cube saving based on IMPRO-466/464.…

acf8cb4

… Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient

Revert changes to spatial.py and test_DifferenceBetweenAdjacentGridSq…

ceeb6be

…uares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file

Remove some print statements only used during testing

f016ea3

Remove superfluous comment

37d4383

markysparks requested a review from cgsandford May 7, 2018 19:39

cgsandford requested changes May 8, 2018

View reviewed changes

Correct comment typo and remove spp__form_of_difference attribute fro…

c837506

…m test_save.py as per reviewers comments

gavinevans requested changes May 8, 2018

View reviewed changes

Add unit test to check that metatdata prefixes cube is discarded duri…

2281ecc

…ng load

gavinevans requested a review from Katie-Howard May 9, 2018 07:32

Add unit tests for append_metadata_cube functionality

784f7f9

cgsandford approved these changes May 9, 2018

View reviewed changes

Katie-Howard requested changes May 10, 2018

View reviewed changes

metoppv deleted a comment May 14, 2018

metoppv deleted a comment May 16, 2018

markysparks added 2 commits May 17, 2018 14:12

inital fix to keep global attributes in netCDF file, get required glo…

e8959e1

…bal attributes from the data cubes and add them to the prefix cube

change iteritems() to items() for Python 3 compatability

17c1e07

cgsandford reviewed May 17, 2018

View reviewed changes

markysparks added 2 commits May 17, 2018 15:30

Use a list of global keys as suggested by cgsandford

752a220

Pass in global_keys as an argument to append_metadata_cube so as to a…

258fc3f

…void redefining this list

cgsandford reviewed May 17, 2018

View reviewed changes

Changes as per cgsandford suggestions, tests confirmed key value cann…

a6b743f

…ot be left as empty string

metoppv deleted a comment May 17, 2018

amend comments

b8daf7d

metoppv deleted a comment May 17, 2018

add unit test to check global attributes added to prefix cube, revert…

221a0b3

… change made to 02_basic_mean.bats used for capturing output

metoppv deleted a comment May 18, 2018

cgsandford requested changes May 21, 2018

View reviewed changes

markysparks added 3 commits May 21, 2018 10:33

only use one cube in input list, compare global keys on input cube to…

4e25fae

… prefix cube than rather than keys from global_keys_ref

shorten line 223 length, travis build complaining about line length o…

ab9f53d

…f 80 characters

add check that key values are also the same in prefix and data cube

b90d42d

cgsandford approved these changes May 21, 2018

View reviewed changes

metoppv deleted a comment May 21, 2018

Removed redundant conversion of cubelist to be a variable of type list.

688f102

gavinevans approved these changes May 22, 2018

View reviewed changes

metoppv deleted a comment May 22, 2018

gavinevans merged commit 93e01f0 into metoppv:master May 22, 2018

This was referenced May 22, 2018

Fix the gradient CLI to pass a CubeList to save_netcdf #557

Merged

Port IMPROVER to Python 3 #558

Merged

markysparks deleted the IMPRO-511-implement-metadata-cube-saving branch November 6, 2018 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMPRO-511 implement metadata cube saving #538

IMPRO-511 implement metadata cube saving #538

markysparks commented May 7, 2018 •

edited

Loading

cgsandford May 8, 2018

cgsandford May 8, 2018

cgsandford May 8, 2018

markysparks May 9, 2018

benfitzpatrick commented May 8, 2018

markysparks commented May 8, 2018 •

edited

Loading

gavinevans left a comment

gavinevans May 8, 2018

markysparks May 9, 2018

Katie-Howard left a comment

markysparks commented May 10, 2018

cgsandford May 17, 2018 •

edited

Loading

markysparks May 17, 2018 •

edited

Loading

cgsandford May 17, 2018 •

edited

Loading

markysparks May 17, 2018

cgsandford May 17, 2018 •

edited

Loading

cgsandford May 17, 2018 •

edited

Loading

cgsandford May 21, 2018

cgsandford May 21, 2018

cgsandford May 21, 2018

cgsandford May 21, 2018

gavinevans left a comment

		@@ -33,6 +33,36 @@
		import iris


		def append_metadata_cube(cubelist):

IMPRO-511 implement metadata cube saving #538

IMPRO-511 implement metadata cube saving #538

Conversation

markysparks commented May 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benfitzpatrick commented May 8, 2018

markysparks commented May 8, 2018 • edited Loading

gavinevans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Katie-Howard left a comment

Choose a reason for hiding this comment

markysparks commented May 10, 2018

cgsandford May 17, 2018 • edited Loading

Choose a reason for hiding this comment

markysparks May 17, 2018 • edited Loading

Choose a reason for hiding this comment

cgsandford May 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgsandford May 17, 2018 • edited Loading

Choose a reason for hiding this comment

cgsandford May 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gavinevans left a comment

Choose a reason for hiding this comment

markysparks commented May 7, 2018 •

edited

Loading

markysparks commented May 8, 2018 •

edited

Loading

cgsandford May 17, 2018 •

edited

Loading

markysparks May 17, 2018 •

edited

Loading

cgsandford May 17, 2018 •

edited

Loading

cgsandford May 17, 2018 •

edited

Loading

cgsandford May 17, 2018 •

edited

Loading