Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPRO-511 implement metadata cube saving #538

Merged
merged 18 commits into from
May 22, 2018
Merged

IMPRO-511 implement metadata cube saving #538

merged 18 commits into from
May 22, 2018

Conversation

markysparks
Copy link
Contributor

@markysparks markysparks commented May 7, 2018

Reference issue #439

Testing:

  • Ran tests and they passed OK using new kgo.
  • Added new tests for the new feature(s)

Implement the prototype work undertaken in IMPRO-466 (GH issue #414) to insert attributes that provide a link to a metadata description of the IMPROVER project specific metadata.

Note that the changes from IMPRO-464 (GH issue #412) to spatial.py and test_DifferenceBetweenAdjacentGridSquares.py have not been included so as not to update the 'form_of_difference' attribute output from improver-gradient, only the metadata cube from IMPRO-466. Implementation of IMPRO-464 is linked to decisions made as a result of IMPRO-507.

… Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient
…uares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file
@markysparks markysparks requested a review from cgsandford May 7, 2018 19:39
@@ -225,6 +225,11 @@ def test_ordering_for_realization_threshold_percentile_over_coordinate(
self.assertArrayAlmostEqual(result.coord_dims("latitude")[0], 4)
self.assertArrayAlmostEqual(result.coord_dims("longitude")[0], 5)

def test_attributes(self):
"""Test that metadata attributes are successfully striped out."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "stripped"

@@ -52,6 +52,7 @@ def set_up_test_cube():
cube = set_up_cube(data, 'air_temperature', 'K', realizations=([0]))
cube.attributes['Conventions'] = 'CF-1.5'
cube.attributes['source_realizations'] = np.arange(12)
cube.attributes['spp__form_of_difference'] = 'spv__forward_difference'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line.

"""Test that metadata attributes are successfully striped out."""
result = load_cube(self.filepath)
self.assertNotIn('bald__isPrefixedBy', result.attributes.keys())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a new unit test to check that the "prefixes" cube is not loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! - unittest now added that checks there is no prefix cube after load. Also corrected typo and removed spp__form of difference line as per comments.

@benfitzpatrick
Copy link
Contributor

Does this play nicely with third party tools and iris load cube outside IMPROVER, or does it cause headaches?

@markysparks
Copy link
Contributor Author

markysparks commented May 8, 2018

In IMPRO-511 I have attached some screenshots from the results using ncview, ncinfo, cfchecker and iris.load which seem to be working fine.

Copy link
Contributor

@gavinevans gavinevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! 😄 My only suggestion is that unit tests are needed for the new function that has been added as part of this PR. All tests pass with the updated known good output.

@@ -33,6 +33,36 @@
import iris


def append_metadata_cube(cubelist):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs some sort of unit tests. You could check that the resulting prefix cube actually has the attributes that you're adding, as well as that the resulting cube has a bald__isPrefixedBy attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test class and unittests added to check prefix cube attributes and bald__isPrefixedBy attribute on other cubes in cube list.

@gavinevans gavinevans requested a review from Katie-Howard May 9, 2018 07:32
Copy link
Contributor

@Katie-Howard Katie-Howard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked through the kgo data that Mark has put together and I think there are issues with the metadata:

  • The metadata seems to have moved the global attributes into the variables which isn't right at all.

  • It then doesn't have grid_id, institution, source, title, um_version within the Global attributes.

I'm not sure why this is happening, but I think it needs investigating before I can carry on reviewing it.

@markysparks
Copy link
Contributor Author

Good catch ! It seems that that unless the attributes are on both the data and prefix cube, they will not be saved as 'global' attributes during the netCDF save, (according to the Iris documentation). Easiest solution might be to just copy the data cube attributes to the prefix cube, but it is not desirable to copy all of the attributes across.

It also appears that not all the test data used has global attributes 'grid_id', 'institution' etc. which is why this not appearing.

Also its been noted that the prefix cube has been assigned units of '1' by Iris.

@metoppv metoppv deleted a comment May 14, 2018
@metoppv metoppv deleted a comment May 16, 2018
or k is 'grid_id'
or k is 'history'
or k is 'Conventions'
or k is 'title'}
Copy link
Contributor

@cgsandford cgsandford May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid multiple definitions, could we pass in the list of global keys as an argument to append_metadata_cube? This would then simplify this code block to be:
keys_for_global_attr = {k for k in cube.attributes.keys() if k in global_keys}

Copy link
Contributor Author

@markysparks markysparks May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I've just tried this out using a list of global keys in append_metadata_cube (which could or course be passed in as argument ultimately).

Your code seems to be producing a set rather than dictionary which caused problems when adding the keys to the prefix cube, so I added a line to produce a dictionary from the set which also set's the global key's value to ' '. I'm not sure if this is what you meant?

For the attributes to kept as global I'm not sure if the keys and the values of the attributes have to be the same or just the keys? (Its not clear to me from the Iris2 docs, but guess I'll find out when I generate some new data)

I'll push up this commit so you can see what you think.......

Copy link
Contributor

@cgsandford cgsandford May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could have it as I put it (returning a set), but then further down (at line 65) you need the following:

for key in keys_for_global_attr:
    prefix_cube.attributes[key] = cube.attributes[key]

That's slightly simpler, but probably doesn't matter if your code works.

However I do think it would be best if we didn't define the list of global keys twice. I'd recommend changing append_metadata_cube(cubelist) to append_metadata_cube(cubelist, global_keys), and passing in the list of global keys as defined in save_netcdf. An alternative would be to define GLOBAL_KEYS at the top of the file as a global variable, rather than locally within the functions, but this is not the preferred convention (see https://google.github.io/styleguide/pyguide.html#Global_variables).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be the best way, passing in the global_keys list as an argument to append_metadata_cube.

# global attributes to the prefix cube (otherwise they will be made
# variables in the netCDF file).
for key, value in keys_for_global_attr.items():
prefix_cube.attributes[key] = value
Copy link
Contributor

@cgsandford cgsandford May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah actually, I think what this will do (when you create a set above) is add attributes to the prefix cube which have no value (as value is an empty string). Have you looked at an output cube to see what this does? I think you may need to assign using prefix_cube.attributes[key] = cube.attributes[key] to ensure the same global attributes are present.

Copy link
Contributor

@cgsandford cgsandford May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe a unit test to check the right global attributes are added?

@metoppv metoppv deleted a comment May 17, 2018
@metoppv metoppv deleted a comment May 17, 2018
… change made to 02_basic_mean.bats used for capturing output
@metoppv metoppv deleted a comment May 18, 2018
k for k in keys_in_prefix_cube.keys() if k in self.global_keys_ref]

self.assertListEqual(
sorted(self.global_keys_ref), sorted(prefix_global_keys))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is comparing strictly the right thing. We want to make sure that all global keys that were on the input cube are also now on the prefix cube. Even though all of self.global_keys_ref are currently on self.cube, this is not by design and could lead to misinterpretation in the future. Suggest the following:

prefix_global_keys = [
    k for k in keys_in_prefix_cube.keys() if k in self.global_keys_ref]
data_cube_global_keys = [
    k for k in self.cube.attributes.keys() if k in self.global_keys_ref]

self.assertListEqual(
    sorted(prefix_global_keys), sorted(data_cube_global_keys))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually what might be even better is:

prefix_global_attrs = [
    attr for attr in keys_in_prefix_cube if attr.keys() in self.global_keys_ref]
data_global_attrs = [
    attr for attr in self.cube.attributes if attr.keys() in self.global_keys_ref]

self.assertDictEqual(prefix_global_attrs, data_global_attrs)

As I believe this will check both the attributes' keys and their values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've established the above doesn't work. It would still be good to check values using eg the following:

for key in prefix_global_keys:
    self.assertEqual(metadata_cubelist[-1].attributes[key], self.cube.attributes[keys])

so that Iris2 keeps these attributes global in any resultant
netCDF file saved using these cubes"""

cube_list = ([self.cube, self.cube])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't the expected functionality (long term), suggest only one cube in this list (ie cube_list = [self.cube]).

@metoppv metoppv deleted a comment May 21, 2018
Copy link
Contributor

@gavinevans gavinevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a very small change to this PR, but otherwise I'm happy with this. Thanks for your work @markysparks! 😄

@metoppv metoppv deleted a comment May 22, 2018
@gavinevans gavinevans merged commit 93e01f0 into metoppv:master May 22, 2018
fionaRust added a commit to fionaRust/improver that referenced this pull request May 23, 2018
* 'master' of github.com:metoppv/improver:
  Fix for IMPRO-564 where ECDF bounds were not defined for max daytime… (metoppv#550)
  Removed changes to forecast reference time coordinate in min / max spot extract (metoppv#552)
  Edit to force the gradient CLI to convert a tuple to a cubelist, as save_netcdf expects to be handling cubelists. (metoppv#557)
  IMPRO-511 implement metadata cube saving (metoppv#538)
  Removed use of deprecated flavor argument within pandas. (metoppv#553)
  Update PULL_REQUEST_TEMPLATE.md
  Created wind direction averaging plugin (metoppv#535)
@markysparks markysparks deleted the IMPRO-511-implement-metadata-cube-saving branch November 6, 2018 10:18
MoseleyS pushed a commit to MoseleyS/improver that referenced this pull request Aug 22, 2024
* Initial incorporation of metadata cube saving based on IMPRO-466/464. Some of these changes will have to be removed to avoid updating the form of difference attribute from improver-gradient

* Revert changes to spatial.py and test_DifferenceBetweenAdjacentGridSquares.py pending Impro-507 / updating form_of_difference attribute. Incremented length of cubes list in test_save due to Iris 2 now saving metadata as separate cube rather than attributes on other cubes in file

* Remove some print statements only used during testing

* Remove superfluous comment

* Correct comment typo and remove spp__form_of_difference attribute from test_save.py as per reviewers comments

* Add unit test to check that metatdata prefixes cube is discarded during load

* Add unit tests for append_metadata_cube functionality

* inital fix to keep global attributes in netCDF file, get required global attributes from the data cubes and add them to the prefix cube

* change iteritems() to items() for Python 3 compatability

* Use a list of global keys as suggested by cgsandford

* Pass in global_keys as an argument to append_metadata_cube so as to avoid redefining this list

* Changes as per cgsandford suggestions, tests confirmed key value cannot be left as empty string

* amend comments

* add unit test to check global attributes added to prefix cube, revert change made to 02_basic_mean.bats used for capturing output

* only use one cube in input list, compare global keys on input cube to prefix cube than rather than keys from global_keys_ref

* shorten line 223 length, travis build complaining about line length of 80 characters

* add check that key values are also the same in prefix and data cube

* Removed redundant conversion of cubelist to be a variable of type list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants