Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes for single-precision and switch to new FIELD_API #4

Merged
merged 11 commits into from
Nov 27, 2023

Conversation

awnawab
Copy link
Contributor

@awnawab awnawab commented Aug 28, 2023

This PR adds various fixes required to validate successfully in single-precision for Intel and nvhpc builds. Due to the need to run NEMO in double-precision even if ecWAM is single-precision, a mixed precision FIELD_API build was needed. Rather than adding this functionality to the old (private) FIELD_API on bitbucket, ecWAM has been switched over to the new FIELD_API which already supports mixed-precision builds. As this repo will soon be open sourced (see PR #1), I have made FIELD_API a hard dependency of ecWAM.

string(FIND ${parent_path} "../" result_var)

# If field_api is found but was not cloned by ecWAM, clone_field_api is set to FALSE
string(COMPARE EQUAL ${result_var} "-1" clone_field_api)
Copy link
Contributor Author

@awnawab awnawab Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we build ecWAM and run FetchContent on FIELD_API, the cache variable field_api_FOUND is set to TRUE. If we then run make clean and re-configure the build without deleting the CMake cache, field_api_FOUND will still be true but the FIELD_API builddir will be empty and the build will fail at compile-time when trying to use libfield_api.

If on the other hand field_api_FOUND is TRUE not because ecWAM cloned FIELD_API but because it is coming from a bundle or system installation, then the above mechanism ensures we don't run FetchContent on FIELD_API.

FetchContent_Declare(
field_api
GIT_REPOSITORY [email protected]:ecmwf-ifs/field_api.git
GIT_TAG naan-ecbuild
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be replaced by a tag once the new FIELD_API is open-sourced and PR #2 is approved. I will leave it as a draft-PR until then.

@awnawab awnawab requested a review from wdeconinck August 28, 2023 12:08
Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as the CMake configuration is concerned I am happy with this.

The DEL1 change means a non-bit reproducible change. This probably invalidates existing hashes in tests.

- ecwam :
dir : $PWD
version : main
require : fiat

options :

- single-precision :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- single-precision :
- with-single-precision :

string(COMPARE EQUAL ${result_var} "-1" clone_field_api)
endif()

if( ${clone_field_api} )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if( ${clone_field_api} )
if( clone_field_api )

@wdeconinck wdeconinck requested a review from mlange05 August 29, 2023 07:47
@awnawab awnawab force-pushed the naan-single-precision-fix branch from 85f8315 to 961d8d9 Compare August 31, 2023 11:13
field_api
GIT_REPOSITORY [email protected]:ecmwf-ifs/field_api.git
GIT_TAG naan-ecbuild
OVERRIDE_FIND_PACKAGE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires CMake version 3.24 i.e. cmake_minimum_required(VERSION 3.24). I guess this is fine, but I strongly recommend to adapt the CMake file. The error you get with 3.22 is extremely unclear.

Copy link
Contributor Author

@awnawab awnawab Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for pointing this out! FIELD_API itself requires CMake 3.25 for fixes to the cmake find_package(OpenACC), and given that this PR makes ecWAM hard reliant on FIELD_API, it makes sense to update the minimum CMake version in ecWAM to 3.25 too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dependency on 3.25 is only due to the OpenACC fixes, I'd argue it would be better to simply add the following workaround and make that a less strict requirement:

https://github.com/ecmwf-ifs/ectrans/blob/e7a738749fa9f6e28cc1ecce35f1643700d2bb19/CMakeLists.txt#L38-L47

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVERRIDE_FIND_PACKAGE already needs CMake 3.24 as Lukas pointed out. When reconfiguring after a make clean, OVERRIDE_FIND_PACKAGE makes the FetchContent mechanism a bit more robust. So the decision would be between 3.24 and 3.25. In that context, I think 3.25 isn't too restrictive. What are your thoughts on this @wdeconinck ?

Copy link
Collaborator

@wdeconinck wdeconinck Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 3.24 is my current preference, using @reuterbal 's suggestion to adapt FIELD_API, as Leonardo currently uses cmake 3.24 as latest version. Both LUMI and our own HPC have 3.25 however.

@awnawab awnawab force-pushed the naan-single-precision-fix branch from 9bfcd8a to ed50952 Compare November 9, 2023 10:55
@awnawab awnawab marked this pull request as ready for review November 9, 2023 15:52
@awnawab
Copy link
Contributor Author

awnawab commented Nov 10, 2023

Hi @wdeconinck. Now that FIELD_API has been open-sourced, this is ready for review. I've removed the +DEL1 to prevent divide by zero, so this PR won't affect bit-identicality. I get around the FPE by disabling divz checks for that file, and the run still validates correctly. I know it's a bit of a hack, but breaking bit-identicality just for NVHPC single-precision debug builds seems like overkill.

Copy link
Contributor

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FIELD API data structure changes look good to me and are consistent with the recent IFS update and the proposed release v0.2.1 (sync'd with Meteo-France recently). The cmake-fetch_content mechanism also looks sound to me, although wiser heads can probably judge this better. 😉

GTG from me, and very nicely done @awnawab 🎉 😄 👍 🙏

FetchContent_Declare(
field_api
GIT_REPOSITORY https://github.com/ecmwf-ifs/field_api.git
GIT_TAG v0.2.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[no action] Eventually we might want a mechanism to define this in the top-level CMakeLists.txt, but not needed for this PR, I think

Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addressed all my previous concerns and recommendations.
Very nice work @awnawab . Thank you!
I have merged your other PR.
You can rebase and merge when you're ready.

@awnawab awnawab force-pushed the naan-single-precision-fix branch from 5ebd063 to 313d1db Compare November 17, 2023 14:56
@wdeconinck wdeconinck force-pushed the naan-single-precision-fix branch from 5efa295 to 6ee971c Compare November 23, 2023 10:30
@wdeconinck wdeconinck force-pushed the naan-single-precision-fix branch from 6ee971c to 86dd918 Compare November 27, 2023 09:48
@wdeconinck
Copy link
Collaborator

All green now :) You can merge when you're ready.

@awnawab awnawab merged commit 64f95e5 into main Nov 27, 2023
8 checks passed
@awnawab awnawab deleted the naan-single-precision-fix branch November 27, 2023 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants