Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b4b-dev: Plumber2 Implementation #2406

Open
wants to merge 31 commits into
base: b4b-dev
Choose a base branch
from

Conversation

TeaganKing
Copy link
Contributor

@TeaganKing TeaganKing commented Mar 7, 2024

This PR will use the TowerSite class (parent to both NEON and PLUBMER sites) and create a Plumber2Site class as well as implement capabilities for running single point simulations at PLUMBER sites.

Contributors other than yourself, if any:
@ekluzek
@adrifoster

CTSM Issues Fixed (include github issue #):
Addresses part of #1487

Are answers expected to change (and if so in what way)?
No, this PR should be BFB. However, it should expand existing capabilities to allow users to run at PLUMBER sites.

Any User Interface Changes (namelist or namelist defaults changes)?
Additional flags will be implemented for run_neon

@TeaganKing
Copy link
Contributor Author

This PR will also address #2186

@TeaganKing
Copy link
Contributor Author

We'll also need to add the new CDEPS tag to this PR.

@TeaganKing
Copy link
Contributor Author

Per conversation with Erik, we'll move this to bfb once the other Plumber-related PRs are in bfb next week.

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Jun 25, 2024

To Do for this PR:

  • Update branch / resolve merge conflicts
  • Add new CDEPS tag to this PR once the CDEPS PR is merged.
  • Bring in CTSM PLUMBER5.2 PR
  • Check for any sort of equivalent of user_version for plumber
  • Include any plumber-specific user namelist lines
  • Update check_plumber_data
  • Determine valid sites based on tower_type?
  • Define available_plumber_list
  • Be able to run NEON & PLUMBER simultaneously (if both are same run type)
  • Provide option for None for both NEON and PLUMBER cases so they can be run indepdently
  • Ensure that we don't use a NEON base case for PLUMBER or vice versa
  • Update run_type defaults in arg parse
  • Resolve unknown case.run error for PLUMBER given need for setup/build
  • Test out various tower options/arguments
  • Update tests
  • Update documentation

@slevis-lmwg slevis-lmwg mentioned this pull request Jul 2, 2024
@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 6, 2024

We don't currently have restart files for PLUMBER2, which has brought up the question: to what extent do we want to provide support for PLUMBER tower sites? Two options that @wwieder and I discussed this morning are described below:

  1. Users spin up their own AD & postAD cases and then can run a transient case (no restart files needed). This is somewhat contrary to the purpose of run_neon because it requires users to run case setup, build, submit, and archive.
  2. My vote would be that we provide (and update upon changes to model versions) restart files and allow users to run a transient case out of the box (this is how NEON tower sites work, and consistency with run_tower would be nice as well as ease of use, but there is more overhead to maintain the feature). This would be a slightly easier implementation given that run_tower currently expects a transient case (and automatically submits case.run, which is not automatically created for the ad run but is automatically created for transient cases).

@danicalombardozzi @slevis-lmwg @olyson let me know if you have thoughts on this. We were planning to chat on Tuesday at 9am but it looks like there is a NCAR-NEON-Community partnership meeting that will be taking that time slot and I'd ideally like to make a decision before August 20th; also happy to tag up an additional time if that's easier than a GitHub discussion.

@olyson
Copy link
Contributor

olyson commented Aug 6, 2024

My guess is that most users will want to change the model and will likely have to do their own spinups anyway. So I might lean toward not providing initial files if it requires significant work to generate, keep track of, and update those files.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Aug 7, 2024 via email

@TeaganKing
Copy link
Contributor Author

Ok, thank you both for this feedback!

@wwieder
Copy link
Contributor

wwieder commented Aug 7, 2024

Thanks for weighing in here. Teagan since Dave and Keith came to the same conclusion we did yesterday lets move forward with the plan to support 'spinup' and 'transient' PLUMBER2 cases, without providing initial conditions.

It might be nice to differentiate how we do this for SP (fast spinup, ~20 or 40 years) vs. BGC (more involved, like NEON) cases. Would that be difficult to integrate into the run_towers workflow?

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 7, 2024

That sounds good. I have successfully run the AD & postAD case (with a bit of hard-coding), so this will require the following updates, which should be feasible:

  • finidat file name updates for each site in user_nl_clm (expecting <SITE>.ad.clm2.r.YYYY-MM-DD-00000.nc files but we have for instance *50400.nc files)
  • Update run_type defaults in run_case in both tower_site.py and plumber_site.py such that the AD case is the default (instead of the transient default that's set in neon_site.py).

@wwieder To clarify, regarding the option for SP vs BGC, are you envisioning a flag that allows users to pick between these options?

@wwieder
Copy link
Contributor

wwieder commented Aug 7, 2024

A few comments:

  • I don't know that we need to "automatically" run postAD and transient runs, as I think it's a good idea for users to have to check the status of their AD and postAD cases to check that they meet steady state criteria (e.g., Alaska NEON sites take significantly longer to spin up than CONUS NEON sites).
  • What I like about the current run_neon implementation is that running a postAD cases automatically creates a branch that looks for an similarly names AD case and then using the --run from postAD flag also creates a transient case uses the off the postAD ref case and initial conditions. Both of these options just take the last restart files that are available (regardless of simulation date).
  • Regarding SP vs. BGC cases, yes a flag that allows users to chose between these two options would be excellent. I'm not sure about the best way to implement this and don't want to introduce scope creep / complexity to this work but having the option to set up SP vs. BGC cases would be very useful for PLUMBER2 (and even NEON) cases.

Happy to have a quick that if it's helpful.

@TeaganKing
Copy link
Contributor Author

Ok, in that case, I'll just plan to change the default for PLUMBER to AD and add some documentation on this anticipated workflow for users. Thanks for clarifying!

Why don't I plan to get that first part implemented and then we can talk more about the BGC vs. SP options.

@danicalombardozzi
Copy link
Contributor

danicalombardozzi commented Aug 7, 2024 via email

@TeaganKing
Copy link
Contributor Author

TeaganKing commented Aug 20, 2024

This is about ready to go, but once #2485 is in, I'll run the unit and system tests once more and perform a few final tests to make sure the various run_tower flags work as expected.

@TeaganKing TeaganKing marked this pull request as ready for review August 23, 2024 19:49
@TeaganKing TeaganKing changed the title [WIP] Plumber2 Implementation Plumber2 Implementation Aug 23, 2024
Copy link
Contributor

@wwieder wwieder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks @TeaganKing. My only bigger comment is that by default PLUMBER is typically run in SP mode. I think it's out of scope in this PR to add this additional functionality, but it's something we may want to think about down the road to define compsets with an --SP or --BGC flag

python/ctsm/site_and_regional/plumber_site.py Outdated Show resolved Hide resolved
python/ctsm/site_and_regional/plumber_site.py Show resolved Hide resolved
python/ctsm/site_and_regional/plumber_site.py Outdated Show resolved Hide resolved
python/ctsm/site_and_regional/run_tower.py Outdated Show resolved Hide resolved
for site_name in valid_plumber_sites:

# start_year and end_year are set in shell commands, so these get overwritten
start_year = 2018
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? I'll let you investigate.

Copy link
Contributor Author

@TeaganKing TeaganKing Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed these start and and years to be very clearly dummy years. I think this is necessary because the PlumberSite class must be implemented with a start/end time. However, this is changed later in the shell commands and is not a known parameter at the time of the object instantiation.

That said, if someone has a better suggestion for implementation, I'm all ears!

python/ctsm/site_and_regional/tower_site.py Show resolved Hide resolved


if __name__ == "__main__":
unit_testing.setup_for_tests()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding more testing. Will defer to Erik on extent and timing of tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add some detail, this system test now takes 30 minutes (each case setup takes about ten minutes). We currently test:

  • setup of a NEON site (BART) with a specified experiment and output root
  • setup of a NEON site (ABBY) AD case and specified output root
  • setup of a PLUMBER site (AR-SLu) with a specified experiment and output root

If we were going to remove one of these for the sake of time, I would probably remove the ABBY case since it is a secondary NEON case and PLUMBER is also going to run AD as default behavior. This does however test that non-default run types are working, which is useful.

python/ctsm/test/test_unit_run_tower.py Show resolved Hide resolved
@samsrabin samsrabin added enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science bfb bit-for-bit labels Aug 26, 2024
@TeaganKing
Copy link
Contributor Author

Hi @ekluzek ,

I have now addressed the comments from @wwieder 's review. Two outstanding issues that Will and I discussed are the following (more details in comments above):

  • The Plumber2Site class is called in a somewhat clunky way with dummy years that get overwritten. I'm not sure of a better way to do this, and it works, but let me know if you have alternative suggestions?
  • Running system tests takes a while. There are of course benefits to doing more testing (as I have done here), but the time the tests take may be on the excessive side. Your feedback on whether these tests are worth it would be helpful!

And a software-focused review of the code would be helpful now that Will reviewed some of the functionality pieces.

Before merging (and after @ekluzek 's review), I will be sure to run the following tests one last time (I ran them before Will's review but there are changes that we should test again before merging):

  • run ./run_tower --plumber-sites with various options as a final check
  • make all
  • run ctsm system tests with ./python/run_ctsm_py_tests -s

Lastly, note that there are some related outstanding issues that have been documented that are outside the scope of this PR.

@wwieder
Copy link
Contributor

wwieder commented Sep 3, 2024

It seems like this PR also fixes #2739

Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TeaganKing thanks for all your work on this. So sorry it's been so long. I know your head isn't into this anymore. This is something that @wwieder wants me to work on, and I will.

I mostly suggest making some changes to make this Object Oriented. Those changes will be worth getting in as it will make it more flexible, extensible and maintainable. With the things needed for NEON isolated in NEON modules and things needed for PLUMBER2 isolated in it's own module as well. It'll also make it easier for these things to evolve separately as well as bring in some new class for something like AmeriFlux or a class to users handle their own list of generic tower sites. Or whatever the future needs are dreamed up...

I know you don't have time to do coding, but if possible and you are up for it, I would like to talk design with you. Would you be up for that?

Also @adrifoster this seems like a good thing to bring you in on as well. Could we maybe meet Thursday when I come in?

@@ -1,5 +1,5 @@
"""
This module contains the NeonSite class and class functions which are used in run_neon.py
This module contains the NeonSite class and class functions which are used in run_tower.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This module contains the NeonSite class and class functions which are used in run_tower.py
This module contains the NeonSite class and class functions which extend the tower_site class for
things that are specific just for NEON sites.

@@ -0,0 +1,115 @@
"""
This module contains the Plumber2Site class and class functions which are used in run_tower.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This module contains the Plumber2Site class and class functions which are used in run_tower.py
This module contains the Plumber2Site class and class functions that extend the tower_site class for
things that are specific just for PLUMBER2 sites.

run_type: str, opt
transient, post_ad, or ad case, default ad
(ad case is default because PLUMBER requires spinup)
prism: bool, opt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we remove the prism option for PLUMBER2 right? You might have to include it in the interface, but we then should check to make sure it's False.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to handle this would be to have PRISM only within the NEON class and only able to be set there.

base_case_root,
run_from_postad,
setup_only,
no_batch,
rerun,
user_version,
) = get_parser(sys.argv, description, valid_neon_sites)
) = get_parser(sys.argv, description, valid_neon_sites, valid_plumber_sites)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than returning a list for both valid plumber sites and neon sites maybe it should be more generic of valid sites?

@@ -1,12 +1,12 @@
#!/usr/bin/env python3
"""
This is a just top-level skeleton script that calls
run_neon.py.
The original code (run_neon.py) is located under
run_tower.py.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either rename this to

run_plumber

OR remove run_neon and only have run_tower which can run both NEON or PLUMBER sites. And then can be extended to work with generic tower sites, or another set of data similar to NEON or PLUMBER.

available_plumber_list = setup_plumber_data(valid_plumber_sites)

# -- Looping over plumber sites
if plumber_site_list:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if we have a generic list we don't have to know if it's plumber or neon (or something else).

@@ -18,7 +18,7 @@
from CIME.utils import setup_standard_logging_options


def get_parser(args, description, valid_neon_sites):
def get_parser(args, description, valid_neon_sites, valid_plumber_sites):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again remove returning both site lists and just have a general list of sites.

@@ -185,31 +185,36 @@ def get_parser(args, description, valid_neon_sites):

args = parse_args_and_handle_standard_logging_options(args, parser)

if "all" in args.neon_sites:
neon_sites = valid_neon_sites
if args.neon_sites:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be lifted to the neon specific implementation.

elif args.run_type == "postad":
run_length = "100Y"
neon_sites = None
if args.plumber_sites:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise this would go to the PLUMBER2 specific implementation.

@@ -0,0 +1,121 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having this do both NEON and PLUMBER2 I suggest there should be a NEON and PLUMBER2 specific versions that just do their own things rather than both.

Doing it this way is a more OO programming pattern and will help extensibility and keeping NEON and PLUMBER2 logic separated. As such will be easier to maintain.

@TeaganKing
Copy link
Contributor Author

Hi @ekluzek , happy to tag up to chat about this (and share some challenges that resulted in me implementing this a certain way). My calendar is up to date and lists the days that I'm on site-- feel free to just schedule something on there with me at some point when we're both in person! Or we can meet remotely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit enhancement new capability or improved behavior of existing capability PR status: awaiting review Work on this PR is paused while waiting for review. science Enhancement to or bug impacting science
Projects
Status: Stalled (needs review, blocked etc.)
Status: In Progress
Development

Successfully merging this pull request may close these issues.

run_neon not working as expected for AD simulations
7 participants