Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New unintuitive behavior for RUN_TYPE=branch simulations #525

Open
ekluzek opened this issue Jan 17, 2025 · 3 comments · May be fixed by #528
Open

New unintuitive behavior for RUN_TYPE=branch simulations #525

ekluzek opened this issue Jan 17, 2025 · 3 comments · May be fixed by #528
Labels
bug Something isn't working

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jan 17, 2025

With the rpointer updates it's now required to set DRV_RESTART_POINTER when you do a RUN_TYPE=branch simulation in addition to other settings. The default of drv.rpointer is most likely going to be wrong for branching from a case with rpointer files with timestamps on them. Coupled with #524 this means you setup a case and don't get any clear error messaging on what's wrong.

Here's a sample case to replicate what I mean based off of using ctsm5.3.020 which has: cmeps1.0.33, ccs_config_cesm1.0.16 and cime6.1.59 (I'm using an mpi-serial single gridcell case just to make a simpler smaller test that can also be run interactively without going into the queue):

# for cshell (see the use of cshell set below)
cd cime/scripts
# First the control case to branch from:
./create_newcase --res 1x1_brazil --compset I2000Clm60SpRs --machine derecho --case teststartup --mpilib mpi-serial --run-unsupported
cd teststartup
# Turn DEBUG compiling on
./xmlchange DEBUG=TRUE
./case.setup
./case.build
./case.submit --no-batch
# Save the ARCHIVE directory
set DOUT_S_ROOT_BRANCHFROM=`./xmlquery --value DOUT_S_ROOT`
cd ..
# Now the branch case after the first one completes and saves the restart files to the archive directory
./create_clone --clone teststartup --case testbranch --keepexe
cd testbranch
set REFDATE=2000-01-06
set REFTOD=00000
./xmlchange 	RUN_REFCASE=teststartup,RUN_REFDATE=$REFDATE,RUN_TYPE=branch,RUN_STARTDATE=$REFDATE
./case.setup
# Copy the restart files over to the run directory
set RUNDIR=`./xmlquery --value RUNDIR`
cp $DOUT_S_ROOT_BRANCHFROM/rest/${REFDATE}-${REFTOD}/* $RUNDIR
./case.build
./case.submit --no-batch

It fails at runtime because it can't find the drv.rpointer file, but the error messaging is insufficient as I say above. I have a list of ideas I'll add in the next comment.

Pinging maintainers. I know Jim will want to weigh in on this, but figure he might still be traveling so might not see this for awhile.

@briandobbins @billsacks @jedwards4b @fischer-ncar

@ekluzek ekluzek added the bug Something isn't working label Jan 17, 2025
@billsacks
Copy link
Member

Thank you @ekluzek for laying this out clearly. I agree that we should fix this in some way.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 18, 2025

A list of ideas I have:

  1. For RUN_TYPE==branch check for existence of the $DRV_RESTART_POINTER file in the $RUNDIR at preview_namelist time (but after the phase of staging data) abort if it's not found
  2. For RUN_TYPE==branch set the default of DRV_RESTART_POINTER to drv.rpointer.$RUN_STARTDATE-$RUN_STARTTOD
  3. Abort in preview_namelists if RUN_TYPE is branch and $RUN_REFDATE is set, but DRV_RESTART_POINTER isn't
  4. For branch cases have the default of DRV_RESTART_POINTER be UNSET and abort in preview_namelists if UNSET
  5. Same as 4, but for all cases
  6. Maybe just always set the default to rpointer.cpl.$RUN_STARTDATE-$RUN_STARTTOD?

I think most of the above should be done all at once. I'd like to hear others about setting it to UNSET. I'm thinking actually that maybe 6 is a nice simple solution to get the naive simple case to work. But, I also would like for there to be some file existence checking in place when the file is going to be used, so also do 1.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 18, 2025

I made an error in the setting of RUN_REFDATE, in the original post that I'm just correcting.

ekluzek added a commit to ekluzek/CMEPS that referenced this issue Jan 18, 2025
… allow continue_run as well), add it as a file that will be put into cpl.input_data_list for existence checking, fixing ESCOMP#525
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants