Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various pp issues related to running seaice_suite #721

Merged
merged 6 commits into from
Dec 18, 2024

Conversation

jtmims
Copy link
Collaborator

@jtmims jtmims commented Dec 16, 2024

Description
A variety issues were found in the preprocesser while trying to run the seaice_suite on CM4.5 data. These issues resolved in the PR are as follows:

  1. The realm_regex would grab files from an adjacent ice pp dir. This was fixed by spinning out the setting of case_d.query in the class for each convention (i.e. CMIP, GFDL, and CESM)
  2. Add siconc to the GFDL data fieldlist
  3. Create PercentTo01Function function to handle conversion for the unit 0-1 as cfunits does not handle this
  4. Add logic to conversion_factor to return 1 if the units have the same string value
  5. fix file path issue in POD
  6. fix issues caused by multiple coords per axis

How Has This Been Tested?
Please describe the tests that you ran to verify your changes in enough detail that
someone can reproduce them. Include any relevant details for your test configuration
such as the Python version, package versions, expected POD wallclock time, and the
operating system(s) you ran your tests on.

Checklist:

  • My branch is up-to-date with the NOAA-GFDL main branch, and all merge conflicts are resolved
  • The scripts are written in Python 3.12 or above (preferred; required if funded by a CPO grant), NCL, or R
  • All of my scripts are in the diagnostics/[POD short name] subdirectory, and include a main_driver script, template html, and settings.jsonc file
  • I have made corresponding changes to the documentation in the POD's doc/ subdirectory
  • I have requested that the framework developers add packages required by my POD to the python3, NCL, or R environment yaml file if necessary, and my environment builds with conda_env_setup.sh
  • I have added any necessary data to input_data/obs_data/[pod short name] and/or input_data/model/[pod short name]
  • My code is portable; it uses MDTF environment variables, and does not contain hard-coded file or directory paths
  • I have provided the code to generate digested data files from raw data files
  • Each digested data file generated by the script contains numerical data (no figures), and is 3 GB or less in size
  • I have included copies of the figures generated by the POD in the pull request
  • The repository contains no extra test scripts or data files

@jtmims jtmims added bug Something isn't working framework Issue pertains to the framework code labels Dec 16, 2024

def set_query_base(self, var: varlist_util.VarlistEntry, path_regex: str):
realm_regex = var.realm + '*'
date_range = var.T.range

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'date_range' is unnecessary as it is
redefined
before this value is used.
This assignment to 'date_range' is unnecessary as it is
redefined
before this value is used.
standard_name = var.translation.standard_name
if any(var.translation.alternate_standard_names):
standard_name = [var.translation.standard_name] + var.translation.alternate_standard_names
date_range = var.translation.T.range

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'date_range' is unnecessary as it is
redefined
before this value is used.
standard_name = [var.translation.standard_name] + var.translation.alternate_standard_names
date_range = var.translation.T.range
if var.is_static:
date_range = None

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable date_range is not used.
src/preprocessor.py Fixed Show fixed Hide fixed
src/preprocessor.py Outdated Show resolved Hide resolved
src/data_sources.py Outdated Show resolved Hide resolved
@jtmims jtmims marked this pull request as ready for review December 17, 2024 21:33
@wrongkindofdoctor wrongkindofdoctor merged commit abc89d6 into NOAA-GFDL:main Dec 18, 2024
5 checks passed
@jtmims jtmims deleted the si branch December 18, 2024 19:35
wrongkindofdoctor added a commit that referenced this pull request Dec 19, 2024
* Container (#678)

* Create Dockerfile

works with synthetic example_multicase POD

* Update Dockerfile

* Update Dockerfile

* Create docker-build-and-push.yml

* Update docker-build-and-push.yml

* Update docker-build-and-push.yml

* Update docker-build-and-push.yml

* Update docker-build-and-push.yml

* Container Documentation (#687)

* Create container_config_demo.jsonc

* Create container_cat.csv

* Create container_cat.json

* Update container_config_demo.jsonc

* docs

* Update ref_container.rst

* Update ref_container.rst

* Update ref_container.rst

* Update ref_container.rst

* Update ref_container.rst

* Update dev_start.rst

* Update ref_container.rst

* Update dev_start.rst

* Update ref_container.rst

* Update doc/sphinx/dev_start.rst

Co-authored-by: Jess <[email protected]>

* Update doc/sphinx/ref_container.rst

Co-authored-by: Jess <[email protected]>

* Update doc/sphinx/ref_container.rst

Co-authored-by: Jess <[email protected]>

* Update doc/sphinx/ref_container.rst

Co-authored-by: Jess <[email protected]>

* Update doc/sphinx/dev_start.rst

Co-authored-by: Jess <[email protected]>

---------

Co-authored-by: Jess <[email protected]>

* Fix ci bugs (#688)

* fix unresolved conda_root ref in pod_setup
comment out no_translation setting for matching POD and runtime conventions for testing

* fix coord_name def in translate_coord

* define var_id separately in pp query

* change new_coord definition to obtain ordered dict instead of generator object in translation.create_scalar_name so that deepcopy can pickle it

* change logic in pod_setup to set translation object to no_translation only if translate_data is false in runtime config file

* uncomment more set1 pods that pass initial testing in
house

* add checks for no_translation data source and assign query atts using the var object instead of the var.translation object if True to preprocessor

* remove old comment from preprocessor

* change value for for hourly data search in datelabel get_timedelta_kwargs to return 1hr instead of hr so that the frequency for hourly data matchew required catalog specification

* comment out some set1 tests, since they are timing out on CI

* rename github actions test config files
split group 1 CI tests into 2 runs to avoid timeout issues

* update mdtf_tests.yml to reference new config file names and clean up deprecated calls

* update mdtf_tests.yml

* update matrix refs in mdtf_tests.yml

* revert changes to datelabel and move hr --> 1hr freq conversion to preprocessor

* delete old test files
just run 1 POD in set1 tests
try adding timeouts mdtf_tests.yml

* fix typo in timeout call in mdtf_tests

* fix GFDL entries in test catalogs

* fix varid entries for wvp in test catalogs

* change atmosphere_mass_content_of_water_vapor id from prw to wvp in gfdl field table

* comment out long_name check in translation.py

* define src_unit for coords if available in preprocessor.ConvertUnitsFunction
redefine dest_unit using var.units.units so that parm is a string instead of a Units.units object in call to units.convert_dataarray

* log warning instead of raising error if attr name doesn't match in xr_parser.compare_attr so that values can be converted later

* fix variable refs in xarray datasets in units.convertdatarray
add check to convert mb to hPa to convertdataarray

* fix frequency entries for static vars in test catalogs

* remove duplicate realm entries from stc_eddy_heat_fluxes settings file

* remove non alphanumeric chars from atts in xr_parser check_metadata

* comment out non-working PODs in set 3 tests

* Remove timeout lines and comment unused test tarballs in mdtf_tests.yml

* infer 'start_time' and 'end_time' from 'time_range' due to type issues (#691)

* infer 'start_time' and 'end_time' from 'time_range' due to type issues

* add warning

* fix ci issue

* move line setting date_range in query_catalog() (#693)

* move line setting date_range in query_catalog()

* cleanup print

* Remove modifier entry from areacello in trop_pac_sea_lev POD settings file

* Fix issues in pp query (#692)

* fix hr -> 1hr freq conversion in pp query
try using regex string contains standard_name in query

* add check for parameter type to xr_parser approximate_attribute_value

* remove regex from pp query standard_name

* add check that bounds is populated in cf.assessor, then check coord attrs and only run coord bounds check if bounda s are not None in xr_parser

* add escape brackets to command-line commands (#694)

* Fix convective_transition_diag POD (#695)

* fix ctd file formatting and typos

* more formatting and typo fixes in ctd POD

* uncomment convective transistion diag POD in 1a CI test config files

* try moving convective_transition_pod to ubuntu suite 2 tests

* add wkdir cleanup between each test run step and separate obs data fetching for set 1 tests in ci config file

* move convective_transition_diag POD to set 1b tests

* just run 1 POD in set 1a and 2 PODs in set 1b to avoid runner timeouts

* reorganize 1b tests

* add ua200-850 and va200-850 to gfld-cmor-tables (#696)

* add ice/ocean precip entries to GFDL fieldlist (#697)

* Add alternate standard names entry to fieldlists and varlistEntry objects (#699)

* add alternate_stanadard_names entries to precipitation_flux vars in CMIP and GFDL fieldlists
add list of applicable realms to preciptitation flux

* add alternate_standard_names attributes and property setters to DMDependentvariable class that is VarlistEntry parent class
define realm parm as string or list

* extend realm search in fieldlist lookup tables to use a realm list in the translation
add list to realm type hints in translation module

* extend standard_name query to list that includes alternate_standard_names if present in the translation object

* break up rainfall_flux and precipitation_flux entries in CMIP and GFDL field tables since translator can't parse realm list correctly

* revert realm type hints defined  as string or list and casting realm strings to listsin translation module

* change assertion to log errof if translation is None in varlist_util

* define new standard_name for pp xarray vars using the translation standard_name if the query standard name is a list with alternates instead of a string

* add function check_multichunk to fix issue with chunk_freqs (#701)

* add function check_multichunk to fix issue with chunk_freqs

* fix function comment

grammar grammar grammar

* move log warning

* add plots link to pod_error_snippet.html (#705)

* add plots link to pod_error_snippet.html

* remove empty line

* add variable table tool and put output into docs (#706)

* add variable table script to docs

* move file

* Delete tools/get_POD_varname/MDTF_Variable_Lists.html

* rework ref_vartable.rst to link directly to html file of the table (#707)

* rework ref_vartable.rst to link directly to html file of the table

* Delete doc/sphinx/MDTF_Variable_Lists.html

* Update MDTF_Variable_Lists.html

* remove example_pp_script.py from user_pp_scripts list in multirun_config_template.jsonc

* remove .nc files found in OUTPUT_DIR depending on config file (#710)

* fix formatting issues in output reference documentation (#711)

* fix forcing_feedback settings.jsonc formatting and remove extra freq entries

* Add check for user_pp_scripts attribute in config object to DaskMultifilePP init method

* add check for user_pp-scripts attr to execute_pp_functions

* update 'standard_name' for each var in write_pp_catalog (#713)

* Update docs about --env_dir flag (#715)

* Update README.md

* Update start_install.rst

* fix logic when defining log messages in pod_setup

* Fix dummy translation method in NoTranslationFieldlist (#717)

* define missing entries in dummy translation object returned by NoTranslationFieldlist.translate
add logic to determine alternate_standard_names attribute to NoTranslationFieldlist.translate

* set translate_data to false for testing

* edit logging message for no translation setting in pod_setup

* add todo to translation translate_coord and cleanup comments

* remove checks for no_translation from preprocessor

* define TranslatedVarlistEntry name attribute using data convention field table variable id

* revert debugging changes from test config file

* update docs for translate_data flag in the runtime config file

* fix variable_id and var_id refs in dummy translate method

* Reimplement crop date range capability (#718)

* add placeholder functions for date range cropping

* refine crop_date_range function. Need to figure out how to pass calendar from subset df

* continue reworking crop_date_range

* revert changes to check_group_daterange, and add check that input files overlap start and end times
add option aggregate=false to to_dataset_dict call
look into replaceing check_time_bounds with crop date range call before the xarray merge

* reorder crop_date_range call
add calls to parse xr time coord and define start and end times for dataset

* finalize logic in crop_date_range

* remove start_time and end_time from, and add time_range column to catalog generated by define_pp_catalog_assets

* replace start_time and end_time entries with time_range entries populated from information in processed xarray dataset in write_pp_catalog

* remove unused dask import from preprocessor

* replace hard coded time dimension name with var.T.name in call to xarray concatenate

* add check_time_bounds call back to query and fix definitions for modified start and end points so that they use the dataset information

* fix hour, min, sec defs in crop_date_range for new start and end times

* strip non-numeric chars from strings passed to _coerce_to_datetime

* add logic to define start and end points for situation where desired date range is contained by xarray dataset to crop_date_range

* Create drop attributes func (#720)

* fix forcing_feedback settings formatting

* add check for user_pp_scripts attribute before looping through list to multifilepreprocessor add_user_pp_scripts method

* add snakeviz to env_dev.yml

* move drop_atts loop to a separate function that is called by crop_date_range and before merging xradate_range and before merging datasets in query_catalog in the preprocessor

* Update mdtf dev env file (#722)

* add snakeviz, gprof2dot, and intake-esgf packages to env_dev file

* add viztracer to dev environment file

* add kerchunk package to dev environment

* Fix various pp issues related to running seaice_suite (#721)

* fix pp issues for seaice_suite

* fix arg issue

* rename functions

* add default return for conversion function

---------

Co-authored-by: Aparna Radhakrishnan <[email protected]>
Co-authored-by: Jess <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working framework Issue pertains to the framework code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants