From b1d5b6a34c3df4754e5faa09adf48290f51f32a6 Mon Sep 17 00:00:00 2001 From: bshifaw Date: Fri, 28 Apr 2023 11:44:43 -0400 Subject: [PATCH] Release edits for cromshell2 (#255) * Added short option for --cromwell-url * changed cromshell2 initiation from `cromshell-beta` to `cromshell`, minor edits to README.md * Added test badges * removed "Notify" command description for README.md * rebased cromshell sh script using master branch, moved cromshell sh script to legacy_cromshell folder * moved completions folder to legacy_cromshell folder --------- Co-authored-by: bshifaw --- LICENSE | 2 +- README.md | 68 +++-- legacy_cromshell/README.md | 130 +++++++++ .../completions}/_cromshell | 5 +- legacy_cromshell/completions/reload.sh | 8 + cromshell => legacy_cromshell/cromshell | 246 +++++++++++++++++- setup.py | 4 +- src/cromshell/__main__.py | 3 +- 8 files changed, 412 insertions(+), 54 deletions(-) create mode 100644 legacy_cromshell/README.md rename {completions => legacy_cromshell/completions}/_cromshell (93%) create mode 100644 legacy_cromshell/completions/reload.sh rename cromshell => legacy_cromshell/cromshell (86%) diff --git a/LICENSE b/LICENSE index 0388433c..dcdb29b0 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ BSD 3-Clause License -Copyright (c) 2018, Broad Institute +Copyright (c) 2023, Broad Institute All rights reserved. Redistribution and use in source and binary forms, with or without diff --git a/README.md b/README.md index c3b49a1f..30cbe72e 100644 --- a/README.md +++ b/README.md @@ -7,36 +7,32 @@ "" "" "" "" "" "" ``` -# cromshell - A CLI for submitting workflows to a cromwell server and monitoring / querying their results. +# Cromshell +[![GitHub version](https://badge.fury.io/gh/broadinstitute%2Fcromshell.svg)](https://badge.fury.io/gh/broadinstitute%2Fcromshell) +[![Integration Test Workflow](https://github.com/broadinstitute/cromshell/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/broadinstitute/cromshell/actions/workflows/integration_tests.yml/badge.svg) +[![Unit Test Workflow](https://github.com/broadinstitute/cromshell/actions/workflows/unit_tests.yml/badge.svg)](https://github.com/broadinstitute/cromshell/actions/workflows/unit_tests.yml/badge.svg) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -Current version: 2.0.0.beta - -Cromshell 2 is the next step in the evolution Cromshell. It offers many of the same -functions as Cromshell 1 but has been rebuilt in python with many added benefits such as: -- Automatically zip dependencies when submitting a workflow. -- Added integration and unit tests to insure reliability -- Multiple ways of installation (source, brew tap, and pypi). -- Modular architecture making adding new functionalities easier. -- Developer documentation showing how to add new features and commands to the tool. +Cromshell is a CLI for submitting workflows to a Cromwell server and monitoring/querying their results. ## Examples: ``` - cromshell-beta submit workflow.wdl inputs.json options.json dependencies.zip - cromshell-beta status - cromshell-beta -t 20 metadata - cromshell-beta logs -2 + cromshell submit workflow.wdl inputs.json options.json dependencies.zip + cromshell status + cromshell -t 20 metadata + cromshell logs -2 ``` ## Supported Options: * `--no_turtle` or `--I_hate_turtles` * Hide turtle logo * `--cromwell_url [TEXT]` - * Specify Cromwell URL used. + * Specifies Cromwell URL used. * `TEXT` Example: `http://65.61.654.8:8000` + * Note: Depending on your Cromwell server configuration, you may not need to specify the port. * `-t [TIMEOUT]` - * Specify the server connection timeout in seconds. + * Specifies the server connection timeout in seconds. * Default is 5 sec. * `TIMEOUT` must be a positive integer. * `--gcloud_token_email [TEXT]` @@ -51,8 +47,8 @@ functions as Cromshell 1 but has been rebuilt in python with many added benefits #### Start/Stop workflows * `submit [-w] [options_json] [included_wdl_zip_file]` - * Will automatically validate the WDL and JSON file. - * Submit a new workflow. + * Automatically validates the WDL and JSON file. + * Submit a new workflow to the Cromwell server. * *`-w`* [COMING SOON] Wait for workflow to transition from 'Submitted' to some other status before ${SCRIPTNAME} exits. * *`included_wdl_zip_file`* Zip file containing any WDL files included in the input WDL * `abort [workflow-id] [[workflow-id]...]` @@ -60,7 +56,7 @@ functions as Cromshell 1 but has been rebuilt in python with many added benefits #### Workflow information: * `alias ` * Label the given workflow ID with the given alias_name. Aliases can be used in place of workflow IDs to reference jobs. - * Remove alias by passing empty double quotes as `alias_name` (e.g. `alias ""`) + * Remove an alias by passing empty double quotes as `alias_name` (e.g. `alias ""`) #### Query workflow status: * `status [workflow-id] [[workflow-id]...]` * Check the status of a workflow. @@ -72,8 +68,7 @@ functions as Cromshell 1 but has been rebuilt in python with many added benefits * Get the summarized status of all jobs in the workflow. * `-j` prints a JSON instead of a pretty summary of the execution status (compresses subworkflows) * `-x` compress sub-workflows for less detailed summarization - * `timing` *`[workflow-id] [[workflow-id]...]`* - + * `timing` *`[workflow-id] [[workflow-id]...]`* * Open the timing diagram in a browser. #### Logs @@ -83,26 +78,22 @@ functions as Cromshell 1 but has been rebuilt in python with many added benefits * Download all logs produced by a workflow. #### Job Outputs - * [COMING SOON] `list-outputs [workflow-id] [[workflow-id]...]` + * `list-outputs [workflow-id] [[workflow-id]...]` * List all output files produced by a workflow. * [COMING SOON] `fetch-all [workflow-id] [[workflow-id]...]` * Download all output files produced by a workflow. - - #### Get email notification on job completion - * [COMING SOON] `notify [workflow-id] [daemon-server] email [cromwell-server]` - * *`daemon-server`* server to run the notification daemon on #### Display a list jobs submitted through cromshell * `list [-c] [-u]` - * `-c` Color the output by completion status - * `-u` Check completion status of all unfinished jobs + * `-c` Color the output by completion status. + * `-u` Check completion status of all unfinished jobs. #### Clean up local cached list * [COMING SOON] `cleanup [-s STATUS]` * Remove completed jobs from local list. - Will remove all jobs from the local list that are in a completed state, + This command removes all jobs from the local list that are in a completed state, where a completed state is one of: `Succeeded`, `Failed`, `Aborted` - * *`-s STATUS`* If provided, will only remove jobs with the given `STATUS` from the local list. + * *`-s [STATUS]`* If provided, will only remove jobs with the given `[STATUS]` from the local list. #### Update cromwell server * `update-server` @@ -127,9 +118,9 @@ functions as Cromshell 1 but has been rebuilt in python with many added benefits You may omit the job ID of the last job submitted when running commands, or use negative numbers to reference previous jobs, e.g. "-1" will track the last job, "-2" will track the one before that, and so on. * You can override the default cromwell server by setting the argument `--cromwell_url` to the appropriate URL. * You can override the default cromshell configuration folder by setting the environmental variable `CROMSHELL_CONFIG` to the appropriate directory. - * Most commands takes multiple workflow-ids, which you *can specify both in relative and absolute ID value* (i.e. `./cromshell status -- -1 -2 -3 c2db2989-2e09-4f2c-8a7f-c3733ae5ba7b`). - * Assign aliases to workflow ids using the alias command (i.e. `./cromshell alias -- -1 myAliasName`). - * Once the Alias command is used to attach an alias to a workflow id, the alias name can be used instead of the id (i.e. `./cromshell status myAliasName`). + * Most commands takes multiple workflow-ids, which you *can specify both in relative and absolute ID value* (i.e. `cromshell status -- -1 -2 -3 c2db2989-2e09-4f2c-8a7f-c3733ae5ba7b`). + * Assign aliases to workflow ids using the alias command (i.e. `cromshell alias -- -1 myAliasName`). + Once the Alias command is used to attach an alias to a workflow id, the alias name can be used instead of the id (i.e. `cromshell status myAliasName`). ## Installation From brew @@ -145,10 +136,9 @@ From source git clone git@github.com:broadinstitute/cromshell.git cd cromshell - git checkout cromshell_2.0 pip install . - cromshell-beta --help + cromshell --help ## Uninstallation From brew @@ -157,10 +147,14 @@ From brew From pypi/source - pip uninstall cromshell-beta + pip uninstall cromshell ## Development See the [Developer Docs](./developer_docs/) +## Legacy Cromshell +The original Cromshell shell script is still available in the legacy_cromshell folder and in the `cromshell1` branch of this repository. +It is no longer maintained, but is still available for use. The original Cromshell contains some commands not yet available in Cromshell2, +such as `fetch-logs`, `fetch-all`, `notify`, and `cleanup`. These commands will be added to Cromshell2 in the future. diff --git a/legacy_cromshell/README.md b/legacy_cromshell/README.md new file mode 100644 index 00000000..e819daea --- /dev/null +++ b/legacy_cromshell/README.md @@ -0,0 +1,130 @@ +# Legacy Cromshell + +The legacy cromshell folder contains the original cromshell script. It is no longer maintained and is only kept for historical purposes. + +--- + +``` + __ __ + .,-;-;-,. /'_\ +-----------------------------------------------+ /_'\.,-;-;-,. + _/_/_/_|_\_\) / | CROMSHELL : run Cromwell jobs from the shell | \ (/_/__|_\_\_ + '-<_><_><_><_>=/\ +-----------------------------------------------+ /\=<_><_><_><_>-' + `/_/====/_/-'\_\ /_/'-\_\====\_\' + "" "" "" "" "" "" +``` + +# Installation + +`cromshell` and its dependencies can be installed on OSX with `brew install broadinstitute/dsp/cromshell` + +or through [bioconda](https://bioconda.github.io/) with `conda install cromshell` + +Alternatively, download the script and put it somewhere... + +# cromshell + A script for submitting workflows to a cromwell server and monitoring / querying their results. + +requires `column`, `curl`, `mail`, and [jq](https://stedolan.github.io/jq/) + +### Examples: + +``` + cromshell submit workflow.wdl inputs.json options.json dependencies.zip + cromshell status + cromshell -t 20 metadata + cromshell logs -2 +``` + +### Supported Flags: + * `-t` `TIMEOUT` + * Set the curl connect timeout to `TIMEOUT` seconds. + * Also sets the curl max timeout to `2*TIMEOUT` seconds. + * `TIMEOUT` must be an integer. + +### Supported Subcommands: + + + #### Start/Stop workflows + * `submit` `[-w]` *``* *``* `[options_json]` `[included_wdl_zip_file]` + * Submit a new workflow. + * Will automatically validate the WDL and JSON file if `womtool` is in your path. + * To add `womtool` to your path, install `cromwell` with brew: + * `brew install cromwell` + * *`-w`* Wait for workflow to transition from 'Submitted' to some other status + before ${SCRIPTNAME} exits + * *`included_wdl_zip_file`* Zip file containing any WDL files included in the input WDL + * `abort` *`[workflow-id] [[workflow-id]...]`* + * Abort a running workflow. + #### Workflow information: + * `alias` *`` ``* + * Label the given workflow ID with the given alias_name. Aliases can be used in place of workflow IDs to reference jobs. + #### Query workflow status: + * `status` *`[workflow-id] [[workflow-id]...]`* + * Check the status of a workflow. + * `metadata` *`[workflow-id] [[workflow-id]...]`* + * Get the full metadata of a workflow. + * `slim-metadata` *`[workflow-id] [[workflow-id]...]`* + * Get a subset of the metadata from a workflow. + * `execution-status-count`, `counts` *`[-p] [-x] [workflow-id] [[workflow-id]...]`* + * Get the summarized status of all jobs in the workflow. + * `-p` prints a pretty summary of the execution status instead of JSON + * `-x` expands sub-workflows for more detailed summarization + * `timing` *`[workflow-id] [[workflow-id]...]`* + * Open the timing diagram in a browser. + + #### Logs + * `logs` *`[workflow-id] [[workflow-id]...]`* + * List the log files produced by a workflow. + * `fetch-logs` *`[workflow-id] [[workflow-id]...]`* + * Download all logs produced by a workflow. + + #### Job Outputs + * `list-outputs` *`[workflow-id] [[workflow-id]...]`* + * List all output files produced by a workflow. + * `fetch-all` *`[workflow-id] [[workflow-id]...]`* + * Download all output files produced by a workflow. + + #### Get email notification on job completion + * `notify` *`[workflow-id]` `[daemon-server]` `email` `[cromwell-server]`* + * *`daemon-server`* server to run the notification daemon on + + #### Display a list jobs submitted through cromshell + * `list` *`[-c]` `[-u]`* + * *`-c`* Color the output by completion status + * *`-u`* Check completion status of all unfinished jobs + + #### Clean up local cached list + * `cleanup` *`[-s STATUS]`* + * Remove completed jobs from local list. + Will remove all jobs from the local list that are in a completed state, + where a completed state is one of: `Succeeded`, `Failed`, `Aborted` + * *`-s STATUS`* If provided, will only remove jobs with the given `STATUS` from the local list. + + #### Update cromwell server + * `update-server` + * Change the cromwell server that new jobs will be submitted to. + + #### Get cost of a workflow + Costs are only available for workflows that completed more than 8 hours ago on a `GCS` backend. + Requires the `~/.cromshell/gcp_bq_cost_table.config` configuration file to exist and contain the name of the BigQuery cost table for your organization. + Also ensure that the default project has been set using `bq init`. + * `cost [workflow-id] [[workflow-id]...]` + * Get the cost for a workflow. + * `cost-detailed [workflow-id] [[workflow-id]...]` + * Get the cost for a workflow at the task level. + + + ### Features: + * Running `submit` will create a new folder in the `~/.cromshell/${CROMWELL_URL}/` directory named with the cromwell job id of the newly submitted job. + It will copy your wdl and json inputs into the folder for reproducibility. + * It keeps track of your most recently submitted jobs by storing their ids in `./cromshell/` + You may omit the job ID of the last job submitted when running commands, or use negative numbers to reference previous jobs, e.g. "-1" will track the last job, "-2" will track the one before that, and so on. + * You can override the default cromwell server by setting the environmental variable `CROMWELL_URL` to the appropriate URL. + * Most commands takes multiple workflow-ids, which you *can specify both in relative and absolute ID value* (i.e. `./cromwell status -1 -2 -3 c2db2989-2e09-4f2c-8a7f-c3733ae5ba7b`). + * You can supply additional headers to Cromshell REST calls by setting the environmental variable `CROMSHELL_HEADER`. This is useful if your Cromwell server is fronted by an auth server that authenticates access using bearer tokens before forwarding requests onto the Cromwell API. For example: `CROMSHELL_HEADER="Authorization: Bearer 3e2f34f2e..."` + + ### Code Conventions: + Please try to follow these conventions when editing cromshell. + * Use double brackets for tests ( `[[ ... ]]` instead of `[]`) + * Use `{}` when doing dereferencing variables (`${VALUE}`,`${1}` instead of `$VALUE`,`$1`) + * Define functions with the `function` keyword (`function doThing()` instead of `doThing()`) \ No newline at end of file diff --git a/completions/_cromshell b/legacy_cromshell/completions/_cromshell similarity index 93% rename from completions/_cromshell rename to legacy_cromshell/completions/_cromshell index fd7fdb1d..03c29920 100755 --- a/completions/_cromshell +++ b/legacy_cromshell/completions/_cromshell @@ -23,6 +23,9 @@ function _cromshell { notify\:'Get email notification on job completion' list\:'Display a list jobs submitted through cromshell' cleanup\:'Clean up local cached list' + cost\:'Display workflow cost' + cost-detailed\:'Display workflow cost broken down by task' + update-server\:'Change the cromshell server new jobs will be submited to' ))" \ "*::arg:->args" @@ -33,7 +36,7 @@ function _cromshell { alias) _alias ;; - abort|status|metadata|slim-metadata|timing|logs|fetch-logs|list-outputs|fetch-all) + abort|status|metadata|slim-metadata|timing|logs|fetch-logs|list-outputs|fetch-all|cost|cost-detailed) _id_list_function ;; list) diff --git a/legacy_cromshell/completions/reload.sh b/legacy_cromshell/completions/reload.sh new file mode 100644 index 00000000..1cc01bb9 --- /dev/null +++ b/legacy_cromshell/completions/reload.sh @@ -0,0 +1,8 @@ +# Reload the _cromshell completion file for testing +# Usage: +# source reload.sh + +fpath=($PWD $fpath) +unfunction _cromshell +autoload -U compinit +compinit diff --git a/cromshell b/legacy_cromshell/cromshell similarity index 86% rename from cromshell rename to legacy_cromshell/cromshell index 0c624821..3047699e 100755 --- a/cromshell +++ b/legacy_cromshell/cromshell @@ -44,6 +44,9 @@ mkdir -p ${CROMSHELL_CONFIG_DIR} CROMWELL_SUBMISSIONS_FILE="${CROMSHELL_CONFIG_DIR}/all.workflow.database.tsv" [[ ! -f ${CROMWELL_SUBMISSIONS_FILE} ]] && echo -e "DATE\tCROMWELL_SERVER\tRUN_ID\tWDL_NAME\tSTATUS\tALIAS" > ${CROMWELL_SUBMISSIONS_FILE} +# Set the GCP big query cost table file: +BQ_COST_TABLE_FILE=${CROMSHELL_CONFIG_DIR}/gcp_bq_cost_table.config + # Update cromshell submissions file if it needs updating: grep -q 'ALIAS$' ${CROMWELL_SUBMISSIONS_FILE} r=$? @@ -75,7 +78,12 @@ CROMWELL_SLIM_METADATA_PARAMETERS+="&includeKey=subWorkflowMetadata&includeKey=s CURL_CONNECT_TIMEOUT=5 let CURL_MAX_TIMEOUT=2*${CURL_CONNECT_TIMEOUT} -alias curl='curl --connect-timeout ${CURL_CONNECT_TIMEOUT} --max-time ${CURL_MAX_TIMEOUT}' +# CROMSHELL_HEADER can be set externally and can be use to pass things like oauth tokens to curl... +if [[ -z "${CROMSHELL_HEADER}" ]]; then + alias curl='curl --connect-timeout ${CURL_CONNECT_TIMEOUT} --max-time ${CURL_MAX_TIMEOUT}' +else + alias curl='curl -H "${CROMSHELL_HEADER}" --connect-timeout ${CURL_CONNECT_TIMEOUT} --max-time ${CURL_MAX_TIMEOUT}' +fi ################################################################################ @@ -151,6 +159,9 @@ function usage() echo -e " cromshell -t 50 status" echo -e " cromshell logs -2" echo -e "" + echo -e "Environment Variables:" + echo -e " CROMSHELL_HEADER Can be use to pass things like oauth tokens to the request header" + echo -e "" echo -e "Supported Flags:" echo -e " -t TIMEOUT Set the curl connect timeout to TIMEOUT seconds." echo -e " Also sets the curl max timeout to 2*TIMEOUT seconds." @@ -204,6 +215,22 @@ function usage() echo -e " Will remove all jobs from the local list that are in a completed state," echo -e " where a completed state is one of: $(echo ${TERMINAL_STATES} | tr ' ' ',' )" echo -e " -s STATUS If provided, will only remove jobs with the given STATUS from the local list." + echo -e " Update cromwell server:" + echo -e " update-server Change which cromwell server jobs will be submitted to." + echo -e "" + echo -e " Get cost for a workflow" + echo -e " cost [workflow-id] [[workflow-id]...] Get the cost for a workflow." + echo -e " Only works for workflows that completed" + echo -e " more than 8 hours ago on GCS." + echo -e " Requires the 'gcp_bq_cost_table.config'" + echo -e " configuration file to exist and contain" + echo -e " the big query cost table for your organization." + echo -e " cost-detailed [workflow-id] [[workflow-id]...] Get the cost for a workflow at the task level." + echo -e " Only works for workflows that completed" + echo -e " more than 8 hours ago on GCS." + echo -e " Requires the 'gcp_bq_cost_table.config'" + echo -e " configuration file to exist and contain" + echo -e " the big query cost table for your organization." echo -e "" echo -e "Return values:" echo -e " 0 SUCCESS" @@ -453,6 +480,8 @@ function assertCanCommunicateWithServer curl -s ${1}/api/workflows/v1/backends > ${f} grep -q 'supportedBackends' ${f} r=$? + # Do some cleanup here for daemon processes. + rm -f ${f} if [[ ${r} -ne 0 ]] ; then turtleDead error "Error: Cannot communicate with Cromwell server: ${serverName}" @@ -522,8 +551,9 @@ function alias_workflow() function populateWorkflowIdAndServerFromAlias() { local alias_name=${1} - WORKFLOW_ID=$(grep "\\t${alias_name}\$" ${CROMWELL_SUBMISSIONS_FILE} | head -n1 | awk '{print $3}') - WORKFLOW_SERVER_URL=$(grep "\\t${alias_name}\$" ${CROMWELL_SUBMISSIONS_FILE} | head -n1 | awk '{print $2}') + WORKFLOW_ID=$(awk "BEGIN{FS=\"\t\"}{if (\$6 == \"${alias_name}\") {print}}" ${CROMWELL_SUBMISSIONS_FILE} | head -n1 | awk '{print $3}') + WORKFLOW_SERVER_URL=$(awk "BEGIN{FS=\"\t\"}{if (\$6 == \"${alias_name}\") {print}}" ${CROMWELL_SUBMISSIONS_FILE} | head -n1 | awk '{print $2}') + if [[ -z ${WORKFLOW_ID} ]]; then error "Invalid workflow/alias: ${alias_name}" exit 5 @@ -549,6 +579,9 @@ function populateWorkflowIdAndServerUrl() grep ${WORKFLOW_ID} ${CROMWELL_SUBMISSIONS_FILE} > ${tmpFile} r=$? [[ ${r} -eq 0 ]] && WORKFLOW_SERVER_URL=$( awk '{print $2}' ${tmpFile} ) + + # Do some cleanup here for daemon processes. + rm -f ${tmpFile} else if [[ -n "$userSpecifiedId" ]]; then populateWorkflowIdAndServerFromAlias ${userSpecifiedId} @@ -785,7 +818,7 @@ function submit() assertCanCommunicateWithServer ${CROMWELL_URL} - echo "Submitting job to server: ${CROMWELL_URL}" + echo "Submitting job to server: ${CROMWELL_URL}" local response=$(curl -s -F workflowSource=@${wdl} ${2:+ -F workflowInputs=@${json}} ${3:+ -F workflowOptions=@${optionsJson}} ${4:+ -F workflowDependencies=@${dependenciesZip}} ${CROMWELL_URL}/api/workflows/v1) r=$? @@ -845,7 +878,7 @@ function status() assertCanCommunicateWithServer ${2} local f=$( makeTemp ) curl -s ${2}/api/workflows/v1/${1}/status > ${f} - [[ $? -ne 0 ]] && error "Could not connect to Cromwell server." && return 2 + [[ $? -ne 0 ]] && error "Could not connect to Cromwell server." && rm -f ${f} && return 2 grep -qE '"Failed"|"Aborted"|"fail"' ${f} r=$? @@ -869,7 +902,7 @@ function status() jq '.. | .calls? | values | map_values(group_by(.executionStatus) | map({(.[0].executionStatus): . | length}) | add)' ${tmpMetadata} > ${tmpExecutionStatusCount} # Check for failure states: - cat ${tmpExecutionStatusCount} | grep -q 'Failed' + cat ${tmpExecutionStatusCount} | grep -q '"Failed"' r=$? # Check for failures: @@ -897,6 +930,9 @@ function status() awk "BEGIN { FS=OFS=\"\t\" } { if ((\$3 == \"${1}\") && (\$2 == \"${2}\")) { \$5=\"${workflowStatus}\"; }; print }" ${CROMWELL_SUBMISSIONS_FILE}.bak > ${CROMWELL_SUBMISSIONS_FILE} #sed -i .bak -e "s#\\(.*${1}.*\\.wdl\\)\\t*.*#\\1$(printf '\t')${workflowStatus}#g" ${CROMWELL_SUBMISSIONS_FILE} + # Do some cleanup here for daemons: + rm -f ${f} ${tmpExecutionStatusCount} ${tmpMetadata} + return ${retVal} } @@ -1006,6 +1042,9 @@ function execution-status-count() jq '.. | .calls? | values | map_values(group_by(.executionStatus) | map({(.[0].executionStatus): . | length}) | add)' "${tempFile}" checkPipeStatus "Could not read tmp file JSON data." "Could not parse JSON output from cromwell server." fi + + # Do some cleanup here for daemon processes. + rm -f ${tempFile} done } @@ -1082,14 +1121,25 @@ function printTaskStatus() failedShards=$(tr '\n' ' ' < ${tmpFailedShardsFile} | sed -E 's/\[( )+//' | sed -E 's/( )+\]//') echo -e "${taskStatus}${indent}\tFailed shards: ${failedShards}${COLOR_NORM}" fi + + rm -f ${tmpFailedShardsFile} ${tmpStatusesFile} } # Bring up a browser window to view timing information on the job. function timing() { + local id=${1} + local server_url_for_browser=${2} turtle - echo "Opening timing information in a web browser for job ID: ${1}" - open ${2}/api/workflows/v1/${1}/timing + echo "Opening timing information in your default web browser for job ID: ${id}" + echo "${server_url_for_browser}" | grep -q "^http" + r=$? + if [ $r -ne 0 ]; then + server_url_for_browser="http://${server_url_for_browser}" + fi + server_url_for_browser=${server_url_for_browser}/api/workflows/v1/${id}/timing + open ${server_url_for_browser} + echo "URL is: ${server_url_for_browser}" return $? } @@ -1450,6 +1500,166 @@ function fetch-all() return 0 } +function _cost_helper() +{ + local id=$1 + local svr=$2 + + turtle + which bq &>/dev/null + local r=$? + [[ ${r} -ne 0 ]] && error "bq does not exist. Must install the big query command-line client." && exit 8 + + # Check for gdate: + if [[ "$(uname)" == "Darwin" ]] ; then + which gdate &> /dev/null + r=$? + [ $r -ne 0 ] && error "Must have coreutils installed for 'gdate'" && exit 13 + fi + + [ ! -e ${BQ_COST_TABLE_FILE} ] && error "Big Query cost table file does not exist. Must populate ${BQ_COST_TABLE_FILE} with big query cost table information." && exit 9 + + # Make sure the given ID is actually in our file: + grep -q "${id}" ${CROMWELL_SUBMISSIONS_FILE} + r=$? + [ $r -ne 0 ] && error "Given ID is not in your cromwell submissions file (${CROMWELL_SUBMISSIONS_FILE}): ${id}" && exit 10 + + COST_TABLE=$(head -n1 ${BQ_COST_TABLE_FILE}) + + # Get the time that the workflow finished: + error "Fetching workflow finish time..." + tmpMetadata=$( makeTemp ) + curl --compressed -s "${svr}/api/workflows/v1/${id}/metadata?includeKey=workflowProcessingEvents" > ${tmpMetadata} + + [ ! -s ${tmpMetadata} ] && error "Could not communicate with server. Perhaps try a longer timeout." && exit 15 + + grep -q '"description":"Finished",' ${tmpMetadata} + r=$? + [ $r -ne 0 ] && error "Workflow ${id} is not finished yet." && exit 11 + + STARTED_TIME=$( jq '.workflowProcessingEvents | map(select(.description == "PickedUp")) | .[].timestamp' ${tmpMetadata} | tr -d '"') + FINISHED_TIME=$( jq '.workflowProcessingEvents | map(select(.description == "Finished")) | .[].timestamp' ${tmpMetadata} | tr -d '"') + + # Make sure that at least 8h have passed since the workflow finished: + if [[ "$(uname)" == "Darwin" ]] ; then + local DATE_CMD=gdate + else + local DATE_CMD=date + fi + + local WAITING_PERIOD_s=$(echo "3600 * 24" | bc) + + local ts1=$( $DATE_CMD +%s -d "${FINISHED_TIME}" ) + local ts2=$( date +%s ) + local time_diff=$(echo "${ts2} - ${ts1}" | bc) + local can_check_cost=$( echo "${time_diff} >= ${WAITING_PERIOD_s}" | bc ) + + if [ $can_check_cost -ne 1 ] ; then + + local total_wait_time_s=$(echo "${WAITING_PERIOD_s} - ${time_diff}" | bc) + local wait_time_h=$(echo "scale=0;${total_wait_time_s} / 3600" | bc) + local wait_time_m=$(echo "scale=0;(${total_wait_time_s} % 3600) / 60" | bc) + local wait_time_s=$(echo "scale=0;(${total_wait_time_s} % 3600) % 60" | bc) + + # Format the time: + [[ ${#wait_time_h} -lt 2 ]] && wait_time_h="0${wait_time_h}" + [[ ${#wait_time_m} -lt 2 ]] && wait_time_m="0${wait_time_m}" + [[ ${#wait_time_s} -lt 2 ]] && wait_time_s="0${wait_time_s}" + + error "Workflow finished less than 24 hours ago. Cannot check cost. Please wait ${wait_time_h}h:${wait_time_m}m:${wait_time_s}s and try again." + exit 12 + fi + + # Generate the start and end dates for our query: + START_DATE=$($DATE_CMD +%Y-%m-%d -d "${STARTED_TIME} -1 day") + END_DATE=$($DATE_CMD +%Y-%m-%d -d "${FINISHED_TIME} +1 day") + + error "Using cost table: ${COST_TABLE}" + error "" +} + +# Get the cost for a workflow ID: +function cost() +{ + local id=$1 + local svr=$2 + _cost_helper $id $svr + + local tmp_cost_file=$( makeTemp ) + + # Get the cost from Big Query: + bq query --use_legacy_sql=false "SELECT sum(cost) FROM \`${COST_TABLE}\`, UNNEST(labels) WHERE value = \"cromwell-${id}\" AND partition_time BETWEEN \"${START_DATE}\" AND \"${END_DATE}\";" > ${tmp_cost_file} + r=$? + + # Display the cost: + total_cost=$( head -n4 ${tmp_cost_file} | tail -n1 | tr -d '| \t') + [[ "${total_cost}" == NULL ]] && error "Could not retrieve cost - no cost entries found." && exit 14 + echo -n '$' + echo "scale=2;${total_cost}/1" | bc + + error "" + error "Costs rounded to nearest cent (approximately)." + error "" + error "WARNING: Costs here DO NOT include any call cached tasks." + + return $r +} + +# Get the cost for a workflow ID: +function cost-detailed() +{ + local id=$1 + local svr=$2 + _cost_helper $id $svr + + local tmp_cost_file=$( makeTemp ) + + # Get the cost from Big Query: + bq query \ + --use_legacy_sql=false \ + "SELECT + wfid.value, service.description, task.value as task_name, sum(cost) as cost + FROM + \`${COST_TABLE}\` as billing, UNNEST(labels) as wfid, UNNEST(labels) as task + WHERE + cost > 0 + AND task.key LIKE \"wdl-task-name\" + AND wfid.key LIKE \"cromwell-workflow-id\" + AND wfid.value like \"%${id}\" + AND partition_time BETWEEN \"${START_DATE}\" AND \"${END_DATE}\" + GROUP BY 1,2,3 + ORDER BY 4 DESC + ;" | tail -n+4 | grep -v '^+' | tr -d '|' | awk 'BEGIN{OFS="\t"}{print $(NF-1), $NF}' | sort > ${tmp_cost_file} + + r=$? + + local total_cost=$(awk '{print $2}' ${tmp_cost_file} | tr '\n' '~' | sed -e 's#$#0#' -e 's@~@ + @g' -e 's@^@print(@' -e 's@$@)@' | python) + local total_cost=$(echo "scale=2;${total_cost}/1" | bc) + + local tmpf2=$(makeTemp) + echo -e "TASK\tCOST" > ${tmpf2} + while read line ; do + local task=$(echo $line | awk '{print $1}') + local task_cost=$(echo $line | awk '{print $2}') + task_cost=$(echo "scale=2;if ( ${task_cost} >= 0.01 ) { ${task_cost}/1; } else { 0.01 }" | bc) + printf "${task}\t$%02.2f\n" ${task_cost} + done < ${tmp_cost_file} >> ${tmpf2} + + column -t ${tmpf2} | head -n1 + local bar_width=$( column -t ${tmpf2} | head -n1 | wc -c ) + python -c "print('=' * ${bar_width})" + column -t ${tmpf2} | tail -n+2 + python -c "print('=' * ${bar_width})" + echo "Total Cost: \$${total_cost}" + + error "" + error "Costs rounded to nearest cent (approximately)." + error "" + error "WARNING: Costs here DO NOT include any call cached tasks." + + return $r +} + function assertValidEmail() { # Make sure the user gave us a good email address: @@ -1522,7 +1732,8 @@ function notify() # Send the script to the server: local tmpOut=$( makeTemp ) scp ${SCRIPTDIR}/${SCRIPTNAME} ${hostServer}:~/. &> ${tmpOut} - [[ $? -ne 0 ]] && error "ERROR: Could not copy cromshell to server ${hostServer}" && error "$(cat ${tmpOut})" && exit 7 + [[ $? -ne 0 ]] && error "ERROR: Could not copy cromshell to server ${hostServer}" && error "$(cat ${tmpOut})" && rm -f ${tmpOut} && exit 7 + rm -f ${tmpOut} # Spin off notification process on the server: results=$( ssh ${hostServer} "~/${SCRIPTNAME} _rawNotify ${WORKFLOW_ID} ${email} ${WORKFLOW_SERVER_URL}" ) @@ -1573,8 +1784,14 @@ function _notifyHelper() workflowFile=$( grep '$id' ${CROMWELL_SUBMISSIONS_FILE} | head -n1 | awk '{print $4}' ) metaData=$( metadata ${id} ${cromwellServer} ) echo -e "CROMWELL Task ${completionStatus}:\n\n${id}\n\non\n\n${cromwellServer}\n\n${separator}\n\nStatus:\n$(cat ${statusFile})\n\n${separator}\nMetadata:\n${metaData}\n\n${separator}\nSent by $( whoami )@$( hostname ) on $( date ) \n\n\n" | mail -n -s "Cromwell Task ${completionStatus} [${cromwellServer}] ${workflowFile}" ${email} + rm -f ${statusFile} break fi + # Clean the temporary file just in case. + # This might need to happen because of how this helper gets called. + # Not sure the trap is working properly (i.e. it doesn't seem to be calling at_exit). + rm -f ${statusFile} + # wait for 10 seconds: sleep 10 done @@ -1834,6 +2051,11 @@ if ${ISINTERACTIVESHELL} ; then # Get our sub-command: SUB_COMMAND=${1} shift + + # Use the update-server sub-command to modify CROMWELL_NEEDS_SETUP here. + if [[ "${SUB_COMMAND}" == "update-server" ]] ; then + CROMWELL_NEEDS_SETUP=true + fi # Check if we need to set this up. # Note: because of how `notify` works, we can't require the setup for the notify action. @@ -1852,7 +2074,7 @@ if ${ISINTERACTIVESHELL} ; then # Validate our sub-command: case ${SUB_COMMAND} in - cleanup|submit|status|logs|execution-status-count|counts|metadata|slim-metadata|timing|abort|notify|list|fetch-all|fetch-logs|list-outputs|alias) + cleanup|submit|status|cost|cost-detailed|logs|execution-status-count|counts|metadata|slim-metadata|timing|abort|notify|list|fetch-all|fetch-logs|list-outputs|alias) # This is a good sub-command, so we do not need to do anything. ;; _rawNotify) @@ -1887,8 +2109,8 @@ if ${ISINTERACTIVESHELL} ; then esac # Don't check JQ version if we're notifying. - # Why do this, you might ask? - # Because it's easier to hack our code than to get Broad IT to update software on our servers. + # Why do this, you might ask? + # Because it's easier to hack our code than to get Broad IT to update software on our servers. if [[ "${SUB_COMMAND_FOR_DISPLAY}" != "notify" ]] ; then checkJQVersion r=$? diff --git a/setup.py b/setup.py index 098bc37b..00974ba3 100644 --- a/setup.py +++ b/setup.py @@ -35,7 +35,7 @@ packages=find_packages("src"), package_dir={"": "src"}, classifiers=[ - "Development Status :: 4 - Beta", + "Development Status :: 5 - Production/Stable", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Natural Language :: English", @@ -46,6 +46,6 @@ "Programming Language :: Python :: 3.10", "Programming Language :: Python :: Implementation :: CPython", ], - entry_points={"console_scripts": ["cromshell-beta=cromshell.__main__:main_entry"]}, + entry_points={"console_scripts": ["cromshell=cromshell.__main__:main_entry"]}, include_package_data=True, ) diff --git a/src/cromshell/__main__.py b/src/cromshell/__main__.py index 30f98584..5313b272 100644 --- a/src/cromshell/__main__.py +++ b/src/cromshell/__main__.py @@ -62,9 +62,10 @@ help="Hide turtle logo", ) @click.option( + "-cu", "--cromwell_url", type=str, - help="Specify Cromwell URL used", + help="Specify Cromwell URL used. Example: 'http://65.61.654.8:8000'.", ) @click.option( "-t",