Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Read_Screen] Comprehensively run QC checks regardless of failure status then output results #736

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

xonq
Copy link
Member

@xonq xonq commented Jan 23, 2025

This PR closes #567

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

Updates task_screen (Read_screen) to run all screening checks regardless of intermittent failures. Outputs a TSV table of read screening results that propagates for both raw and clean read screening in Theia-pipelines

⚡ Impacted Workflows/Tasks

tasks/quality_control/comparisons/task_screen.wdl and all downstream workflows

This PR may lead to different results in pre-existing outputs: No

This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

  • Run all read screening checks regardless of previous checks missing thresholds
  • output each result to a single sample-level TSV
  • expose output to downstream workflows
  • update documentation

⚙️ Algorithm

  • conditional expression for failure check moved to end of task
  • create a TSV (read_screen.tsv) for read QC results
  • expose TSV to downstream WDL tasks/workflows for both raw and clean read screening

➡️ Inputs

n/a

⬅️ Outputs

Added output TSV filepath and exposed to downstream workflows

File read_screen_tsv = "read_screen.tsv"

🧪 Testing

  • TheiaCoV SE wf compared to control PHB main
    • read_screen.tsv is available and formatted correctly in output
    • All other outputs in table are consistent with control
  • TheiaCoV PE wf compared to control PHB main
    • read_screen.tsv is available and formatted correctly in output
    • All other outputs in table are consistent with control
  • TheiaCoV ONT wf compared to control PHB main
    • read_screen.tsv is available and formatted correctly in raw and clean outputs
    • All other outputs in table are consistent with control
  • TheiaProk SE wf compared to control PHB main
    • read_screen.tsv is available and formatted correctly in output
    • All other outputs in table are consistent with control
  • TheiaProk PE wf compared to control PHB main
    • read_screen.tsv is available and formatted correctly in output
    • All other outputs in table are consistent with control
  • TheiaProk ONT wf compared control PHB main
    • read_screen.tsv is available and formatted correctly in output
    • All other outputs in table are consistent with control
  • TheiaEuk PE wf compared to control PHB main
  • PE failure testing
    • min_reads threshold not met, screen_reads proceeded and populated failure log
    • min_basepairs threshold not met, screen_reads proceeded and populated failure log
    • genome_length threshold not met, screen_reads proceeded and populated failure log
    • min_coverage threshold not met, screen_reads proceeded and populated failure log
    • min_proportion threshold not met, screen_reads proceeded and populated failure log
  • SE/ONT failure testing
    • min_reads threshold not met, screen_reads proceeded and populated failure log
    • min_basepairs threshold not met, screen_reads proceeded and populated failure log
    • genome_length threshold not met, screen_reads proceeded and populated failure log
    • min_coverage threshold not met, screen_reads proceeded and populated failure log

Suggested Scenarios for Reviewer to Test

n/a

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable
    • You have updated the "Last Known Changes" field for any affected workflows in the respective workflow documentation page and for every entry in the three workflows_overview tables to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

@xonq xonq changed the title [Read_Screen] Comprehensively run QC checks regardless of failure status & output results [Read_Screen] Comprehensively run QC checks regardless of failure status then output results Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TheiaProk - read_screen] Output all metrics calculated to TSV file
1 participant