Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/refactor #85

Open
wants to merge 21 commits into
base: development
Choose a base branch
from
Open

Feature/refactor #85

wants to merge 21 commits into from

Conversation

lthp
Copy link
Collaborator

@lthp lthp commented Oct 11, 2023

This contains

  • All the documentation branches (pepQuery, documentation_merge)
  • some refactoring on the cancer specific mode

@lthp lthp requested a review from akahles October 11, 2023 13:55
Copy link
Member

@akahles akahles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good to me. I suggested a few things for the documentation. There is nothing that I think is essential to be addressed before merging. A few nice to haves.
In general, it was hard to review all the "business logic" without context. I will go over the code in its entirety and might make suggestions via a separate PR.

required.add_argument("--kmer", type=int, help="length of the kmers for kmer output.", required=True, default=9)

submodes = parser.add_argument_group('Submodes parameters', 'Commands for conceptual information about the processing.')
submodes.add_argument("--libsize-extract",help="Set this parameter to True to generate library sizes and gene quantifications and skip neontigen generation. **Note:** If set to True, the program will only output files 3 and 7 of the :ref:`build output section <build_out>`.",action="store_true", required=False, default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formulation of this is a bit misleading. One can not really set this to true or false, but one rather chooses this option or not.


submodes = parser.add_argument_group('Submodes parameters', 'Commands for conceptual information about the processing.')
submodes.add_argument("--libsize-extract",help="Set this parameter to True to generate library sizes and gene quantifications and skip neontigen generation. **Note:** If set to True, the program will only output files 3 and 7 of the :ref:`build output section <build_out>`.",action="store_true", required=False, default=False)
submodes.add_argument("--all-read-frames", help="Set this parameter to True to switch to exhaustive translation and study all possible reading frames instead of just the annotated ones in the annotation file.", action="store_true", required=False, default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment before on options

submodes.add_argument("--libsize-extract",help="Set this parameter to True to generate library sizes and gene quantifications and skip neontigen generation. **Note:** If set to True, the program will only output files 3 and 7 of the :ref:`build output section <build_out>`.",action="store_true", required=False, default=False)
submodes.add_argument("--all-read-frames", help="Set this parameter to True to switch to exhaustive translation and study all possible reading frames instead of just the annotated ones in the annotation file.", action="store_true", required=False, default=False)
submodes.add_argument("--count-path", help="Absolute path for the second output of `SplAdder <https://github.com/ratschlab/spladder>`_ containing the graph expression quantification. If provided, expression quantification of genes will take place. **Format:** hdf5.", required=False, default=None)
submodes.add_argument("--output-samples", nargs='+', help="List of sample names to output. **Note:** *Names should match the file name of the splice graphs. If not provided all samples are processed and program runs faster.*", required=False, default=[])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the program slower if fewer files are processed?

submodes.add_argument("--heter-code", type=int, help="It specifies the heterozygous allele.", default=0, choices = ['0', '2']) #TODO: Add more info about this parameter?

parameters = parser.add_argument_group('Technical parameters' , 'Commands for optimization of the software.')
parameters.add_argument("--compressed", help="Compress output files", action="store_true", default=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason not to have compressed output as default?


subset = parser.add_argument_group('Subset parameters', 'Commands to select a subset of the genes to be processed.')
subset.add_argument("--process-chr", nargs='+',help="List of chromosomes to be processed. If not provided all chromosomes are processed. The chromosomes names should be provided in the same format as in FASTA and annotation files. For annotations downloaded from GENCODE, this format is **chrX**, X being the chromosome number.", required=False, default=None)
subset.add_argument("--complexity-cap", type=int, help="Maximum edge complexity of the graph to be processed. If not provided all graphs are processed.", required=False, default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without further description, a user would not know what a suitable threshold is here.

crf.add_argument("--n-samples-lim-cancer", type=int, help="This parameter corresponds to the number of samples threshold in cohort specific filtering. It indicated the minimum number of cancer samples in which one should see an expression higher than `--cohort-expr-support-cancer` in order to consider the kmer as a cancer candidate. Kmers with an expression higher than `--cohort-expr-support-cancer` in at least `--n-samples-lim-cancer` samples will be considered as cancer candidates. For each cancer sample of interest, provided under `--ids_cancer_samples`, the expression threshold `--cohort-expr-support-cancer` will be assessed in the rest of the cohort, excluding the sample of interest.", required=False, default=None)
crf.add_argument("--path-cancer-matrix-segm", nargs='+', help="Path to the cancer matrix containing segment expression from samples belonging to a cohort. The matrix will have the following dimensions: [kmers * samples]. When only the junction overlapping cancer kmers are of interest, the user should provide only `--path-cancer-matrix-edge`, and skip the inclusion of this file.If both matrices are provided, junction expression will be chosen in case there is expression information for the same kmer in both matrices. This will be the output 5 of :ref:`build output section <build_out>`", required=False, default=None)
crf.add_argument("--path-cancer-matrix-edge", nargs='+', help="Path to the cancer matrix containing junction expression from samples belonging to a cohort. The matrix will have the following dimensions: [kmers * samples]. When only the junction overlapping cancer kmers are of interest, the user should provide only this file, and skip the inclusion of `--path-cancer-matrix-segm`. If both matrices are provided, junction expression will be chosen in case there is expression information for the same kmer in both matrices. This will be the ouput 6 of :ref:`build output section <build_out>`.", required=False, default=None)
crf.add_argument("--cancer-support-union", help="Parameter to choose how the sample specific filtering and the cohort specific filtering are combined. By default, they are combined by choosing the common kmers to both filtering steps, i.e. performing an intersection. If this parameter is set to True, the union of both filtering steps will be performed, i.e. the kmers that pass either the sample specific filtering or the cohort specific filtering will be kept.", action="store_true", required=False, default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment on options above - cannot be set to true

development = parser.add_argument_group('Optional development parameters')
development.add_argument("--tot-batches", type=int, help="If selected, the filtering of the background and foreground will be based on hash functions. This parameter will set the total number of batches in which we will divide the foreground and background files to filter, and each of those batches will be assigned a hash value. If `--batch-id` is specified, `--tot-batches` should also be specified.", required=False, default=None)
development.add_argument("--batch-id", type=int, help="If selected, the filtering of the background and foreground will be based on hash functions. This parameter will set the batch id of the current batch that is being filtered. The batch id should be an integer between 0 and `--tot-batches`. It shows the specific batch that we want to process, out of the `--tot-batches`. If `--batch-id` is specified, `--tot-batches` should also be specified.", required=False, default=None)
development.add_argument("--on-the-fly", help="If set to true, all the filtering steps will be done on the fly, without the creation of intermediate files. The creation of intermediate files would speed up the computations in the case where full or partial reruns are planned. An example of a partial rerun would be when seveal normal and cancer filtering parameters are applied to the same cohort. Choosing not to save intermediate files will trade speed for disk space.", action="store_true", default=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type seveal --> several

@@ -168,8 +226,12 @@ def parse_arguments(argv):
sys.stdout.write("------------------------------ MHCBIND IMMUNOPEPPER USAGE ------------------------------ \n \n ")
parser_mhcbind.print_help()
sys.stdout.write("\n------------------------------ MHCTOOLS AVAILABLE COMMAND LINE OPTIONS ------------------------------ \n \n ")
parser_mhc = make_mhc_arg_parser(prog="mhctools",description=("Predict MHC ligands from protein sequences"))
from mhctools.mhctools.cli.args import make_mhc_arg_parser
parser_mhc = make_mhc_arg_parser(prog="mhctools",description=("Predict MHC ligands from protein sequences")) #TODO: uncmment this line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete comment?

handlers=handlers,
format="%(asctime)-15s %(levelname)-8s %(message)s")

#stdout_handler.setFormatter(logging.Formatter("%(asctime)-15s %(levelname)-8s %(message)s"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments

logging.info("Command line"+str(arg))
if mode == 'build':
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these pass statements here and below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants