Feature/refactor #85

lthp · 2023-10-11T13:55:24Z

This contains

All the documentation branches (pepQuery, documentation_merge)
some refactoring on the cancer specific mode

… individual sample. Large scale changes

…eground. #TODO kmers solely in annotation

…eground. fix

… individual sample. fixing variable name

… individual sample. fixing mutation mode

akahles

This looks mostly good to me. I suggested a few things for the documentation. There is nothing that I think is essential to be addressed before merging. A few nice to haves.
In general, it was hard to review all the "business logic" without context. I will go over the code in its entirety and might make suggestions via a separate PR.

akahles · 2024-01-29T10:50:38Z

immunopepper/ip.py

+    required.add_argument("--kmer", type=int, help="length of the kmers for kmer output.", required=True, default=9)
+
+    submodes = parser.add_argument_group('Submodes parameters', 'Commands for conceptual information about the processing.')
+    submodes.add_argument("--libsize-extract",help="Set this parameter to True to generate library sizes and gene quantifications and skip neontigen generation. **Note:** If set to True, the program will only output files 3 and 7 of the :ref:`build output section <build_out>`.",action="store_true", required=False, default=False)


the formulation of this is a bit misleading. One can not really set this to true or false, but one rather chooses this option or not.

akahles · 2024-01-29T10:50:53Z

immunopepper/ip.py

+
+    submodes = parser.add_argument_group('Submodes parameters', 'Commands for conceptual information about the processing.')
+    submodes.add_argument("--libsize-extract",help="Set this parameter to True to generate library sizes and gene quantifications and skip neontigen generation. **Note:** If set to True, the program will only output files 3 and 7 of the :ref:`build output section <build_out>`.",action="store_true", required=False, default=False)
+    submodes.add_argument("--all-read-frames", help="Set this parameter to True to switch to exhaustive translation and study all possible reading frames instead of just the annotated ones in the annotation file.", action="store_true", required=False, default=False)


see comment before on options

akahles · 2024-01-29T10:52:45Z

immunopepper/ip.py

+    submodes.add_argument("--libsize-extract",help="Set this parameter to True to generate library sizes and gene quantifications and skip neontigen generation. **Note:** If set to True, the program will only output files 3 and 7 of the :ref:`build output section <build_out>`.",action="store_true", required=False, default=False)
+    submodes.add_argument("--all-read-frames", help="Set this parameter to True to switch to exhaustive translation and study all possible reading frames instead of just the annotated ones in the annotation file.", action="store_true", required=False, default=False)
+    submodes.add_argument("--count-path", help="Absolute path for the second output of `SplAdder <https://github.com/ratschlab/spladder>`_ containing the graph expression quantification. If provided, expression quantification of genes will take place. **Format:** hdf5.", required=False, default=None)
+    submodes.add_argument("--output-samples", nargs='+', help="List of sample names to output. **Note:** *Names should match the file name of the splice graphs. If not provided all samples are processed and program runs faster.*", required=False, default=[])


Why is the program slower if fewer files are processed?

akahles · 2024-01-29T10:53:25Z

immunopepper/ip.py

+    submodes.add_argument("--heter-code", type=int, help="It specifies the heterozygous allele.", default=0, choices = ['0', '2'])  #TODO: Add more info about this parameter?
+
+    parameters = parser.add_argument_group('Technical parameters' , 'Commands for optimization of the software.')
+    parameters.add_argument("--compressed", help="Compress output files", action="store_true", default=True)


Is there a good reason not to have compressed output as default?

akahles · 2024-01-29T10:56:38Z

immunopepper/ip.py

+
+    subset = parser.add_argument_group('Subset parameters', 'Commands to select a subset of the genes to be processed.')
+    subset.add_argument("--process-chr", nargs='+',help="List of chromosomes to be processed. If not provided all chromosomes are processed. The chromosomes names should be provided in the same format as in FASTA and annotation files. For annotations downloaded from GENCODE, this format is **chrX**, X being the chromosome number.", required=False, default=None)
+    subset.add_argument("--complexity-cap", type=int, help="Maximum edge complexity of the graph to be processed. If not provided all graphs are processed.", required=False, default=None)


Without further description, a user would not know what a suitable threshold is here.

akahles · 2024-01-29T14:05:19Z

immunopepper/ip.py

+    crf.add_argument("--n-samples-lim-cancer", type=int, help="This parameter corresponds to the number of samples threshold in cohort specific filtering. It indicated the minimum number of cancer samples in which one should see an expression higher than `--cohort-expr-support-cancer` in order to consider the kmer as a cancer candidate. Kmers with an expression higher than `--cohort-expr-support-cancer` in at least `--n-samples-lim-cancer` samples will be considered as cancer candidates. For each cancer sample of interest, provided under `--ids_cancer_samples`, the expression threshold `--cohort-expr-support-cancer` will be assessed in the rest of the cohort, excluding the sample of interest.", required=False, default=None)
+    crf.add_argument("--path-cancer-matrix-segm", nargs='+', help="Path to the cancer matrix containing segment expression from samples belonging to a cohort. The matrix will have the following dimensions: [kmers * samples]. When only the junction overlapping cancer kmers are of interest, the user should provide only `--path-cancer-matrix-edge`, and skip the inclusion of this file.If both matrices are provided, junction expression will be chosen in case there is expression information for the same kmer in both matrices. This will be the output 5 of :ref:`build output section <build_out>`", required=False, default=None)
+    crf.add_argument("--path-cancer-matrix-edge", nargs='+', help="Path to the cancer matrix containing junction expression from samples belonging to a cohort. The matrix will have the following dimensions: [kmers * samples]. When only the junction overlapping cancer kmers are of interest, the user should provide only this file, and skip the inclusion of `--path-cancer-matrix-segm`. If both matrices are provided, junction expression will be chosen in case there is expression information for the same kmer in both matrices. This will be the ouput 6 of :ref:`build output section <build_out>`.", required=False, default=None)
+    crf.add_argument("--cancer-support-union", help="Parameter to choose how the sample specific filtering and the cohort specific filtering are combined. By default, they are combined by choosing the common kmers to both filtering steps, i.e. performing an intersection. If this parameter is set to True, the union of both filtering steps will be performed, i.e. the kmers that pass either the sample specific filtering or the cohort specific filtering will be kept.", action="store_true", required=False, default=False)


see comment on options above - cannot be set to true

akahles · 2024-01-29T14:12:36Z

immunopepper/ip.py

+    development = parser.add_argument_group('Optional development parameters')
+    development.add_argument("--tot-batches", type=int, help="If selected, the filtering of the background and foreground will be based on hash functions. This parameter will set the total number of batches in which we will divide the foreground and background files to filter, and each of those batches will be assigned a hash value. If `--batch-id` is specified, `--tot-batches` should also be specified.", required=False, default=None)
+    development.add_argument("--batch-id", type=int, help="If selected, the filtering of the background and foreground will be based on hash functions. This parameter will set the batch id of the current batch that is being filtered. The batch id should be an integer between 0 and `--tot-batches`. It shows the specific batch that we want to process, out of the `--tot-batches`. If `--batch-id` is specified, `--tot-batches` should also be specified.", required=False, default=None)
+    development.add_argument("--on-the-fly", help="If set to true, all the filtering steps will be done on the fly, without the creation of intermediate files. The creation of intermediate files would speed up the computations in the case where full or partial reruns are planned. An example of a partial rerun would be when seveal normal and cancer filtering parameters are applied to the same cohort. Choosing not to save intermediate files will trade speed for disk space.", action="store_true", default=False)


type seveal --> several

akahles · 2024-01-29T14:14:38Z

immunopepper/ip.py

@@ -168,8 +226,12 @@ def parse_arguments(argv):
            sys.stdout.write("------------------------------ MHCBIND IMMUNOPEPPER USAGE ------------------------------ \n \n ")
            parser_mhcbind.print_help()
            sys.stdout.write("\n------------------------------ MHCTOOLS AVAILABLE COMMAND LINE OPTIONS ------------------------------ \n \n ")
-            parser_mhc = make_mhc_arg_parser(prog="mhctools",description=("Predict MHC ligands from protein sequences"))
+            from mhctools.mhctools.cli.args import make_mhc_arg_parser
+            parser_mhc = make_mhc_arg_parser(prog="mhctools",description=("Predict MHC ligands from protein sequences")) #TODO: uncmment this line


delete comment?

akahles · 2024-01-29T14:15:09Z

immunopepper/ip.py

+                         handlers=handlers,
+                         format="%(asctime)-15s %(levelname)-8s %(message)s")
+
+    #stdout_handler.setFormatter(logging.Formatter("%(asctime)-15s %(levelname)-8s %(message)s"))


remove comments

akahles · 2024-01-29T14:15:44Z

immunopepper/ip.py

    logging.info("Command line"+str(arg))
    if mode == 'build':
+        pass


why are these pass statements here and below

lthp added 15 commits August 30, 2023 14:01

added the filter cancer usecase for development

771a280

refactor cancerspecif

2497a4a

created a preprocess generic function for preprocess

3f44041

create input to mode function

8abfd51

output path naming function

167ea52

feature: optional removal of normals

d20f027

cleaning

09d1b6d

feature: cancer matrix with all samples pre-processed first

fb357e5

feature: allow the processing of all cancer samples without filtering…

a9b5947

… individual sample. Large scale changes

feature: make restrict the background to the kmers present in the for…

d3ce9df

…eground. #TODO kmers solely in annotation

feature: make restrict the background to the kmers present in the for…

a3958a5

…eground. fix

output path naming function. fix

12913d1

feature: allow the processing of all cancer samples without filtering…

e595397

… individual sample. fixing variable name

feature: allow the processing of all cancer samples without filtering…

a6c9b21

… individual sample. fixing mutation mode

minor mutation mode change

2bc73b9

lthp requested a review from akahles October 11, 2023 13:55

lthp added 6 commits October 26, 2023 11:21

fix indentation

d381d8d

parsers: remove deprecated argument

4af8eb5

replace main parser with readthedoc compatible version

e14a1d4

update readthedocs configuration

dd373d4

correct typo

3b2786e

pepquery wrapper: correct issue in paths declaration

aba7196

akahles approved these changes Jan 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/refactor #85

Feature/refactor #85

lthp commented Oct 11, 2023

akahles left a comment

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

akahles Jan 29, 2024

Feature/refactor #85

Are you sure you want to change the base?

Feature/refactor #85

Conversation

lthp commented Oct 11, 2023

akahles left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment