diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000..1291169a --- /dev/null +++ b/LICENSE @@ -0,0 +1,355 @@ +(C) University of the Witwatersrand, Johannesburg, 2016-2018 on behalf of the H3ABioNet Consortium + +This software is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence. + + +You are free to: + + Share — copy and redistribute the material in any medium or format + Adapt — remix, transform, and build upon the material + for any purpose, even commercially. + +This licence is acceptable for Free Cultural Works. + +The licensor cannot revoke these freedoms as long as you follow the license +terms. + + Attribution — You must give appropriate credit, provide a link to the + license, and indicate if changes were made. You may do so in any reasonable + manner, but not in any way that suggests the licensor endorses you or your + use. + + No additional restrictions — You may not apply legal terms or technological + measures that legally restrict others from doing anything the license + permits. + +Notices: + + You do not have to comply with the license for elements of the material in + the public domain or where your use is permitted by an applicable exception + or limitation. No warranties are given. The license may not give you all of + the permissions necessary for your intended use. For example, other rights + such as publicity, privacy, or moral rights may limit how you use the + material. + + +In detail + +Creative Commons Attribution 4.0 International Public License + +By exercising the Licensed Rights (defined below), You accept and agree to be +bound by the terms and conditions of this Creative Commons Attribution 4.0 +International Public Licence ("Public Licence"). To the extent this Public +Licence may be interpreted as a contract, You are granted the Licensed Rights in +consideration of Your acceptance of these terms and conditions, and the Licensor +grants You such rights in consideration of benefits the Licensor receives from +making the Licensed Material available under these terms and conditions. + +Section 1 – Definitions. + + Adapted Material means material subject to Copyright and Similar Rights that + is derived from or based upon the Licensed Material and in which the + Licensed Material is translated, altered, arranged, transformed, or + otherwise modified in a manner requiring permission under the Copyright and + Similar Rights held by the Licensor. For purposes of this Public License, + where the Licensed Material is a musical work, performance, or sound + recording, Adapted Material is always produced where the Licensed Material + is synched in timed relation with a moving image. + + + Adapter's License means the license You apply to Your Copyright and Similar + Rights in Your contributions to Adapted Material in accordance with the + terms and conditions of this Public License. + + Copyright and Similar Rights means copyright and/or similar rights closely + related to copyright including, without limitation, performance, broadcast, + sound recording, and Sui Generis Database Rights, without regard to how the + rights are labeled or categorized. For purposes of this Public License, the + rights specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + Effective Technological Measures means those measures that, in the absence + of proper authority, may not be circumvented under laws fulfilling + obligations under Article 11 of the WIPO Copyright Treaty adopted on + December 20, 1996, and/or similar international agreements. + + Exceptions and Limitations means fair use, fair dealing, and/or any other + exception or limitation to Copyright and Similar Rights that applies to Your + use of the Licensed Material. + + Licensed Material means the artistic or literary work, database, or other + material to which the Licensor applied this Public License. + + Licensed Rights means the rights granted to You subject to the terms and + conditions of this Public License, which are limited to all Copyright and + Similar Rights that apply to Your use of the Licensed Material and that the + Licensor has authority to license. + + Licensor means the individual(s) or entity(ies) granting rights under this + Public License. + + Share means to provide material to the public by any means or process that + requires permission under the Licensed Rights, such as reproduction, public + display, public performance, distribution, dissemination, communication, or + importation, and to make material available to the public including in ways + that members of the public may access the material from a place and at a + time individually chosen by them. + + Sui Generis Database Rights means rights other than copyright resulting from + Directive 96/9/EC of the European Parliament and of the Council of 11 March + 1996 on the legal protection of databases, as amended and/or succeeded, as + well as other essentially equivalent rights anywhere in the world. + + "You" means the individual or entity exercising the Licensed Rights under + this Public License. Your has a corresponding meaning. + +Section 2 – Scope. + + License grant. + + Subject to the terms and conditions of this Public License, the Licensor + hereby grants You a worldwide, royalty-free, non-sublicensable, + non-exclusive, irrevocable license to exercise the Licensed Rights in + the Licensed Material to: + + reproduce and Share the Licensed Material, in whole or in part; and + produce, reproduce, and Share Adapted Material. + + + Exceptions and Limitations. For the avoidance of doubt, where Exceptions and + Limitations apply to Your use, this Public License does not apply, and You + do not need to comply with its terms and conditions. + + Term. The term of this Public License is specified in Section 6(a). Media and + formats; technical modifications allowed. The Licensor authorizes You to + exercise the Licensed Rights in all media and formats whether now known + or hereafter created, and to make technical modifications necessary to + do so. The Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications necessary to + exercise the Licensed Rights, including technical modifications + necessary to circumvent Effective Technological Measures. For purposes + of this Public License, simply making modifications authorized by this + Section 2(a)(4) never produces Adapted Material. + + Downstream recipients. + + Offer from the Licensor – Licensed Material. Every recipient of the + Licensed Material automatically receives an offer from the Licensor + to exercise the Licensed Rights under the terms and conditions of + this Public License. + + No downstream restrictions. You may not offer or impose any + additional or different terms or conditions on, or apply any + Effective Technological Measures to, the Licensed Material if doing + so restricts exercise of the Licensed Rights by any recipient of the + Licensed Material. + + No endorsement. Nothing in this Public License constitutes or may be + construed as permission to assert or imply that You are, or that Your + use of the Licensed Material is, connected with, or sponsored, endorsed, + or granted official status by, the Licensor or others designated to + receive attribution as provided in Section 3(a)(1)(A)(i). + + + + Other rights. + + Moral rights, such as the right of integrity, are not licensed under + this Public License, nor are publicity, privacy, and/or other similar + personality rights; however, to the extent possible, the Licensor waives + and/or agrees not to assert any such rights held by the Licensor to the + limited extent necessary to allow You to exercise the Licensed Rights, + but not otherwise. + + Patent and trademark rights are not licensed under this Public License. + + To the extent possible, the Licensor waives any right to collect + royalties from You for the exercise of the Licensed Rights, whether + directly or through a collecting society under any voluntary or waivable + statutory or compulsory licensing scheme. In all other cases the + Licensor expressly reserves any right to collect such royalties. + + + +Section 3 – License Conditions. + + + +Your exercise of the Licensed Rights is expressly made subject to the following +conditions. + + + + Attribution. + + + + If You Share the Licensed Material (including in modified form), You + must: + + retain the following if it is supplied by the Licensor with the + Licensed Material: + + identification of the creator(s) of the Licensed Material and + any others designated to receive attribution, in any reasonable + manner requested by the Licensor (including by pseudonym if + designated); + + a copyright notice; + + a notice that refers to this Public License; + + a notice that refers to the disclaimer of warranties; + + a URI or hyperlink to the Licensed Material to the extent + reasonably practicable; + + indicate if You modified the Licensed Material and retain an + indication of any previous modifications; and + + indicate the Licensed Material is licensed under this Public + License, and include the text of, or the URI or hyperlink to, this + Public License. + + You may satisfy the conditions in Section 3(a)(1) in any reasonable + manner based on the medium, means, and context in which You Share the + Licensed Material. For example, it may be reasonable to satisfy the + conditions by providing a URI or hyperlink to a resource that includes + the required information. + + If requested by the Licensor, You must remove any of the information + required by Section 3(a)(1)(A) to the extent reasonably practicable. + + If You Share Adapted Material You produce, the Adapter's License You + apply must not prevent recipients of the Adapted Material from complying + with this Public License. + + + +Section 4 – Sui Generis Database Rights. + + + +Where the Licensed Rights include Sui Generis Database Rights that apply to Your +use of the Licensed Material: + + + + for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, + reuse, reproduce, and Share all or a substantial portion of the contents of + the database; + + if You include all or a substantial portion of the database contents in a + database in which You have Sui Generis Database Rights, then the database in + which You have Sui Generis Database Rights (but not its individual contents) + is Adapted Material; and + + You must comply with the conditions in Section 3(a) if You Share all or a + substantial portion of the contents of the database. + + + +For the avoidance of doubt, this Section 4 supplements and does not replace Your +obligations under this Public License where the Licensed Rights include other +Copyright and Similar Rights. + + + +Section 5 – Disclaimer of Warranties and Limitation of Liability. + + + + Unless otherwise separately undertaken by the Licensor, to the extent + possible, the Licensor offers the Licensed Material as-is and as-available, + and makes no representations or warranties of any kind concerning the + Licensed Material, whether express, implied, statutory, or other. This + includes, without limitation, warranties of title, merchantability, fitness + for a particular purpose, non-infringement, absence of latent or other + defects, accuracy, or the presence or absence of errors, whether or not + known or discoverable. Where disclaimers of warranties are not allowed in + full or in part, this disclaimer may not apply to You. + + To the extent possible, in no event will the Licensor be liable to You on + any legal theory (including, without limitation, negligence) or otherwise + for any direct, special, indirect, incidental, consequential, punitive, + exemplary, or other losses, costs, expenses, or damages arising out of this + Public License or use of the Licensed Material, even if the Licensor has + been advised of the possibility of such losses, costs, expenses, or + damages. Where a limitation of liability is not allowed in full or in part, + this limitation may not apply to You. + + + + The disclaimer of warranties and limitation of liability provided above + shall be interpreted in a manner that, to the extent possible, most closely + approximates an absolute disclaimer and waiver of all liability. + + + +Section 6 – Term and Termination. + + + + This Public License applies for the term of the Copyright and Similar Rights + licensed here. However, if You fail to comply with this Public License, then + Your rights under this Public License terminate automatically. + + + + Where Your right to use the Licensed Material has terminated under Section + 6(a), it reinstates: + + automatically as of the date the violation is cured, provided it is + cured within 30 days of Your discovery of the violation; or + + upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any right the + Licensor may have to seek remedies for Your violations of this Public + License. + + For the avoidance of doubt, the Licensor may also offer the Licensed + Material under separate terms or conditions or stop distributing the + Licensed Material at any time; however, doing so will not terminate this + Public License. + + Sections 1, 5, 6, 7, and 8 survive termination of this Public License. + + + +Section 7 – Other Terms and Conditions. + + + + The Licensor shall not be bound by any additional or different terms or + conditions communicated by You unless expressly agreed. + + Any arrangements, understandings, or agreements regarding the Licensed + Material not stated herein are separate from and independent of the terms + and conditions of this Public License. + + + +Section 8 – Interpretation. + + + + For the avoidance of doubt, this Public License does not, and shall not be + interpreted to, reduce, limit, restrict, or impose conditions on any use of + the Licensed Material that could lawfully be made without permission under + this Public License. + + To the extent possible, if any provision of this Public License is deemed + unenforceable, it shall be automatically reformed to the minimum extent + necessary to make it enforceable. If the provision cannot be reformed, it + shall be severed from this Public License without affecting the + enforceability of the remaining terms and conditions. + + No term or condition of this Public License will be waived and no failure to + comply consented to unless expressly agreed to by the Licensor. + + Nothing in this Public License constitutes or may be interpreted as a + limitation upon, or waiver of, any privileges and immunities that apply to + the Licensor or You, including from the legal processes of any jurisdiction + or authority. + diff --git a/README.md b/README.md index da1050f7..11ea7392 100644 --- a/README.md +++ b/README.md @@ -22,21 +22,19 @@ _Please ignore the Wiki in this version which refers to version 1_ ## Brief introduction -A short video overview of the pipeline can be found at http://www.bioinf.wits.ac.za/h3a/h3agwas.mp4 - +In addition to this README we have the following material available +* A short video overview of the pipeline can be found at http://www.bioinf.wits.ac.za/gwas/h3agwas.mp4 +* A handout from a lecture can be found at http://www.bioinf.wits.ac.za/gwas/gwas-comp-handout.pdf ### Restrictions This version has been run on real data sets and works. However, not all cases have been thoroughly tested. In particular * it is not robust when X chromosome data is not available * the reporting assumes you want to do batch/site analysis. If you don't the code works but the report may look a bit odd with some figures repeated. -* we haven't tested fully with Singularity + The previous version 1 stable branch was commit bfd8c5a (https://github.com/h3abionet/h3agwas/commit/bfd8c5a51ef85481e5590b8dfb3d46b5dd0cc77a) -There is one feature of the original workflow that has been omitted. Version 1 supported parallel GWAS anaysis of different data files in one Nextflow run. This has been removed. Although, not unuseful, this feature complicated the implementation and made expansion more difficulkt and also this capacity can be simulated easily at the operating system level. - -The previous version has dependancies on Perl and R, which have been removed. ## Outline of documentation @@ -47,8 +45,9 @@ The previous version has dependancies on Perl and R, which have been removed. 5. The QC pipeline: `plink-qc.nf` 6. A simple association testing pipeline: `plink-assoc.nf` 7. Converting Illumina genotyping reports to PLINK: `topbottom.nf` -8. Advanced options: Docker, PBS, Amazon EC2 -9. Auxiliary Programs +8. Advanced options: Docker, PBS, Singularity, Amazon EC2 +9. Dealing with errors +10. Auxiliary Programs # 1. Features @@ -261,6 +260,10 @@ Then fill in the details in the config that are required for your run. These are ## 4.3 Using the Excel spreadsheet template +**We plan on removing this -- it doesn't look like many people use this feature and it is very hard to keep in sync with the development of the workflow. If you think we are wrong and this is a useful feature please let us know by registering this an an issuse.** + +_Use of this is deprecated_ + For many users it may be convenient to use the Excel spreadsheet (config.xlsx and a read-only template file config.xlsx.template). This can be used just as an _aide-memoire_, but we also have an auxiliary program that converts the Excel spreadsheet into a config file. The program _config-gen/dist/config-gen.jar_ takes the spreadsheet and produces a config file. The spreadsheet has the following columns @@ -500,7 +503,10 @@ e.g. `params.output = "cvd-rawcalls"` * `chipdescription`: this is a standard file produced by Illumina for your chip which contains not only the chromosome/coordinate of each SNP but also the genomic position (measured in centimorgans). If you don't have this -- give the manifest file. All will work except your bim files will not contain genonomic positoin -* `samplesheet`: This is Excel spreadsheet that Illumina provides which details each perfson in the study for whom you have genotyping results. If you don't have it, you can set this variable to 0 or the empty string, in which case the output PLINK fam file will have unknown values for sex and phenotype. Alternatively, ifyou don't have it, you can make your own. There are three columns that are important: "Institute Sample Label", "Manifest Gender" and "Batch Comment". These must be there. The _label_ is the ID of the person. In the current workflow this ID is used for both the FID and IID. If you have a family study you may need to manually change the fam file. +* `samplesheet`: This is Excel spreadsheet or CSV (comma-separated only) that Illumina or a genotyping centre provides which details each perfson in the study for whom you have genotyping results. If you don't have it, you can set this variable to 0 or the empty string, in which case the output PLINK fam file will have unknown values for sex and phenotype. Alternatively, if you don't have it, you can make your own. + +There are three columns that are important: "Institute Sample Label", "Manifest Sex" and "Batch Comment". These must be there. The _label_ is the ID of the person. In the current workflow this ID is used for both the FID and IID. If you have a family study you may need to manually change the fam file. + Please note that *we expect all entries in the sample IDs etc to be alphanumeric 0-9, Latin letters (NO accents!), underscore, space, hyphen*. The code may break otherwise. @@ -509,6 +515,19 @@ the Illumina IDs in the sample ID are typically a long string some of the compo For example, suppose the ID as found in the Illumina input data is `WG0680781-DNA_A02_ABCDE`, if you use ".*_(.+)" as the idpat, then the FID IID used would be ABCDE ABCDE. If you used "(\\w+)_DNA_(\\w+)_" then the FID IIS used would be "WG0680781 A02". Note how we need to escape the backslash twice. + +Unfortunately we experience that genotyping centres have different formats and that you can even get the same centre changing the labels of columns of the report. Using the `sheet_columns` parameter you can make adjustmens. + +* `params.sheet_columns`: this should be a file name. The file should explain what the column labels in your sample sheet are. The format is shown in the example below, where the default values are given (if you are happy with all of them you don't need the `sheet_columns` parameter -- if you are happy with some of them only put the ones you want to change). Here we are saying that the _sex_ as provided by the manifest is found in a column called "Manifest Sex", the sample is found in a column "Institute Sample Label" and so on. The first four are required by the workflow. If you don't have batch information, you can define `batch` as 0 + +```` +sex=Manifest Sex +sample_label=Institute Sample Label +plate=Sample Plate +well=Well +batch=Batch Comment +``` + * `output_align`. This can be one of three values: _dbsnp_, _ref_, and _db2ref_. dnsnp and ref assume that the input is in TOP/BOT format. If dbsnp, the output will be aligned to the dbSNP report, if "ref", the output will be aligned to a given reference strand. Many of the SNPs will be flipped (e.g. an A/C SNP will become G/T; and A/T SNP will become T/A). _db2ref_ assumes the input is in FORWARD format and aligns to to the given reference genome. * `strandreport`: This is an Illumina-style strand report. It is not needed if you choose "ref" above, but it is needed for the others. @@ -521,10 +540,17 @@ A reference file suitable for the H3A chip can be found here http://www.bioinf.w * `samplesize`: This was included mainly for development purposes but _perhaps_ might be helpful to some users. This allows you sample only the first _n_ people in each genotype report. This allows you to extract out a small subset of the data for testing purposes. The default is 0, which means that *all* individuals will be generated. + +### Advanced features for sample handling + * `mask`: This is a file of sample IDs that you want excluded from your data. These should be IDs given in the _Institute Sample Label_ of the sample sheet. The file should contain at least one column, possibly with other columns white-space delimited. Only the first column is used the other columns are ignored. * `replicates`: This is a file of sample IDs that are biological replicates. You will often include biological replicates for genotyping -- the label as given in the _Institute Sample Label_ column will of course be different, but once you have extracted out the sample ID using the _idpat_ field above, all the replicates for the same individual will then have the same sample id. For samples that have replicates you should choose one of the samples to be the canonical one and then identify the others as being the replicates with the labels +* `newpat`: This is experimental and only should be used with care. Suppose, completely hypothetically, there's a sample mix-up. The person you called "X3RTY" is actually "UYT0AV" who is actually "R2D2" and so on. You can fix the sample-sheet but the genotyping calls still have the same (wrong values). If you have chosen your _Institute Sample_Label_ so that it contains both the ID and the plate and well then our scripts can help you. If not, good luck to you. + * set _idpat_ to a regular expression that gives as the plate ID as the FID and well as the IID. This will give you the id uniquely determined by the plate and the well. + * fix your sample sheet -- just fix the _Institute Sample Label_ field. Choose your _newpat_ as a regular expression that extracts out the correct ID from this. Our workflow will use the plate and well in your sample sheet to produce to match the plate/well from the genotype calling phase to the correct ID. (If you look in the working directory of the fixFam process the .command.out file will show you all the matches) + ## Output @@ -618,64 +644,21 @@ nextflow run plink-qc.nf -profile dockerSwarm ## 8.5 Singularity -The workflows run on Singularity thought this is currently experimental and we haven't tried and tested all options. We don't have first class support for Singularity yet, so you will have do some PT but it's not too bad. - -### Get the Singularity images - -You need to make get the Singularity images for the workflow you want. There are two options - -* running `docker2singularity` (https://github.com/singularityware/docker2singularity) - -This is a good option for people who have a computer with docker running (e.g., their desktop) and will then move the Singularity container to another computer which doesn't run cluster. - -An example run would be to create a directory _singularity_ somewhere and then run the _docker2singularity_ workflow. The singularity image will be put in the specified directory. For example, I did: - -``` - docker run -v /var/run/docker.sock:/var/run/docker.sock \ - -v /Users/scott/singularity:/output --privileged -t --rm \ - singularityware/docker2singularity quay.io/h3abionet_org/py3plink - docker run -v /var/run/docker.sock:/var/run/docker.sock \ - -v /Users/scott/singularity:/output --privileged -t --rm \ - singularityware/docker2singularity quay.io/h3abionet_org/h3agwas-texlive -``` - -* Using `singularity pull` - -This is easier, but the images are bigger - -``` -singularity pull --size 1880 docker://quay.io/h3abionet_org/py3plink -singularity pull --size 1880 docker://quay.io/h3abionet_org/h3agwas-texlive - -``` - -### Copy the images - -Move the images to the system you want to run the workflow on. If you're on a cluster, then this must be on a system-wide file system - -### Edit the nextflow.config file - -You need to edit this part of the _singularity_ stanza - -``` - sg_py3Image = "/home/scott/py3plink.img" - sg_latexImage = "/home/scott/h3agwas-texlive.img" - process.executor = 'pbs' - process.queue = 'batch' -``` +Our workflows should now run easily with Singularity. -The two image variables should be set to where you have put your singularity images. The `process.executor` variabel should be set to `local` if you want to run the workflow on the local computer, and to `pbs` if on a cluster using PBS. In the latter case, you should also set the queue variable appropriatelly. +`nextflow run plink-qc.nf -profile singularity` +or -## Run the workflow +`nextflow run plink-qc.nf -profile singularityPBS` -`nextflow run plink-qc.nf --profile singularity` +By default the user's ${HOME}/.singularity will be used as the cache for Singularity images. If you want to use something else, change the `singularity.cacheDir` parameter in the config file. ## 8.5 Other container services -We hope to support Singularity soon. We are unlikely to support udocker unless Nextflow does. See this link for a discussion https://www.nextflow.io/blog/2016/more-fun-containers-hpc.html +We are unlikely to support udocker unless Nextflow does. See this link for a discussion https://www.nextflow.io/blog/2016/more-fun-containers-hpc.html ## 8.6 Running on Amazon EC2 @@ -801,9 +784,74 @@ Note there are two uses of `-c`. The positions of these arguments are crucial. T The _scott.aws_ file is not shared or put under git control. The _nextflow.config_ and _run10.config_ files can be archived, put under git control and so on because you _want_ to share and archive this information with o thers. +#9. Dealing with errors + +One problem with our current workflow is that error messages can be obscure. Errors can be caused by +* bugs in our code +* you doing something odd +There are two related problems. When a Nextflow script fails for some reason, Nextflow prints out in _great_ detail what went wrong. Second, we don't always catch mistakes that the user makes gracefully. -# 9. Auxiliary Programs +First, don't panic. Take a breath and read through the error message to see if you can find a sensible error message there. + +A typical error message looks something like this + +``` +Command exit status: + 1 + +Command output: + (empty) + +Command error: + Traceback (most recent call last): + File ".command.sh", line 577, in + bfrm, btext = getBatchAnalysis() + File ".command.sh", line 550, in getBatchAnalysis + result = miss_vals(ifrm,bfrm,args.batch_col,args.sexcheck_report) + File ".command.sh", line 188, in miss_vals + g = pd.merge(pfrm,ifrm,left_index=True,right_index=True,how='inner').groupby(pheno_col) + File "/usr/local/python36/lib/python3.6/site-packages/pandas/core/generic.py", line 5162, in groupby + **kwargs) + File "/usr/local/python36/lib/python3.6/site-packages/pandas/core/groupby.py", line 1848, in groupby + return klass(obj, by, **kwds) + File "/usr/local/python36/lib/python3.6/site-packages/pandas/core/groupby.py", line 516, in __init__ + mutated=self.mutated) + File "/usr/local/python36/lib/python3.6/site-packages/pandas/core/groupby.py", line 2934, in _get_grouper + raise KeyError(gpr) + +Column 'batches' unknown + +Work dir: + /project/h3abionet/h3agwas/test/work/cf/335b6d21ad75841e1e806178933d3d + +Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume` + + -- Check '.nextflow.log' file for details +WARN: Killing pending tasks (1) + +``` + +Buried in this is an error message that might help (did you say there was a column _batches_ in the manifest?) If you're comfortable, you can change directory to the specified directory and explore. There'll you find +* Any input files for the process that failed +* Any output files that might have been created +* The script that was executed can be found in `.command.sh` +* Output and error can be found as `.command.out` and `.command.err` + +If you spot the error, you can re-run the workflow (from the original directory), appending `-resume`. Nextflow will re-run your workflow as needed -- any steps that finished successfully will not need to be re-run. + +If you are still stuck you can ask for help at two places + + +* H3ABioNet Help desk --- https://www.h3abionet.org/support + + +* On GitHub -- need a GitHub account if you have a GitHub account + + https://github.com/h3abionet/h3agwas/issues + + +# 10. Auxiliary Programs These are in the aux directory @@ -841,7 +889,9 @@ Scott Hazelhurst, Lerato E. Magosi, Shaun Aron, Rob Clucas, Eugene de Beste, Abo We thank Harry Noyes from the University of Liverpool and Ayton Meintjes from UCT who both spent significant effort being testers of the pipleine. ### License -h3agwas offered under the MIT license. See LICENSE.txt. +h3agwas offered under the +This software is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence. + ### Download diff --git a/aux/sheetDups.py b/aux/sheetDups.py new file mode 100644 index 00000000..aab31046 --- /dev/null +++ b/aux/sheetDups.py @@ -0,0 +1,207 @@ +#!/usr/bin/env python3 + +import argparse +import openpyxl +import sys +import re +import shutil +import os +import pandas as pd +# Expects four arguments +# -- xlsx file with the data +# -- the original fam file +# -- the output name (just the base -- e.g. chip, not chip.fam) +# -- abbrev? If "0" then the IDs are used as provided in the samplesheet +# If "1", we remove everything up to the last underscore in the name + + + +def parseArguments(): + parser=argparse.ArgumentParser() + parser.add_argument('--batch-col', dest="batch_col", type=str, metavar='batch',\ + help="batch column"), + parser.add_argument('samplesheet', type=str, metavar='samplesheet',nargs='+'), + parser.add_argument('--bad-batch', dest="bad_batch", default=False) + parser.add_argument('--phe', dest="phe", type=str, metavar='phe',\ + help="phe"), + parser.add_argument('--output', dest="output", type=str, metavar='fname',\ + help="output base"), + parser.add_argument('--idpat', dest="idpat", type=str, metavar='idpat',\ + help="abbreviate IDs"), + parser.add_argument('--out_idpat', dest="out_idpat", type=str, metavar='out_idpat',\ + help="abbreviate IDs"), + args = parser.parse_args() + return args + + +# If called with no parameters, we assume using Nexflow's template mechanism and the parameters +# are substituted in with fixed names +if len(sys.argv)<=1: + sys.argv = ["sheet2fam.py","$samplesheet","$fam", "$batch_col", "$output","$idpat"] + + +# we avoid backslashes +TAB=chr(9) +EOL=chr(10) + +def getHeading(allrows): + heading = list(map(lambda x: x.value, allrows.__next__())) + col_id=heading.index("Institute Sample Label") + call = heading.index("Call_Rate") + if "Institute Plate Label" in heading: + label = "Institute Plate Label" + else: + label = "Sample Plate" + plate = heading.index(label) + well = heading.index("Well") + # Now we get the sex -- used to be labelled "Gender" but we asked Illumina to change but old sheets will + # have this + if "Manifest Gender" in heading: + sex_colname = "Manifest Gender" + sex_geno = "Genotype Gender" + elif "Manifest Sex" in heading: + sex_colname = "Manifest Sex" + sex_geno = "Genotyped Sex" + else: + sys.exit("Can't find manifest sex column in sample sheet <%s>"%args.samplesheet) + col_sex=heading.index(sex_colname) + sex_geno_col = heading.index(sex_geno) + if args.batch_col in ["0",0,False,"false",None,""]: + col_batch=-1 + else: + col_batch=heading.index(args.batch_col) + return (col_id, col_sex, sex_geno_col, col_batch,call,plate,well) + +def sex_code(x): + if x == "Male": + return "1" + elif x == "Female": + return "2" + else: + return "0" + + +def parseSheet(allrows,indivs,sofar,problems): + [col_id, col_sex, col_geno_sex, col_batch,call,plate,well] = getHeading(allrows) + batch="-9" + for row in allrows: + sample_id = raw_id = row[col_id].value + if sample_id in exclusions : pass + if col_batch>0: + batch = row[col_batch].value + m=re.search("Batch (.+)",batch) + if m: + batch = m.group(1) + m = re.search(args.out_idpat,sample_id) + if m: + outid = m.group(1) + else: + sys.exit("Out ID PAT does not match on "+sample_id) + if args.idpat not in [0, "0", False, ""]: + m = re.search(args.idpat,sample_id) + if m: + sample_id=m.groups() + if len(sample_id)==1: sample_id=sample_id+sample_id + else: + print("Sample ID <%s> cannot be abbrev"%sample_id) + else: + sample_id=(sample_id,sample_id) + data = [sample_id,raw_id, row[col_sex].value, row[col_geno_sex].value,\ + batch, row[call].value, row[plate].value, row[well].value] + if outid in sofar: + if outid in problems: + problems[outid] = problems[outid]+[data] + else: + problems[outid] = [sofar[outid],data] + sofar[outid]=data + indivs[outid]=1 + + + + + + +def createReplicates(fname,fd,problems): + g = open("%s.rep"%fname,"w") + h = open("%s.err"%fname,"w") + m = open("%s.miss"%fname,"w") + allf = open("%s.all"%fname,"w") + for k in sorted(problems.keys()): + try: + fam_sex = str(fd.loc[k]['sex']) + except KeyError: + fam_sex = 0 + best=-1 + rate=0 + for i, v in enumerate(problems[k]): + sex_ok = (fam_sex==sex_code(v[3])) or (sex_code(v[3])==sex_code(v[2])) and ("0"==fam_sex) + if (v[5]>rate) and sex_ok : + rate=v[5] + best=i + dup=edup=1 + ok=False + errs=[] + allf.write('%s '%k) + allf.write(' '.join(map(str,sofar[k][1:]))+EOL) + for i, v in enumerate(problems[k]): + if i==best: + ok=True + continue + sex_ok = (fam_sex==sex_code(v[3])) or (sex_code(v[3])==sex_code(v[2])) and ("0"==fam_sex) + if sex_ok: + g.write("%s%s%s%s%s%s%d%s"%(v[0][0],TAB,v[0][1],TAB,v[1],TAB,dup,EOL)) + allf.write("%s_replicate_%d "%(k,dup)+" ".join(map(str,v[1:]))+EOL) + dup=dup+1 + else: + errs.append("%s%s%s%s%s%s%d%s"%(v[0][0],TAB,v[0][1],TAB,v[1],TAB,edup,EOL)) + edup=edup+1 + if not ok: + m.write(errs[0]) + del errs[0] + h.writelines(errs) + for x in exclusions: + h.write("%s\t%s\n"%(x,x)) + g.close() + h.close() + m.close() + allf.close() + + + +args = parseArguments() + + +indivs = {} +problems = {} +sofar = {} + +if args.bad_batch: + f = open(args.bad_batch) + exclusions = set(map(lambda x:x.strip(), f.readlines())) +else: + exclusions=[] + +for name in args.samplesheet: + xlsxf = openpyxl.load_workbook(name) + sheet_names = xlsxf.sheetnames + allrows = xlsxf[sheet_names[0]].rows + parseSheet(allrows,indivs,sofar,problems) +xlsxf.close() + +if args.phe: + fd = pd.read_csv(args.phe,delim_whitespace=True,index_col="FID") +else: + fd = pd.DataFrame.from_dict(indivs,orient='index',columns=["sex"]) + + + +if len(problems)>0: + print("ID,Sample Label,Manifest Sex,Genotype Sex,Batch,Call Rate,Plate,Well") + for k in sorted(problems.keys()): + #print(problems[k]) + pass + if args.output: + createReplicates(args.output,fd,problems) + + + diff --git a/bin/checkRef.py b/bin/checkRef.py index 98d6f648..10eddc53 100755 --- a/bin/checkRef.py +++ b/bin/checkRef.py @@ -1,4 +1,8 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details # we allow the reference file to be in multiple format # 1. Simple: diff --git a/bin/extractPheno.py b/bin/extractPheno.py index 82072f91..500d6ad0 100755 --- a/bin/extractPheno.py +++ b/bin/extractPheno.py @@ -1,4 +1,8 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details import sys import pandas as pd diff --git a/bin/fill_in_bim.py b/bin/fill_in_bim.py index e649d71b..a967dbc1 100755 --- a/bin/fill_in_bim.py +++ b/bin/fill_in_bim.py @@ -1,5 +1,9 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details import pandas as pd import argparse diff --git a/bin/plinkDraw.py b/bin/plinkDraw.py index 0e66fb52..970400ad 100755 --- a/bin/plinkDraw.py +++ b/bin/plinkDraw.py @@ -1,5 +1,10 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details + import pandas as pd import sys import numpy as np diff --git a/bin/qc1logextract.py b/bin/qc1logextract.py index ee0ca3ca..76d117fa 100755 --- a/bin/qc1logextract.py +++ b/bin/qc1logextract.py @@ -1,4 +1,8 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details from __future__ import print_function diff --git a/bin/sheet2fam.py b/bin/sheet2fam.py index d37c60f9..ce9b547c 100755 --- a/bin/sheet2fam.py +++ b/bin/sheet2fam.py @@ -1,4 +1,8 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details import argparse import openpyxl @@ -42,9 +46,9 @@ def parseArguments(): TAB=chr(9) EOL=chr(10) +legal_columns = ['Institute Sample Label','Sample Plate','Well','Manifest Sex',"Batch Comment"] +column_index = ['sample_label','plate','well','sex','batch'] -legal_columns = ['sex_column_name','Institute Sample Label','Sample Plate','Well',0] -column_index = ['sex','sample_label','plate','well','batch'] def getSheetColumnMaps(fname): column = dict(zip(column_index,legal_columns)) @@ -67,7 +71,7 @@ def getSheetColumnMaps(fname): def extractCol(heading,col): try: - if ".xls" in args.samplesheet == "excel": + if ".xls" in args.samplesheet: result = heading.index(column[col]) else: result = column[col] @@ -77,7 +81,7 @@ def extractCol(heading,col): def getHeading(allrows): # refactor to use pandas instead of openpyxl - if ".xls" in args.samplesheet == "excel": + if ".xls" in args.samplesheet: heading = list(map(lambda x: x.value, allrows.__next__())) else: heading = allrows @@ -131,18 +135,20 @@ def parseSheet(allrows): problems = {} indivs = {} batch="-9" - if ".xls" in args.samplesheet: - def getVal(row,col): - return row[col].value.replace(" ","") - else: - def getVal(row,col): - return row[col].replace(" ","") + def getVal(row,col): + try: + if ".xls" in args.samplesheet: + return row[col].value.replace(" ","") + else: + return row[col].replace(" ","") + except KeyError as e: + print(EOL+"<%s> is not a column of the sample sheet"%col+EOL) for row in allrows: raw_id = getVal(row,col_id) (fid,iid)= getID(args.idpat, raw_id) if (fid,iid) in masks: continue - if args.newpat not in null_values and (fid != getVal(row[col_plate]) or iid != getVal(row[col_well])): - sys.exit("Unhappy about this row ",raw_id,fid,iid,getVal(row[col_plate]),getVal(row[col_well])) + if args.newpat not in null_values and (fid != getVal(row,col_plate) or iid != getVal(row,col_well)): + sys.exit("Unhappy about this row ",raw_id,fid,iid,getVal(row,col_plate),getVal(row,col_well)) (real_fid, real_iid) = getID(args.newpat, raw_id) if (fid,iid) in replicates: real_fid = real_fid + "_replicate_" + replicates[(fid,iid)] @@ -152,12 +158,14 @@ def getVal(row,col): else: problems[real_fid] = sofar[real_fid]+","+raw_id sofar[real_fid]=raw_id - sample_sex = getVal(row[col_sex]) - if col_batch>0: - batch = getVal(row[col_batch]) + sample_sex = getVal(row,col_sex) + if col_batch not in null_values: + batch = getVal(row,col_batch) m=re.search("Batch (.+)",batch) if m: batch = m.group(1) + else: + batch=batch.replace(" ","") indivs[(fid,iid)] = [real_fid,real_iid,sample_sex,batch] if len(problems)>0: print(EOL+EOL+"==============================================="+EOL+EOL) @@ -176,7 +184,10 @@ def produceFam(indivs,problems,origfam): for sample_id in origfam: try: (fid,iid) = getID(args.idpat, sample_id) + ofid,oiid = fid,iid [fid,real_id,sample_sex,batch] = indivs[(fid,iid)] + if (ofid,oiid) != (fid,real_id): + print("<%s,%s> ---> <%s,%s>"%(ofid,oiid,fid,real_id)) if real_id in problems: print("The ID <%s> with fid, iid, real_id <%s> <%s> <%s> is a duplicate"%(sample_id,fid,iid,real_id)) sys.exit(124) diff --git a/bin/topbot2plink.py b/bin/topbot2plink.py index d2bc3fba..b1ad7784 100755 --- a/bin/topbot2plink.py +++ b/bin/topbot2plink.py @@ -8,6 +8,10 @@ # - the base name of the PLINK output files # The output l +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details from __future__ import print_function diff --git a/nextflow.config b/nextflow.config index 44d2963a..51e30a5e 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,8 +1,8 @@ -py3Image = "h3abionet_org/py3plink" -gemmaImage="h3abionet_org/h3agwas-gemma" -latexImage="h3abionet_org/h3agwas-texlive" +py3Image = "quay.io/h3abionet_org/py3plink" +gemmaImage="quay.io/h3abionet_org/h3agwas-gemma" +latexImage="quay.io/h3abionet_org/h3agwas-texlive" swarmPort = '2376' queue = 'batch' @@ -120,7 +120,7 @@ profiles { pbsDocker { process.executor = 'pbs' - container = py3Image + process.container = py3Image process.executor = 'local' process.$produceReports.container =latexImage process.$computeTest.container = "$gemmaImage" @@ -136,14 +136,13 @@ profiles { // Execute pipeline with Docker locally docker { process.executor = 'local' - process.container = py3Image - process.$produceReports.container =latexImage - process.$computeTest.container = "$gemmaImage" - process.$doReport.container =latexImage + //process.$produceReports.container =latexImage + //process.$computeTest.container = "$gemmaImage" + //process.$doReport.container =latexImage docker.remove = true docker.runOptions = '--rm' - docker.registry = 'quay.io' + //docker.registry = 'quay.io' docker.enabled = true docker.temp = 'auto' docker.fixOwnership= true @@ -185,24 +184,41 @@ profiles { docker.engineOptions = "-H :$swarmPort" } + singularity.cacheDir = "${HOME}/.singularity" singularity { + singularity.autoMounts = true + singularity.enabled = true + process.executor = 'local' + process.queue = queue - sg_py3Image = "/home/scott/py3plink.img" - sg_latexImage = "/home/scott/h3agwas-texlive.img" - + } - enabled = true - process.executor = 'pbs' + singularityPBS { + singularity.autoMounts = true + singularity.enabled = true + process.executor = 'local' process.queue = queue - container = sg_py3Image - process.$produceReports.container = sg_latexImage - } } + + +process { + container = py3Image + + withLabel:latex { + container = latexImage + } + + withLabel: gemma { + container = gemmaImage + } + +} + timeline { enabled=true file = "nextflow_reports/timeline.html" diff --git a/plink-assoc.nf b/plink-assoc.nf index 4bf07b75..555a4a00 100755 --- a/plink-assoc.nf +++ b/plink-assoc.nf @@ -5,6 +5,7 @@ * * * Scott Hazelhurst + * Jean-Tristan Brandenburg * Shaun Aron * Rob Clucas * Eugene de Beste @@ -13,6 +14,8 @@ * On behalf of the H3ABionet Consortium * 2015-2018 * + *(C) University of the Witwatersrand, Johannesburg, 2016-2018 on behalf of the H3ABioNet Consortium + *This is licensed under the Creative Commons Attribution 4.0 International Licence. See the "LICENSE" file for details * * Description : Nextflow pipeline for Wits GWAS. * @@ -528,6 +531,7 @@ else report_ch = report_ch.mix(report_pca_ch) process doReport { + label 'latex' input: file(reports) from report_ch.toList() publishDir params.output_dir diff --git a/plink-qc.nf b/plink-qc.nf index 77e86be0..d41051b4 100755 --- a/plink-qc.nf +++ b/plink-qc.nf @@ -18,6 +18,8 @@ * * Description : Nextflow pipeline for Wits GWAS. * + *(C) University of the Witwatersrand, Johannesburg, 2016-2018 on behalf of the H3ABioNet Consortium + *This is licensed under the Creative Commons Attribution 4.0 International Licence. See the "LICENSE" file for details */ //---- General definitions --------------------------------------------------// @@ -88,13 +90,13 @@ f_lo_male = params.f_lo_male f_hi_female = params.f_hi_female remove_on_bp = params.remove_on_bp -allowed_params= ["AMI","accessKey","batch","batch_col","bootStorageSize","case_control","case_control_col", "chipdescription", "cut_het_high","cut_get_low","cut_maf","cut_mind","cut_geno","cut_hwe","f_hi_female","f_lo_male","cut_diff_miss","cut_het_low", "help","input_dir","input_pat","instanceType","manifest", "maxInstances", "max_plink_cores","high_ld_regions_fname","other_mem_req","output", "output_align", "output_dir","phenotype","pheno_col","pi_hat", "plink_mem_req","region","reference","samplesheet", "scripts","secretKey","sexinfo_available", "sharedStorageMount","strandreport","work_dir"] +allowed_params= ["AMI","accessKey","batch","batch_col","bootStorageSize","case_control","case_control_col", "chipdescription", "cut_het_high","cut_get_low","cut_maf","cut_mind","cut_geno","cut_hwe","f_hi_female","f_lo_male","cut_diff_miss","cut_het_low", "help","input_dir","input_pat","instanceType","manifest", "maxInstances", "max_plink_cores","high_ld_regions_fname","other_mem_req","output", "output_align", "output_dir","phenotype","pheno_col","pi_hat", "plink_mem_req","region","reference","samplesheet", "scripts","secretKey","sexinfo_available", "sharedStorageMount","strandreport","work_dir","max_forks","big_time","super_pi_hat","samplesize","idpat","newpat","access-key","secret-key","instance-type","boot-storage-size","max-instances","shared-storage-mount","gemma_num_cores","remove_on_bp","queue"] params.each { parm -> if (! allowed_params.contains(parm.key)) { - // println "Check $parm"; - } + println "Check $parm ************** is it a valid parameter -- are you using one rather than two - signs or vice-versa"; + } } if (params.help) { @@ -864,6 +866,7 @@ repnames = ["dups","cleaned","misshet","mafpdf","snpmiss","indmisspdf","failedse process produceReports { + label 'latex' input: set file(orig), file (dupf) from report["dups"] set file(cbed), file(cbim), file(cfam), file(ilog) from report["cleaned"] diff --git a/templates/batchReport.py b/templates/batchReport.py index bac06b42..120ad739 100644 --- a/templates/batchReport.py +++ b/templates/batchReport.py @@ -500,7 +500,7 @@ def dumpMissingSexTable(fname, ifrm,sxAnalysis,pfrm,bfrm): g=open(fname,"w") g.write(TAB.join(["FID","IID",args.batch_col,'F_MISS',args.pheno_col])+TAB+TAB.join(map(str,sxAnalysis.columns))+EOL) for i, row in ifrm.iterrows(): - output = TAB.join(map(str, [*i,bfrm.loc[i][args.batch_col],"%5.3f"%row['F_MISS'],pfrm.loc[i]]))+\ + output = TAB.join(map(str, [*i,bfrm.loc[i][args.batch_col].values[0],"%5.3f"%row['F_MISS'],'C',pfrm.loc[i]]))+\ TAB+TAB.join(map(xstr,sxAnalysis.loc[i]))+EOL g.write(output) g.close() diff --git a/templates/make_assoc_report.py b/templates/make_assoc_report.py index f275ef97..a8bcfab2 100755 --- a/templates/make_assoc_report.py +++ b/templates/make_assoc_report.py @@ -1,5 +1,10 @@ #!/usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details + import glob import sys import os diff --git a/templates/missHetPlot.py b/templates/missHetPlot.py index efc57870..8173f53b 100644 --- a/templates/missHetPlot.py +++ b/templates/missHetPlot.py @@ -1,5 +1,9 @@ #!//usr/bin/env python3 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details import matplotlib matplotlib.use('Agg') diff --git a/templates/qcreport.py b/templates/qcreport.py index 58fac7bc..8ecec9e6 100644 --- a/templates/qcreport.py +++ b/templates/qcreport.py @@ -2,12 +2,12 @@ # Scott Hazelhurst, 2016 # Creates a PDF report for QC - -# Tested under both Python 2.7 and 3.5.2 # -# Scott Hazelhurst on behalf of the H3ABioinformatics Network Consortium -# December 2016 -# (c) Released under GPL v.2 +# (c) University of the Witwatersand, Johannesburg on behalf of the H3ABioinformatics Network Consortium +# 2016-2018 +# Licensed under the Creative Commons Attribution 4.0 International Licence. +# See the "LICENSE" file for details +# from __future__ import print_function @@ -383,7 +383,11 @@ def getImages(images): images =images.replace("[","").replace("]","").replace(",","").split() result = "Table "+chr(92)+"ref{table:docker}"+chr(10)+chr(92)+"begin{table}"+chr(92)+"begin{tabular}{ll}"+chr(92)+"textbf{Nextflow process} &" + chr(92)+"textbf{Docker Image}"+chr(92)+chr(92)+chr(92)+"hline"+chr(10) for img in images: - (proc,dimg)=img.split(":") + dets = img.split(":",1) + if len(dets)==1: + (proc,dimg)=("default",dets[0]) + else: + (proc,dimg)==img.split(":",1) result = result + \ proc + "&" + chr(92) + "url{%s}"%dimg+\ chr(92)+chr(92) diff --git a/topbottom.nf b/topbottom.nf index 636b7748..af251147 100644 --- a/topbottom.nf +++ b/topbottom.nf @@ -1,5 +1,8 @@ -/* (C) H3ABionet - GPL +/* + +(C) University of the Witwatersrand, Johannesburg, 2016-2018 on behalf of the H3ABioNet Consortium + + This is licensed under the Creative Commons Attribution 4.0 International Licence. See the "LICENSE" file for details Scott Hazelhust, 2017-2018 */ @@ -55,7 +58,7 @@ strand_src_ch = condChannel(params.strandreport,"strn") manifest_ch = Channel.create() manifest1_ch = Channel.create() -manifest_src_ch.separate(manifest_ch,manifest1_ch) +manifest_src_ch.separate(manifest_ch,manifest1_ch) { a-> [a,a] } strand_ch = Channel.create() strand1_ch = Channel.create() @@ -135,7 +138,7 @@ def gChrom= { x -> file(fam) from fam_ch.toList() output: set file("raw.bed"), file("raw.fam"), file("raw.log") into plink_src - set file("rawraw.bim") into fill_in_bim_ch + file("rawraw.bim") into fill_in_bim_ch script: """ ls *.bed | sort > beds @@ -212,8 +215,8 @@ def gChrom= { x -> publishDir params.output_dir, pattern: "*.{bed,bim,log,badsnps}", \ overwrite:true, mode:'copy' output: - set file("*.{bed,bim,log,badsnps}") into aligned_ch - file ("aligned.fam") into fix_fam_ch + file("*.{bed,bim,log,badsnps}") into aligned_ch + file ("aligned.fam") into fix_fam_ch script: base = bed.baseName refBase = ref.baseName @@ -242,13 +245,13 @@ def gChrom= { x -> if (null_values.contains(params.samplesheet)) { process fixFam{ input: - file(fam) from fix_fam_ch - publishDir params.output_dir, pattern: "${output}.fam", \ + file(fam) from fix_fam_ch + publishDir params.output_dir, pattern: "${output}.fam", \ overwrite:true, mode:'copy' - output: - set file("${output}.fam") into fixedfam_ch - script: - "cp $fam ${output}.fam" + output: + file("${output}.fam") into fixedfam_ch + script: + "cp $fam ${output}.fam" } } else { @@ -274,7 +277,7 @@ if (null_values.contains(params.samplesheet)) { publishDir params.output_dir, pattern: "${output}.fam", \ overwrite:true, mode:'copy' output: - set file("${output}.fam") into fixedfam_ch + file("${output}.fam") into fixedfam_ch script: idpat = params.idpat if (null_values.contains(params.replicates))