-
Notifications
You must be signed in to change notification settings - Fork 1
Preprocessing reads
If multiple fastq files exist for a single sample, they will first need to be merged using the --merge
option.
Then the read names of the fastq file created will be trimmed after the first whitespace, for compatiblity purposes with all downstream tools.
Reads can also be optionally trimmed of adapters and/or quality filtered:
-
Search for presence of adapters in sequences reads using
Porechop ABI
by specifying the--adapter_trimming
parameter. Porechop ABI parameters can be specified using--porechop_options '{options} '
, making sure you leave a space at the end before the closing quote. Please refer to the Porechop manual.
To limit the search to known adapters listed inadapter.py
, just specify the--adapter_trimming
option.
To search ab initio for adapters on top of known adapters, specify--adapter_trimming --porechop_options '-abi '
.
To limit the search to custom adapters, specify--adapter_trimming --porechop_custom_primers --porechop_options '-ddb '
and list the custom adapters in the text file located under bin/adapters.txt following the format:line 1: Adapter name line 2: Start adapter sequence line 3: End adapter sequence --- repeat for each adapter pair---
-
Perform a quality filtering step using
Chopper
by specifying the--qual_filt
parameter. Chopper parameters can be specified using the--chopper_options '{options}'
. Please refer to the Chopper manual.
For instance to filter reads shorter than 1000 bp and longer than 20000 bp, and reads with a minimum Phred average quality score of 10, you would specify:--qual_filt --chopper_options '-q 10 -l 1000 --maxlength 20000'
.
A zipped copy of the resulting preprocessed and/or quality filtered fastq file will be saved in the preprocessing folder.
If you trim raw read of adapters and/or quality filter the raw reads, an additional quality control step will be performed and a qc report will be generated summarising the read counts recovered before and after preprocessing for all samples listed in the index.csv file.