How to extract the UMI info in illumina read's name into a seperate tag #533

TendoLiu · 2019-11-11T15:30:12Z

Hi,
Have beening working on UMI collapsing of illumina DNA seq data. The fastq header looks like this. I wonder is there a way to transfer all the UMI like "TATGTNC+NNGAGCA" to a seperate tag which could be used by duplicates markers?

@NS500211:808:HW27KAFXY:1:11101:12228:1057:TATGTNC+NNGAGCA 1:N:0:TCCGGAGA

Thanks.

magicDGS · 2019-11-14T05:56:30Z

Hello @TendoLiu - the name of your read looks a bit weird to me, as it contains a Casava barcode (1:N:0:TCCGGAGA) and the UMI appended to the read name (TATGTNC+NNGAGCA). Is this a FASTQ or a BAM file?

ReadTools is a bit "picky" with read names, as it only understands 2 formats that are common:

Casava: e.g. @NS500211:808:HW27KAFXY:1:11101:12228:1057 1:N:0:TCCGGAGA, where the identified barcode will be TCCGGAGA
Illumina: e.g., @NS500211:808:HW27KAFXY:1:11101:12228:1057#TATGTNC+NNGAGCA, where the identified barcode will be TATGTNC+NNGAGCA. Note that, contrary to your case, the barcodes are separated from the read name by # instead of :., and that only one barcode is detected as + is used for concatenation instead of the standard (in the specs), which is -.

ReadTools can handle only one of the problems that you are facing: the barcode separator could be overriden (although will still be used for all the output files) with the java property barcode_index_delimiter (so providing -Dbarcode_index_delimiter=+ in your case). Nevertheless, I am not sure if your use-case matches AssignReadGroupByBarcode, as it is designed for barcodes (like the one after the space) and not for UMIs (I am not familiar with them, but maybe appending them to the read name with : as separator is a standard there...)

Could you please clarify with this information? Thanks!

magicDGS added Priority: Medium Status: Pending In discussion to include in the project backlog Type: Question User/Developer question to be answer labels Nov 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to extract the UMI info in illumina read's name into a seperate tag #533

How to extract the UMI info in illumina read's name into a seperate tag #533

TendoLiu commented Nov 11, 2019

magicDGS commented Nov 14, 2019

How to extract the UMI info in illumina read's name into a seperate tag #533

How to extract the UMI info in illumina read's name into a seperate tag #533

Comments

TendoLiu commented Nov 11, 2019

magicDGS commented Nov 14, 2019