-
Notifications
You must be signed in to change notification settings - Fork 8
Output files
All output files shown below are in the /final/
folder.
The exact file name will vary based on the value of the --name
parameter, but the extension will be the same.
This file provides the summary metrics of the Tn5 insertions across barcode pairs. The two barcodes are shown in the first two columns, as well as the number of Tn5 insertions shared (N_both
) and specific to either sample. The modified jaccard statistic used by bap
for merging is shown (jaccard_frag
) and finally whether or not the two were merged based on the thresholding of the statistic.
barc1,barc2,N_both,N_barc1,N_barc2,jaccard_frag,merged
tgtttagtccgctcacagctt,actcaataccatgcggttagt,15836,26920,32452,0.36374,TRUE
cgattacccaagctaattggt,ccttaggaacgtaacttgtcc,15397,29754,29596,0.35031,TRUE
ctattcggttgatgtcaagac,ctattcgcatacgctcaagac,13254,28616,23048,0.34507,TRUE
tgtttagtccgctcacagctt,cgattacccaagctaattggt,43,26920,29754,0.00076,FALSE
ctattcggttgatgtcaagac,cgattacccaagctaattggt,36,28616,29754,0.00062,FALSE
tgtttagtccgctcacagctt,taacgccaccggctaactctt,32,26920,29784,0.00056,FALSE
This is a simple two-column file that has the old barcode (left) and the new barcode (right) based on the run of bap
. Here, the top two barcodes have the same "after" barcode, meaning they were barcode multiplets that were merged. The new barcodes have three components. First the run part specifies the name (often the prefix of the bam file). Second, the BCxxx part will be a unique barcode identifier. Third the _Nxx part of the barcode will specify how many of the original bead barcodes were merged when determining the new droplet-level barcode.
cgcaatcctcatttttctgca run_BC1_N02
gagctaaggtcgtatctgaac run_BC1_N02
actcaataccatgcggttagt run_BC2_N01
taacgccaccggctaactctt run_BC3_N01
Simple file that reports the universe of considered barcodes for analysis. This may be a copy of a user-specified file if --barcode-whitelist
was used.
actcaataccatgcggttagt
ctattcggttgatgtcaagac
cgcaatcctcatttttctgca
tgtttagtccgctcacagctt
This summarizes fragments that are 1. merged from the barcode detection algorithm; 2. deduplicated after barcode merging; 3. filtered for proper pairs, Q30 alignment, and < 2kb insert size (all of these are customizable based on the parameterization); 4. adjusted for the Tn5 offset. As an example--
chr1 713778 713972 run_BC4_N02
chr1 713980 714170 run_BC1_N02
chr1 714125 714217 run_BC4_N02
chr1 762734 762916 run_BC5_N02
chr1 790520 790888 run_BC1_N02
Unlike the 10X standard, we don't retain the number of PCR duplicates observed, but otherwise, the file should be plug-and-play ready if one wants to account for barcode multiplets.
The .bap.bam
file processes the original reads and performs the following: 1. acknowledgement of duplicated reads by adding the "droplet-level" SAM tag; 2. reads are deduplicated after barcode merging (i.e. 1 read per barcode multiplet); 3. filtered for proper pairs, Q30 alignment, and < 2kb insert size (also parametrizable). This bam is ready for downstream analyses.
A fairly lengthy file that contains various QC stats. These should all be comparable (though not exactly the same) as the metrics reported in the 10X ATAC quality control report.
DropBarcode,totalNuclearFrags,uniqueNuclearFrags,totalMitoFrags,uniqueMitoFrags,duplicateProportion,librarySize,meanInsertSize,medianInsertSize,tssProportion,FRIP
run_BC1_N02,107739,40070,390,111,0.628,43818,191.9,184,0.322,0
run_BC2_N01,53042,21211,881,272,0.6,23760,153.4,122,0.536,0
runBC3_N01,45179,19202,954,305,0.575,22040,179.1,162,0.4653,0
Of note, FRIP will return 0
if the user did not specify ay input peaks (--peak-file
in the original command line execution).
The librarySize column represents the "total" number of unique nuclear molecules that were possible using the Lander-Waterman equation.
When using a species-mixed genome, bap
will automatically append additional columns that will reflect the species-specified quantifications for mouse and human.
Finally, when the user executes with the --one-to-one
flag, the original barcode will also be appended to this QC report.
Additional files are present in logs
and knee
that can be useful for assessing quality control summarizes of the run. These should be intuitive and direct.
Please raise an issue here