Skip to content

Output files

Caleb Lareau edited this page Jul 26, 2019 · 3 revisions

Output from running bap2

All output files shown below are in the /final/ folder.

The exact file name will vary based on the value of the --name parameter, but the extension will be the same.

run.implicatedBarcodes.csv.gz

This file provides the summary metrics of the Tn5 insertions across barcode pairs. The two barcodes are shown in the first two columns, as well as the number of Tn5 insertions shared (N_both) and specific to either sample. The modified jaccard statistic used by bap for merging is shown (jaccard_frag) and finally whether or not the two were merged based on the thresholding of the statistic.

barc1,barc2,N_both,N_barc1,N_barc2,jaccard_frag,merged
tgtttagtccgctcacagctt,actcaataccatgcggttagt,15836,26920,32452,0.36374,TRUE
cgattacccaagctaattggt,ccttaggaacgtaacttgtcc,15397,29754,29596,0.35031,TRUE
ctattcggttgatgtcaagac,ctattcgcatacgctcaagac,13254,28616,23048,0.34507,TRUE
tgtttagtccgctcacagctt,cgattacccaagctaattggt,43,26920,29754,0.00076,FALSE
ctattcggttgatgtcaagac,cgattacccaagctaattggt,36,28616,29754,0.00062,FALSE
tgtttagtccgctcacagctt,taacgccaccggctaactctt,32,26920,29784,0.00056,FALSE

run.barcodeTranslate.tsv

This is a simple two-column file that has the old barcode (left) and the new barcode (right) based on the run of bap. Here, the top two barcodes have the same "after" barcode, meaning they were barcode multiplets that were merged. The new barcodes have three components. First the run part specifies the name (often the prefix of the bam file). Second, the BCxxx part will be a unique barcode identifier. Third the _Nxx part of the barcode will specify how many of the original bead barcodes were merged when determining the new droplet-level barcode.

cgcaatcctcatttttctgca	run_BC1_N02
gagctaaggtcgtatctgaac	run_BC1_N02
actcaataccatgcggttagt	run_BC2_N01
taacgccaccggctaactctt	run_BC3_N01

run.HQbeads.tsv

Simple file that reports the universe of considered barcodes for analysis. This may be a copy of a user-specified file if --barcode-whitelist was used.

actcaataccatgcggttagt
ctattcggttgatgtcaagac
cgcaatcctcatttttctgca
tgtttagtccgctcacagctt

run.fragments.tsv.gz

This summarizes fragments that are 1. merged from the barcode detection algorithm; 2. deduplicated after barcode merging; 3. filtered for proper pairs, Q30 alignment, and < 2kb insert size (all of these are customizable based on the parameterization); 4. adjusted for the Tn5 offset. As an example--

chr1	713778	713972	run_BC4_N02
chr1	713980	714170	run_BC1_N02
chr1	714125	714217	run_BC4_N02
chr1	762734	762916	run_BC5_N02
chr1	790520	790888	run_BC1_N02

Unlike the 10X standard, we don't retain the number of PCR duplicates observed, but otherwise, the file should be plug-and-play ready if one wants to account for barcode multiplets.

run.bap.bam

The .bap.bam file processes the original reads and performs the following: 1. acknowledgement of duplicated reads by adding the "droplet-level" SAM tag; 2. reads are deduplicated after barcode merging (i.e. 1 read per barcode multiplet); 3. filtered for proper pairs, Q30 alignment, and < 2kb insert size (also parametrizable). This bam is ready for downstream analyses.

run.QCstats.csv

A fairly lengthy file that contains various QC stats. These should all be comparable (though not exactly the same) as the metrics reported in the 10X ATAC quality control report.

DropBarcode,totalNuclearFrags,uniqueNuclearFrags,totalMitoFrags,uniqueMitoFrags,duplicateProportion,librarySize,meanInsertSize,medianInsertSize,tssProportion,FRIP
run_BC1_N02,107739,40070,390,111,0.628,43818,191.9,184,0.322,0
run_BC2_N01,53042,21211,881,272,0.6,23760,153.4,122,0.536,0
runBC3_N01,45179,19202,954,305,0.575,22040,179.1,162,0.4653,0

Of note, FRIP will return 0 if the user did not specify ay input peaks (--peak-file in the original command line execution).

The librarySize column represents the "total" number of unique nuclear molecules that were possible using the Lander-Waterman equation.

When using a species-mixed genome, bap will automatically append additional columns that will reflect the species-specified quantifications for mouse and human.

Finally, when the user executes with the --one-to-one flag, the original barcode will also be appended to this QC report.

Additional outputs

Additional files are present in logs and knee that can be useful for assessing quality control summarizes of the run. These should be intuitive and direct.