-
Notifications
You must be signed in to change notification settings - Fork 8
10X scATAC
Here, we provide two vignettes for running bap2
on scATAC-seq data generated on the droplet-based 10X scATAC-seq platform.
Assuming that CellRanger
was run with default parameters and with a specified output name of CR_out, we can execute bap2
directly on the output folder with one simple command:
bap2 bam -i CR_out/outs/possorted_bam.bam -o local_CR_bap -r hg19 -c 12 -bt CB -w CR_out/outs/filtered_peak_bc_matrix/barcodes.tsv
All raw files can be found here: https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_pbmc_5k
Raw .bam
files (with index) are an essential input--
wget http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-atac/1.1.0/atac_v1_pbmc_5k/atac_v1_pbmc_5k_possorted_bam.bam
samtools index atac_v1_pbmc_5k_possorted_bam.bam
Next, pull the barcodes called as "cells" by CellRanger
. There several different ways of pulling these barcodes, including one shown below--
wget http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_5k/atac_v1_pbmc_5k_filtered_peak_bc_matrix.tar.gz
tar -xzvf atac_v1_pbmc_5k_filtered_peak_bc_matrix.tar.gz
ls filtered_peak_bc_matrix/barcodes.tsv
Finally, with these inputs in place, bap2
can be executed:
bap2 bam -i atac_v1_pbmc_5k_possorted_bam.bam -o public_pbmcs5k_bap -r hg19 -c 12 -bt CB -w filtered_peak_bc_matrix/barcodes.tsv
Here, the -c 12
specified 12 cores; the -bt CB
indicates the the cell barcode tag is "CB" (consistent with the 10X standard, and the universe of barcodes considered for merging are only those called "cells" by the original CellRanger
knee call. Additionally -r
specifies the reference genome; update accordingly to your use case (both analyses here used hg19
, which is the default). Use bap2 --help
to see all possible command line configurations.
While all output files will hopefully be of some value (see here), one particularly useful file that should drag-and-drop with other existing workflows for 10X scATAC-seq data is the *.fragments.tsv.gz
file, which is compressed with bgzip
and indexed with tabix
akin to what comes out of CellRanger. However, these fragments have been merged between the constitutive barcodes that were predicted to be barcode multiplets.
Please raise an issue here