Skip to content

10X scATAC

Caleb Lareau edited this page Oct 2, 2019 · 3 revisions

Running bap2 on 10X single-cell ATAC data

Here, we provide two vignettes for running bap2 on scATAC-seq data generated on the droplet-based 10X scATAC-seq platform.

Use with a new CellRanger output

Assuming that CellRanger was run with default parameters and with a specified output name of CR_out, we can execute bap2 directly on the output folder with one simple command:

bap2 bam -i CR_out/outs/possorted_bam.bam -o local_CR_bap -r hg19 -c 12 -bt CB -w CR_out/outs/filtered_peak_bc_matrix/barcodes.tsv 

Look at existing public data

All raw files can be found here: https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_pbmc_5k

Download and setup data

Raw .bam files (with index) are an essential input--

wget http://s3-us-west-2.amazonaws.com/10x.files/samples/cell-atac/1.1.0/atac_v1_pbmc_5k/atac_v1_pbmc_5k_possorted_bam.bam
samtools index atac_v1_pbmc_5k_possorted_bam.bam

Next, pull the barcodes called as "cells" by CellRanger. There several different ways of pulling these barcodes, including one shown below--

wget http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_5k/atac_v1_pbmc_5k_filtered_peak_bc_matrix.tar.gz
tar -xzvf atac_v1_pbmc_5k_filtered_peak_bc_matrix.tar.gz
ls filtered_peak_bc_matrix/barcodes.tsv 

Execute bap2

Finally, with these inputs in place, bap2 can be executed:

bap2 bam -i atac_v1_pbmc_5k_possorted_bam.bam -o public_pbmcs5k_bap -r hg19 -c 12 -bt CB -w filtered_peak_bc_matrix/barcodes.tsv 

Here, the -c 12 specified 12 cores; the -bt CB indicates the the cell barcode tag is "CB" (consistent with the 10X standard, and the universe of barcodes considered for merging are only those called "cells" by the original CellRanger knee call. Additionally -r specifies the reference genome; update accordingly to your use case (both analyses here used hg19, which is the default). Use bap2 --help to see all possible command line configurations.

Examine output files

While all output files will hopefully be of some value (see here), one particularly useful file that should drag-and-drop with other existing workflows for 10X scATAC-seq data is the *.fragments.tsv.gz file, which is compressed with bgzip and indexed with tabix akin to what comes out of CellRanger. However, these fragments have been merged between the constitutive barcodes that were predicted to be barcode multiplets.