Skip to content

Reference Genome

Caleb Lareau edited this page Nov 6, 2019 · 2 revisions

Integrated reference genomes

To see if your reference genome is currently supported, execute the following command:

bap2 support

You can specify to use any of these built-in reference genomes with the --reference-genome or -r flags.

Custom reference genomes

If you do not see your specified reference genome among those listed, you can manually specify the four input files required for bap processing with these four flags

1) Bedtools genome file

This is a file with two columns: 1) the contig name and 2) the size of the contig. Specify this file using the -bg or --bedtools-genome flags. Note: this file should be equivalent too what is supplied when running many standard bedtools commands with the -g flag.

The contigs listed in this file will be parsed to compute overlaps in order to identify multiplets.

2) Blacklist bed file

This is a standard three column bed file containing 1) the contig name; 2) the start position; and 3) the end position per blacklisted region. These files are generally available via the ENCODE project. Specify this file path with -bg or --bedtools-genome.

The regions listed in this file will be used to remove fragmments in low-complexity regions.

3) TSS bed file

This is a standard three column bed file containing 1) the contig name; 2) the start position; and 3) the end position per annotated transcription start site. Specify this file path with -ts or --tss-file.

The regions listed in this file are used to compute TSS enrichment statistics per single-cell in the final QC steps.

4) Mitochondrial chromosome (string)

A simple string that should match one of the contig names in the bedtools genome file. Specify this with -mc or --mito-chromosome. This contig is treated differently for QC purposes as well as the fragment overlap score.

Examples of each of these files can be found here