chromCRISPR

This repository demonstrates a series of analyses to assess the impact of DNA accessibility on CRISPR-Cas9 cleavage efficiency. This analysis takes multiple open access genome-scale datasets including GUIDE-Seq, CIRCLE-Seq, DNase-Seq and RNA-Seq to systematically characterize crucial determinants for CRISPR-induced editing efficiency.

Highlighting results:

The condensed chromatin conformation has the potential to abrogate the correlation between gRNA:target similarity and CRISPR-induced cleavage frequency.
CRISPR-induced sequence editing is possible even in regions where the vast majority of endogenous genes are silent.

Quick review

Full analysis and figure generation could be found in this python notebook

List of files required to run full analysis:

processed_data/20181216_GUIDE_sup_data_RPM_clean.csv
processed_data/20181216_CIRCLE_sup_data_RPM_clean.csv
raw_data/transcriptome/HK_genes.txt
raw_data/transcriptome/HEK/paired/SRR3997505/abundance.tsv
raw_data/transcriptome/U2OS/ERR191523_trimmed/ERR191523/abundance.tsv
raw_data/transcriptome/U2OS_990_TSS_1000up_200down_DNaseSeq.csv
raw_data/transcriptome/HEK_TSS_1000up_200down_DNaseSeq.csv

Detailed pipeline

The pipelines that generates all required files to run full analysis descrbied above take publically available dataset from multiple resources:

Preprocessing of DNase-Seq datasets:

Prepare the sorted bam file for the calculation of Read count Per Million mappable reads per basepair (RPM).

Dataset	Assay	Link
HEK293T DNA accessibility	DNase-Seq	ENCFF774HUB.bam
U2OS DNA accessibility	DNase-Seq	SRR4413990.fastq

Preprocessing of DNase-Seq on HEK293T:

python code/preprocessing.py -i raw_data/HEK/ENCFF774HUB.bam -p 'bam' -r raw_data/HG19.fasta

Preprocessing of DNase-Seq on U2OS:

python code/preprocessing.py -i raw_data/U2OS/SRR4413990.fastq.gz -p 'fastq' -r raw_data/HG19.fasta

Note:

HG19 reference genome could be downloaded from: GRCh37/hg19

and indxed by:

bwa index -a bwtsw raw_data/HG19.fasta

Calculate DNase-Seq RPM to CRISPR-induced cleavage sites/Gene promoter:

Dataset	Assay/Platform	Link
GUIDE-Seq identified	GUIDE-Seq	GUIDEseq_allgRNAs_identified or Supplementary Table 2
CIRCLE-Seq identified	CIRCLE-Seq	CIRCLEseq_allgRNAs_identified or Supplementary Table 2
HG19 gene coordinates	NCBI RefSeq	UCSC Table Browser
HEK293T transcriptome	NextSeq 500	SRR3997505
U2OS transcriptome	HighSeq 2000	ERR191523

For pre-defined promoter regions, use this file: hg19_allTSS_1000up_200down.bed.gz

For more detail of gene region file, please see: rnaseq.ipynb

Add DNase-Seq RPM column to CRISPR-induced cleavage sites/Gene promoter:

e.g.

DNase-Seq RPM on HEK293T gene promoter:

python code/production.py -L raw_data/transcriptome/hg19_allTSS_1000up_200down.bed.gz -c "HEK293T" -b raw_data/HEK/HEK.se50.DNaseSeq.sorted.bam -o processed_data/HEK_TSS_1000up_200down_DNaseSeq.csv

DNase-Seq RPM on U2OS gene promoter:

python code/production.py -L raw_data/transcriptome/hg19_allTSS_1000up_200down.bed.gz -c "U2OS" -b raw_data/U2OS/SRR4413990_trimmed.sorted.bam -o processed_data/U2OS_TSS_1000up_200down_DNaseSeq.csv

DNase-Seq RPM on HEK293T GUIDE-Seq identified cleavage sites:

python code/production.py -L raw_data/GUIDEseq_allgRNAs_identified.csv -c "HEK293T" -b raw_data/HEK/HEK.se50.DNaseSeq.sorted.bam -w 100 -o processed_data/HEK_GUIDESeq_DNaseSeq.csv

DNase-Seq RPM on U2OS GUIDE-Seq identified cleavage sites:

python code/production.py -L raw_data/GUIDEseq_allgRNAs_identified.csv -c "U2OS" -b raw_data/U2OS/SRR4413990_trimmed.sorted.bam -w 100 -o processed_data/U2OS_GUIDESeq_DNaseSeq.csv

DNase-Seq RPM on HEK293T CIRCLE-Seq identified cleavage sites:

python code/production.py -L raw_data/CIRCLEseq_allgRNAs_identified_matched.csv -c "HEK293T" -b raw_data/HEK/HEK.se50.DNaseSeq.sorted.bam -w 100 -o processed_data/HEK_CIRCLESeq_DNaseSeq.csv

DNase-Seq RPM on U2OS CIRCLE-Seq identified cleavage sites:

python code/production.py -L raw_data/CIRCLEseq_allgRNAs_identified_matched.csv -c "U2OS" -b raw_data/U2OS/SRR4413990_trimmed.sorted.bam -w 100 -o processed_data/U2OS_CIRCLESeq_DNaseSeq.csv

Ready for figure generation:

Open analysis.ipynb and displace the correct file names in corresponding input DataFrames.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
code		code
paper_figures		paper_figures
processed_data		processed_data
raw_data/transcriptome		raw_data/transcriptome
results		results
.gitignore		.gitignore
HISTORY.md		HISTORY.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chromCRISPR

Quick review

Detailed pipeline

Preprocessing of DNase-Seq datasets:

Calculate DNase-Seq RPM to CRISPR-induced cleavage sites/Gene promoter:

Ready for figure generation:

About

Releases

Packages

Languages

License

DamLabResources/chroCRISPR

Folders and files

Latest commit

History

Repository files navigation

chromCRISPR

Quick review

Detailed pipeline

Preprocessing of DNase-Seq datasets:

Calculate DNase-Seq RPM to CRISPR-induced cleavage sites/Gene promoter:

Ready for figure generation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages