Mapping and analyzing 4C-seq experiments using 4Cseqpipe
Please refer the README to prepare reference data.
Just copy the template directory to your work directory.
dir_work="data/4Cseqpipe"
cp -R reference/pipeline_4Cseqpipe/template $dir_work
There are two files must be checked/configured:
rawdata/index.txt
: One sample per line.species_name
(Column 6): Change to the real species,Mus_musculus
orHomo_sapiens
.primer_seq
(Column 5): Change to the real primer.first/sec_cutter_name/seq
(Column 7-10): Change to the real data.bait_chromo/coord
(Column 13,14): Change to the real bait chromosome and coordinate.seq_len
(Column 15): Change to the real read length.raw_fname
(Column 16): Set the name, which will be saved in therawdata
folder.- Other columns have no big effect.
id
(Column 1) will be used as the experiment ID.run, lane_no, exp
(Column 2-4) is just for your tailored experiment.linearization_name/seq
(Column 11,12) can keep as "NA".
4cseqpipe.conf
index=rawdata/index.txt
(First line): Make sure or change theindex.txt
file path.trackdb_root=mm9_trackdb
(Second line): Make sure or change thetrackdb_root
path.trackset_name=4c
(Third line): Change the name to your sample. Pay attention please, a same named folder will be created in thetrackdb_root/tracks
directory.
NGS reads form 4C experiment in FASTQ
format. PE reads can be concatenated to one single file, or used seperately.
cd $dir_work
# 1. Convert FASTQ files to "raw" format
## $id: Corresponding to the "index.txt" file
## $fastq: Path to the FASTQ file
perl 4cseqpipe.pl -fastq2raw -ids $id -fastq_fn $fastq
# 2. Mapping valid 4C-Seq procucts to restriction fragments in the genome
perl 4cseqpipe.pl –map -ids $id
# 3. Normalizing and generating near-cis domainograms
## $start, $end: genomic range for computing normalized trend
## $res: size of window in basepairs (5000, 2000 or 1000 is used most)
## $fig: Output figure filename, which will be stored in `figures` folder.
perl 4cseqpipe.pl –nearcis -ids $id -calc_from $start -calc_to $end -stat_type median -trend_resolution $res -figure_fn $fig
# three steps in one: fastq2raw + map + nearcis
# perl 4cseqpipe.pl -dopipe [all parameters above]
The detailed explanation and full parameters can be found on the website or manual.
The main output file is a figure in PNG format, which is stored in the figures
folder.
-
For PE reads, two files can be concatenated to one file. Besides, you can try to run this pipeline for each file, and choose the best result.
-
If the pipeline terminate for no enough data, you can extent the genome range (controlled by
-calc_from
andcalc_to
) and try again. -
The parameter
-convert_qual
may should be set to1
if the pipeline fail. But what is its meaning? That is, if the error occurs:
Mapper will quit now. Please reduce the value for "min_precision_for_weight" setting and run again the program.
Also beware that you might experience slow run-times with low quality reads.
To improve run-times consider lowering down the precision by reducing the value of "max_mismatches".
the first step should be:
# 1. Convert FASTQ files to "raw" format
## $id: Corresponding to the "index.txt" file
## $fastq: Path to the FASTQ file
perl 4cseqpipe.pl -fastq2raw -ids $id -fastq_fn $fastq -convert_qual 1
Before runing the pipeline, please contact the experimentalist to collect information required for the pipeline:
- Primer sequence: Usually, the reverse primier is what you need.
- Enzymes (and their cut sites)
- Bait point coordiante: Check the coordinate system, convert it if necessary.
All input and output files are stored in test_data/pipeline_4Cseqpipe
.
FASTQ files for two sample experiments performed using the alpha1 globin gene promoter as the viewpoint in mouse fetal liver and fetal brain (Van de Werken, Landan, et. al., 2012):
alpha_FL.fastq
: experiment in fetal liver cells using the highly active alpha1 globin promoter as viewpointalpha_FB.fastq
: same viewpoint in a different tissue (fetal brain, in which the alpha1 globin gene is not active)
# Prepare the pipeline
dir_work="test_data/pipeline_4Cseqpipe/4Cseqpipe_for_test"
cp -R reference/pipeline_4Cseqpipe/template $dir_work
cd $dir_work
# Configure the pipeline
# 1. Change the `trackdb_root` in `4cseqpipe.conf` file as ``.
# Run the pipeline
## Case
perl 4cseqpipe.pl -fastq2raw -ids 1 -fastq_fn ../alpha_FL.fastq
perl 4cseqpipe.pl -map -ids 1
perl 4cseqpipe.pl -nearcis -calc_from 32000000 -calc_to 32300000 -stat_type median -trend_resolution 5000 -ids 1 -figure_fn alpha_FL.png
cp figures/alpha_FL.png ../output/
## Control
perl 4cseqpipe.pl -fastq2raw -ids 2 -fastq_fn ../alpha_FB.fastq
perl 4cseqpipe.pl -map -ids 2
perl 4cseqpipe.pl -nearcis -calc_from 32000000 -calc_to 32300000 -stat_type median -trend_resolution 5000 -ids 2 -figure_fn alpha_FB.png
cp figures/alpha_FB.png ../output/
alpha_FL.png
: The case outputalpha_FB.png
: The control output- Other folders, such as
tables
,stats
, may should be checked if necessary. Please reference the website or manual for more information.
Yi Xianfu (yixfbio AT gmail DOT com)
GPL v3 or later