Skip to content

SMRT Pipe file structure

fripp edited this page Feb 13, 2013 · 5 revisions

Note: The output of a SMRT Pipe analysis includes more files than described here; interested users should explore the file structure. Following are details about the major files.

 <jobID>/job.sh
  • Contains the SMRT Pipe command line call for the job.
<jobID>/settings.xml
  • Contains the modules (and their associated parameters) to be run as part of the SMRT Pipe run.
<jobID>/metadata.rdf
  • Contains all important metadata associated with the job. This includes metadata propagated from primary results, links to all reports and data files exposed to users, and high-level summary metrics computed during the job. The file is an entry point to the job by tools such as SMRT Portal and SMRT View. metadata.rdf is formatted as an RDF-XML file using OWL ontologies. See http://www.w3.org/standards/semanticweb/ for an introduction to Semantic Web technologies.
<jobID>/input.fofn
  • This file (“file of file names”) is generated early during a job and contains the file names of the raw input files used for the analysis.
<jobID>/input.xml
  • Used to specify the input files to be analyzed in a job, and is passed on to the command line.
<jobID>/vis.jnlp
  • Deprecated - no longer generated in v1.4.0. To visualize data, install SMRT View and choose File > Open Data from Server.
log/smrtpipe.log
  • Contains debugging output from SMRT Pipe modules. This is typically shown by way of the View Log button in SMRT Portal.

Data Files##

The Data directory is where most raw files generated by the pipeline are stored. (Note: The following are example output files - for more details about specific files, see the sections dealing with individual modules.)

aligned_reads.cmp.h5, aligned_reads.sam, aligned_reads.bam
  • Mapping and consensus data from secondary analysis.
alignment_summary.gff
  • Alignment data summarized on sequence regions.
variants.gff.gz
  • All sequence variants called from consensus sequence.
toc.xml
  • Deprecated - The master index information for the job outputs is now included in the metadata.rdf file.

Results/Reports Files

Modules with Reports in their name produce HTML reports with static PNG images using XML+XSLT. These reports are located in the results subdirectory. The underlying XML document for each report is preserved there as well; these can be useful files for data-mining the outputs of SMRT Pipe.

Clone this wiki locally