-
Notifications
You must be signed in to change notification settings - Fork 0
Step 6: Set up Distributed Computing
SMRT Analysis provides support for distributed computation using an existing job management system. Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), LSF and PBS.
Note: Celera Assembler 7.0 will only work correctly with the SGE job management system. If you are not using SGE, you will need to deactivate the Celera Assembler protocols so that they do not display in SMRT Portal. To do so, rename the following files, located in common/protocols
:
RS_CeleraAssembler.1.xml to RS_CeleraAssembler.1.bak
filtering/CeleraAssemblerSFilter.1.xml to CeleraAssemblerSFilter.1.bak
assembly/CeleraAssembler.1.xml to CeleraAssembler.1.bak
This section describes setup for SGE and gives guidance for extensions to other Job Management Systems.
Following are the options in the $SEYMOUR_HOME/analysis/etc/smrtpipe.rc
file that you can set to execute distributed SMRT Pipe runs.
IMAGE of Table here, or a link to the SMRT Pipe section when ready
The central component for setting up distributed computing in SMRT Analysis are the Job Management Templates (JMTs). JMTs provide a flexible format for specifying how SMRT Analysis communicates with the resident JMS. There are two templates which must be modified for your system:
-
start.tmpl
is the legacy template used for assembly algorithms. -
interactive.tmpl
is the new template used for resequencing algorithms. The difference between the two is the additional requirement of a sync option ininteractive.tmpl
. (kill.tmpl
is not used.)
Note: We are in the process of converting all protocols to use only interactive.tmpl.
To customize a JMS for a particular environment, edit or create start.tmpl
and interactive.tmpl
. For example, the installation includes the following sample start.tmpl and interactive.tmpl (respectively) for SGE:
qsub -pe smp ${NPROC} -S /bin/bash -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}
qsub -S /bin/bash -sync y -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} -pe smp ${NPROC} ${CMD}
- Create a new directory in
etc/cluster/
underNEW_NAME
. - In
smrtpipe.rc
, change theCLUSTER_MANAGER
variable toNEW_NAME
, as described in “Smrtpipe.rc Configuration”. - Once you have a new JMS directory specified, edit the
interactive.tmpl
andstart.tmpl
files for your particular setup.
Sample SGE, LSF and PBS templates are included with the installation in $SEYMOUR_HOME/analysis/etc/cluste
r.
For this version (v1.4.0), you must still edit both interactive.tmpl
and start.tmpl
as follows:
- Change
secondary
to the queue name on your system. (This is the–q
option.) - Change
smp
to the parallel environment on your system. (This is the-pe
option.)
PBS does not have a –sync
option, so the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.
- Change the queue name to one that exists on your system. (This is the
–q
option.) - Change the parallel environment to one that exists on your system. (This is the
-pe
option.) - Make sure that
interactive.tmpl
calls the–PBS
option.
Create an interactive.tmpl
file by copying the start.tmpl
file and adding the –K
functionality in the bsub
call. Or, you can also edit the sample LSF templates.
We have not tested the –sync
functionally on other systems. Find the equivalent to the –sync
option for your JMS and create an interactive.tmpl
file. If there is no -sync
option available, you may need to edit the qsw.py
script in $SEYMOUR_HOME/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/qsw.py
to add additional options for wrapping jobs on your system.
The code for PBS and SGE looks like the following:
if '-PBS' in args:
args.remove('-PBS')
self.jobIdDecoder = PBS_JOB_ID_DECODER
self.noJobFoundCode = PBS_NO_JOB_FOUND_CODE
self.successCode = PBS_SUCCESS_CODE
self.qstatCmd = "qstat"
else:
self.jobIdDecoder = SGE_JOB_ID_DECODER
self.noJobFoundCode = SGE_NO_JOB_FOUND_CODE
self.successCode = SGE_SUCCESS_CODE
self.qstatCmd = "qstat -j"
Running jobs in distributed mode is 88disabled by default88 in SMRT Portal.
To enable distributed processing, set the jobsAreDistributed
value in $SEYMOUR_HOME/redist/tomcat/webapps/smrtportal/WEB-INF/web.xml
to true:
<context-param>
<param-name>jobsAreDistributed</param-name>
<param-value>true</param-value>
</context-param>
You will need to restart Tomcat.