Skip to content

Installing software and making custom modules

bastiaanstar edited this page Mar 27, 2019 · 16 revisions

Introduction to using modules

Programs that are used by a large number of users at CEES are generally maintained by our IT support, who also create 'modules' for each program installed in /cluster/software. On abel and the cod nodes, the standard way to use software is then to load them as modules with e.g.

module load samtools 

The module system allows for different versions of the software. For example, writing

module avail samtools

will list the available versions:

samtools/0.1.18(default)
samtools/0.1.19

So you can then choose to overwrite the default version like this:

module load samtools/0.1.19

A good explanation on how to use modules can be found here.

Installing software

Occasionally, however, you might need to use software that is not already available on abel, and that might only be used by you alone, or a small group of researchers. In that case, you may not want to bother IT support with the installation and maintenance of that software, but you could install it yourself in the dedicated directory, /projects/cees/bin/. If a package depends on some software or library that is installed on Abel, use module load before installing it, and before using it later

Different Types of Installs

Software can be installed in many different ways. It depends on what the developers have decided and the procedure should be found in the documentation.

Some common types of installs:

  • GNU Autoconf based
  • Cmake based
  • Manually written Make file
  • Binary install in Tar og Zip file

Note: Ubuntu/Debian binary packages cannot be installed, since Abel and Colossus run RedHat (CentOS).

It might be possible to install with package management system RPM (Red Hat Package Manager), but it is tricky.

GNU Autoconf Based Install

$ module purge
$ module load intel/2016.0
$ module list
$ wget \ftp://heasarc.gsfc.nasa.gov/software/fitsio/c/cfitsio3370.tar.gz
$ tar xf cfitsio3370.tar.gz
$ cd cfitsio/
$ less README
$ ./configure --prefix=$HOME/cfitsio
$ make
$ make install

Some packages have a make check to test the build. Some packages can be run from the build area without using make install. See ./configure --help for more options

Cmake or Manual Makefile

  • The documentation (README, INSTALL, etc.) should give you the information needed.
  • With cmake, one can choose installation directory with cmake -DCMAKE_INSTALL_PREFIX=/the/path.
  • Well written Make les use a variable to control the installation directory. Then one can use make VARIABLE=/the/path install to build and install.
  • If the Make le uses an environment variable, one can set it with export VARIABLE=/the/path before running make.
  • If the installation directory is hard coded in the Make file, one must edit the file.

Binary Installs

  • Usually Tar or Zip files.
  • Often, one only needs to unpack the les. Optionally, one can rename the directory, or copy the extracted les to the desired directory.
  • Tip: First check whether the le unpacks in a subdirectory or into the current directory (tar tvf file.tar.gz or unzip -t file.zip).
  • There is a risk that the programs need libraries that are missing, too old or too new on cluster. Then one needs to install these as well, or switch to installing from source, if possible.

FIXME add correct link, old one: https://wiki.uio.no/mn/bio/cees-bioinf/index.php/User_manual_cod_nodes#Software.

Writing module files

In order to keep /projects/cees/bin/ as organized as possible, it is recommended that programs are installed in sub-directories according to program name and version number, e.g. in /projects/cees/bin/freebayes/0.9.14/.

If software is installed this way in /projects/cees/bin/, it is still possible (and recommended) to use modules, as these will facilitate the use of the software in SLURM scripts and increase reproducibility. In order to use modules, module files must be written and saved in directory /projects/cees/bin/modules/, again using the the program name as the name of a sub-directory, as in /projects/cees/bin/modules/freebayes/. Modules are set up using 'module files'. These should be named according to the version number of the program, without file extension. Thus, the file /projects/cees/bin/modules/freebayes/0.9.14 is a module file for the use of version 0.9.14 of the program FreeBayes. The content of the module file links this module with the installed software and is written in a language called TCL. However, commands in the module file are simple to understand, and existing module files can easily be copied and adapted for new module files. For this purpose, there is also a template module file in /projects/cees/bin/modules/ that you may copy.

For example, the content of module file /projects/cees/bin/modules/freebayes/0.9.14 is as follows:

#%Module

## URL of application homepage.
set appurl     https://github.com/ekg/freebayes/

## Short description of module.
module-whatis "
Name:          FreeBayes
Description:   Bayesian haplotype-based polymorphism discovery and genotyping
Website:       https://github.com/ekg/freebayes/
Installed by:  Michael Matschiner"

## Commands.
set               root                 /projects/cees/bin/freebayes/0.9.14/
prepend-path      PATH                 $root

Here, the first line

#%Module

is required to make this file interpretable as a module file. All lines starting with ## are comments, and specification of the program URL in

set appurl     https://github.com/ekg/freebayes/

only serves as an information for the user. The text for module-whatis allows the user to find out more about this module by specifying

module whatis freebayes

on the command line. In order to provide consistent information for each module, we recommend to structure the description as given in the above example, with name, description, URL, and the name of the person that installed the module (you). This will allow users of the module to contact you when questions concerning the module arise. The line

set               root                 /projects/cees/bin/freebayes/0.9.14/

defines a variable 'root' as the directory name of the actual software installation in /projects/454data/bin/freebayes/0.9.14/, and with the following line

prepend-path      PATH                 $root

the directory name stored in this variable is added to the PATH variable, which is simply a list of directories, in which the system searches for executables whenever a program is called.

Thus, this module file allows that after using

module load freebayes/0.9.14

you can simply type

freebayes

to start FreeBayes. In this case, you will actually be running /projects/cees/bin/freebayes/0.9.14/freebayes.

Note: see FIXME link these instructions on how to make sure the module system can find our locally installed modules.

Program version defaults

When multiple versions of the same program are installed, a default version can be specified. For example, there are currently two versions of FreeBayes installed on /projects/cees/bin/freebayes/ (versions 9.9.2 and 0.9.14), and correspondingly, the modules directory /projects/cees/bin/modules/freebayes contains two module files with these version names. In this case, you may want to specify which version should be loaded by default with the following command.

module load freebayes

This can be achieved with a file named .version in the same directory as the module files (here: /projects/cees/bin/freebayes/.version). Note that the file name is preceded by a period symbol, which defines this file to be a hidden file that is not shown when directory contents are normally listed with command

ls

To see hidden files in the current directory, you will have to instead use

ls -a

The content of the '.version' file is very simple:

#%Module
 
set ModulesVersion "0.9.14"

where, again, the first line allows interpretation of this file as a module file, and the second line specifies the version number that is to be used as the default program version.

Note: While using the default version may be a handy short-cut at times, it is recommended to explicitly specify program version numbers in all scripts. As the default may change over time, scripts using the default might produce different results if you have to rerun them at later stages.

FIXME explain the need to add module use --append /projects/cees/bin/modules to .bash_login file to enable us of local modules

ADDITIONAL DEPENDENCIES

It can occur that an application requires additional or specific module versions in order to function correctly. Such dependencies can be easily accounted for by loading the required modules within the module script file of the program itself.