Skip to content

Latest commit

 

History

History
174 lines (140 loc) · 8.39 KB

exploring-results.md

File metadata and controls

174 lines (140 loc) · 8.39 KB

Exploring Results

The tutorials include how to retrieve some of the more important results for each case: comparing relationships among genomes, finding clades, and evaluating genomes for completeness and contamination. To retrieve more detailed information it is helpful to understand MiGA's directory structure. The following sub-directories are included for each project under the project directory:

.
├── daemon
│   └── daemon.json
├── data
│   ├── 01.raw_reads
│   ├── 02.trimmed_reads
│   ├── 03.read_quality
│   ├── 04.trimmed_fasta
│   ├── 05.assembly
│   ├── 06.cds
│   ├── 07.annotation
│   │   ├── 01.function
│   │   │   ├── 01.essential
│   │   │   └── 02.ssu
│   │   ├── 02.taxonomy
│   │   │   └── 01.mytaxa
│   │   └── 03.qa
│   │       ├── 01.checkm
│   │       └── 02.mytaxa_scan
│   ├── 08.mapping
│   │   ├── 01.read-ctg
│   │   └── 02.read-gene
│   ├── 09.distances
│   │   ├── 01.haai
│   │   ├── 02.aai
│   │   ├── 03.ani
│   │   ├── 04.ssu
│   │   └── 05.taxonomy
│   ├── 10.clades
│   │   ├── 01.find
│   │   ├── 02.ani
│   │   ├── 03.ogs
│   │   ├── 04.phylogeny
│   │   │   ├── 01.essential
│   │   │   └── 02.core
│   │   └── 05.metadata
│   └── 90.stats
├── metadata
└── miga.project.json

The sub-directories under data and beginning with 01 thorugh 10 correspond to MiGA's sequential steps in processing data beginning with raw paired fastq files loaded into 01.raw_reads. Not all steps need to be done. In most of the tutorials, we submitted assembled genomes in fasta format; they were loaded into 05.assembly and processing began there. In the genome assembly exercise, we began by loading already trimmed and quality filtered data into 04.trimmed. The genome projects ended with step 09.distances. Step 10.clades was performed only for the clade project. Each of these sub-directories beginning with 01 through 10 contain results for the associated data processing step.

Some of the results are in text format (sometimes compressed), e.g. tables, sequence files, gff files, nwk files. Others are in pdf format or Rdata format. The information in each file is explained in the MiGA workflow section of the MiGA manual. If you are using MiGA on a cluster, you can download the files individually using FileZilla or a similar program.

A Taxonomy Summary Script

For projects including classification, Fang Yuan has written a script to summarize the results. For each classified genome it determines the closest reference genome, the distances to it, and writes the results to the file summary.csv. To use this script, move into the project/data/09.distance/05.taxonomy directory and run the following commands:

wget https://github.com/jfq3/Miscellaneous-scripts/raw/master/miga_sumdb.sh
chmod 750 miga_sumdb.sh
./miga_sumdb.sh

For the miscellaneous genomes project in these tutorials, this produces:

name closest haai aai ani
Acidobacterium_capsulatum Silvibacterium_bohemicum_GCA_001006305 99.9508952380952 60.0831465538606
Acinetobacter_baumanii Pseudomonas_pharmafabricae_GCA_002835605 98.7412788461538 48.5834215277602
Bacillus_anthracis Bacillus_flexus_NBRC_15715_GCA_001591565 98.9199428571429 59.5616830477613
Bacillus_cereus Bacillus_flexus_NBRC_15715_GCA_001591565 98.9702788461539 59.9991400823551
Bifidobacterium_bifidum Bifidobacterium_scardovii_JCM_12489___DSM_13734_GCA_000770985 99.7717980769231 68.9492871800872
Campylobacter_jejuni Campylobacter_helveticus_GCA_900176295 99.1678666666667 66.2667775679713
Cytophaga_hutchinsonii Nafulsella_turpanensis_ZLM_10_GCA_000346615 99.9811470588235 50.1443577165562
Gemmatimonas_aurantiaca Gemmatimonas_phototrophica_NZ_CP011454 99.9633431372549 68.505494616812
Lacunisphaera_limnophila Oleiharenicola_lentus_GCA_004118375 99.9875247524753 68.0251176025193

A genome completeness and quality script

John Quensen has written a python script to summarize the completeness, contamination, and quality of each genome in a project and write it to a tab-delimited file that my be viewed as is or loaded into a spread-sheet program like Excel. It takes two arguments: the path to the MiGA project and the name of the output file. Download and run the script by entering the following commands:

wget https://github.com/jfq3/Miscellaneous-scripts/raw/master/miga_completeness.py
chmod u+x 
python ./miga-completeness.py /path/to/miga/project completness_summary.txt

For the miscellaneous genomes project in these tutorials, this produces:

Completeness    Contamination   Quality Genome
94.6            0.9             90.1    P_putida
94.6            0.9             90.1    P_stutzeri
95.5            0.9             91.0    P_alcaligenes
94.6            0.9             90.1    P_syringae
93.7            0.9             89.2    P_fluorescens
94.6            0.9             90.1    P_mendocina

Browse the results with MiGA-Web

MiGA was originally provided as a web-based program, and the results were delivered via a web browser. You can view MiGA results generated on a cluster in the same manner if you first install the Docker version of MiGA on your computer. Steps are as follows:

  1. If you have not already done so, install the Docker Version appropriate to your system (Windows, Mac OS or Linux).

  2. If you have not already done so, install MiGA-Web:

    • Open a terminal and enter the following, one line at a time:

      docker pull miga/miga:1.0.2.0
      docker run -p 9090:3000 -it -v C:/miga:/root/miga-data -v db_volume:/miga/db --name miga miga/miga:1.0.2.0 /bin/bash
      cd miga/
      export SECRET_KEY_BASE=`bundle exec rake secret`
      bundle exec rails server -e production -b 0.0.0.0 -p 3000 Puma
      

      The docker run line creates the MiGA project directory as C:/miga on your computer. You may use a different directory name if you wish.

    • Enter control C and close the terminal.

    • Open a new terminal and enter:

      docker stop miga
      
    • Close the terminal.

  3. Compress the results on the cluster:

    • Using tar:

      cd project_directory
      tar czf project_name.tar.gz .
      
    • Using miga archive. With MiGA running on the cluster:

      cd project_directory
      miga archive -o project_name.tar.gz -P .
      

      The miga archive option has the advantage that unnecessary files are not included in the archive. Thus the archive file is smaller and the MiGA results take up less drive space on your computer.

  4. Download the compressed file to the project directory on your computer and decompress it. The installation commands above created the project directory as C:/miga-web.

    tar xzf project_name.tar.gz
    
  5. Start Miga on your computer.

    • Start Docker desktop.

    • Open a terminal and enter, one line at a time:

      docker start miga
      docker exec -it miga /bin/bash
      cd miga/
      export SECRET_KEY_BASE=`bundle exec rake secret`
      bundle exec rails server -e production -b 0.0.0.0 -p 3000 Puma
      
    • Leave the terminal running until you are finished browsing the results (see below).

    • Open a web browser, enter "localhost:9090" as the URL and log into MiGA.

  6. Link the project to MiGA.

    • Go to the Admin console page.
    • Click on "Link existing projects"
    • Under the name of the project, click either "Link publicly" or "Link privately." The choice only affects where the project will be listed.
  7. Open the project and browse the results. Get explanations for each item; the "Learn more" links take you directly to the appropriate section of the MiGA manual. The download links copy the item to your downloads directory.

  8. After you have finished, end MiGA as above after you installed it:

    • Enter control C into the terminal and close it.

    • Open a new terminal and enter:

      docker stop miga
      
    • Close the terminal.