In this repository are included the scripts published in the paper Parsimony analysis of phylogenomic datasets (I): scripts and guidelines for using TNT by Ambrosio Torres, Pablo A. Goloboff and Santiago A. Catalano (Cladistics. 2022) [https://onlinelibrary.wiley.com/doi/epdf/10.1111/cla.12477?af=R]. Please cite the work mentioned if you use any tools included in this repository. The scripts are completely described here and most of them can be executed using the TNT Windows-menu version (Fig. 1) and using the command-line versions of TNT for Linux, Mac and Windows. The entire information about the scripts is also within the files themselves. This information is also available by typing in TNT (please make sure the script-files are in the folder where you are working):
run NameOfTheScript.run help;
Multimedia tutorials on the use of the scripts: https://www.youtube.com/channel/UCpIgK8sVH-yK0Bo3fK62IxA/featured
Figure 1. Opening input files in the TNT Windows-menu version.
We provide sample datasets in the link below that may be used for testing the scripts presented here. However, we encourage you to use your own datasets in order to expand the number of datasets that have been used to test these scripts. In this way, with your help we will more easily notice the bugs and errors the scripts may be producing: https://drive.google.com/file/d/1lTyVPjFP0HSw_-Kn13nfpXb3lakHx3WS/view?usp=sharing
- easy.run (script to familiarize with the TNT scripting structure and with the basic TNT commands. Change the name "Prosopis.tnt" for the name of your own dataset in the script!)
- CompileGenes.run
- GeneOccupancy.run
- PhylogenomicSearch.run
- PhylogenomicSupport.run
- SearchGeneTrees.run
- GeneSupFrequency.run
- TreeDistMatrix.run
This script enables the user to concatenate any number of data files into a single matrix in TNT format. When the script is run from TNT Windows-menu version, a dialog box is opened to define the settings (Fig. 2). In Linux, Mac and Windows command-line versions of TNT, the script should be run with arguments (see the help of the script).
Figure 2. Dialog box for CompileGenes.run.
This script generates submatrices with sampling of taxa/genes that present a level of occupancy defined by the user. Two different conditions can be determined: (i) percentage of species a gene must present for to be sampled, (ii) percentage of genes a species must present for to be sampled. The script produces a submatrix in TNT format and a SVG file with information about taxon/gene occupancy (Fig. 3). This script should be executed using arguments in Linux, Mac and Windows command TNT versions (for more information see the help of the script).
Figure 3. Example of graphical output from GeneOccupancy.run.
This script performs phylogenetic inference for phylogenomic datasets. The script enables the users to define multiple settings in a very simple way (e.g. the treatment of the gaps in the alignments, the use of different implied weighting schemes, the format of the output files, etc.) When the script is run from TNT Windows-menu version, a dialog box is opened to define the settings (Fig. 4).
Figure 4. Dialog box for PhylogenomicSearch.run.
The script performs six different types of support analyses. In addition the users have access to a comprehensive combination of possible settings. When the script is run from TNT Windows-menu version, a dialog box is opened to define the settings (Fig. 5). Note: FAR = Fast Approximate Resampling.
Figure 5. Dialog box for PhylogenomicSupport.run.
The script calculates the gene trees from a TNT dataset where each gene is included as a different block/partition (e.g. output files from CompileGenes.run). When the script is run from TNT Windows-menu version, a dialog box is opened to define the settings (Fig. 6). In Linux, Mac and Windows command-line versions of TNT, the script should be run with arguments (see the help of the script).
Figure 6. Dialog box for SearchGeneTrees.run.
This script calculates the percentage of individual gene trees supporting the nodes of a reference tree (e.g. concatenation-based tree). Some authors referred to this measure as Gene Support Frequency (GSF; Gadagkar et al. 2005 doi: 10.1002/jez.b.21026&59; Salichos et al., 2014 doi: 10.1093/molbev/msu061). This script presents the GSF values in three different ways (Fig. 7).
Figure 7. Example of graphical output from GeneSupFrequency.run. (a) Graphical matrix of the Gene support frequency for the species tree (concatenation-based tree). Green box: groups present in the gene tree with the exact taxon composition. Yellow box: groups present in the gene tree after missing taxa in the gene tree are pruned from the reference tree. Red box: groups not present in the gene tree. The frequencies for each gene/node is presented as stacked bar plots; (b) Species tree with values of Gene-support frequency. Labels above branches= node number (Nd). Labels below branches= frequency of gene trees that present the node after taxa missing from the gene tree are pruned from reference tree (Yellow + green in the graphical matrix) | frequency of gene trees that present the node with exact composition (Green in the graphical matrix; Gene-support frequency).
This script calculates a distance matrix for a set of trees. The users could choose among four different metrics (SPR moves, SPR difference, normalized Robinson-Foulds and the distortion coefficient). Two outputs are produced: the distance matrix in CSV format and a heat map in SVG format (Fig. 8).
Figure 8. Example of graphical output from TreeDistMatrix.run.