Additional datasets for Swift et al. "A Genomic Catalog of Stress Response Genes in Anaerobic Fungi for Applications in Bioproduction"
Phylogenetic trees in Newick format constructed by Fasttree (1) or RAxML (2) from selected transmembrane kinases from Neocallimastigomycetes and apicomplexans and their top homologs. The folder “hits” contains the top-scoring MMSeq2 (3) hits from the NCBI’s non-redundant protein database, MycoCosm (4), and MMMMETPS (5) for each sequence (excluding Neocallimastigomycetes hits for the four sequences from this class). Sequence identifiers for each filename are as follows: anasp=MycoCosm protein Id 13827 (Anaeromyces robustus), caecom=543476 (Caecomyces churrovis), neosp=503500 (Neocallimastix californiae), pirfi=367815 (Piromyces finnis), plasma2=accession XP_024328881.1 (Plasmodium falciparum), thei2=XP_953607.1 (Theileria annulata), toxo1=AAS48463.1 (Toxoplasma gondii), toxo2=ACA62938.1 (Toxoplasma gondii).
Citations:
(1) Price, M. N., Dehal, P. S. & Arkin, A. P. Fasttree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
(2) Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
(3)Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
(4) Grigoriev, I. V. et al. MycoCosm portal: Gearing up for 1000 fungal genomes. Nucleic Acids Res. 42, 699–704 (2014).
(5) Keeling, P. J. et al. The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP): Illuminating the Functional Diversity of Eukaryotic Life in the Oceans through Transcriptome Sequencing. PLoS Biol. 12, e1001889 (2014).