Skip to content

Commit

Permalink
Merge development into master
Browse files Browse the repository at this point in the history
  • Loading branch information
apetkau committed Jul 19, 2018
2 parents 6f42299 + 4db3a80 commit 7b54e91
Show file tree
Hide file tree
Showing 14 changed files with 269 additions and 111 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ python:

env:
matrix:
- "DATABASE_COMMITS='--resfinder-commit dc33e2f9ec2c420f99f77c5c33ae3faa79c999f2 --pointfinder-commit ba65c4d175decdc841a0bef9f9be1c1589c0070a'"
- "DATABASE_COMMITS='--resfinder-commit e8f1eb2585cd9610c4034a54ce7fc4f93aa95535 --pointfinder-commit 8706a6363bb29e47e0e398c53043b037c24b99a7'"

install:
- sudo apt-get update -qq
Expand Down
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
# Version 0.2.0 (in development)
# Version 0.2.1 (in development)

* Minor
* Updating default ResFinder/PointFinder databases to version from July 2018.
* Fix regex extracting gene/variant/accession values from ResFinder/PointFinder databases.
* Fixing a few entries in table mapping genes to phenotypes.
* Print stderr for errors with `makeblastdb`

# Version 0.2.0

* Major
* Inclusion of predicted resistances to antimicrobial drugs thanks to gene/drug mappings from the NARMS/CIPARS Molecular Working Group. Resistance predictions are microbiological resistances and not clinical resistances (issue #4, #6).
Expand Down
123 changes: 118 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

`staramr` (*AMR) scans bacterial genome contigs against both the [ResFinder][resfinder-db] and [PointFinder][pointfinder-db] databases (used by the [ResFinder webservice][resfinder-web]) and compiles a summary report of detected antimicrobial resistance genes.

**Note: The predicted phenotypes/drug resistances are for microbiological resistance and *not* clinical resistance. This is is provided with support from the NARMS/CIPARS Molecular Working Group and is continually being improved. A small comparison between phenotype/drug resistance predictions produced by `staramr` and those available from NCBI can be found in the [tutorial][tutorial]. We welcome any feedback or suggestions.**
**Note: The predicted phenotypes/drug resistances are for microbiological resistance and *not* clinical resistance. This is provided with support from the NARMS/CIPARS Molecular Working Group and is continually being improved. A small comparison between phenotype/drug resistance predictions produced by `staramr` and those available from NCBI can be found in the [tutorial][tutorial]. We welcome any feedback or suggestions.**

For example:

Expand Down Expand Up @@ -54,6 +54,11 @@ staramr search -o out --pointfinder-organism salmonella *.fasta
* [Latest Code](#latest-code)
* [Dependencies](#dependencies)
- [Output](#output)
* [summary.tsv](#summarytsv)
* [resfinder.tsv](#resfindertsv)
* [pointfinder.tsv](#pointfindertsv)
* [settings.txt](#settingstxt)
* [hits/](#hits)
- [Tutorial](#tutorial)
- [Usage](#usage)
* [Main Command](#main-command)
Expand Down Expand Up @@ -177,8 +182,8 @@ source .venv/bin/activate
# Install staramr. Use '-e' to update the install on code changes.
pip install -e .

# Now run `starmr`
starmr
# Now run `staramr`
staramr
```

Due to the way I package the ResFinder/PointFinder databases, the development code will not come with a default database. You must first build the database before usage. E.g.
Expand All @@ -203,8 +208,116 @@ There are 5 different output files produced by `staramr`:
4. `settings.txt`: The command-line, database versions, and other settings used to run `staramr`.
5. `results.xlsx`: An Excel spreadsheet containing the previous 4 files as separate worksheets.

In addition, the directory `hits/` stores fasta files of the specific blast hits.
In addition, the directory `hits/` stores fasta files of the specific blast hits.

## summary.tsv

The **summary.tsv** output file generated by `staramr` contains the following columns:

* __Isolate ID__: The id of the isolate/genome file(s) passed to `staramr`.
* __Genotype__: The AMR genotype of the isolate.
* __Predicted Phenotype__: The predicted AMR phenotype (drug resistances) for the isolate.

### Example

| Isolate ID | Genotype | Predicted Phenotype |
|------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| SRR1952908 | aadA1, aadA2, blaTEM-57, cmlA1, gyrA (S83Y), sul3, tet(A) | streptomycin, ampicillin, chloramphenicol, ciprofloxacin I/R, nalidixic acid, sulfisoxazole, tetracycline |
| SRR1952926 | blaTEM-57, gyrA (S83Y), tet(A) | ampicillin, ciprofloxacin I/R, nalidixic acid, tetracycline |

## resfinder.tsv

The **resfinder.tsv** output file generated by `staramr` contains the following columns:

* __Isolate ID__: The id of the isolate/genome file(s) passed to `staramr`.
* __Gene__: The particular AMR gene detected.
* __Predicted Phenotype__: The predicted AMR phenotype (drug resistances) for this gene.
* __%Identity__: The % identity of the top BLAST HSP to the AMR gene.
* __%Overlap__: THe % overlap of the top BLAST HSP to the AMR gene (calculated as __hsp length/total length * 100__).
* __HSP Length/Total Length__ The top BLAST HSP length over the AMR gene total length (nucleotides).
* __Contig__: The contig id containing this AMR gene.
* __Start__: The start of the AMR gene (will be greater than __End__ if on minus strand).
* __End__: The end of the AMR gene.
* __Accession__: The accession of the AMR gene in the ResFinder database.

### Example

| Isolate ID | Gene | Predicted Phenotype | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End | Accession |
|------------|------------|----------------------|------------|-----------|--------------------------|--------------|--------|-------|-----------|
| SRR1952908 | sul3 | sulfisoxazole | 100.00 | 100.00 | 792/792 | contig00030 | 2091 | 2882 | AJ459418 |
| SRR1952908 | tet(A) | tetracycline | 99.92 | 100.00 | 1200/1200 | contig00032 | 1551 | 2750 | AJ517790 |

## pointfinder.tsv

The **pointfinder.tsv** output file generated by `staramr` contains the following columns:

* __Isolate ID__: The id of the isolate/genome file(s) passed to `staramr`.
* __Gene__: The particular AMR gene detected, with the point mutation within *()*.
* __Predicted Phenotype__: The predicted AMR phenotype (drug resistances) for this gene.
* __Type__: The type of this mutation from PointFinder (either **codon** or **nucleotide**).
* __Position__: The position of the mutation. For **codon** type, the position is the codon number in the gene, for **nucleotide** type it is the nucleotide number.
* __Mutation__: The particular mutation. For **codon** type lists the codon mutation, for **nucleotide** type lists the single nucleotide mutation.
* __%Identity__: The % identity of the top BLAST HSP to the AMR gene.
* __%Overlap__: The % overlap of the top BLAST HSP to the AMR gene (calculated as __hsp length/total length * 100__).
* __HSP Length/Total Length__ The top BLAST HSP length over the AMR gene total length (nucleotides).
* __Contig__: The contig id containing this AMR gene.
* __Start__: The start of the AMR gene (will be greater than __End__ if on minus strand).
* __End__: The end of the AMR gene.

### Example

| Isolate ID | Gene | Predicted Phenotype | Type | Position | Mutation | %Identity | %Overlap | HSP Length/Total Length | Contig | Start | End |
|-------------|--------------|------------------------------------|--------|-----------|----------------------|------------|-----------|--------------------------|--------------|---------|--------|
| SRR1952908 | gyrA (S83Y) | ciprofloxacin I/R, nalidixic acid | codon | 83 | TCC -> TAC (S -> Y) | 99.96 | 100.00 | 2637/2637 | contig00008 | 22801 | 20165 |
| SRR1952926 | gyrA (S83Y) | ciprofloxacin I/R, nalidixic acid | codon | 83 | TCC -> TAC (S -> Y) | 99.96 | 100.00 | 2637/2637 | contig00011 | 157768 | 160404 |

## settings.txt

The **settings.txt** file contains the particular settings used to run `staramr`.

* __command_line__: The command line used to run `staramr`.
* __version__: The version of `staramr`.
* __start_time__,__end_time__,__total_minutes__: The start, end, and duration for running `staramr`.
* __resfinder_db_dir__, __pointfinder_db_dir__: The directory containing the ResFinder and PointFinder databases.
* __resfinder_db_url__, __pointfinder_db_url__: The URL to the git repository for the ResFinder and PointFinder databases.
* __resfinder_db_commit__, __pointfinder_db_commit__: The git commit ids for the ResFinder and PointFinder databases.
* __resfinder_db_date__, __pointfinder_db_date__: The date of the git commits of the ResFinder and PointFinder databases.
* __pointfinder_gene_drug_version__, __resfinder_gene_drug_version__: A version identifier for the gene/drug mapping table used by `staramr`.

### Example

```
command_line = staramr search -o out --pointfinder-organism salmonella SRR1952908.fasta SRR1952926.fasta
version = 0.2.0
start_time = 2018-06-08 10:28:47
end_time = 2018-06-08 10:28:59
total_minutes = 0.20
resfinder_db_dir = staramr/databases/data/dist/resfinder
resfinder_db_url = https://bitbucket.org/genomicepidemiology/resfinder_db.git
resfinder_db_commit = dc33e2f9ec2c420f99f77c5c33ae3faa79c999f2
resfinder_db_date = Tue, 20 Mar 2018 16:49
pointfinder_db_dir = staramr/databases/data/dist/pointfinder
pointfinder_db_url = https://bitbucket.org/genomicepidemiology/pointfinder_db.git
pointfinder_db_commit = ba65c4d175decdc841a0bef9f9be1c1589c0070a
pointfinder_db_date = Fri, 06 Apr 2018 09:02
pointfinder_gene_drug_version = 050218
resfinder_gene_drug_version = 050218
```

## hits/

The **hits/** directory contains the BLAST HSP nucleotides for the entries listed in the **resfinder.tsv** and **pointfinder.tsv** files. There are up to two files per input genome, one for ResFinder and one for PointFinder.

For example, with an input genome named **SRR1952908.fasta** there would be two files `hits/resfinder_SRR1952908.fasta` and `hits/pointfinder_SRR1952908.fasta`. These files contain mostly the same information as in the **resfinder.tsv** and **pointfinder.tsv** files. Additional information is the **resistance_gene_start** and **resistance_gene_end** listing the start/end of the BLAST HSP on the AMR resistance gene from the ResFinder/PointFinder databases.

### Example

```
>aadA1_3_JQ414041 isolate: SRR1952908, contig: contig00030, contig_start: 5355, contig_end: 4564, resistance_gene_start: 1, resistance_gene_end: 792, hsp/length: 792/792, pid: 100.00%, plength: 100.00%
ATGAGGGAAGCGGTGATCGCCGAAGTATCGACTCAACTATCAGAGGTAGTTGGCGTCATC
GAGCGCCATCTCGAACCGACGTTGCTGGCCGTACATTTGTACGGCTCCGCAGTGGATGGC
...
```

# Tutorial

Expand Down Expand Up @@ -445,4 +558,4 @@ specific language governing permissions and limitations under the License.
[pypi-staramr]: https://pypi.org/project/staramr/
[bioconda]: https://bioconda.github.io/
[card-web]: https://card.mcmaster.ca/
[tutorial]: doc/tutorial/staramr-tutorial.ipynb
[tutorial]: doc/tutorial/staramr-tutorial.ipynb
Loading

0 comments on commit 7b54e91

Please sign in to comment.