Implement HMMER's handling of X's (for protein) and N's (for DNA) #117

ihh · 2020-06-14T23:56:57Z

HMMER weights IUPAC degenerate emissions using the reciprocal of the perplexity of the underlying match state (see esl_abc_FExpectScore function in HMMER3 source)

This has the effect that the "score" for those emissions is the expectation of what you'd get if you randomized X's using the underlying emission distribution - much to the chagrin of Roger Sewell, who argued they should be treated as missing data (Sean's counterargument is that this
would reward their alignment to the model) - this is an old argument

Practically (as noted by @jordisr) this affects <1% of sequences, but for full hmmer compatibility we ought to include it.

The text was updated successfully, but these errors were encountered:

ihh added the enhancement label Jun 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement HMMER's handling of X's (for protein) and N's (for DNA) #117

Implement HMMER's handling of X's (for protein) and N's (for DNA) #117

ihh commented Jun 14, 2020 •

edited

Loading

Implement HMMER's handling of X's (for protein) and N's (for DNA) #117

Implement HMMER's handling of X's (for protein) and N's (for DNA) #117

Comments

ihh commented Jun 14, 2020 • edited Loading

ihh commented Jun 14, 2020 •

edited

Loading