Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/similarity/frequency incorrectly returns zero for "owl:Thing" and corpus taxa #380

Closed
hlapp opened this issue Jan 30, 2021 · 7 comments
Closed

Comments

@hlapp
Copy link
Member

hlapp commented Jan 30, 2021

"http://www.w3.org/2002/07/owl#Thing" is one the subsumers returned in the subsumer matrix, however, when used as input for /similarity/frequency, the count returned is zero rather than the corpus size.

This needs to be fixed on the server end. On the client side I shouldn't guess whether zero is somehow the magic number meaning the size of the corpus, or that the occurrence count really is zero.

@hlapp
Copy link
Member Author

hlapp commented Jan 30, 2021

This currently leads to a test failure in Rphenoscape (phenoscape/rphenoscape#153).

@hlapp
Copy link
Member Author

hlapp commented Jan 30, 2021

@balhoff what's a reasonable ETA for getting this fixed?

@balhoff
Copy link
Member

balhoff commented Jan 30, 2021

The special semantics of owl:Thing can make it tricky to always handle it correctly (and doing so can also be unnecessarily expensive). Would you be okay with the services filtering it out of results? This is likely already done for some services and if so should be made consistent.

@hlapp
Copy link
Member Author

hlapp commented Jan 30, 2021

Yes I think we can safely filter it out of any semantic similarity-related results, including from the subsumer matrix. It is far from the only upper level term that are commonly in the union for subgraph similarity. (However, this means you should not include it in your own Jaccard calculation, either, or otherwise our scores will start to diverge.)

If someones does pass it into /similarity/frequency, it would be better to return "NA" or nothing for the count, rather than zero. I.e., if an IRI gets passed for which you don't have a count, return nothing as the count, not zero.

@hlapp
Copy link
Member Author

hlapp commented Jan 30, 2021

if an IRI gets passed for which you don't have a count, return nothing as the count, not zero.

Should I post this a separate (generic) issue?

@balhoff
Copy link
Member

balhoff commented Feb 1, 2021

Should I post this a separate (generic) issue?

That sounds good. I will plan to fix at least this one this week, and see what else needs changing.

@balhoff
Copy link
Member

balhoff commented Feb 24, 2021

We are filtering out owl:Thing from return values: see #383. That should prevent this from being an issue in the future.

@balhoff balhoff closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants