Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial DOIs #101

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Partial DOIs #101

wants to merge 7 commits into from

Conversation

eseiver
Copy link
Collaborator

@eseiver eseiver commented Mar 22, 2018

This PR adds a number of methods to the Article and Corpus class to allow for Article() creation from a partial DOI, including from Corpus, transformations between partial and full DOIs, as well as regex to validate partial DOIs. I added a new pytest file as well to cover these changes.

eseiver added 7 commits March 28, 2018 17:02
to make internal PLOS server processes easier for checking articles likely to have updated XML, adds `from_partial_doi` class method & `find_valid_partial_dois` regex.
The same as it can initialize an Article object from DOI, it can do that from a partial DOI as well.
return 'pone.0040259'


def test_partial_doi_regex(test_partial_doi):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you test the regexes directly as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this test is confusingly named… its not testing the doi regexes it's testing the validate_partial_doi function.

@@ -22,6 +22,10 @@
full_doi_regex_match = re.compile(regex_match_prefix+regex_body_match)
full_doi_regex_search = re.compile(r"10\.1371/journal\.p[a-zA-Z]{3}\.[\d]{7}"
"|10\.1371/annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}")
partial_doi_regex_search = re.compile(r"p[a-zA-Z]{3}\.[\d]{7}"
"|annotation/[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}")
partial_doi_regex_match = re.compile(r"^p[a-zA-Z]{3}\.[\d]{7}$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is just the same string with ^+your_str+$, am I correct? and the last part of it is almost the same as the last part of the full_doi_regex_search?

Why not make this based on a single string that you then combine as needed to make the search and match?

I'm not sure you need to define search vs. match separately anyway (in this case at least). I ask below… but it'd be a lot easier to know whether that is the case if you tested the regexes directly.

Example: 'pbio.2000777' is True, but '10.1371/journal.pbio.2000777' is False
:return: True if string is in valid PLOS partial DOI format; False if not
"""
return bool(partial_doi_regex_match.search(partial_doi))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the match regex used to search?

@property
def partial_doi(self):
"""Convert a DOI to a partial DOI."""
return self.doi.lstrip('10.1371/').replace('journal.', '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't you using doi_to_partial?

if isinstance(key, int):
return Article(self.dois[key], directory=self.directory)
elif isinstance(key, slice):
return (Article(doi, directory=self.directory)
for doi in self.dois[key])
elif key not in self.dois:
if partial_to_doi(key) in self.dois:
return Article.from_partial_doi(key, directory=self.directory)
elif validate_partial_doi(key):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be safer to validate first?

@@ -1336,3 +1342,15 @@ def from_filename(cls, filename):
else:
directory = None
return cls(filename_to_doi(filename), directory=directory)

@classmethod
def from_partial_doi(cls, partial_doi, directory=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you validate the partial doi first?



@pytest.fixture
def corpus():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider the corpus a module/session level scoped fixture since this is basically the same as in some of the other tests. You don't need to do it for this PR but it's worth considering



@pytest.fixture
def test_article():
Copy link
Collaborator

@mpacer mpacer Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally fixtures don't have test prefixed to them (see corpus above)


@pytest.fixture
def test_article():
return Article('10.1371/journal.pone.0040259', directory=starterdir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the corpus & doi fixtures directly here, since fixtures can be used in other fixtures:

@pytest.fixture(scope='module')
def article(corpus, doi):
    return corpus[doi]



@pytest.fixture
def test_doi():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with test_article shouldn't be prefixed with test, better is to just use def doi():



@pytest.fixture
def test_partial_doi():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called partial_doi


def test_partial_doi_regex(test_partial_doi):
assert validate_partial_doi(test_partial_doi)
assert not validate_partial_doi(' pone.0040259')
Copy link
Collaborator

@mpacer mpacer Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to have a test like this it's better to make explicit what the manipulation is " " + partial_doi would do that.

def test_partial_doi_regex(test_partial_doi):
assert validate_partial_doi(test_partial_doi)
assert not validate_partial_doi(' pone.0040259')
assert not validate_partial_doi('pone.0040259 ')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above… reuse partial_doipartial_doi + " "

assert doi == test_doi


def test_partial_doi_method_article(test_partial_doi, test_article):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically I would prefer this to be in its own test_article.py for article functionality but I don't think that exists.

assert article == test_article


def test_partial_doi_method_corpus(corpus, test_article, test_partial_doi):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the test_corpus.py testing module.

@mpacer
Copy link
Collaborator

mpacer commented Apr 10, 2018

I was thinking about this for a bit… why do you ever need to validate a partial doi?

Why not validate only on dois since dois are the only things that actually have canonical representations?

That way the canonical way to do everything is always to transform the partial doi into the doi first and then validate the doi.

This seems like it introduces a lot of complexity and it's not clear to me what the wins are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants