Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add yapremisrw2, yet another PREMIS reader/writer plugin #34

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jrwdunham
Copy link
Contributor

@jrwdunham jrwdunham commented Nov 9, 2017

Added yapremisrw, yet another PREMIS reader/writer plugin. This was based on previous work by mcantelon. It has been rebased against current master and modified minimally to make it a dependency (plugin) injectable into metsrw.fsentry.FSEntry.

This should supersede #20. The CR/comments from that PR can, in large part, be applied to this one.

@jrwdunham jrwdunham self-assigned this Nov 9, 2017
@jrwdunham jrwdunham force-pushed the dev/issue-11581-premis-parsing branch from 02d14c9 to cd8113c Compare November 10, 2017 03:13
@jrwdunham jrwdunham force-pushed the dev/issue-11581-premis-parsing branch from cd8113c to d5a1645 Compare November 10, 2017 23:19
@jrwdunham jrwdunham changed the title Add premisrw2, a second PREMIS plugin Add yapremisrw2, yet another PREMIS reader/writer plugin Nov 10, 2017
@jrwdunham
Copy link
Contributor Author

After this is merged, a 0.2.1 release should be created so that artefactual/archivematica-storage-service#262 can require it.

Added yapremisrw, yet another PREMIS reader/writer plugin. This was based on
previous work by mcantelon. It has been rebased against current master and
modified minimally to make it a dependency (plugin) injectable into
`metsrw.fsentry.FSEntry`.
@jrwdunham jrwdunham force-pushed the dev/issue-11581-premis-parsing branch from f1a1bea to ee177cf Compare November 11, 2017 01:05
@sevein sevein self-requested a review November 11, 2017 16:19
Copy link
Member

@sevein sevein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I'm happy to see how your plugin solution is working. Do you think we should try to pick @mcantelon's brain to review it? It's been almost two years since his original work but hey I'm sure he remembers. @mcantelon, are you watching this?

setup.py Outdated

here = path.abspath(path.dirname(__file__))

# Get the long description from the relevant file
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
with open(path.join(HERE, 'README.md'), encoding='utf-8') as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't renamed the global yet right? Is still reads here.

setup.py Outdated
long_description = f.read()


def get_version():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also done in archivematica-fpr-admin. It looks a bit different - which method you prefer?

Copy link
Contributor Author

@jrwdunham jrwdunham Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think raising a RuntimeError if no version retrievable is probably a good idea, as is allowing for UTF-8-encoded README and init.py files. Also, the regular expression approach is shorter. I'll copy that code to here.

import tests.helpers as h


class TestDublinCoreXmlData(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests live in tests/plugins/dc. Shouldn't it be tests/plugins/dcrw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, yes it should.

@@ -0,0 +1,33 @@
# -*- coding: utf8 -*-
Copy link
Member

@sevein sevein Nov 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With you coverage only gets better! 📈

I realize coveralls can be annoying when it doesn't pass the build just because the coverage decreased 0.01% 😆 - but if this gets on your way again you could also change the threshold settings. I'm not sure what the defaults are but they're sure very small!

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know about the threshold settings. The test coverage actually decreased by about 0.5%, but yes it was annoying to have to add more tests to get the CI checks to pass.

def mockisfile(path):
if path == bad_path:
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is this the Pythonic way of return not path == bad_path? I like explicit, just wondering!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you've written is equally Pythonic. Maybe I'll add some parens so nobody has to think about operator precedence: return not (path == bad_path).

Also moved tests/plugins/dc to tests/plugins/dcrw to be consistent with
metsrw/plugins.
@jrwdunham jrwdunham requested a review from mcantelon November 14, 2017 00:07
Copy link
Contributor Author

@jrwdunham jrwdunham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have copied relevant comments from #20 to this PR and closed PR #20.

"""
DC_ELEMENTS = ['title', 'creator', 'subject', 'description', 'publisher', 'contributor', 'date', 'format', 'identifier', 'source', 'relation', 'language', 'coverage', 'rights']

def __init__(self, title=None, creator=None, subject=None, description=None, publisher=None, contributor=None, date=None, format=None, identifier=None, source=None, relation=None, language=None, coverage=None, rights=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment by @Hwesta: This is a great use case for kwargs instead of accessing locals() directly. We could replace the list of parameters with **kwargs, which would allow us to accept an arbitrary number of inputs. kwargs is a dictionary. Then the loop could be setattr(self, element, kwargs[element])

parser = etree.XMLParser(remove_blank_text=True)
if isinstance(document, six.string_types):
self.document = etree.fromstring(document, parser=parser)
elif isinstance(document, (etree._Element, list)):
self.document = document
self.mdtype = mdtype
self.othermdtype = othermdtype
self.data = data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment by @Hwesta: Rather than creating a new attribute, this could use document to store the child document - whether that's a string, ElementTree, or plugin class of the appropriate type.

if event_type is None:
raise ConstructError('event_type argument is required.')

if event_datetime is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment by @Hwesta: This feels unpythonic - these parameters are required by the function, and if they're None, it will raise an error later.


event_type_el = etree.Element(utils.lxmlns('premis') + 'eventType')
event_type_el.text = self.event_type
root.append(event_type_el)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment by @Hwesta: Tip: These three lines can be etree.SubElement(root, utils.lxmlns('premis') + 'eventType').text = self.event_type

Before this, calling `get_subsections_of_type` could raise `IndexError`
if the `FSEntry` instance had no amdSecs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants