-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use StorageManager
in execute_DAG
#258
base: main
Are you sure you want to change the base?
Conversation
…o shared-object-v2
Makes much more sense as _safe_to_delete_holding
right now, we seem to not be getting anything; see if this is because someone has added a handler on the root logger or because we can't caplog from a parent logger (may need specific __name__)
…o shared-object-v2
also update for changes in other PR
This will make it easier to test that we create the same old directories
from ..storage.storagemanager import StorageManager | ||
from ..storage.externalresource.filestorage import FileStorage | ||
from ..storage.externalresource.base import ExternalStorage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix the namespace of storage
so these are all available at the same level? from ..storage import StorageManager, FileStorage, etc
) | ||
return storage_manager | ||
|
||
|
||
def execute_DAG(protocoldag: ProtocolDAG, *, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you keeping this for comparison for the PR? I don't think we need to keep the old behaviour around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we told Joe that the API was stable enough that he could use it in workflows. So yeah, I spent a lot of time ensuring that there would be a code path that doesn't break the current API.
|
||
def new_execute_DAG( # TODO: this is a terrible name | ||
protocoldag: ProtocolDAG, | ||
dag_label: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is a DAG label now required?
and why can't the label just be taken off the input DAG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks current behavior, where the equivalent of the DAG label is given by the user as the -o
from quickrun.
Make a function that returns a pathlib.Path be the main thing, instead of the property that returns a string. Almost all usage of `.fspath` was to then wrap it in a pathlib.Path. This is more convenient for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting on a couple of breaking changes these PRs introduce that people should be aware of.
with open(ctx.shared / f'unit_{my_id}_shared.txt', 'w') as out: | ||
out.write(f'unit {my_id} existed!\n') | ||
with open(os.path.join(ctx.scratch, f'unit_{my_id}_scratch.txt'), 'w') as out: | ||
with open(ctx.scratch / f'unit_{my_id}_scratch.txt', 'w') as out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking change: os.path
functionality should not be used. Switch to the pathlib-like semantics of using /
.
I don't see a way to have this not be breaking, since the PathLike created here needs to be one of ours. Options that involve keeping os.path
would be (1) require something other than the system open
, or (2) require manual conversion of any user-created path to one of our objects. We do support option 2 for cases where legacy code can't be converted, but the pathlib-like syntax is more readable and allows us to silently support any protocol that already uses that syntax.
assert len(list(shared.iterdir())) == 0 | ||
# assert len(list(shared.iterdir())) == 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Extremely minor) breaking change: Previously, this left behind 5 empty shared directories. In the current implementation, empty directories are, by default, automatically deleted, so we have no directories within the shared
space. Had these directories contained any files, you would still find all 5.
There's a parameter to keep empty directories, but the effect will also be to keep empty subdirectories, which I didn't think was the desired default behavior. My assumption is that it will be relatively rare to keep shared
anyway, because there should be something in settings to determine whether any given file is scratch
/shared
/permanent
, and anything intended for long-term storage will be put in permanent
.
Adds a user story test where a type supported by a custom JSON codec is added to the result dict, and that codec isn't registered as of serialization object creation.
This is to avoid likely footguns related to using os.path.join. Includes test of error on os.path.join.
gufe/protocols/protocoldag.py
Outdated
with dag_ctx.running_unit( | ||
dag_label, unit.key, attempt=attempt | ||
) as context: | ||
_logger.info("Starting unit {label}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing an f-string thingy here
gufe/protocols/protocolunit.py
Outdated
scratch: PathLike | ||
shared: PathLike | ||
shared: StagingRegistry | ||
permanent: StagingRegistry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are meant to be stagingpaths not registries?
This is a subset of the work in #234, split off to facilitate review. This PR will merge into #186 (making it easier to see the additions specific to the PR, but meaning that this should be merged before that). It also includes the work in #257 (one of the old tests fails without that).
This PR integrates
StorageManager
intoexecute_DAG
. Note that this still isn't a complete replacement of the storage system: a follow-up PR will handle serialization ofStagingPath
s.TODO:
ReproduceOldBehaviorStorageManager
; I think some paths aren't quite right.ExecutionStorageDemoTest
subclassesdag_label
is given (also needed in #186)