-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add slidescore url and slidescore study id to download manifest. #42
base: main
Are you sure you want to change the base?
add slidescore url and slidescore study id to download manifest. #42
Conversation
- save manifest as json - when starting the download, add slidescore_url and slidescore_study_id - add layer of slidescore_url and slidescore_study_id keys, might a user download multiple studies into a single directory, so as to not overwrite the initial values of `slidescore_study_id` if a second study is downloaded to the same directory.
- add option to save mappings without downloading WSIs - add option to save mappings as tsv instead of json - save image name instead of file name, since this is not helpful when we download .mrxs files as zips - flip image name -> image id mapping to image id -> image name mapping, since image name is not necessarily unique
Now also fixes #43 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. Can you also fix mypy?
slidescore_api/cli.py
Outdated
Will end up with something like | ||
{ | ||
'slidescore_url': str, | ||
'slidescore_study_id': int, | ||
'slide_filename_to_id_mapping': { | ||
str: int | ||
... | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not filename anymore is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the structure indeed.
changed
{
"url": {
"study_id": {
"slidescore_url": "url",
"slidescore_study_id": study_id,
"slide_filename_to_study_image_id_mapping": {
"image_id": "image_name",
...
...
}
}
}
}
slidescore_api/cli.py
Outdated
def append_to_tsv_mapping(save_dir: pathlib.Path, items: List[str]) -> None: | ||
""" | ||
Create a manifest mapping image id to the filename. | ||
Create a manifest mapping image id to image name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that if the file does not exist yet, you write the first two lines?
# slidescore_url: <URL>
# study_id: <id>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do this manually in cli.download_wsis
.
Otherwise I'd need to always pass the url
and id
to the function.
Also, if this would be the functionality, then downloading a second slidescory study into the same directory would NOT write the slidescore_url and study_id of the second study.
elif mapping_format == "tsv":
append_to_tsv_mapping(save_dir=save_dir, items=[f"# {slidescore_url}"])
append_to_tsv_mapping(save_dir=save_dir, items=[f"# {study_id}"])
slidescore_api/cli.py
Outdated
@@ -399,23 +493,56 @@ def download_wsis( | |||
# Collect image metadata | |||
images = client.get_images(study_id) | |||
|
|||
# # Add study details to mapping manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single #
disable_download: bool = False, | ||
mapping_format: str = "json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no other way to get the mapping only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not that i can think of without duplicating a lot of code or entirely refactoring the function into less coherent parts.
elif mapping_format == "tsv": | ||
append_to_tsv_mapping( | ||
save_dir=save_dir, | ||
items=[str(image_id), image_name], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not one of these, it's better to raise an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Thought this was done by the argparser, but raising error here makes the function more generic
else:
raise ValueError(f"mapping_format should be either 'tsv' or 'json', but is {mapping_format}")
- update docstring of file structure of slidescore_mapping.json - remove double # - raise valueerror when mapping format is not properly given
slidescore_study_id
if a second study is downloaded to the same directory.Fixes #40