Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stardist 0.8.5 recipe to containers #551

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

FloWuenne
Copy link
Contributor

Submitting a Container

(If you're requesting for a new container, please check the procedure described here.

Check BioContainers' Dockerfile specifications

Checklist

  1. Misc
  • My tool doesn't exist in BioConda
  • The image can be built
  1. Metadata
  • LABEL base_image
  • LABEL version
  • LABEL software.version
  • LABEL about.summary
  • LABEL about.home
  • LABEL about.license
  • MAINTAINER
  1. Extra (optionals)
  • I have written tests in test-cmds.txt
  • LABEL extra.identifier
  • LABEL about.documentation
  • LABEL about.license_file
  • LABEL about.tags

Check BioContainers' Dockerfile metadata

@biocontainers-bot
Copy link
Collaborator

No biotools label defined, please check if tool is not already defined in biotools (https://bio.tools) and add extra.identifiers.biotools label if it exists. If it is not defined, you can ignore this comment.

@biocontainers-bot
Copy link
Collaborator

No test-cmds.txt (test file) present, skipping tests


ARG DEBIAN_FRONTEND="noninteractive"
ARG STARDIST_VERSION="0.8.5"
ARG NVIDIA_DRIVER_VERSION=470
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this will be an issue? According to the stardist github, the NVIDIA_DRIVER_VERSION should be adjusted to the runtime environnement (depending on the GPU you have, I guess).

No sure how the container will behave if the runtime env does not match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @mboudet !

What would you suggest how to handle this issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No quite sure, to be honest. It would be possible to install the correct lib at runtime I guess, but it would be quite dirty, and would probably not work with singularity.

I'm not sure how does it work with the conda package? Would it be possible to directly install the package in the dockerfile (like here), or does it require external librairies ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what you mean whether it would be possible to directly install the package in the dockerfile? Which package are you referring to? I don't see anything related to nvidia drivers in the recipe you linked unforunately? 🤷🏼

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you use the 'NVIDIA_DRIVER_VERSION' variable to install a specific version of the libnvidia-compute in the next step. (Line 24). According the the stardist github, the libnvidia-compute version should match the GPU driver version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as is custom these days, I asked ChatGPT for help. The suggestion from it was to specify --build-arg when building the docker container at runtime to adjust the ENV variable NVIDIA_DRIVER_VERSION to the one available:
docker build --build-arg NVIDIA_DRIVER_VERSION=<desired_version> -t <your_image_name> .

I am not sure there is a good way to automatically detect this version number. I also don't really know whether we could build a container for cpu completely without NVIDIA drivers...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would work, but that mean that the user would need to build the image themselve (so, not much point in having the image on biocontainers). Also, building an image usually require root/sudo privileges, which are not usually available in computing clusters.

My question regarding the conda package stands though. I'm not sure if a mamba/conda install in the container would work. This might need input from the software's developpers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will ask in their repo if they can advise regarding the mamba / conda install. What would be the advantage over mamba/conda install over the pip install that we are currently doing?

Totally agree on the container build comment. One of the main goals here is to be able to build a singularity container from the docker container and if we can't do that, you are right, pretty useless to put the docker container on biocontainers 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked them in stardist/stardist#259

@uschmidt83
Copy link

Hi, StarDist author here. I'm not really familiar with Bioconda+BioContainers, hence I'm wondering what the StarDist container is supposed to be used for in the Bioconda+BioContainers ecosystem.

The context for the Dockerfile in our repo (which the Dockerfile here is based on) was to offer an easy way to get a GPU-accelerated version of StarDist running, without the hassle of installing CUDA etc. Using StarDist is meant to be primarily done via interactive Jupyter notebooks (hence the TensorFlow jupyter-variant base image).

StarDist itself doesn't require GPU-acceleration, but can take advantage of it for improved performance (by enabling GPU-support for TensorFlow via CUDA, and also by installing gputools to take advantage of OpenCL).

@FloWuenne
Copy link
Contributor Author

Thanks for that insight @uschmidt83 !

The main purpose of having a stardist container in biocontainers (or any hosted hub for docker for that matter) from our side is that we want to implement it in nf-core based nextflow pipelines. These pipelines require Docker containers for reproducible execution. Since stardist is part of conda-forge, a bioconda package (which automatically builds biocontainers) obviously can't be created and so a biocontainers recipe was the next logical step from my side. The alternative would be a CI logic on your repo that pushes to a publicly available dockerhub.

For starting I used your github repo Dockerfile and adjusted it for the biocontainer requirements. To be fair, I am not that faimiliar with GPU specifications in container settings and didn't realize, that the NVIDIA driver specifications could be issues for cross-platform and singularity compatibility until @mboudet mentioned it.

Would you be able and willing to make suggestions to this recipe to remove GPU acceleration and requirements? Again, the main drive of this is to have a docker container hosted in the public domain to use in nf-core pipelines. If you prefer to have a CI on your repo push to your dockerhub and we can use that, then we can also close this PR, whatever option you like better 😊!

Thanks for your help and insights!

@uschmidt83
Copy link

The main purpose of having a stardist container in biocontainers (or any hosted hub for docker for that matter) from our side is that we want to implement it in nf-core based nextflow pipelines. These pipelines require Docker containers for reproducible execution.

If I understand this correctly, these Nextflow pipelines are not meant to run interactively. Instead each "step" of the pipeline consists of software that can either be installed via conda or as run as a container.

If that assumption is true, wouldn't the StarDist container then not need a defined entrypoint to run the the software with defined inputs and outputs?

Since stardist is part of conda-forge, a bioconda package (which automatically builds biocontainers) obviously can't be created and so a biocontainers recipe was the next logical step from my side.

I don't understand this part, and also don't know the relationship between conda-forge and bioconda.

For starting I used your github repo Dockerfile and adjusted it for the biocontainer requirements. To be fair, I am not that faimiliar with GPU specifications in container settings and didn't realize, that the NVIDIA driver specifications could be issues for cross-platform and singularity compatibility until @mboudet mentioned it.

Would you be able and willing to make suggestions to this recipe to remove GPU acceleration and requirements?

Removing the GPU-related things from the Dockerfile is actually easy. Again, I'm more concerned with how the Docker container is supposed to be used in the context of Nextflow.

Again, the main drive of this is to have a docker container hosted in the public domain to use in nf-core pipelines. If you prefer to have a CI on your repo push to your dockerhub and we can use that, then we can also close this PR, whatever option you like better 😊!

I don't have enough information to know which option would be better for us. It looks like the container definition needs to be updated each time we make a new StarDist release. How does that process work in BioContainers?

Best,
Uwe

@FloWuenne
Copy link
Contributor Author

Thank you for the details again @uschmidt83 ! Sorry if my answer was more confusing then anything.
To address your questions:

  1. Your first assumption about nextflow is absolutely right. Nextflow aims to provide reproducible pipelines by running specific tasks from within conda environments or docker/singularity (or other container environments). Usually, tools are run via CLI interfaces from within the containers and thus the container itself doesn't need an entrypoint. Within nextflow pipelines, these CLI interfaces can be given by scripts provided with the nextflow code, rather than necessarily be part of the tool / container. That being said, having a CLI interface for inference using stardist wouldn't be a bad idea in my opinion. migueLib already has some code for doing exactly this. Could maybe be an option to open a PR and add a CLI interface...

  2. The whole part about bioconda vs conda-forge was basically just me saying, that bioconda builds docker containers automatically on quay.io / biocontainers (and conda-forge doesnt). But since stardist is already on conda-forge, one can't create a bioconda recipe (because namespace is shared between the two channels).

  3. Regarding updating the container, the easiest from a maintenance standpoint I guess would be CI on the stardist repo, that automatically builds the container on each new release. I believe biocontainers would automatically make a PR to bump the version (via dependabot) everytime there would be a new pypi release. Ultimately again, I would leave this up to you and could help with the CI setup if required.

Please let me know what you think of these ideas and whether any details are still unclear!

@mboudet
Copy link
Contributor

mboudet commented Dec 5, 2023

I believe biocontainers would automatically make a PR to bump the version (via dependabot) everytime there would be a new pypi release.

I don't believe there is a dependabot setup on the biocontainer repository. Seems like all PR are manual.
(There is a bioconda-bot setup on the bioconda repository, which makes PR based on release on github).

So it might be easier to setup a CI on stardist side to push on docker/github container hub (and update the conda package) after a release (if that's needed)

@FloWuenne
Copy link
Contributor Author

Thanks @mboudet ! I just saw a PR with dependabot and thought that would also check version bumps of main packages within containers...

In that case, a CI directly on the stardist repo that builds a non-GPU container and pushes to dockerhub might be the easiest @uschmidt83.

@uschmidt83
Copy link

Thanks for the clarification, @FloWuenne and @mboudet.

That being said, having a CLI interface for inference using stardist wouldn't be a bad idea in my opinion. migueLib already has some code for doing exactly this. Could maybe be an option to open a PR and add a CLI interface...

We do have CLI for 2D and 3D model inference, but not for model training. Please take a look and let me know if you think this is sufficient.

So it might be easier to setup a CI on stardist side to push on docker/github container hub (and update the conda package) after a release (if that's needed)

Not sure about this, I have to look into this and understand the pros and cons.

@FloWuenne
Copy link
Contributor Author

Thanks for the links for the inference CLIs @uschmidt83 !

The inference scripts look exactly what would be needed to run inference from within a container. So I think the easiest would be if we would update the Dockerfile in your main stardist repository to be one that doesn't use GPU (if that's something you want to support. The existing dockerfile could be saved as a DOCKERFILE_GPU or similar).

Once the main DOCKERFILE in the repo is updated, here is an example of a Github CI that would build the container and push it to Dockerhub on release and every night (you would have to set the Dockerhub name and secret in the repository). Obviously for stardist one could set the CI to only trigger on release or merge on main (depending on your development structure!)

Let us know what your preference is! We can also still go ahead and change this recipe here to work without GPU and have this alongside your main repository. That way, you don't have to change the Dockerfile and deal with setting up the CI... Ultimately I would leave the decision to you as the package creator!

@FloWuenne
Copy link
Contributor Author

@uschmidt83 Just a quick ping here to ask whether you wanted to go through with biocontainer or rather go with Docker CIs in your repo? If you want to go for the CI action then I would close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants