Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup #191

Merged
merged 7 commits into from
Jan 7, 2021
Merged

Setup #191

merged 7 commits into from
Jan 7, 2021

Conversation

EEngl52
Copy link
Collaborator

@EEngl52 EEngl52 commented Dec 22, 2020

No description provided.

@EEngl52 EEngl52 requested a review from kba December 22, 2020 13:22
site/en/setup.md Outdated

</details>
<details>
<summary>- min. 100 GB free disk space for local installation</summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would also be a good moment to reduce the HDD storage requirements. The idea was to prevent users from provisioning too little storage space, but 100 GB is quite a daunting number. IIRC that was also an issue raised by @cneud.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it could be made more clear that a minimum of ~7-10 GB does indeed suffice for installation of ocrd_all (pending models) and that more disk space may be very useful but really required only depending on the actual usage.

Also these other values seem quite arbitrary and will depend only on the user:

  • "10 GB for evaluation data"?
  • "20 GB for data to process"?

Finally, when would I benefit from isolated venvs? Is this mandatory, optional, in which cases? As end user I cannot tell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur we should be more realistic about the actual minimum required disk space, and omit figures we cannot really estimate.

Finally, when would I benefit from isolated venvs? Is this mandatory, optional, in which cases? As end user I cannot tell.

There's no alternative to isolated venvs (besides isolated Docker containers) if you want to install multiple OCR-D modules, because of the Python dependency hell (and pip shortcomings). So for ocrd_all it's a mandatory solution, for native installation you're on your own (but will suffer the effects). The toll of this scheme on disk space requirements is considerable because Tensorflow and its dependencies require GBs each time. The setup guide does not even mention this (implementation detail), but the linked ocrd_all README does discuss it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating, that's what I assumed. So I take it this is mandatory then and perhaps the according sentence may even be omitted from this guide as it now presents this as a user choice.

Are the ~7GB minimum already including this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating, that's what I assumed. So I take it this is mandatory then and perhaps the according sentence may even be omitted from this guide as it now presents this as a user choice.

Indeed, that's what it sounds like. Also I don't quite understand what backups are required in this context (since everything is repo-controlled already or built by make), so perhaps that line should be omitted entirely?

Are the ~7GB minimum already including this?

With the standard Docker installation (ocrd/all:maximum-git i.e. all except DISABLED_MODULES ?= cor-asv-fst opencv-python ocrd_kraken clstm ocrd_ocropy) we are already at 15.4 GB. (I have no idea how Dockerhub counts...)

BTW I am surprised to see that the non-git Docker images are still smaller than the git ones, as they are merely aliases. Does anyone have a clue what's going on?

site/en/setup.md Outdated Show resolved Hide resolved

We recommend using the prebuilt Docker images since this does not require any changes to
the host system besides [installing Docker](https://hub.docker.com/r/ocrd/all).

We do not recommend installing modules individually, because it can be difficult to catch all dependencies,
For developers it might be useful to [install the modules individually](#individual-installation), either via Docker or natively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we even mention this here? We could move that section to the developer's guide (which is woefully out-of-date as well 😨)

Copy link
Collaborator Author

@EEngl52 EEngl52 Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really mind. Would make the setup guide even clearer though if we could just remove it completely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we should keep mentioning this option in the setup guide (both as a subsection and a short mention of it above). There are still (to this day) users who want maximal control and don't trust (or want to learn about) ocrd_all's makefile. From their perspective, it's the lowest threshold for entering OCR-D.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay to reference this here as long as we restrict any details to another guide/doc.

@@ -122,7 +131,36 @@ docker pull ocrd/all:maximum

Replace `maximum` accordingly if you want the `minimum` or `medium` variant.

(Also, if you want to keep the modules' git repos inside the Docker images – so you can keep making fast updates, without waiting for a new pre-built image but also without building an image yourself –, then add the suffix `-git` to the variant, e.g. `maximum-git`. This will behave like the native installation, only inside the container. Yes, you can also [commit changes](https://rollout.io/blog/using-docker-commit-to-create-and-change-an-image/) made in containers back to your local Docker image.)
(Also, if you want to keep the modules' git repos inside the Docker images – so you can keep making
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation is still useful why it is useful to have the git repo inside the docker container. However, this has been the default for a while now. maximum is just an alias for maximum-git for backwards-compatibility.

site/en/setup.md Outdated Show resolved Hide resolved
site/en/setup.md Outdated Show resolved Hide resolved
site/en/setup.md Outdated Show resolved Hide resolved
site/en/setup.md Outdated Show resolved Hide resolved
@EEngl52 EEngl52 requested a review from cneud January 6, 2021 13:43
@EEngl52 EEngl52 merged commit 5bd916d into master Jan 7, 2021
@EEngl52 EEngl52 deleted the setup branch January 7, 2021 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants