-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup #191
Conversation
site/en/setup.md
Outdated
|
||
</details> | ||
<details> | ||
<summary>- min. 100 GB free disk space for local installation</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would also be a good moment to reduce the HDD storage requirements. The idea was to prevent users from provisioning too little storage space, but 100 GB is quite a daunting number. IIRC that was also an issue raised by @cneud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it could be made more clear that a minimum of ~7-10 GB does indeed suffice for installation of ocrd_all
(pending models) and that more disk space may be very useful but really required only depending on the actual usage.
Also these other values seem quite arbitrary and will depend only on the user:
- "10 GB for evaluation data"?
- "20 GB for data to process"?
Finally, when would I benefit from isolated venvs? Is this mandatory, optional, in which cases? As end user I cannot tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur we should be more realistic about the actual minimum required disk space, and omit figures we cannot really estimate.
Finally, when would I benefit from isolated venvs? Is this mandatory, optional, in which cases? As end user I cannot tell.
There's no alternative to isolated venvs (besides isolated Docker containers) if you want to install multiple OCR-D modules, because of the Python dependency hell (and pip
shortcomings). So for ocrd_all it's a mandatory solution, for native installation you're on your own (but will suffer the effects). The toll of this scheme on disk space requirements is considerable because Tensorflow and its dependencies require GBs each time. The setup guide does not even mention this (implementation detail), but the linked ocrd_all README does discuss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating, that's what I assumed. So I take it this is mandatory then and perhaps the according sentence may even be omitted from this guide as it now presents this as a user choice.
Are the ~7GB minimum already including this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating, that's what I assumed. So I take it this is mandatory then and perhaps the according sentence may even be omitted from this guide as it now presents this as a user choice.
Indeed, that's what it sounds like. Also I don't quite understand what backups are required in this context (since everything is repo-controlled already or built by make), so perhaps that line should be omitted entirely?
Are the ~7GB minimum already including this?
With the standard Docker installation (ocrd/all:maximum-git
i.e. all except DISABLED_MODULES ?= cor-asv-fst opencv-python ocrd_kraken clstm ocrd_ocropy
) we are already at 15.4 GB. (I have no idea how Dockerhub counts...)
BTW I am surprised to see that the non-git Docker images are still smaller than the git ones, as they are merely aliases. Does anyone have a clue what's going on?
|
||
We recommend using the prebuilt Docker images since this does not require any changes to | ||
the host system besides [installing Docker](https://hub.docker.com/r/ocrd/all). | ||
|
||
We do not recommend installing modules individually, because it can be difficult to catch all dependencies, | ||
For developers it might be useful to [install the modules individually](#individual-installation), either via Docker or natively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we even mention this here? We could move that section to the developer's guide (which is woefully out-of-date as well 😨)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind. Would make the setup guide even clearer though if we could just remove it completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we should keep mentioning this option in the setup guide (both as a subsection and a short mention of it above). There are still (to this day) users who want maximal control and don't trust (or want to learn about) ocrd_all's makefile. From their perspective, it's the lowest threshold for entering OCR-D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is okay to reference this here as long as we restrict any details to another guide/doc.
@@ -122,7 +131,36 @@ docker pull ocrd/all:maximum | |||
|
|||
Replace `maximum` accordingly if you want the `minimum` or `medium` variant. | |||
|
|||
(Also, if you want to keep the modules' git repos inside the Docker images – so you can keep making fast updates, without waiting for a new pre-built image but also without building an image yourself –, then add the suffix `-git` to the variant, e.g. `maximum-git`. This will behave like the native installation, only inside the container. Yes, you can also [commit changes](https://rollout.io/blog/using-docker-commit-to-create-and-change-an-image/) made in containers back to your local Docker image.) | |||
(Also, if you want to keep the modules' git repos inside the Docker images – so you can keep making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation is still useful why it is useful to have the git repo inside the docker container. However, this has been the default for a while now. maximum
is just an alias for maximum-git
for backwards-compatibility.
No description provided.