Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the Docker file - fixes #1 #153

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

darknetehf
Copy link

Hello,

This is an attempt to optimize the Docker file.
I noticed that the base image webrecorder/browsertrix-crawler weighs 2.44 Gb, which is already a lot, whereas the resulting auto-archiver image nearly doubles the size (4.29GB on my end). There is clearly some bloat.

I think we could improve this with multi-stage builds and remove some unwanted layers.

The proposed version should be functionally equivalent. The image size is 3.33GB on my end.
There is one stage that builds the virtual environment and copies it to the next layer. So, we can ditch pipenv and run Python directly. See the adapted entry point.

Another benefit of the multi-stage build is that if changes are made to the Python code, there is no need to rebuild the upper layers unless the requirements list has changed. Thanks to caching, generating a new Docker image will be quicker.

However, I need to stress that it’s not been tested thoroughly at all, and should be treated as a PoC to be validated before going to production.

@msramalho
Copy link
Contributor

thanks for the submission! these are relevant changes.
as we're moving to poetry and that will already have a significant impact on the docker image size, let's keep this on hold until then and reassess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants