Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of Docker image #167

Open
erinhmclark opened this issue Jan 12, 2025 · 2 comments
Open

Reduce size of Docker image #167

erinhmclark opened this issue Jan 12, 2025 · 2 comments
Assignees

Comments

@erinhmclark
Copy link
Collaborator

Implement multi stage builds, equivalent of this for Poetry.

@erinhmclark erinhmclark self-assigned this Jan 12, 2025
@erinhmclark
Copy link
Collaborator Author

There's a new Dockerfile for the poetry changes in #164 which separates the system installation from runtime, but as poetry installs the package as well as the environment it requires the src files.
We can separate the Poetry build from the runtime stage like below, but that involves copying the src files before installation. The other option is installing only the environment (adding --no-root) in the poetry layer, then installing the package itself later, I didn't manage to get that working without reinstalling poetry itself in the runtime stage negating the optimisation (maybe I'm missing something though?)

FROM webrecorder/browsertrix-crawler:1.0.4 AS base

ENV RUNNING_IN_DOCKER=1 \
    LANG=C.UTF-8 \
    LC_ALL=C.UTF-8 \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONFAULTHANDLER=1 \
    PATH="/root/.local/bin:$PATH"

RUN add-apt-repository ppa:mozillateam/ppa && \
	apt-get update && \
    apt-get install -y --no-install-recommends gcc ffmpeg fonts-noto exiftool && \
	apt-get install -y --no-install-recommends firefox-esr && \
    ln -s /usr/bin/firefox-esr /usr/bin/firefox && \
    wget https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz && \
    tar -xvzf geckodriver* -C /usr/local/bin && \
    chmod +x /usr/local/bin/geckodriver && \
    rm geckodriver-v* && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Stage 2: Poetry setup
FROM base AS poetry

ENV POETRY_NO_INTERACTION=1 \
    POETRY_VIRTUALENVS_IN_PROJECT=1 \
    POETRY_VIRTUALENVS_CREATE=1

RUN pip install --upgrade pip && \
    pip install "poetry>=2.0.0"

WORKDIR /app

COPY pyproject.toml poetry.lock README.md ./
# This is needed for poetry to install the package itself
COPY ./src/ .
RUN poetry install --only main --no-cache


# Stage 3: Runtime setup
FROM base AS runtime

WORKDIR /app

ENV VIRTUAL_ENV=/app/.venv \
    PATH="/app/.venv/bin:$PATH"

COPY --from=poetry /app /app

ENTRYPOINT ["python3", "-m", "auto_archiver"]

@pjrobertson
Copy link
Collaborator

Just some things to flag:

Some of the GH actions from docker could be updated in the docker-publish.yaml file.

        uses: docker/setup-buildx-action@v1
        uses: docker/setup-qemu-action@v1
        uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
        uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
        uses: docker/build-push-action@v2

E.g. setup-buildx-action is up to @v3 and setup-qemu-action is also up to @v3. I don't know if that'll help with the docker image size, but maybe work doing it anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants