Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google-cloud-cli has a huge install size #1732

Open
jonsneyers opened this issue Aug 2, 2023 · 32 comments
Open

google-cloud-cli has a huge install size #1732

jonsneyers opened this issue Aug 2, 2023 · 32 comments

Comments

@jonsneyers
Copy link

It's something like 1 gigabyte of install size when I follow the instructions on https://cloud.google.com/storage/docs/gsutil_install
(and then I'm not even counting the dependencies, I'm only looking at /usr/lib/google-cloud-sdk)

This seems very excessive for what is supposed to be just a command line interface to a cloud platform.

Please consider being a bit more size-aware when packaging these things. For example, I see a huge unstripped binary in /usr/lib/google-cloud-sdk/bin/anthoscli (174 MB) and a big uncompressed JSON file in /usr/lib/google-cloud-sdk/data/cli/gcloud.json (83 MB). Also it looks like a bunch of test code is being shipped as part of the package:

$ ls -R /usr/lib/google-cloud-sdk |grep test |wc -l
1609

which makes me think this is not actually a packaged tool but more like a dump of a development repository.

It feels like there is a lot of low hanging fruit to reduce the install size of this tool.

@wiardvanrij
Copy link

root@9c0e8edf5bdd:/opt# du -sh *
1.7G	google-cloud-sdk

We only need 1 addon. I would like to stretch this issue by saying the size is even ridiculous. We really want minimal image sizes.

@Mcfloy
Copy link

Mcfloy commented Oct 13, 2023

On top of that I don't understand why Python should be installed when using a CLI that is available through multiple links, categorized by their core architecture. If I download the CLI for linux x86_64, why isn't it just a static binary ? On docker images it bloats the size because of dependencies that could be avoided.

@tmancill
Copy link

+1 on taking action to make this more efficient. It doesn't seem like it scales very well.

This package gets installed automatically on some GCP instance types (reference) and takes a long time (1-2 minutes) to update on smaller instances. Having nearly 40000 files, over a third of which are manpages, probably doesn't help...

$ dpkg -L google-cloud-cli | wc -l
38775

$ dpkg -L google-cloud-cli | grep /usr/share/man/man1/gcloud  | wc -l
13271

For a clean install on a Debian image:

The following NEW packages will be installed:
  google-cloud-cli
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 154 MB of archives.
After this operation, 760 MB of additional disk space will be used.

It also releases on a strict weekly cadence, so if you're keeping up with security updates, etc. you're going to end up (re)installing it continually.

image

@binaryronin
Copy link

+1 on this. We have some very small endpoint instances in GCP and this single package takes over 10 minutes to unpack and update during which our alerting goes crazy!

image

@mihaiav
Copy link

mihaiav commented Jan 27, 2024

Someone would expect more from Google engineers. Instead of an efficient Command Line Interface we are served with GBs of bloat. I'm using the CLI to deploy a Cloud Run docker image. The Google Cloud Cli competes with Docker itself in size...

@ViliusS
Copy link

ViliusS commented Feb 6, 2024

I can confirm this is not only a problem for disk footprint but a problem during installation and update on very small GCP instances (e2-micro, etc). Every time DNF is trying to unpack Cloud SDK RPM image VM instance just goes down or segfaults with out of memory errors.

This happens on all instances with < 1GB of memory -> https://issuetracker.google.com/issues/239207289

@tonymet
Copy link

tonymet commented Feb 6, 2024

I entered this ticket and later found this convo. Please upvote this bug to help get it some attention.

https://issuetracker.google.com/issues/324114897

I agree the install size is abhorrent.

@tonymet
Copy link

tonymet commented Feb 6, 2024

@jonsneyers
Here's a review of the installation and some of the wasted space. Fun fact: there is an entire python installation within the google-cloud-sdk

/google-cloud-sdk/...
* /platform/bundledpythonunix 270.1MB -- this contains a complete python installation
* /generatedclients 165.7MB
* /lib/third_party/botocore 94.8MB
* /lib/third_party/kubernetes 25.1MB 

@framegrace
Copy link

Installing it manually instead of using the distro package can save you precious space.
This method reduced the docker image size 400Mb from installing with apt. Also removed bq, which is small, but I don't need and also do some config and cleanup from updating. Adapt to suit your needs:

ENV PATH /google-cloud-sdk/bin:$PATH
RUN ARCH=`arch` && CLOUD_SDK_VERSION="470.0.0" && curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-${CLOUD_SDK_VERSION}-linux-${ARCH}.tar.gz && \
    tar xzf google-cloud-cli-${CLOUD_SDK_VERSION}-linux-${ARCH}.tar.gz && \
    rm google-cloud-cli-${CLOUD_SDK_VERSION}-linux-${ARCH}.tar.gz && \
    gcloud config set core/disable_usage_reporting true && \
    gcloud config set component_manager/disable_update_check true && \
    gcloud config set metrics/environment github_docker_image && \
    gcloud components remove  bq && \
    gcloud components update && \
    rm -rf $(find google-cloud-sdk/ -regex ".*/__pycache__") && \
    rm -rf google-cloud-sdk/.install/.backup && \
    gcloud --version
RUN git config --system credential.'https://source.developers.google.com'.helper gcloud.sh

@framegrace
Copy link

framegrace commented Apr 15, 2024

For ubuntu (jammy) docker images, have a Dockerfile which completelly removes the included python and forces the use of the OS version.

Reduces the image size to around 800Mb uncompressed (200 compressed). THat's 2G less than the official cloud-sdk image.

FROM ubuntu:jammy as base

...
# Add your base stuff  if needed
...

# Minimalized Google cloud sdk
FROM base as gcloud-installer

# Download python3 module dependencies  (will also install python-minimal which is only around 25Mb)
RUN apt-get install -y \
        python3-crcmod \
        python3-openssl
ARG CLOUD_SDK_VERSION=452.0.1
ENV CLOUD_SDK_VERSION=$CLOUD_SDK_VERSION
ENV PATH /google-cloud-sdk/bin:$PATH
ENV CLOUDSDK_PYTHON=/usr/bin/python3
# Download and install cloud sdk. Review the components I install, you may not need them.
RUN ARCH=x86_64 && \
    curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-${CLOUD_SDK_VERSION}-linux-${ARCH}.tar.gz && \
    tar xzf google-cloud-cli-${CLOUD_SDK_VERSION}-linux-${ARCH}.tar.gz && \
    rm google-cloud-cli-${CLOUD_SDK_VERSION}-linux-${ARCH}.tar.gz && \
    rm -rf /google-cloud-sdk/platform/bundledpythonunix && \
    gcloud config set core/disable_usage_reporting true && \
    gcloud config set component_manager/disable_update_check true && \
    gcloud config set metrics/environment github_docker_image && \
    gcloud components remove -q bq && \
    gcloud components install -q beta && \
    gcloud components install -q gke-gcloud-auth-plugin && \
    rm -rf $(find google-cloud-sdk/ -regex ".*/__pycache__") && \
    rm -rf google-cloud-sdk/.install/.backup && \
    rm -rf google-cloud-sdk/bin/anthoscli && \
    gcloud --version

#...
#<Add more stages if you need>
#...

# On your final stage, (here simply from base, for example)
FROM base as final

# Add to the path
ENV PATH /google-cloud-sdk/bin:$PATH
# Ask gcloud to use local python3
ENV CLOUDSDK_PYTHON=/usr/bin/python3
# Copy just the installed files
copy --from=gcloud-installer /google-cloud-sdk /google-cloud-sdk
# This is to be able to update gcloud packages
RUN git config --system credential.'https://source.developers.google.com'.helper gcloud.sh

I really don't understand the little care Google engineers have put on this. 2GB of python is a lot of python and 400Mb of documentation is also a waste.
SO, well, there's nothing really special there, I think this is easily ported to alpine,etc..
I may do it when I have some time.

Edit: Cleaned up some stuff and added comments to make it more readable. This was directly extracted from an internal dockerfile.

@tonymet
Copy link

tonymet commented Apr 16, 2024

For ubuntu (jammy) docker images, have a Dockerfile which completelly removes the included python and forces the use of the OS version.

Here is an improved version

  • image size 389 MB
  • supports CLI arguments e.g. docker run -v$HOME:/root tonymet/gcloud-lite compute instances list
  • based on ALPINE
docker pull tonymet/gcloud-lite

@tonymet
Copy link

tonymet commented Apr 16, 2024

I'm working on the CI to automate these artifacts for general use. Here are the two categories we'll publish

  • "gcloud-lite" docker images like the above about 400MB
  • a "gcloud-lite" tgz distribution based on google's official images at < 100MB. Published to github releases page

thumbs up👍 here if you would find this useful and ill ping you when it's ready to be tested.

@tonymet
Copy link

tonymet commented Apr 16, 2024

471 & 472 releases are ready to test out here. You can find instructions on how to obtain the docker images & .tgz artifacts

https://github.com/tonymet/gcloud-lite

@ad-m-ss
Copy link

ad-m-ss commented Apr 16, 2024

Please be understanding, this is the industry standard: Azure/azure-cli#22955 Azure/azure-cli#7387 , unfortunately.

@tonymet
Copy link

tonymet commented Apr 17, 2024 via email

@tonymet
Copy link

tonymet commented Apr 18, 2024

please upvote this GCP ticket https://issuetracker.google.com/issues/324114897?pli=1 -- maintainers also raised an internal ticket that's connected.

@ViliusS
Copy link

ViliusS commented Apr 19, 2024

please upvote this GCP ticket https://issuetracker.google.com/issues/324114897?pli=1 -- maintainers also raised an internal ticket that's connected.

Voted. Please also upvote https://issuetracker.google.com/issues/239207289 to fix this in RPM packages too.

@tonymet
Copy link

tonymet commented Apr 25, 2024

automation is working now on a 24-hour latency. we have lite releases 471,472 & 473 in runnable docker images (86% faster) and tgz release . https://github.com/tonymet/gcloud-lite

give it a run

@max0x7ba
Copy link

max0x7ba commented Apr 30, 2024

I really don't understand the little care Google engineers have put on this. 2GB of python is a lot of python and 400Mb of documentation is also a waste.

Quick recap:

  1. Google creates Go language which produces compact executables with no dependencies and uses that for its own applications.
  2. Google creates Compute Engine and sells CPU time and storage space.
  3. Google creates gcloud command line application for Compute Engine users which takes 2GB+ of storage and takes longer to install than Windows OS.

The above makes perfect business sense for Google, does it not?

@tonymet
Copy link

tonymet commented Apr 30, 2024

Quick recap:

  1. Google creates Go language which produces compact executables with no dependencies and uses that for its own applications.

Let's be charitable to the team and make sure we are being reasonable. A Go rewrite is out of scope but a stripped gcloud CLI distribution is feasible. I don't think the package size is intentional, it's likely that the teams haven't assessed customer impact. I think our best strategy here is to make sure GCP is aware of the customer impact and give them resources to help prepare a stripped distribution.

@max0x7ba
Copy link

Quick recap:

  1. Google creates Go language which produces compact executables with no dependencies and uses that for its own applications.

Let's be charitable to the team and make sure we are being reasonable. A Go rewrite is out of scope but a stripped gcloud CLI distribution is feasible. I don't think the package size is intentional, it's likely that the teams haven't assessed customer impact. I think our best strategy here is to make sure GCP is aware of the customer impact and give them resources to help prepare a stripped distribution.

I am a paying customer of Google Compute Engine. And I am nice and kind. Charitable donations or gifts are not involved here, hence, word charitable is not applicable and is totally out of place here.

I expect world-class service for my money from a company which takes pride in its technical excellence. Currently, gcloud is the opposite of technical excellence, is it not?

Is it too much to ask of a corporation to fulfill its promises https://about.google/intl/en-GB/philosophy/ ?

  1. Focus on the user and all else will follow.
  2. It’s best to do one thing really, really well.
  3. Fast is better than slow.
    ...
  4. Great just isn’t good enough.

@ViliusS
Copy link

ViliusS commented Apr 30, 2024

Exactly. This is not free software, nor this is some small business which requires our nurture and forgiving attention. This is a multi-billion dollar company and we are paying customers. If their margins and profits cannot hire proper management or engineering which are capable of delivering stellar experience for their clients we might just turn off the lights and go home.

@max0x7ba
Copy link

Just to add context,

I had no issues invoking gcloud since 2021 until sometime April-May 2024, when invoking gcloud would freeze an instance forever and prevent it from accepting new ssh connections to diagnose the issue. Wasting a day debugging the issue revealed that snap auto-updating gcloud was the root cause of the problem.

In other words, gcloud automatic update broke my business processes and made me waste a day debugging the issue, all the while paying Google for using its resources. A mini-disaster caused solely by gcloud developers.

Next, 6+ months since the issue was reported and not resolved, cute dude @tonymet tells me to be charitable while cutting out one point of mine out of context. Which is a text-book example of trolling and virtue signalling.

@tonymet
Copy link

tonymet commented May 14, 2024

Next, 6+ months since the issue was reported and not resolved, cute dude @tonymet tells me to be charitable while cutting out one point of mine out of context. Which is a text-book example of trolling and virtue signalling.

I'm sorry i meant no offense. I meant that we will catch more flies with honey than vinegar.

And I'm certainly not virtue signaling. Look at all the effort I invested to create an open source distro and raise awareness with google. We both want the same thing here.

https://github.com/tonymet/gcloud-lite

@max0x7ba
Copy link

Next, 6+ months since the issue was reported and not resolved, cute dude @tonymet tells me to be charitable while cutting out one point of mine out of context. Which is a text-book example of trolling and virtue signalling.

I'm sorry i meant no offense. I meant that we will catch more flies with honey than vinegar.

And I'm certainly not virtue signaling. Look at all the effort I invested to create an open source distro and raise awareness with google. We both want the same thing here.

You are right. I am sorry for any offence or irritation caused. Sometimes, I cannot hold in the fire, and that's a limitation of mine.

@tonymet
Copy link

tonymet commented May 14, 2024

good to hear it and thanks. let's hope for the best

@pleandre
Copy link

pleandre commented Jun 21, 2024

A go rewrite would be nice. The package is too large. I ended up on this thread because I also found out the google cloud cli was taking huge amount of space in my docker container, to the point I had to do a google search about it.

@max0x7ba
Copy link

The sombre business reality is that for Google gcloud development is costs, gcloud large size and long time to install is profit. There are no monetary incentives for Google to do anything about this issue, as its history demonstrates. Neither honey nor vinegar are of any value to corporations.

@max0x7ba
Copy link

Ideally, Google should provide gcloud along with all its dependencies on a shared read-only disk for its users to mount and read free of charge. That would eliminate the root cause of this problem and make it non-existent for gcloud users.

@glarrain-cdd
Copy link

Ideally, Google should provide gcloud along with all its dependencies on a shared read-only disk for its users to mount and read

@max0x7ba how is that ideal? IMHO requiring users to connect to and mount a given disk shared by all its customers is not incredibly attractive, and I wonder if it even is feasible technically.

I'd rather download and execute a single pre-built and signed binary, and be done with it.

@tonymet
Copy link

tonymet commented Jun 25, 2024

Ideally, Google should provide gcloud along with all its dependencies on a shared read-only disk for its users to mount and read

@max0x7ba how is that ideal? IMHO requiring users to connect to and mount a given disk shared by all its customers is not incredibly attractive, and I wonder if it even is feasible technically.

I'd rather download and execute a single pre-built and signed binary, and be done with it.

I think @max0x7ba is suggesting this as a short-term workaround to reduce CPU + IO costs until the SDK is stripped . I would like to see the SDK included in the base VM images with allowances for the wasted storage. That would speed up launch time

@tonymet
Copy link

tonymet commented Oct 28, 2024

To help combat bloated gcloud cli, I created a 10 mb standalone go CLI , gcloud-go, with no dependencies . .tgz and docker images are available.
It supports concurrent connections, firebase hosting deployment and GCS (google cloud storage) . Please file issues for features that you would like, or share PRs

gcloud-go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests