Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: beam_Publish_Beam_SDK_Snapshots and beam_PostCommit_Python_Arm are extremely flaky due to failing to build wheels #33425

Open
17 tasks
chamikaramj opened this issue Dec 19, 2024 · 6 comments

Comments

@chamikaramj
Copy link
Contributor

chamikaramj commented Dec 19, 2024

What happened?

Seems like all runs for the past two days failed and it was very flaky before that. Note that this prevents Beam container snapshots from being published which blocks/breaks other tests.

https://github.com/apache/beam/actions/workflows/beam_Publish_Beam_SDK_Snapshots.yml

For example,

https://github.com/apache/beam/actions/runs/12403586484/job/34627334523

#16 422.9   Building wheel for google-cloud-profiler (setup.py): finished with status 'error'
#16 423.0   error: subprocess-exited-with-error
#16 423.0   
#16 423.0   ?? python setup.py bdist_wheel did not run successfully.
#16 423.0   ??? exit code: 1
#16 423.0   ??????> [41 lines of output]
#16 423.0       /usr/local/lib/python3.9/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
#16 423.0       !!
#16 423.0       
#16 423.0               ********************************************************************************
#16 423.0               Requirements should be satisfied by a PEP 517 installer.
#16 423.0               If you are using pip, you can try `pip install --use-pep517`.
#16 423.0               ********************************************************************************
#16 423.0       
#16 423.0       !!
#16 423.0         dist.fetch_build_eggs(dist.setup_requires)
#16 423.0       running bdist_wheel
#16 423.0       running build
#16 423.0       running build_py
#16 423.0       creating build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/client.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/backoff.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/cpu_profiler.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/__version__.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/profile_pb2.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/__init__.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/pythonprofiler.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       copying googlecloudprofiler/builder.py -> build/lib.linux-aarch64-cpython-39/googlecloudprofiler
#16 423.0       running build_ext
#16 423.0       building 'googlecloudprofiler._profiler' extension
#16 423.0       creating build/temp.linux-aarch64-cpython-39/googlecloudprofiler/src
#16 423.0       g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Igooglecloudprofiler/src -I/usr/local/include/python3.9 -c googlecloudprofiler/src/_profiler.cc -o build/temp.linux-aarch64-cpython-39/googlecloudprofiler/src/_profiler.o -std=c++11
#16 423.0       In file included from googlecloudprofiler/src/profiler.h:26,
#16 423.0                        from googlecloudprofiler/src/_profiler.cc:18:
#16 423.0       googlecloudprofiler/src/stacktraces.h: In member function ???void AsyncSafeTraceMultiset::Reset()???:
#16 423.0       googlecloudprofiler/src/stacktraces.h:72:24: warning: ???void* memset(void*, int, size_t)??? clearing an object of type ???struct AsyncSafeTraceMultiset::TraceData??? with no trivial copy-assignment; use value-initialization instead [-Wclass-memaccess]
#16 423.0          72 |   void Reset() { memset(traces_, 0, sizeof(traces_)); }
#16 423.0             |                  ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#16 423.0       googlecloudprofiler/src/stacktraces.h:89:10: note: ???struct AsyncSafeTraceMultiset::TraceData??? declared here
#16 423.0          89 |   struct TraceData {
#16 423.0             |          ^~~~~~~~~
#16 423.0       g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Igooglecloudprofiler/src -I/usr/local/include/python3.9 -c googlecloudprofiler/src/clock.cc -o build/temp.linux-aarch64-cpython-39/googlecloudprofiler/src/clock.o -std=c++11
#16 423.0       g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -Igooglecloudprofiler/src -I/usr/local/include/python3.9 -c googlecloudprofiler/src/log.cc -o build/temp.linux-aarch64-cpython-39/googlecloudprofiler/src/log.o -std=c++11
#16 423.0       g++: internal compiler error: Segmentation fault signal terminated program cc1plus
#16 423.0       Please submit a full bug report, with preprocessed source (by using -freport-bug).
#16 423.0       See <file:///usr/share/doc/gcc-12/README.Bugs> for instructions.
#16 423.0       error: command '/usr/bin/g++' failed with exit code 4
#16 423.0       [end of output]
#16 423.0   
#16 423.0   note: This error originates from a subprocess, and is likely not a problem with pip.
#16 423.0   ERROR: Failed building wheel for google-cloud-profiler

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@chamikaramj
Copy link
Contributor Author

cc: @tvalentyn @liferoad @damccorm

@chamikaramj
Copy link
Contributor Author

Oldest failure was from last week. The workflow was pretty stable before that.

https://github.com/apache/beam/actions/runs/12267026140

@chamikaramj
Copy link
Contributor Author

chamikaramj commented Dec 23, 2024

@chamikaramj chamikaramj changed the title [Bug]: beam_Publish_Beam_SDK_Snapshots is extremely flaky due to failing to build wheels [Bug]: beam_Publish_Beam_SDK_Snapshots and beam_PostCommit_Python_Arm are extremely flaky due to failing to build wheels Dec 23, 2024
@damccorm
Copy link
Contributor

damccorm commented Jan 6, 2025

I fixed building python wheels with #33506 - moving the other workflows over to github hosted is a little more involved since we need to authenticate and it looks like the secrets we've used for auth have expired.

Next steps to get unblocked here are:

  1. Create a new secret representing a service account key which we can use to authenticate and do scoped resource creation in apache-beam-testing, and get it added to our account
  2. Follow https://github.com/google-github-actions/setup-gcloud?tab=readme-ov-file#setup-gcloud-github-action to setup gcloud and get authenticated in the workflows we need to fix. Also switch those to use github hosted runners

cc/ @kennknowles - this will probably break things in the release. build_release_candidate at least should not need any real tweaks to run the docker part on self hosted, though, so all the pieces needed for the release are probably mostly there

@damccorm
Copy link
Contributor

damccorm commented Jan 6, 2025

I'll plan on following up on this tomorrow

@damccorm
Copy link
Contributor

damccorm commented Jan 7, 2025

I kicked off the process of getting the secrets we use updated. I also put up #33507 to migrate the first workflow back to github hosted runners, once that's in/verified I'll move over the other ones which need it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants