Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when relocating images #139

Open
EugenVeracity opened this issue Jan 8, 2025 · 0 comments
Open

Deadlock when relocating images #139

EugenVeracity opened this issue Jan 8, 2025 · 0 comments
Assignees

Comments

@EugenVeracity
Copy link

EugenVeracity commented Jan 8, 2025

Describe the bug

There appears to be a deadlock when relocating certain images. This was first noticed in Porter (getporter/porter#3311) and also observed when running cnab-to-oci directly.

The scheduler todoList fills up with the maximum number of items (50) and then blocks indefinitely.

Stack trace at point of deadlock:

github.com/cnabio/cnab-to-oci/remotes.(*errgroupScheduler).schedule(0xc000342840, 0xc000588e10)
	c:/Projects/research/porter/cnab-to-oci/remotes/promises.go:205 +0x238
github.com/cnabio/cnab-to-oci/remotes.scheduleAndUnwrap({0xdee038, 0xc000342840}, 0xc0000b1a70)
	c:/Projects/research/porter/cnab-to-oci/remotes/promises.go:126 +0x179
github.com/cnabio/cnab-to-oci/remotes.(*manifestWalker).walk(0xc0003f9d70, {0xdf2648, 0xc00049aa00}, {{0xc00002fad0, 0x2b}, {0xc000388910, 0x47}, 0x1047af, {0x0, 0x0, ...}, ...}, ...)
	c:/Projects/research/porter/cnab-to-oci/remotes/mount.go:235 +0x605
github.com/cnabio/cnab-to-oci/remotes.(*manifestWalker).walk.func1({0xdf2648, 0xc00049aa00})
	c:/Projects/research/porter/cnab-to-oci/remotes/mount.go:242 +0x31b
github.com/cnabio/cnab-to-oci/remotes.scheduleAndUnwrap.func1({0xdf2648, 0xc00049aa00})
	c:/Projects/research/porter/cnab-to-oci/remotes/promises.go:127 +0x68
github.com/cnabio/cnab-to-oci/remotes.newErrgroupScheduler.func1()
	c:/Projects/research/porter/cnab-to-oci/remotes/promises.go:178 +0x163
golang.org/x/sync/errgroup.(*Group).Go.func1()
	C:/Users/EugenVeracity/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0xa9
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
	C:/Users/EugenVeracity/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0xe8

From my very limited understanding of the code, this is what I think is happening:

  • (*manifestWalker).walk pushes a task onto the scheduler
  • the task is processed by a scheduler worker thread and executed
  • when executed, the task iterates over the manifest children and calls walk() which then pushes a new task onto the scheduler for each child
  • if there is no space in the scheduler to push the child tasks, the parent task becomes blocked, which blocks that worker thread
  • if all 4 worker threads become blocked by tasks which are waiting for space in the scheduler, there is a deadlock

This is just a guess, I'm not a Go programmer and am not familiar with the code for this project but hopefully it will point someone in the right direction.

I was able to work around this issue and successfully push my bundle by increasing the defaultJobsBufferLength in remotes/fixupoptions.go to an arbitrary higher number (500).

To Reproduce

Steps to reproduce the behavior:

  1. Create new Porter bundle: porter create
  2. Add images section containing postgres to porter.yaml:
images:
  postgres:
    repository: postgres
    tag: 17.2 # or digest: sha256:888402a8cd6075c5dc83a31f58287f13306c318eaad016661ed12e076f3e6341
  1. Build the bundle: porter build
  2. Start a local registry: docker run -d -p 5000:5000 --name registry registry:2
  3. Push the bundle: cnab-to-oci push .cnab/bundle.json --target localhost:5000/porter-hello:v0.1.0 --auto-update-bundle
  4. Push gets stuck on "Starting to copy image postgres@sha256:888402..."

Expected Behavior

The bundle should be pushed successfully.

Command and Output

$ cnab-to-oci push .cnab/bundle.json --target localhost:5000/porter-hello:v0.1.0 --auto-update-bundle
Starting to copy image localhost:5000/porter-hello:porter-37da5464f8517662657529ad34851db9...
The push refers to repository [localhost:5000/porter-hello]
7a3c7ac90876: Pushed
d90fe0dc0219: Pushed
8133469efab5: Pushed
4f4fb700ef54: Pushed
01e382c7c42f: Pushed
5bfc8f96be69: Pushed
be67e3c6dd67: Pushed
33fddb31ec5d: Pushed
16043bbe354c: Pushed
8e511068f5cd: Pushed
da00d385239d: Pushed
v0.1.0: digest: sha256:dd46f7f59c0354962d9949631aa4a13257f18eb731f396420107c41d184405a4 size: 856
Completed image localhost:5000/porter-hello:porter-37da5464f8517662657529ad34851db9 copy
Starting to copy image postgres@sha256:888402a8cd6075c5dc83a31f58287f13306c318eaad016661ed12e076f3e6341...

Version

Current master branch (e51b130)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants