-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote workers don't receive submissions #1455
Comments
This is strange. We haven't changed anything that will break something. Can you please confirm the following:
|
hi,
Thanks for the quick response.
1. Yes, it works on default queue, but fails due to 20-minute time limit.
2. The broker URL has been working until the 24.05.24 - never changed it. The workers are communicating with each other (have multiple). It been working perfectly for more than a month
3. They are CPU clusters
4. Yes, I have tried to resubmit and rerun submissions
|
Where are your workers hosted? Are you using google cloud or another service? |
I am using Amazon Web Services (AWS) to run my remote workers on t3.xlarge instances (https://aws.amazon.com/ec2/instance-types/) |
Can you do the following to see if this helps:
|
If this does not work, create a compute worker on Google Cloud and see if that works. |
I have tried to follow your steps; unfortunately, it didn't change anything. I assume Codabench is run on Google Cloud. What are you running? |
Have you tried a Google Cloud worker? |
So far I've just been working with AWS and their cloud workers |
Please try google cloud. That should work. I am not sure what is the problem with AWS. Or maybe contact AWS support maybe they will help you |
Just to check if there is anything unusual. These are the output logs when starting my remote worker:
|
Also, have you made a guide how to set it up using Google Cloud Compute? |
Logs look good. For Google cloud the guidelines are the same. You have to go to google cloud console. Select VM instances. Click Create Instance (select storage, memory, location of VM, allow http/https traffic). You can access the VM through a shell provided on the console. The rest of the setup is the same |
I've set up remote working using Google Cloud; however, unfortunately, this hasn't changed anything. |
Please repeat the google cloud with a new queue. |
With a new queue I get the following:
Do you have any suggestions? I'm not sure if it's on Codabench or google cloud. |
Not sure what is happening there. Can you please list down the steps you are following to setup a worker and then linking it to your queue. @Didayolo do you have any idea? By the way I was recently using Google Cloud workers for a competition and I haven't faced any issue like this |
I follow the guide you have provided step-by-step. On both AWS and Google Cloud.
Everything was been working perfectly until Friday. |
You mean that you were able to process submissions on your workers in the past, and it stopped working? |
Yes, that's what has happened at our end. Taking over from @johanneskruse . |
Adding a few details, I've set up a new smaller version of the competition, the submissions for this new dummy competition run on the default queue without any issues but when I set up the new queue attached to a gcp worker, the submissions are stuck on "Submitting" and the worker gets no traffic, logs remain unchanged. The steup of the GCP worker is the same as @ihsaan-ullah has provided above.
We are generally following the steps given here for CPU workers - https://github.com/codalab/codabench/wiki/Compute-Worker-Management---Setup Thanks |
I'm running into this issue too, which can be reproduced by creating a competition using the example in https://github.com/codalab/competition-examples/tree/master/codabench/wheat_seeds.
This creates a benchmark, with the private queue configured as indicated by the GUI. Following the steps for running a compute worker (https://github.com/codalab/codabench/wiki/Compute-Worker-Management---Setup), create a
And spin a worker locally:
This container runs with the following logs:
Then, submitting the default sample It hangs from here. |
After reading #1457 and running the steps in my previous post without setting up a custom queue, I can confirm that even the default queue does not process jobs anymore. Are all competitions currently halted? |
Indeed, we have some issues with the default queue. Right now it is working again, by we are actively investigating the problem to avoid it happening again. You can follow this second problem here: #1446 |
On my side I am not able to reproduce the problem. I tried several custom queues and several workers, and they are receiving and computing the submissions without problem. Can you retry? Maybe it was linked to other problems of the platform (queue congestion, ...) |
My workers are now receiving submissions again! I haven't changed or done anything on my end. Thank you for the shift actions; I hope it stays stable for now. I will follow #1446 closely. |
Everything is probably linked. I'm closing this issue and keeping the other one open. |
Hi @Didayolo - it is happening again. None of my workers are receiving any submissions they are all in limbo. No error logs. Is there an explanation? |
Indeed I can see that the submissions are stuck in "Submitted" on your queue: https://www.codabench.org/server_status I don't know the reason. I'll investigate this. |
Hi @Didayolo, thank you for getting back! Are there anything we can do in the mean while or help to debugging this issue? |
I did start a new issue #1473 with more error logs. |
I'm now able to access the |
I tried to remove |
Dear Codabench team,
My remote workers have stopped receiving any traffic - is there an explanation, recent update, etc. From one day to the next, submissions are not being processed. Everything was good on May 23, 2024, but they stopped working on May 24, 2024.
I have multiple remote workers, and I do see that they are connected when turning on/off:
When using the default CPU queue, I can run submissions; however, due to the 20-minute limit, I have to use my own remote workers.
Link to Ekstra Bladet News Recommendation Competition
Best,
Johannes
The text was updated successfully, but these errors were encountered: