You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that GPU memory is not being cleared properly, and the available variable space is diminishing with each image processed.
Interestingly, the code was functioning correctly before the latest commits made on December 12, 2024, and December 15, 2024. These recent changes might have introduced the problem.
2025-01-15T10:16:06.329-08:002025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1 | 2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1 | AllTraffic/i-098f59f854f0cab5b
-- | -- | --
| 2025-01-15T10:16:06.329-08:002025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error | 2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time | org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00#011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?] | #011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?] | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] | #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] | #011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] | #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] | #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] | AllTraffic/i-098f59f854f0cab5b
| 2025-01-15T10:16:06.329-08:00#011at java.lang.Thread.run(Thread.java:840) [?:?]
2025-01-15T10:16:06.329-08:00
2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?]
I have already done these and still getting the error after processing a few large images
Explicitly clear GPU memory: Use torch.cuda.empty_cache() to clear the GPU memory after processing each image.
Delete variables: Ensure that you delete any variables holding large tensors after they are no longer needed using del.
Use with torch.no_grad(): Wrap your inference code with torch.no_grad() to prevent PyTorch from storing intermediate values for backpropagation, which can save memory.
I encountered an issue while attempting to segment multiple images in a loop. The error I am receiving is:
"Backend worker did not respond in the given time."
org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_MODEL_LOADED
It appears that GPU memory is not being cleared properly, and the available variable space is diminishing with each image processed.
Interestingly, the code was functioning correctly before the latest commits made on December 12, 2024, and December 15, 2024. These recent changes might have introduced the problem.
2025-01-15T10:16:06.329-08:002025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1 | 2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1 | AllTraffic/i-098f59f854f0cab5b -- | -- | -- | 2025-01-15T10:16:06.329-08:002025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error | 2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time | org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00#011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?] | #011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?] | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] | #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] | #011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] | #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] | #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] | AllTraffic/i-098f59f854f0cab5b | 2025-01-15T10:16:06.329-08:00#011at java.lang.Thread.run(Thread.java:840) [?:?] 2025-01-15T10:16:06.329-08:00 2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 12025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
2025-01-15T18:16:06,223 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?]
#011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:247) [model-server.jar:?]
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
#011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
AllTraffic/i-098f59f854f0cab5b
2025-01-15T10:16:06.329-08:00
#011at java.lang.Thread.run(Thread.java:840) [?:?]
The text was updated successfully, but these errors were encountered: