-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CTRL+C in new asyncio daemon while running a process does not kill it #4649
Comments
I am not sure I understand it correctly, it confused me a lot. I try to explain what I understand. @muhrin could you have a look and give some comments? It is not a real 'hang up'. In fact, in this case, the program is waiting for the end of the current state(
BTW I think the code touched in this issue also relevant to #4298, could be taken care of at the same time. |
Hi @unkcpz I totally see your point. But from an user experience point of view, I think what I really want is that also the time.sleep(60) is killed, quickly (and that the state of the calcfunction node is correctly updated: goes in kill state, is sealed etc.) I don't know why the behaviour was different before (pinging also @sphuber) but I find it more desirable (if I press CTRL+C, I want to stop as soon as possible, say within 3-5 seconds, but also setting up correctly the state of the calcjob, something that does not happen with |
I'm sorry, I'm not quite sure how all this works. I know Sebastiaan did some signal handler magic to try and clean things up a little more cleanly but I don't remember many of the details. |
Just want to link this to some potentially related issues and comments: #4603 (comment) and #3776 (comment) In particular, this certainly looks to relate to the fact that, even after a calcjob (run not submitted) has finished in an IPython shell (i.e. the cell finishes execution), if you In [1]: from aiida_sleep.cli import run_calc
In [2]: run_calc(payload=int(1e6), output=int(1e6))
In [5]: import time; time.sleep(60)
^C01/26/2021 03:42:51 PM <243> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 5 I don't think the calcjobs are ever actually finalising properly |
Also, I don't know if this is an issue with my calcjob, I start getting a bunch of error messages when I try to directly In [6]: run_calc(time=60)
^C01/26/2021 04:54:35 PM <426> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 67
01/26/2021 04:54:38 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:54:38 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 1 of do_upload excepted, retrying after 20 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:54:38 PM <426> aiida.scheduler.slurm: [ERROR] Error in _parse_kill_output: retval=1; stdout=; stderr=scancel: error: Invalid job id None
01/26/2021 04:54:38 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
01/26/2021 04:54:38 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 1 of do_kill excepted, retrying after 20 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
01/26/2021 04:54:58 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:54:58 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 2 of do_upload excepted, retrying after 40 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:55:04 PM <426> aiida.scheduler.slurm: [ERROR] Error in _parse_kill_output: retval=1; stdout=; stderr=scancel: error: Invalid job id None
01/26/2021 04:55:04 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
01/26/2021 04:55:04 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 2 of do_kill excepted, retrying after 40 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
^C01/26/2021 04:55:06 PM <426> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 67
^C01/26/2021 04:55:07 PM <426> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 67
01/26/2021 04:55:38 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:55:38 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 3 of do_upload excepted, retrying after 80 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:55:49 PM <426> aiida.scheduler.slurm: [ERROR] Error in _parse_kill_output: retval=1; stdout=; stderr=scancel: error: Invalid job id None
01/26/2021 04:55:49 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
01/26/2021 04:55:49 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 3 of do_kill excepted, retrying after 80 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
01/26/2021 04:56:58 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:56:58 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 4 of do_upload excepted, retrying after 160 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 376, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
01/26/2021 04:57:14 PM <426> aiida.scheduler.slurm: [ERROR] Error in _parse_kill_output: retval=1; stdout=; stderr=scancel: error: Invalid job id None
01/26/2021 04:57:14 PM <426> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found
01/26/2021 04:57:14 PM <426> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [ERROR] iteration 4 of do_kill excepted, retrying after 160 seconds
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/utils.py", line 188, in exponential_backoff_retry
result = await coro()
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 317, in do_kill
return execmanager.kill_calculation(node, transport)
File "/root/aiida-core/aiida/engine/daemon/execmanager.py", line 426, in kill_calculation
running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
File "/root/aiida-core/aiida/schedulers/scheduler.py", line 341, in get_jobs
retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
File "/root/aiida-core/aiida/schedulers/plugins/slurm.py", line 235, in _get_joblist_command
command.append(f"--jobs={','.join(joblist)}")
TypeError: sequence item 0: expected str instance, NoneType found This happens if I do In [3]: run_calc(time=60)
^C01/26/2021 05:01:51 PM <477> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 70
01/26/2021 05:01:51 PM <477> aiida.engine.processes.calcjobs.tasks: [WARNING] CalcJob<70> killed, it was in the CalcJobState.SUBMITTING state
01/26/2021 05:01:51 PM <477> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 105, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 137, in do_submit
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 384, in execute
await self._launch_task(task_submit_job, node, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 434, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Process was killed because the runner received an interrupt
---------------------------------------------------------------------------
KilledError Traceback (most recent call last)
<ipython-input-3-f7a7fd3871de> in <module>
----> 1 run_calc(time=60)
/opt/venv/lib/python3.8/site-packages/aiida_sleep/cli.py in run_calc(code, time, payload, output, fail, submit)
61 node = submit_func(builder)
62 else:
---> 63 node = run_get_node(builder).node
64
65 return node
~/aiida-core/aiida/engine/launch.py in run_get_node(process, *args, **inputs)
56 runner = manager.get_manager().get_runner()
57
---> 58 return runner.run_get_node(process, *args, **inputs)
59
60
~/aiida-core/aiida/engine/runners.py in run_get_node(self, process, *args, **inputs)
266 :return: tuple of the outputs of the process and the calculation node
267 """
--> 268 result, node = self._run(process, *args, **inputs)
269 return ResultAndNode(result, node)
270
~/aiida-core/aiida/engine/runners.py in _run(self, process, *args, **inputs)
242 signal.signal(signal.SIGTERM, kill_process)
243
--> 244 process_inited.execute()
245 return process_inited.outputs, process_inited.node
246
/opt/venv/lib/python3.8/site-packages/plumpy/processes.py in func_wrapper(self, *args, **kwargs)
77 if self._closed:
78 raise exceptions.ClosedError('Process is closed')
---> 79 return func(self, *args, **kwargs)
80
81 return func_wrapper
/opt/venv/lib/python3.8/site-packages/plumpy/processes.py in execute(self)
1148 self.loop.run_until_complete(self.step_until_terminated())
1149
-> 1150 return self.future().result()
1151
1152 @ensure_not_closed
KilledError: Process was killed because the runner received an interrupt |
To note, now that #4682 has been merged, the Firtly, it might be good to add to: aiida-core/aiida/engine/runners.py Lines 236 to 238 in 97cecd2
something like: def kill_process(_num, _frame):
"""Send the kill signal to the process in the current scope."""
if process_inited._killing:
LOGGER.critical('process %s is already being killed', process_inited.pid)
else:
LOGGER.critical('runner received interrupt, killing process %s', process_inited.pid) so it does not seem like the interrupt did not work. In terms of the time it takes to kill, it only seems to take a few seconds for me, e.g. if I run:
which is a calculation that calls sleep for 300 seconds, then CTRL-C after about 10 seconds, it only takes a few seconds to kill. @giovannipizzi, if you only do |
One thing that does seem to be an additional issue, is that the exponentional backoff does not appear to respect the
|
I think the latter may simply be a problem only with the |
sounds like a plan 👍 |
The last additional issue (which I alluded to in #4649 (comment)), is that it seems you can kill the process before a Job ID is assigned, and so here: aiida-core/aiida/engine/daemon/execmanager.py Line 414 in 97cecd2
it returns ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/aiida-core/aiida/engine/processes/calcjobs/tasks.py in task_kill_job(node, transport_queue, cancellable)
320 logger.info(f'scheduled request to kill CalcJob<{node.pk}>')
--> 321 result = await exponential_backoff_retry(do_kill, initial_interval, max_attempts, logger=node.logger)
322 except plumpy.process_states.Interruption:
~/aiida-core/aiida/engine/utils.py in exponential_backoff_retry(fct, initial_interval, max_attempts, logger, ignore_exceptions)
187 try:
--> 188 result = await coro()
189 break # Finished successfully
~/aiida-core/aiida/engine/processes/calcjobs/tasks.py in do_kill()
316 transport = await cancellable.with_interrupt(request)
--> 317 return execmanager.kill_calculation(node, transport)
318
~/aiida-core/aiida/engine/daemon/execmanager.py in kill_calculation(calculation, transport)
425 # Failed to kill because the job might have already been completed
--> 426 running_jobs = scheduler.get_jobs(jobs=[job_id], as_dict=True)
427 job = running_jobs.get(job_id, None)
~/aiida-core/aiida/schedulers/scheduler.py in get_jobs(self, jobs, user, as_dict)
340 with self.transport:
--> 341 retval, stdout, stderr = self.transport.exec_command_wait(self._get_joblist_command(jobs=jobs, user=user))
342
~/aiida-core/aiida/schedulers/plugins/slurm.py in _get_joblist_command(self, jobs, user)
234
--> 235 command.append(f"--jobs={','.join(joblist)}")
236
TypeError: sequence item 0: expected str instance, NoneType found |
hi @chrisjsewell I think what you did in #4682 is similar to 3771945 for aiida-core/aiida/engine/processes/functions.py Lines 148 to 149 in 97cecd2
However, since ProcessFunction already reset the signal handler to the original one, ctrl+C not stop the running ProcessFunction and give control back to user in IPython.
As I mentioned #4649 (comment)
Calling ctrl+C in old tornado aiida_core give the control back immediately while asyncio version not. Moreover, what I understand is in asyncio the recommend way of signal handling is by adding signal handler to the loop. The catch in the runner case is the process actually doing synchronous task (like the Or am I missing something? |
I'm not quite sure what you are asking @unkcpz 😬 The key point for me is that the signal handlers are generally working as expected (you press (I also just want to link a comment by @sphuber here, with some additional discussion: #4603 (comment)) |
You mean the code version of the original issue at the top, or in the most recent develop after the fixes? Let me know if you need me to do more testing (and on which version) of this info is enough. |
@chrisjsewell Sorry for not clearly express my question with necessary context. My question comes with two aspects:
|
Ok so firstly:
I not sure if this is necessary yet, one issue with it appears that you cannot retrieve the current signal handler and so reinstate them at the end of the function, as we do now (see https://docs.python.org/3/library/asyncio-eventloop.html#unix-signals) Secondly, to do this right it is actually a bit more complicated. $ aiida-sleep workchain -nc 2
<CTRL-C>
...
$ verdi process list -ap1
10616 26s ago SleepWorkChain ☠ Killed Process was killed because the runner received an interrupt
10617 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: upload
10618 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: upload Here is a proof of concept, whereby on the first def _run(self, process: TYPE_RUN_PROCESS, *args: Any, **inputs: Any) -> Tuple[Dict[str, Any], ProcessNode]:
"""
Run the process with the supplied inputs in this runner that will block until the process is completed.
The return value will be the results of the completed process
:param process: the process class or process function to run
:param inputs: the inputs to be passed to the process
:return: tuple of the outputs of the process and the calculation node
"""
assert not self._closed
if utils.is_process_function(process):
result, node = process.run_get_node(*args, **inputs) # type: ignore[union-attr]
return result, node
with utils.loop_scope(self.loop):
process_inited = self.instantiate_process(process, *args, **inputs)
run_future = self.loop.create_future()
def _callback(_future):
if not run_future.done():
run_future.set_result(True)
process_inited.future().add_done_callback(_callback)
class _sig_count:
value = 0
def kill_process(_num, _frame):
"""Send the kill signal to the process in the current scope."""
_sig_count.value += 1
if _sig_count.value > 1:
LOGGER.critical('runner received second interrupt, terminating')
run_future.set_exception(RuntimeError('kill process terminated'))
return
LOGGER.critical('runner received interrupt, killing process %s', process_inited.pid)
process_inited.future().remove_done_callback(_callback)
process_inited.kill(msg='Process was killed because the runner received an interrupt')
import gc
futures = [asyncio.wrap_future(o.future()) for o in gc.get_objects() if hasattr(o, '__class__') and isinstance(o, Process)]
futures = asyncio.gather(*futures, return_exceptions=True)
futures.add_done_callback(_callback)
original_handler_int = signal.getsignal(signal.SIGINT)
original_handler_term = signal.getsignal(signal.SIGTERM)
async def _run():
await run_future
try:
signal.signal(signal.SIGINT, kill_process)
signal.signal(signal.SIGTERM, kill_process)
process_inited.loop.create_task(process_inited.step_until_terminated())
self.loop.run_until_complete(_run())
finally:
signal.signal(signal.SIGINT, original_handler_int)
signal.signal(signal.SIGTERM, original_handler_term)
return process_inited.outputs, process_inited.node thoughts? |
note at present there is an issue that the exceptions are not caught and the processes try to go into an excepted state:
but it does correctly set all processes as killed: $ verdi process list -ap1
10622 10s ago SleepWorkChain ☠ Killed Process was killed because the runner received an interrupt
10623 10s ago SleepCalculation ☠ Killed Killed by parent<10622>
10624 10s ago SleepCalculation ☠ Killed Killed by parent<10622> |
ok if we catch the exceptions, this works "properly" for with utils.loop_scope(self.loop):
self.loop.set_exception_handler(lambda loop, context: None) it also works for In [7]: run_workchain(2, time=10)
^C02/23/2021 03:28:37 PM <1502764> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 10658
02/23/2021 03:28:37 PM <1502764> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 110, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 367, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 425, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Killed by parent<10658>
02/23/2021 03:28:37 PM <1502764> aiida.engine.transports: [ERROR] Exception whilst using transport:
Traceback (most recent call last):
File "/root/aiida-core/aiida/engine/transports.py", line 110, in request_transport
yield transport_request.future
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 78, in do_upload
transport = await cancellable.with_interrupt(request)
File "/root/aiida-core/aiida/engine/utils.py", line 95, in with_interrupt
result = await next(wait_iter)
File "/usr/lib/python3.8/asyncio/tasks.py", line 608, in _wait_for_one
return f.result() # May raise f.exception().
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 367, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 425, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
plumpy.process_states.KillInterruption: Killed by parent<10658>
^C02/23/2021 03:28:38 PM <1502764> aiida.engine.runners: [CRITICAL] runner received second interrupt, terminating
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-7-5d65dc24cb6b> in <module>
----> 1 run_workchain(2, time=10)
~/venv/lib/python3.8/site-packages/aiida_sleep/cli.py in run_workchain(number, code, time, payload, output_dict, output_array, fail, submit)
215 node = submit_func(builder)
216 else:
--> 217 node = run_get_node(builder).node
218
219 return node
~/aiida-core/aiida/engine/launch.py in run_get_node(process, *args, **inputs)
56 runner = manager.get_manager().get_runner()
57
---> 58 return runner.run_get_node(process, *args, **inputs)
59
60
~/aiida-core/aiida/engine/runners.py in run_get_node(self, process, *args, **inputs)
304 :return: tuple of the outputs of the process and the calculation node
305 """
--> 306 result, node = self._run(process, *args, **inputs)
307 return ResultAndNode(result, node)
308
~/aiida-core/aiida/engine/runners.py in _run(self, process, *args, **inputs)
276 signal.signal(signal.SIGTERM, kill_process)
277 process_inited.loop.create_task(process_inited.step_until_terminated())
--> 278 self.loop.run_until_complete(_run())
279 finally:
280 signal.signal(signal.SIGINT, original_handler_int)
~/venv/lib/python3.8/site-packages/nest_asyncio.py in run_until_complete(self, future)
96 raise RuntimeError(
97 'Event loop stopped before Future completed.')
---> 98 return f.result()
99 finally:
100 self._thread_id = old_thread_id
/usr/lib/python3.8/asyncio/futures.py in result(self)
176 self.__log_traceback = False
177 if self._exception is not None:
--> 178 raise self._exception
179 return self._result
180
/usr/lib/python3.8/asyncio/tasks.py in __step(***failed resolving arguments***)
280 result = coro.send(None)
281 else:
--> 282 result = coro.throw(exc)
283 except StopIteration as exc:
284 if self._must_cancel:
~/aiida-core/aiida/engine/runners.py in _run()
270
271 async def _run():
--> 272 await run_future
273
274 try:
/usr/lib/python3.8/asyncio/futures.py in __await__(self)
258 if not self.done():
259 self._asyncio_future_blocking = True
--> 260 yield self # This tells Task to wait for completion.
261 if not self.done():
262 raise RuntimeError("await wasn't used with future")
/usr/lib/python3.8/asyncio/tasks.py in __wakeup(self, future)
347 def __wakeup(self, future):
348 try:
--> 349 future.result()
350 except BaseException as exc:
351 # This may also be a cancellation.
/usr/lib/python3.8/asyncio/futures.py in result(self)
176 self.__log_traceback = False
177 if self._exception is not None:
--> 178 raise self._exception
179 return self._result
180
RuntimeError: kill process terminated
Unhandled exception in event loop:
File "/usr/lib/python3.8/asyncio/tasks.py", line 280, in __step
result = coro.send(None)
File "/root/venv/lib/python3.8/site-packages/plumpy/events.py", line 106, in run
self._process.callback_excepted(self._callback, exc_info[1], exc_info[2])
File "/root/venv/lib/python3.8/site-packages/plumpy/processes.py", line 513, in callback_excepted
self.fail(exception, trace)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 97, in transition
result = wrapped(self, *a, **kw)
File "/root/venv/lib/python3.8/site-packages/plumpy/processes.py", line 1080, in fail
self.transition_to(process_states.ProcessState.EXCEPTED, exception, trace_back)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 335, in transition_to
self.transition_failed(initial_state_label, label, *sys.exc_info()[1:])
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 351, in transition_failed
raise exception.with_traceback(trace)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 317, in transition_to
self._exit_current_state(new_state)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 376, in _exit_current_state
raise RuntimeError('Cannot transition from {} to {}'.format(self._state.LABEL, next_state.label))
Exception Cannot transition from ProcessState.KILLED to ProcessState.EXCEPTED
Press ENTER to continue...
Unhandled exception in event loop:
File "/usr/lib/python3.8/asyncio/tasks.py", line 280, in __step
result = coro.send(None)
File "/root/venv/lib/python3.8/site-packages/plumpy/events.py", line 106, in run
self._process.callback_excepted(self._callback, exc_info[1], exc_info[2])
File "/root/venv/lib/python3.8/site-packages/plumpy/processes.py", line 513, in callback_excepted
self.fail(exception, trace)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 97, in transition
result = wrapped(self, *a, **kw)
File "/root/venv/lib/python3.8/site-packages/plumpy/processes.py", line 1080, in fail
self.transition_to(process_states.ProcessState.EXCEPTED, exception, trace_back)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 335, in transition_to
self.transition_failed(initial_state_label, label, *sys.exc_info()[1:])
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 351, in transition_failed
raise exception.with_traceback(trace)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 317, in transition_to
self._exit_current_state(new_state)
File "/root/venv/lib/python3.8/site-packages/plumpy/base/state_machine.py", line 376, in _exit_current_state
raise RuntimeError('Cannot transition from {} to {}'.format(self._state.LABEL, next_state.label))
Exception Cannot transition from ProcessState.KILLED to ProcessState.EXCEPTED
Press ENTER to continue...
Unhandled exception in event loop:
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 367, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 425, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
Exception Killed by parent<10658>
Press ENTER to continue...
Unhandled exception in event loop:
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 367, in execute
skip_submit = await self._launch_task(task_upload_job, self.process, transport_queue)
File "/root/aiida-core/aiida/engine/processes/calcjobs/tasks.py", line 425, in _launch_task
result = await self._task
File "/usr/lib/python3.8/asyncio/tasks.py", line 349, in __wakeup
future.result()
Exception Killed by parent<10658>
Press ENTER to continue... |
actually the error above is the same as #3888 |
so I've opened a draft PR #4784 for a more complete way to deal with killing Processes +children, |
I tried this (using So for now I am going to move both to the v1.6.1 milestone, as not being a "critical issue". |
Tried in Quantum Mobile v20.11.2a, with Python version 3.7.9, and was not able to reproduce the hanging issue: $ aiida-common-workflows launch relax -S Al -- quantum_espresso
Running a QuantumEspressoRelaxWorkChain...
03/10/2021 06:42:52 AM <3226715> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [5132|PwRelaxWorkChain|setup]: No change in volume possible for the provided base input parameters. Meta convergence is turned off.
03/10/2021 06:42:53 AM <3226715> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [5132|PwRelaxWorkChain|run_relax]: launching PwBaseWorkChain<5135>
/home/max/codes/aiida-quantumespresso/aiida_quantumespresso/utils/pseudopotential.py:35: AiidaDeprecationWarning: this function is deprecated and will be removed in `v4.0.0`.
warnings.warn('this function is deprecated and will be removed in `v4.0.0`.', AiidaDeprecationWarning)
03/10/2021 06:42:53 AM <3226715> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [5135|PwBaseWorkChain|run_process]: launching PwCalculation<5140> iteration #1
^C03/10/2021 06:42:56 AM <3226715> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 5131
Traceback (most recent call last):
File "/home/max/.virtualenvs/aiida/bin/aiida-common-workflows", line 11, in <module>
load_entry_point('aiida-common-workflows', 'console_scripts', 'aiida-common-workflows')()
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/max/codes/aiida-common-workflows/aiida_common_workflows/cli/launch.py", line 120, in cmd_relax
utils.launch_process(builder, daemon)
File "/home/max/codes/aiida-common-workflows/aiida_common_workflows/cli/utils.py", line 59, in launch_process
_, node = launch.run_get_node(process, **inputs)
File "/home/max/codes/aiida-core/aiida/engine/launch.py", line 58, in run_get_node
return runner.run_get_node(process, *args, **inputs)
File "/home/max/codes/aiida-core/aiida/engine/runners.py", line 278, in run_get_node
result, node = self._run(process, *args, **inputs)
File "/home/max/codes/aiida-core/aiida/engine/runners.py", line 250, in _run
process_inited.execute()
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/plumpy/processes.py", line 81, in func_wrapper
return func(self, *args, **kwargs)
File "/home/max/.virtualenvs/aiida/lib/python3.7/site-packages/plumpy/processes.py", line 1166, in execute
return self.future().result()
plumpy.exceptions.KilledError: Process was killed because the runner received an interrupt |
Indeed: $ verdi process list -ap1
PK Created Process label Process State Process status
---- --------- ----------------------------- --------------- -----------------------------------------------------------
5131 58m ago QuantumEspressoRelaxWorkChain ☠ Killed Process was killed because the runner received an interrupt
5132 58m ago PwRelaxWorkChain ⏵ Waiting Waiting for child processes: 5135
5135 58m ago PwBaseWorkChain ⏵ Waiting Waiting for child processes: 5140
5136 58m ago create_kpoints_from_distance ⏹ Finished [0]
5140 58m ago PwCalculation ⏵ Waiting Monitoring scheduler: job state QUEUED
Total results: 5
Info: last time an entry changed state: 58m ago (at 06:42:56 on 2021-03-10) |
This is for instance an output I get, when running a CalcJob in ipython and then pressing CTRL+C multiple times:
To kill it I had to do a
kill -9
instead.In the previous implementation (v1.5.x and previous) this would kill the process and I could get back control e.g. if I was in ipython.
@sphuber @muhrin @unkcpz @chrisjsewell
The text was updated successfully, but these errors were encountered: