-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler test failing #1888
Comments
Probably Python 3.10 related, like your other issue. We didn't test anything with 3.10. |
Just checked - same failure in environments with 3.8 and 3.9 |
Strange - it's passing on the server e.g. https://nixbld.m-labs.hk/build/120442/log |
Could this just be a timing/event ordering issue? I haven't had a look at this particular test, but some of them do rely on implicitly defining order via |
Certainly seems to be, but I don't know how that order could be defined more rigidly. The "issue" seems to be that experiment 2 is taking longer to prepare on my machine than the test server, so experiment 0 is reaching the run stage before 2 is done preparing; but the prepare stage is empty so I don't see how it can be made faster.
Agreed. I repro'd this on a second W10 machine with python=3.9, package list generated was identical except for |
I haven't looked into what this particular test would require, but for testing def _make_status_events(self, rid, previous_notify=None):
status_events = {
key: asyncio.Event()
for key in ("prepare_done", "running", "paused", "run_done",
"deleting", "idle")
}
def notify(mod):
if (mod["path"] == [rid] and mod["action"] == "setitem"
and mod["key"] == "status"):
event = status_events.get(mod["value"], None)
if event:
event.set()
if previous_notify:
previous_notify(mod)
return status_events, notify
def _make_scheduler_for_idle_test(self):
class Flags:
def __init__(self):
self.be_idle = False
self.take_long_in_idle = False
self.termination_ok = False
flags = Flags()
def get_dataset(name):
return getattr(flags, name)
def check_termination(mod):
self.assertEqual(
mod, {
"action": "setitem",
"key": "termination_ok",
"value": (False, True),
"path": []
})
flags.termination_ok = True
handlers = {
"get_dataset": get_dataset,
"update_dataset": check_termination
}
scheduler = Scheduler(_RIDCounter(0), handlers, None, None)
status_events, notify = self._make_status_events(0)
scheduler.notifier.publish = notify
scheduler.start()
expid_idle = _get_expid("IdleExperiment")
# Suppress the SystemExit backtrace when worker process is killed.
expid_idle["log_level"] = logging.CRITICAL
scheduler.submit("main", expid_idle, 0, None, False)
return scheduler, status_events, flags
def test_close_with_idle_running(self):
"""Check scheduler exits with experiment still idle"""
loop = self.loop
scheduler, status_events, flags = self._make_scheduler_for_idle_test()
loop.run_until_complete(status_events["running"].wait())
flags.be_idle = True
loop.run_until_complete(status_events["idle"].wait())
loop.run_until_complete(scheduler.stop())
def test_close_with_prepared_bg(self):
"""Check scheduler exits when there is still a prepare_done experiment"""
loop = self.loop
scheduler, status_events, _ = self._make_scheduler_for_idle_test()
loop.run_until_complete(status_events["running"].wait())
# Submit lower-priority experiment that is still waiting to be run when
# the scheduler is terminated.
expid = _get_expid("EmptyExperiment")
scheduler.submit("main", expid, -1, None, False)
loop.run_until_complete(scheduler.stop())
def test_resume_idle(self):
"""Check scheduler resumes previously idle experiments"""
loop = self.loop
scheduler, status_events, flags = self._make_scheduler_for_idle_test()
loop.run_until_complete(status_events["running"].wait())
# Make sure we can un-idle by returning False from the idle callback.
flags.be_idle = True
loop.run_until_complete(status_events["idle"].wait())
status_events["running"].clear()
flags.be_idle = False
loop.run_until_complete(status_events["running"].wait())
# Make sure we can un-idle by requesting termination.
flags.be_idle = True
loop.run_until_complete(status_events["idle"].wait())
self.assertFalse(flags.termination_ok)
scheduler.request_termination(0)
loop.run_until_complete(status_events["deleting"].wait())
self.assertTrue(flags.termination_ok)
loop.run_until_complete(scheduler.stop())
def test_idle_bg(self):
"""Check scheduler runs lower-priority experiments while idle"""
loop = self.loop
scheduler, idle_status_events, flags = self._make_scheduler_for_idle_test()
bg_status_events, notify = self._make_status_events(
1, scheduler.notifier.publish)
scheduler.notifier.publish = notify
loop.run_until_complete(idle_status_events["running"].wait())
# Submit experiment with lower priority.
expid = _get_expid("BackgroundExperiment")
scheduler.submit("main", expid, -1, None, False)
check_pause_1 = lambda: loop.run_until_complete(scheduler.check_pause(1))
loop.run_until_complete(bg_status_events["prepare_done"].wait())
flags.be_idle = True
loop.run_until_complete(bg_status_events["running"].wait())
self.assertFalse(check_pause_1())
idle_status_events["running"].clear()
flags.be_idle = False
loop.run_until_complete(idle_status_events["running"].wait())
bg_status_events["running"].clear()
flags.be_idle = True
self.assertFalse(check_pause_1())
loop.run_until_complete(bg_status_events["running"].wait())
idle_status_events["running"].clear()
flags.be_idle = False
self.assertTrue(check_pause_1())
loop.run_until_complete(idle_status_events["running"].wait())
scheduler.request_termination(0)
scheduler.request_termination(1)
loop.run_until_complete(bg_status_events["deleting"].wait())
loop.run_until_complete(scheduler.stop())
def test_idle_timeout(self):
"""Check that blocking is_idle_callback times out"""
loop = self.loop
scheduler, status_events, flags = self._make_scheduler_for_idle_test()
# Submit empty experiment with lower priority.
empty_status_events, notify = self._make_status_events(
1, scheduler.notifier.publish)
scheduler.notifier.publish = notify
expid = _get_expid("EmptyExperiment")
scheduler.submit("main", expid, -1, None, False)
# Make the is_idle_callback block; silencing the expected error
# message in the log output.
logging.disable(logging.ERROR)
flags.be_idle = True
flags.take_long_in_idle = True
loop.run_until_complete(status_events["deleting"].wait())
logging.disable(logging.NOTSET)
# Make sure EmptyExperiment completes normally now.
loop.run_until_complete(empty_status_events["run_done"].wait())
loop.run_until_complete(scheduler.stop()) |
Oh, I see. In this case it is probably simpler to have the test check relevant properties of the schedule instead of comparing it exactly to an expected schedule. Synchronizing the experiments to enforce a fully deterministic schedule would need complicated IPC. |
Which ones? Only moninj tests do that (to ensure a scan and event report by the firmware) IIRC. |
Oh, just checked the Git history, and apparently I did write the offending test myself two years ago – how embarassing (in fairness, I was probably just copying The above is an example on using
It might be that the 0.2 second intervals in which pause/check_pause are called by the test experiments don't actually matter, only some scheduler internals – it's been a while. |
Those are just to prevent 100% CPU utilization while it waits for |
Agree on checking properties; given the current overlap between this test and both |
I had this test fail, when my laptop cpu was forced to lowest frequency and other processes were running. With proper frequency restored this test passed. |
Bug Report
One-Line Summary
python -m unittest -v artiq.test.test_scheduler.SchedulerCase.test_pending_priority
fails.Issue Details
Disabling the assertion and printing out the order in which events come in shows that they are only in a slightly different order; I'm not sure if behaviour has changed recently for this to be expected. The order I see is:
These two items are reversed compared to the output expected by the test case:
Steps to Reproduce
python -m unittest -v artiq.test.test_scheduler.SchedulerCase.test_pending_priority
Expected Behavior
Test passes.
Actual (undesired) Behavior
Test fails.
Your System (omit irrelevant parts)
artiq_client --version
):ARTIQ v7.8044.c440f9f.beta
artiq_coremgmt -D .... log
) (not relevant)conda list
(please submit as a file attachment, as this tends to be long)conda-list.txt
The text was updated successfully, but these errors were encountered: