Qsync times out a lot when tests are running sequentially #6965

Gerold103 · 2022-03-28T22:30:13Z

When I do this: python test-run.py qsync -j -1, I get some qsync timeout errors. When I run just python test-run.py qsync, all is fine. That is quite strange and deserves an investigation. The failing tests are:

replication/gh-5163-qsync-restart-crash

Test failed! Result content mismatch:
--- replication/gh-5163-qsync-restart-crash.result	Fri Feb 11 00:24:42 2022
+++ var/rejects/replication/gh-5163-qsync-restart-crash.reject	Tue Mar 29 00:24:49 2022
@@ -22,13 +22,13 @@
 
 box.space.sync:replace{1}
  | ---
- | - [1]
+ | - error: Quorum collection for a synchronous transaction is timed out
  | ...
 test_run:cmd('restart server default')
  | 
 box.space.sync:select{}
  | ---
- | - - [1]
+ | - []
  | ...
 box.space.sync:drop()

replication/gh-5288-qsync-recovery

Test failed! Result content mismatch:
--- replication/gh-5288-qsync-recovery.result	Fri Feb 11 00:24:42 2022
+++ var/rejects/replication/gh-5288-qsync-recovery.reject	Tue Mar 29 00:25:01 2022
@@ -17,7 +17,7 @@
  | ...
 s:insert{1}
  | ---
- | - [1]
+ | - error: Quorum collection for a synchronous transaction is timed out
  | ...
 box.snapshot()

replication/gh-5298-qsync-recovery-snap

Test failed! Result content mismatch:
--- replication/gh-5298-qsync-recovery-snap.result	Fri Feb 11 00:24:42 2022
+++ var/rejects/replication/gh-5298-qsync-recovery-snap.reject	Tue Mar 29 00:25:17 2022
@@ -22,6 +22,7 @@
  | ...
 for i = 1, 10 do box.space.sync:replace{i} end
  | ---
+ | - error: Quorum collection for a synchronous transaction is timed out
  | ...
 
 -- Local rows could affect this by increasing the signature.
@@ -46,7 +47,7 @@
 -- Could hang if the limbo would incorrectly handle the snapshot end.
 box.space.sync:replace{11}
  | ---
- | - [11]
+ | - error: Quorum collection for a synchronous transaction is timed out
  | ...
 
 old_synchro_quorum = box.cfg.replication_synchro_quorum
@@ -79,7 +80,6 @@
  | ...
 box.space.sync:get({11})
  | ---
- | - [11]
  | ...
 box.space.sync:get({12})

replication/gh-5446-qsync-eval-quorum (almost no output)

Test failed! Result content mismatch:
--- replication/gh-5446-qsync-eval-quorum.result	Fri Feb 11 00:24:42 2022
+++ var/rejects/replication/gh-5446-qsync-eval-quorum.reject	Tue Mar 29 00:27:19 2022
@@ -94,258 +94,3 @@
 
 -- Only one master node -> 1/2 + 1 = 1
 s:insert{1} -- should pass
- | ---
- | - [1]
- | ...

and nothing below this line. All red.

Then I gave up, next tests hang.

The text was updated successfully, but these errors were encountered:

Gerold103 · 2022-03-29T23:25:39Z

I could track down the problem I think - _cluster space is cleared. Here is a minimal reproducer: python test-run.py -j -1 gh-5140 gh-5163 --conf memtx. The test gh-5163 runs second and sees 2 rows in _cluster. That affects automatic synchro quorum calculation. It should see only one, this test doesn't start new instances.

Apparently, it is a leftover from gh-5140 after which test-run somewhy couldn't clear _cluster. I see some relevant code in tarantool/test-run/lib/tarantool-python/test/suites/lib/tarantool_python_ci.lua, but I don't know if it is called.

The solution is either get auto-clear fixed or find all the tests having the problem and stop using automatic synchro quorum in them - set it explicitly to 1, 2, etc.

Totktonada · 2022-04-08T15:47:23Z

test-run restarts the default server each time before run a test. All non-default server should be stopped at end of the test (even if the test itself don't do that). Data files (xlog, snap, vylog) should be cleaned up between test runs.

IOW, if one test really can leave something in a space and other test sees it, it is the bug in test-run.

Needed for tarantool/tarantool#6965 Closes tarantool/tarantool#6965

Gerold103 added bug Something isn't working replication qsync replication labels Mar 28, 2022

DifferentialOrange mentioned this issue Mar 31, 2022

Introduce GitHub CI tarantool/tarantool-python#213

Merged

kyukhin added the teamS label Apr 8, 2022

TarantoolBot removed the teamS label Jun 7, 2023

sergepetrenko assigned Astronomax Sep 6, 2024

Astronomax added a commit to Astronomax/test-run that referenced this issue Oct 25, 2024

Add default server restart before each test in consistent mode

d481316

Needed for tarantool/tarantool#6965 Closes tarantool/tarantool#6965

Astronomax linked a pull request Oct 25, 2024 that will close this issue

Add default server restart before each test in consistent mode tarantool/test-run#449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qsync times out a lot when tests are running sequentially #6965

Qsync times out a lot when tests are running sequentially #6965

Gerold103 commented Mar 28, 2022

Gerold103 commented Mar 29, 2022

Totktonada commented Apr 8, 2022

Qsync times out a lot when tests are running sequentially #6965

Qsync times out a lot when tests are running sequentially #6965

Comments

Gerold103 commented Mar 28, 2022

Gerold103 commented Mar 29, 2022

Totktonada commented Apr 8, 2022