More tests #176

jmid · 2021-09-11T18:14:51Z

This PR

adds tuple and bind tests (both positive, negative, and statistics).
I also added a manual shrinker and test for the IntTree test.
I grouped test-names per sub-module as the long list at the end was getting hard to maintain.

Edit: I've now also

added shrink-logging tests to compare the total number of shrinking attempts.
separated the tests into a separate "test library" and runner.
Then we have one runner for the expect tests in the CI - while other runners can reuse (some of) the same tests,
e.g., for local shrinker benchmarking.

Here's the diff -y of the new test outputs. Note how QCheck2's int shrinking strategy generally spends
less successful shrinking steps (again) - as have been discussed in, e.g., PR #153 and #173:

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have different components failed (0 shrink steps):		Test pairs have different components failed (0 shrink steps):

(4, 4)									(4, 4)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have same components failed (125 shrink steps):	     |	Test pairs have same components failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have a zero component failed (124 shrink steps):	     |	Test pairs have a zero component failed (122 shrink steps):

(-1, 1)								     |	(1, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are (0,0) failed (125 shrink steps):			     |	Test pairs are (0,0) failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are ordered failed (125 shrink steps):		     |	Test pairs are ordered failed (2 shrink steps):

(0, -1)									(0, -1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are ordered reversely failed (125 shrink steps):	     |	Test pairs are ordered reversely failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs sum to less than 128 failed (121 shrink steps):	     |	Test pairs sum to less than 128 failed (59 shrink steps):

(0, 128)								(0, 128)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples have pair-wise different components failed (7 shrink st |	Test triples have pair-wise different components failed (3 shrink st

(0, 7, 7)							     |	(0, 0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples have same components failed (188 shrink steps):	     |	Test triples have same components failed (64 shrink steps):

(0, -1, 0)							     |	(0, 1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples are ordered failed (188 shrink steps):		     |	Test triples are ordered failed (3 shrink steps):

(0, -1, 0)								(0, -1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples are ordered reversely failed (188 shrink steps):	     |	Test triples are ordered reversely failed (64 shrink steps):

(0, 0, 1)								(0, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples have pair-wise different components failed (23 shrin |	Test quadruples have pair-wise different components failed (4 shrink

(0, 0, 0, 0)								(0, 0, 0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples have same components failed (250 shrink steps):	     |	Test quadruples have same components failed (126 shrink steps):

(0, 1, 0, 1)								(0, 1, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples are ordered failed (251 shrink steps):		     |	Test quadruples are ordered failed (5 shrink steps):

(0, 0, -1, 0)								(0, 0, -1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples are ordered reversely failed (251 shrink steps):     |	Test quadruples are ordered reversely failed (66 shrink steps):

(0, 0, 0, 1)								(0, 0, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test bind ordered pairs failed (123 shrink steps):		     |	Test bind ordered pairs failed (1 shrink steps):

(0, 0)									(0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test bind list_size constant failed (261 shrink steps):		     |	Test bind list_size constant failed (15 shrink steps):

(4, [0; 0; 0; 0])							(4, [0; 0; 0; 0])

As a bonus, there are also pretty (and completely identical) histograms to be found:

+++ Stats for quad dist ++++++++++++++++++++++++++++++++++++++++++++	+++ Stats for quad dist ++++++++++++++++++++++++++++++++++++++++++++

stats quad sum:								stats quad sum:
  num: 500000, avg: 200.13, stddev: 58.33, median 200, min 5, max 39	  num: 500000, avg: 200.13, stddev: 58.33, median 200, min 5, max 39
    5.. 24:                                                         	    5.. 24:                                                         
   25.. 44:                                                         	   25.. 44:                                                         
   45.. 64: ##                                                      	   45.. 64: ##                                                      
   65.. 84: ######                                                  	   65.. 84: ######                                                  
   85..104: ############                                            	   85..104: ############                                            
  105..124: #####################                                   	  105..124: #####################                                   
  125..144: ###############################                         	  125..144: ###############################                         
  145..164: ##########################################              	  145..164: ##########################################              
  165..184: ##################################################      	  165..184: ##################################################      
  185..204: ####################################################### 	  185..204: ####################################################### 
  205..224: #####################################################   	  205..224: #####################################################   
  225..244: ###############################################         	  225..244: ###############################################         
  245..264: ######################################                  	  245..264: ######################################                  
  265..284: ##########################                              	  265..284: ##########################                              
  285..304: ################                                        	  285..304: ################                                        
  305..324: #########                                               	  305..324: #########                                               
  325..344: ####                                                    	  325..344: ####                                                    
  345..364: #                                                       	  345..364: #                                                       
  365..384:                                                         	  365..384:                                                         
  385..404:                                                         	  385..404:                                                         

+++ Stats for bind dist ++++++++++++++++++++++++++++++++++++++++++++	+++ Stats for bind dist ++++++++++++++++++++++++++++++++++++++++++++

stats ordered pair difference:						stats ordered pair difference:
  num: 1000000, avg: 25.02, stddev: 22.36, median 19, min 0, max 100	  num: 1000000, avg: 25.02, stddev: 22.36, median 19, min 0, max 100
    0..  4: ####################################################### 	    0..  4: ####################################################### 
    5..  9: #####################################                   	    5..  9: #####################################                   
   10.. 14: #############################                           	   10.. 14: #############################                           
   15.. 19: ########################                                	   15.. 19: ########################                                
   20.. 24: #####################                                   	   20.. 24: #####################                                   
   25.. 29: ##################                                      	   25.. 29: ##################                                      
   30.. 34: ################                                        	   30.. 34: ################                                        
   35.. 39: #############                                           	   35.. 39: #############                                           
   40.. 44: ############                                            	   40.. 44: ############                                            
   45.. 49: ##########                                              	   45.. 49: ##########                                              
   50.. 54: #########                                               	   50.. 54: #########                                               
   55.. 59: ########                                                	   55.. 59: ########                                                
   60.. 64: ######                                                  	   60.. 64: ######                                                  
   65.. 69: #####                                                   	   65.. 69: #####                                                   
   70.. 74: ####                                                    	   70.. 74: ####                                                    
   75.. 79: ###                                                     	   75.. 79: ###                                                     
   80.. 84: ##                                                      	   80.. 84: ##                                                      
   85.. 89: ##                                                      	   85.. 89: ##                                                      
   90.. 94: #                                                       	   90.. 94: #                                                       
   95.. 99:                                                         	   95.. 99:                                                         
  100..104:                                                         	  100..104:                                                         

stats ordered pair sum:							stats ordered pair sum:
  num: 1000000, avg: 75.12, stddev: 46.93, median 72, min 0, max 200	  num: 1000000, avg: 75.12, stddev: 46.93, median 72, min 0, max 200
    0..  9: ####################################################### 	    0..  9: ####################################################### 
   10.. 19: #####################################################   	   10.. 19: #####################################################   
   20.. 29: #####################################################   	   20.. 29: #####################################################   
   30.. 39: #####################################################   	   30.. 39: #####################################################   
   40.. 49: #####################################################   	   40.. 49: #####################################################   
   50.. 59: #####################################################   	   50.. 59: #####################################################   
   60.. 69: #####################################################   	   60.. 69: #####################################################   
   70.. 79: #####################################################   	   70.. 79: #####################################################   
   80.. 89: #####################################################   	   80.. 89: #####################################################   
   90.. 99: #####################################################   	   90.. 99: #####################################################   
  100..109: ##################################################      	  100..109: ##################################################      
  110..119: ###########################################             	  110..119: ###########################################             
  120..129: #####################################                   	  120..129: #####################################                   
  130..139: ###############################                         	  130..139: ###############################                         
  140..149: #########################                               	  140..149: #########################                               
  150..159: ####################                                    	  150..159: ####################                                    
  160..169: ###############                                         	  160..169: ###############                                         
  170..179: ###########                                             	  170..179: ###########                                             
  180..189: ######                                                  	  180..189: ######                                                  
  190..199: ##                                                      	  190..199: ##                                                      
  200..209:                                                         	  200..209:

Edit: This again builds on top of #172 and #174 (merge! merge! 😄)

jmid · 2021-09-12T15:53:47Z

A few observations:

I found myself wanting a uniform positive int generator. With QCheck I can just write (pair pos_int pos_int), e.g., in the test pair_ordered but for QCheck2 to achieve the same I have to write Gen.(pair (pint ~origin:0) (pint ~origin:0)). The opaque Gen.t makes the optional parameter mandatory - which is just clunky (also pointed out in issue QCheck2.Gen design considerations #162)
QCheck2's int-shrinker relies on bind, as it first generates a bool to decide the integer's sign. This has a side-effect for shrinking: true (meaning "generate a negative int") is reduced to false, thus reducing negative integers to positive ones - which can seem simpler as an end-user. Since we don't use a splittable RNG, the Random.State has moved on since the original state, and therefore the resulting int shrinking strategy will reduce an arbitrary negative integer -1975781842156211842 to an arbitrary positive integer 2696939011544317271 (not necessarily with a smaller, absolute value):
```
$ head shrink_algo_logs/triple_same_components_qcheck2.expected 
fails (-1975781842156211842, -4571327332697646483, -3285013039971199785) 
fails (2696939011544317271, -4571327332697646483, -3285013039971199785)
fails (0, -4571327332697646483, -3285013039971199785)
fails (0, 3095744455229849699, -3285013039971199785)
holds (0, 0, -3285013039971199785)
...
```
This is also how 4571327332697646483 is reduced to the seemingly unrelated 3095744455229849699 above.
The lack of a splittable RNG also makes for an unpredictable strategy when list and pair generators are combined in QCheck2. Thus in shrink_algo_logs/pair_lists_rev_concat_qcheck2.expected we find, e.g.:
```
...
fails ([3762171117042495591; 4588024816217148396], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
holds ([], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
fails ([2152069955941623198], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
holds ([], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
fails ([0], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
holds ([0], [])
fails ([0], [1599388225294475516; 3378876932193098527])
...
```
Here a 2-element list in the first component is reduced to a seemingly unrelated 1-element list, and later a 4-element list in the second component is reduced to a seemingly unrelated 2-element list.
This strategy of "starting from Random.State scratch" affects the result when the generator has been lucky to find a Random.State producing a counterexample requiring some relation between the components of a tuple, e.g., for
Test pairs lists no overlap QCheck returns ([0], [0]) after 22 successful shrink steps whereas QCheck2 returns ([0], [0; 0; 0; 0]) after 27 successful shrink steps.

Overall:

the design discussion in QCheck2.Gen design considerations #162 is still relevant (@sir4ur0n, @vch9, @c-cube, ...)
we should consider adopting the "negative-to-positive reduction strategy" in our QCheck int-shrinkers
we should consider adopting the "reduce-to-0-first strategy" in our QCheck int-shrinkers as the shrinking steps of irrelevant ints add up in composite generators (this is an ongoing discussion with @Gbury in QCheck: improve int shrinker fast path #173)
we should add a splittable RNG (Add a splittable random number generator #86) for more predictable QCheck2 shrinking. As I've mentioned previously I have a poor-man's version here: https://github.com/jmid/sm2-tes21/blob/5c1d9406b8457560999cf031c0c808011590bf70/lec10/intqc.ml#L66-L69 which I plan to try out when these observable tests are in place, to better assess the impact.

…rst_foldleftright which will then have a 100MB+ log

jmid · 2022-04-16T23:22:12Z

This PR is superseded by #234 and #237

jmid added 14 commits September 7, 2021 17:33

add shrink logging tests

4332f4f

remove shrink logging from existing tests

5acee95

simplify dune logic

6cf99fd

rename test.ml to a more descriptive name

80543c3

add unit tests for check_exn

a2ec83f

rename

3e8a2f3

fix exception documentation

111eeae

add some QCheck unit tests

072fe0c

add tuple and bind tests - pos/neg/stats

a078249

update expected outputs

917a282

add manual tree shrinker and test for QCheck

316154c

origin of mod 3 test

d92058d

collect test names in each module, in order

59be9b0

rm commented code

3b0404c

jmid requested a review from c-cube September 11, 2021 18:14

jmid added 5 commits September 11, 2021 20:21

update expected output after moving IntTree test

13f03d2

adjust tests gens, add a few more tests

d71b178

shrink_algo_logs/dune.inc

2786eb5

updated dune.inc

d6b69c4

factor tests into reusable module

e887466

jmid added 6 commits September 12, 2021 23:29

fix bind_pair_ordered + add variant w/gen exception

48a52ae

mv gen-failure test to Overall

4920755

add string test

c17fdb3

add shrink-failure test + adj. test name

895d8f5

forgot to update expected output, again

fb0a005

update shrink-logs to use same seed as expect-tests ... except fun_fi…

654abe1

…rst_foldleftright which will then have a 100MB+ log

jmid mentioned this pull request Nov 3, 2021

fix Gen.{nat,pos}_split{2,} #183

Merged

This was referenced Apr 2, 2022

Fix exception documentation #233

Merged

Add unit and expect tests #234

Merged

jmid mentioned this pull request Apr 3, 2022

Add regression tests for qualified type #201

Merged

jmid mentioned this pull request Apr 16, 2022

More expect tests and decoupling expect test source and runner #237

Merged

jmid closed this Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More tests #176

More tests #176

jmid commented Sep 11, 2021 •

edited

Loading

jmid commented Sep 12, 2021

jmid commented Apr 16, 2022 •

edited

Loading

More tests #176

More tests #176

Conversation

jmid commented Sep 11, 2021 • edited Loading

jmid commented Sep 12, 2021

jmid commented Apr 16, 2022 • edited Loading

jmid commented Sep 11, 2021 •

edited

Loading

jmid commented Apr 16, 2022 •

edited

Loading