Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More tests #176

Closed
wants to merge 25 commits into from
Closed

More tests #176

wants to merge 25 commits into from

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Sep 11, 2021

This PR

  • adds tuple and bind tests (both positive, negative, and statistics).
  • I also added a manual shrinker and test for the IntTree test.
  • I grouped test-names per sub-module as the long list at the end was getting hard to maintain.

Edit: I've now also

  • added shrink-logging tests to compare the total number of shrinking attempts.
  • separated the tests into a separate "test library" and runner.
    Then we have one runner for the expect tests in the CI - while other runners can reuse (some of) the same tests,
    e.g., for local shrinker benchmarking.

Here's the diff -y of the new test outputs. Note how QCheck2's int shrinking strategy generally spends
less successful shrinking steps (again) - as have been discussed in, e.g., PR #153 and #173:

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have different components failed (0 shrink steps):		Test pairs have different components failed (0 shrink steps):

(4, 4)									(4, 4)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have same components failed (125 shrink steps):	     |	Test pairs have same components failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have a zero component failed (124 shrink steps):	     |	Test pairs have a zero component failed (122 shrink steps):

(-1, 1)								     |	(1, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are (0,0) failed (125 shrink steps):			     |	Test pairs are (0,0) failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are ordered failed (125 shrink steps):		     |	Test pairs are ordered failed (2 shrink steps):

(0, -1)									(0, -1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are ordered reversely failed (125 shrink steps):	     |	Test pairs are ordered reversely failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs sum to less than 128 failed (121 shrink steps):	     |	Test pairs sum to less than 128 failed (59 shrink steps):

(0, 128)								(0, 128)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples have pair-wise different components failed (7 shrink st |	Test triples have pair-wise different components failed (3 shrink st

(0, 7, 7)							     |	(0, 0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples have same components failed (188 shrink steps):	     |	Test triples have same components failed (64 shrink steps):

(0, -1, 0)							     |	(0, 1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples are ordered failed (188 shrink steps):		     |	Test triples are ordered failed (3 shrink steps):

(0, -1, 0)								(0, -1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples are ordered reversely failed (188 shrink steps):	     |	Test triples are ordered reversely failed (64 shrink steps):

(0, 0, 1)								(0, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples have pair-wise different components failed (23 shrin |	Test quadruples have pair-wise different components failed (4 shrink

(0, 0, 0, 0)								(0, 0, 0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples have same components failed (250 shrink steps):	     |	Test quadruples have same components failed (126 shrink steps):

(0, 1, 0, 1)								(0, 1, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples are ordered failed (251 shrink steps):		     |	Test quadruples are ordered failed (5 shrink steps):

(0, 0, -1, 0)								(0, 0, -1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples are ordered reversely failed (251 shrink steps):     |	Test quadruples are ordered reversely failed (66 shrink steps):

(0, 0, 0, 1)								(0, 0, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test bind ordered pairs failed (123 shrink steps):		     |	Test bind ordered pairs failed (1 shrink steps):

(0, 0)									(0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test bind list_size constant failed (261 shrink steps):		     |	Test bind list_size constant failed (15 shrink steps):

(4, [0; 0; 0; 0])							(4, [0; 0; 0; 0])

As a bonus, there are also pretty (and completely identical) histograms to be found:

+++ Stats for quad dist ++++++++++++++++++++++++++++++++++++++++++++	+++ Stats for quad dist ++++++++++++++++++++++++++++++++++++++++++++

stats quad sum:								stats quad sum:
  num: 500000, avg: 200.13, stddev: 58.33, median 200, min 5, max 39	  num: 500000, avg: 200.13, stddev: 58.33, median 200, min 5, max 39
    5.. 24:                                                         	    5.. 24:                                                         
   25.. 44:                                                         	   25.. 44:                                                         
   45.. 64: ##                                                      	   45.. 64: ##                                                      
   65.. 84: ######                                                  	   65.. 84: ######                                                  
   85..104: ############                                            	   85..104: ############                                            
  105..124: #####################                                   	  105..124: #####################                                   
  125..144: ###############################                         	  125..144: ###############################                         
  145..164: ##########################################              	  145..164: ##########################################              
  165..184: ##################################################      	  165..184: ##################################################      
  185..204: ####################################################### 	  185..204: ####################################################### 
  205..224: #####################################################   	  205..224: #####################################################   
  225..244: ###############################################         	  225..244: ###############################################         
  245..264: ######################################                  	  245..264: ######################################                  
  265..284: ##########################                              	  265..284: ##########################                              
  285..304: ################                                        	  285..304: ################                                        
  305..324: #########                                               	  305..324: #########                                               
  325..344: ####                                                    	  325..344: ####                                                    
  345..364: #                                                       	  345..364: #                                                       
  365..384:                                                         	  365..384:                                                         
  385..404:                                                         	  385..404:                                                         

+++ Stats for bind dist ++++++++++++++++++++++++++++++++++++++++++++	+++ Stats for bind dist ++++++++++++++++++++++++++++++++++++++++++++

stats ordered pair difference:						stats ordered pair difference:
  num: 1000000, avg: 25.02, stddev: 22.36, median 19, min 0, max 100	  num: 1000000, avg: 25.02, stddev: 22.36, median 19, min 0, max 100
    0..  4: ####################################################### 	    0..  4: ####################################################### 
    5..  9: #####################################                   	    5..  9: #####################################                   
   10.. 14: #############################                           	   10.. 14: #############################                           
   15.. 19: ########################                                	   15.. 19: ########################                                
   20.. 24: #####################                                   	   20.. 24: #####################                                   
   25.. 29: ##################                                      	   25.. 29: ##################                                      
   30.. 34: ################                                        	   30.. 34: ################                                        
   35.. 39: #############                                           	   35.. 39: #############                                           
   40.. 44: ############                                            	   40.. 44: ############                                            
   45.. 49: ##########                                              	   45.. 49: ##########                                              
   50.. 54: #########                                               	   50.. 54: #########                                               
   55.. 59: ########                                                	   55.. 59: ########                                                
   60.. 64: ######                                                  	   60.. 64: ######                                                  
   65.. 69: #####                                                   	   65.. 69: #####                                                   
   70.. 74: ####                                                    	   70.. 74: ####                                                    
   75.. 79: ###                                                     	   75.. 79: ###                                                     
   80.. 84: ##                                                      	   80.. 84: ##                                                      
   85.. 89: ##                                                      	   85.. 89: ##                                                      
   90.. 94: #                                                       	   90.. 94: #                                                       
   95.. 99:                                                         	   95.. 99:                                                         
  100..104:                                                         	  100..104:                                                         

stats ordered pair sum:							stats ordered pair sum:
  num: 1000000, avg: 75.12, stddev: 46.93, median 72, min 0, max 200	  num: 1000000, avg: 75.12, stddev: 46.93, median 72, min 0, max 200
    0..  9: ####################################################### 	    0..  9: ####################################################### 
   10.. 19: #####################################################   	   10.. 19: #####################################################   
   20.. 29: #####################################################   	   20.. 29: #####################################################   
   30.. 39: #####################################################   	   30.. 39: #####################################################   
   40.. 49: #####################################################   	   40.. 49: #####################################################   
   50.. 59: #####################################################   	   50.. 59: #####################################################   
   60.. 69: #####################################################   	   60.. 69: #####################################################   
   70.. 79: #####################################################   	   70.. 79: #####################################################   
   80.. 89: #####################################################   	   80.. 89: #####################################################   
   90.. 99: #####################################################   	   90.. 99: #####################################################   
  100..109: ##################################################      	  100..109: ##################################################      
  110..119: ###########################################             	  110..119: ###########################################             
  120..129: #####################################                   	  120..129: #####################################                   
  130..139: ###############################                         	  130..139: ###############################                         
  140..149: #########################                               	  140..149: #########################                               
  150..159: ####################                                    	  150..159: ####################                                    
  160..169: ###############                                         	  160..169: ###############                                         
  170..179: ###########                                             	  170..179: ###########                                             
  180..189: ######                                                  	  180..189: ######                                                  
  190..199: ##                                                      	  190..199: ##                                                      
  200..209:                                                         	  200..209:                                                         

Edit: This again builds on top of #172 and #174 (merge! merge! 😄)

@jmid jmid requested a review from c-cube September 11, 2021 18:14
@jmid
Copy link
Collaborator Author

jmid commented Sep 12, 2021

A few observations:

  1. I found myself wanting a uniform positive int generator. With QCheck I can just write (pair pos_int pos_int), e.g., in the test pair_ordered but for QCheck2 to achieve the same I have to write Gen.(pair (pint ~origin:0) (pint ~origin:0)). The opaque Gen.t makes the optional parameter mandatory - which is just clunky (also pointed out in issue QCheck2.Gen design considerations #162)

  2. QCheck2's int-shrinker relies on bind, as it first generates a bool to decide the integer's sign. This has a side-effect for shrinking: true (meaning "generate a negative int") is reduced to false, thus reducing negative integers to positive ones - which can seem simpler as an end-user. Since we don't use a splittable RNG, the Random.State has moved on since the original state, and therefore the resulting int shrinking strategy will reduce an arbitrary negative integer -1975781842156211842 to an arbitrary positive integer 2696939011544317271 (not necessarily with a smaller, absolute value):

    $ head shrink_algo_logs/triple_same_components_qcheck2.expected 
    fails (-1975781842156211842, -4571327332697646483, -3285013039971199785) 
    fails (2696939011544317271, -4571327332697646483, -3285013039971199785)
    fails (0, -4571327332697646483, -3285013039971199785)
    fails (0, 3095744455229849699, -3285013039971199785)
    holds (0, 0, -3285013039971199785)
    ...
    

    This is also how 4571327332697646483 is reduced to the seemingly unrelated 3095744455229849699 above.

  3. The lack of a splittable RNG also makes for an unpredictable strategy when list and pair generators are combined in QCheck2. Thus in shrink_algo_logs/pair_lists_rev_concat_qcheck2.expected we find, e.g.:

    ...
    fails ([3762171117042495591; 4588024816217148396], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    holds ([], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    fails ([2152069955941623198], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    holds ([], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    fails ([0], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    holds ([0], [])
    fails ([0], [1599388225294475516; 3378876932193098527])
    ...
    

    Here a 2-element list in the first component is reduced to a seemingly unrelated 1-element list, and later a 4-element list in the second component is reduced to a seemingly unrelated 2-element list.
    This strategy of "starting from Random.State scratch" affects the result when the generator has been lucky to find a Random.State producing a counterexample requiring some relation between the components of a tuple, e.g., for
    Test pairs lists no overlap QCheck returns ([0], [0]) after 22 successful shrink steps whereas QCheck2 returns ([0], [0; 0; 0; 0]) after 27 successful shrink steps.

Overall:

@jmid jmid mentioned this pull request Nov 3, 2021
This was referenced Apr 2, 2022
@jmid
Copy link
Collaborator Author

jmid commented Apr 16, 2022

This PR is superseded by #234 and #237

@jmid jmid closed this Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant