Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/utils: add decoy selection implementation guides and tools #9024

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jeffro256
Copy link
Contributor

Documentation to understand/implement non-fingerprinting decoy selection plus a reference code to empirically test correctness. If anyone wants c++ source for an executable that empirically proves the correctness of the Python script, I might create a new branch on Github and release that code.


Finally, when we are doing decoy selection to find the other members of a ring, our result is a list of global output
indexes, which represent a set of transaction outputs with the same amount as the transaction output we are trying to
spend. We sample these global output indicies according to a certain distribution, with this distribution hopefully
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spend. We sample these global output indicies according to a certain distribution, with this distribution hopefully
spend. We sample these global output indices according to a certain distribution, with this distribution hopefully

Finally, when we are doing decoy selection to find the other members of a ring, our result is a list of global output
indexes, which represent a set of transaction outputs with the same amount as the transaction output we are trying to
spend. We sample these global output indicies according to a certain distribution, with this distribution hopefully
statistically matching the distribution of the ages of "true spends", so that the ring member we truely wish to spend is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
statistically matching the distribution of the ages of "true spends", so that the ring member we truely wish to spend is
statistically matching the distribution of the ages of "true spends", so that the ring member we truly wish to spend is

statistical dependence for picks within rings more than necessary. When you are trying to build up a set of X unique
decoy picks, if the first pick has 100 choices, then the next pick has 99 choices, then 98 choices, etc, etc. Since
these picks are not statistically independent, then the distribution of the picks gets more and more skewed for the later
picks. You can combat this effect by simply commiting to the order in which you pick the outputs, and try adding them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
picks. You can combat this effect by simply commiting to the order in which you pick the outputs, and try adding them
picks. You can combat this effect by simply committing to the order in which you pick the outputs, and try adding them

### First, Some Numeric Constants

* `GAMMA_SHAPE = 19.28` [source](https://github.com/monero-project/monero/blob/67d190ce7c33602b6a3b804f633ee1ddb7fbb4a1/src/wallet/wallet2.cpp#L141-L142)
* Shape paramater for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Shape paramater for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)
* Shape parameter for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)

* `GAMMA_SHAPE = 19.28` [source](https://github.com/monero-project/monero/blob/67d190ce7c33602b6a3b804f633ee1ddb7fbb4a1/src/wallet/wallet2.cpp#L141-L142)
* Shape paramater for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)
* `GAMMA_RATE = 1.61` [source](https://github.com/monero-project/monero/blob/67d190ce7c33602b6a3b804f633ee1ddb7fbb4a1/src/wallet/wallet2.cpp#L141-L142)
* Rate paramater for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Rate paramater for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)
* Rate parameter for a [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution)

psuedo_global_output_index = num_usable_rct_outputs - 1 - target_num_outputs_post_unlock

# 7
picked_block_index = bisect.bisect_left(crod, psuedo_global_output_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
picked_block_index = bisect.bisect_left(crod, psuedo_global_output_index)
picked_block_index = bisect.bisect_left(crod, pseudo_global_output_index)

distribution, we can use a two-sample [Kolmogorov–Smirnov Test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)
to statistically test if a given implementation statistically matches the reference implementation. Running the provided
Python decoy selection reference script (utils/python-rpc/decoy_selection.py) will generate a TXT file containing
decoy selection picks (you can specify how many) seperated by newlines. This data can be imported and used to perform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decoy selection picks (you can specify how many) seperated by newlines. This data can be imported and used to perform
decoy selection picks (you can specify how many) separated by newlines. This data can be imported and used to perform

continue

# 6
psuedo_global_output_index = num_usable_rct_outputs - 1 - target_num_outputs_post_unlock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
psuedo_global_output_index = num_usable_rct_outputs - 1 - target_num_outputs_post_unlock
pseudo_global_output_index = num_usable_rct_outputs - 1 - target_num_outputs_post_unlock

psuedo_global_output_index = num_usable_rct_outputs - 1 - target_num_outputs_post_unlock

# 7
picked_block_index = bisect.bisect_left(crod, psuedo_global_output_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
picked_block_index = bisect.bisect_left(crod, psuedo_global_output_index)
picked_block_index = bisect.bisect_left(crod, pseudo_global_output_index)

def main():
# Handle CLI arguments
arg_parser = argparse.ArgumentParser(prog='Decoy Selection Python Reference',
description='We provide an easy-to-read non-fingerprinting reference for Monero decoy selecton',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description='We provide an easy-to-read non-fingerprinting reference for Monero decoy selecton',
description='We provide an easy-to-read non-fingerprinting reference for Monero decoy selection',

Copy link
Contributor

@hinto-janai hinto-janai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples, links, python scripts all work.

$ time ./decoy_selection.py -p 18082 -n 1024

Fetching the CROD up to height <top> from daemon at '127.0.0.1:18082'...
The start height of the CROD is 1220516, and the top height is 3039030.
Performing 1024 picks and writing output to 'python_decoy_selections.txt'...
Progress: 100.0%

real	0m0.868s
user	0m0.397s
sys	0m0.072s
python_decoy_selections.txt (1024 lines)
85470898
85248409
85519514
85520126
85516241
84477709
15101425
85521767
84299429
85114221
85253916
85519544
84628725
83984075
85437269
79104544
85512373
85389658
79094371
85433092
85511527
83290741
85524350
85524862
85429074
85523097
85523393
85518865
62732230
84941314
84298552
85250053
82855006
84754992
85460675
85495839
85339121
85280305
85521883
85397651
85335686
85392018
85504860
85521407
85488759
85519401
85518240
81552108
85240081
85455131
85434561
85525014
85222683
85404699
81736861
85513902
85342996
85494823
85271346
74398874
85071002
85523842
80268512
85462095
85471298
85503461
85092325
85473801
85446229
85496034
24430958
85436591
85451631
85522513
85472006
84819606
85509315
85521745
73088217
80069759
74473736
85522013
85523995
84416227
85508815
85504674
78093043
85242293
81661073
85458160
85520331
85509801
84882626
85518308
84891248
77251253
85065683
85363846
84915055
85207921
85500348
85521234
85359035
85524725
85510382
85023068
85016017
85454912
85365961
85493668
85263045
85394145
85519689
76884416
85508129
85522050
85518644
85515938
85514315
84194574
85379430
85392783
84702021
84898201
85512278
85524908
85522992
85518526
85523981
84776195
80004894
85483872
85524525
84951106
81361373
85376150
85462461
85513906
85475611
85502408
85328620
85226015
85524969
85464847
85369589
85231809
82957713
64576798
84512994
85487713
85394991
85516368
85507047
85524619
85518063
84345287
85408557
85521084
85469264
84357677
85523451
85496536
81146330
81957824
85380654
85511496
85223385
85102681
84088522
85509973
85496000
85425563
80304159
84636742
85275483
85456874
85152760
85495183
82630021
85398541
85518298
85519418
85297544
85496973
85518868
80988353
85392434
85395780
85525005
85437695
84455222
85360989
70269934
85485155
85524513
84900511
85393382
85502209
85072877
85482944
85522932
85218420
85510031
85198159
84065836
85417495
85494925
85518046
73062210
82867841
83187480
85465594
83281359
85502486
60706018
85520354
85358776
85353796
84751957
85480285
85506720
85074389
85522316
85397002
82277729
85522416
79358266
84970274
85480955
85118491
85524305
85490443
79387301
85512743
43240411
85404422
85522931
85520078
85508884
83648181
85524326
85280755
81310683
85513062
83848864
83685109
85521429
85519725
85432115
85486837
85275187
84611666
85518451
85471103
85461913
83776819
84077404
85485312
85411502
85360996
85516958
85496047
79369601
85331255
85517741
85522662
85415417
85425747
85390843
82571881
72818378
85495605
85334338
85522510
84614595
84357301
85519561
84896065
85491338
84464105
85049765
85490989
55460017
85469066
85524336
85522786
85514293
85447786
84350715
85412417
85476844
83093821
85520768
81389794
85518374
85431296
84660966
85523015
85457048
85304240
84352130
85516921
85489886
85513270
84872162
85419507
85463781
85263001
85445267
85521719
85511555
85512755
85484688
85184867
85524435
85518373
85501329
84606637
85424945
85510226
85520637
85108959
85144003
85495042
85519232
77619401
85434696
85365648
85386591
84957916
79139303
85186220
85507111
85523810
85483078
85497658
85470560
84131861
85413378
85517886
85517576
82026837
85085541
82801391
85498815
85402485
85523275
85433450
85475506
85522161
85500773
85520996
85338581
85522197
85372110
83635632
85524991
85513978
85502244
84066630
84761582
85439991
85514914
85488879
85524308
84975136
83913506
85393152
85337472
85492850
84950346
85486959
84911451
85236050
85507905
85490293
85524644
85464185
85515879
64247052
85521278
85432016
85466963
85256358
85524024
85436036
85381589
81884892
85517252
85444495
85471428
84997168
85441500
85460560
85518224
85486111
85520817
85507699
80118930
85480946
77267472
85523283
85499580
85524800
75338830
85521824
85523485
82697454
85437254
85505172
85465692
85511511
85511286
83575614
84879051
85322261
85486029
85524324
85416727
85519343
85450242
85506561
85482401
85519675
85524620
85447360
85512433
85501307
85522589
85515786
85511657
85497930
85116655
85172094
85401155
58875630
85439975
85380564
85521748
85521038
85030426
85523095
85048526
84418579
83984547
85489823
83979348
85059734
85499754
63819941
85518103
73989453
85403888
85523250
85041959
85386419
85504867
84689550
83580844
85454065
85478208
60712291
85495222
85511581
85450773
85381127
85457007
85272525
82416848
85491549
82842532
84484559
85477666
72381381
85129792
85424581
85320258
85523157
84540366
85524668
85036075
85524182
85521932
84169124
84673100
85524368
60050593
85466459
84553033
85333298
85342107
85505877
79427807
85382425
82406671
84846137
85443016
84091939
85409624
85071167
84515632
85454989
85499662
85472337
79509756
85523721
85423103
79574816
83857284
85518939
85524114
85524104
85396906
85470547
84404269
85489612
85482756
85457694
85522774
85522938
85501093
85251886
85524751
85515168
85445941
85063658
85517942
85519168
85433884
85317633
85060107
85516749
85078093
85418816
85467013
85459475
85517034
85451519
85064498
85502954
84538677
85521367
85510277
85446535
84954090
85462926
85522320
85524505
85518742
85497531
85518999
85421300
85508000
85011758
85521572
83421960
85524338
85504039
84852884
85524127
85517626
81811589
85525006
85503529
85491706
85525020
85409291
84359776
84903553
85188272
85198456
82628047
85508704
83520351
85524430
85322724
85512878
85375886
84335848
85486730
85191646
85325944
85380786
85422142
84640139
37879301
85497619
83401503
85363359
85051732
85211575
85524539
85440588
85506742
84998186
85504682
75167037
85522040
41124858
85512679
85502298
85501744
85501287
84581284
85467895
85311066
84987804
85523876
85492780
85008320
85417796
85518755
85400963
84718169
85491522
83735097
85516560
85524674
85454859
85418598
85507708
84945953
85423542
85519671
85495202
85361389
85326117
84960362
85508815
84837054
85508498
85006228
83982486
85378443
84250756
85521758
85493932
84801338
85494898
73086519
83700205
84552601
85519436
85521839
85492599
85363020
85132289
84065741
85524184
85521844
85487640
85504835
85503262
85520010
79499257
85524892
85468110
85218963
85519421
85522409
85522446
85452633
85355087
85514460
85503755
85509995
85417273
85493410
85301769
85013629
85521886
85228762
85324164
85387067
85445811
85523612
85442220
85273006
85501228
71177122
85522366
85356135
85199490
85523920
85240387
85050784
85516173
85511702
84235498
85520122
85496251
85524251
85454408
85333797
85524488
84796712
85513298
84718356
84313794
85513423
85464828
85490967
85524086
85454629
85477194
85335686
85483584
85524864
84311987
85484053
84408153
85511030
85031339
85466456
85519210
85399583
85486733
84856319
85493668
85340151
85170981
85486310
85523013
85522786
84864546
85480606
85521367
84969363
85174216
82325684
84919091
84901661
85476932
85517534
85070259
85506103
85524306
85409282
85522054
85495864
85513528
85447466
84752652
85371498
85499341
85443334
85311761
85518050
85474220
85517739
83955833
85370993
85524479
85512841
78677674
85263350
59923444
85230553
85499897
85516888
85524292
85510876
84993890
84837627
85465273
85522526
85520520
85496188
68113842
85443371
85513397
85395413
85520510
85520676
85503874
65294373
79725512
85207283
85504712
80701452
85524474
85506216
85513879
85495359
85253746
85337189
61627791
85249199
85524350
85512860
85491773
85422250
75607419
79135613
85521155
84792829
84669220
85365417
84023889
85478051
85496108
85412553
85519995
82839338
85521798
85522345
85509888
85524001
85499012
85344758
85518674
81250602
85523967
84386304
84743172
59603291
79714740
85521682
84037079
85312127
85508299
85209498
85515651
85505188
84164067
85436585
85452157
83528535
85383318
85524230
85338957
85496951
85243031
85513832
85514135
85520873
85490357
85424220
85088438
85523793
80480072
85512018
85464324
85439120
85512532
85073949
85495516
85480529
85506964
84215025
85523576
84296828
84031899
80937464
85295223
84682315
85365073
85491062
71660441
81534281
85486190
64242199
85177376
85411510
85506940
85073482
84492686
85444210
85458591
84813556
85464516
77854827
82328142
84611788
85501555
85457684
85516659
85106007
85512409
85510267
84448600
85523152
57730212
85010380
85518003
85523964
84953442
85357642
85250995
85519211
84993087
85519296
85504034
85088073
85039508
83154306
47413821
85256202
85501261
85496943
85062702
85524798
85495432
85350724
85521442
85478137
85376471
85415440
85487822
85509071
85486658
71846287
85301213
84963684
79741626
85446180
84446265
85519091
85156899
85518806
85506220
84691538
81643454
85511677
85519066
85512699
84430878
84704072
85515926
85411700
42331445
84974951
85050608
85495136
85100306
30391785
85511743
85403401
85248288
85522538
82693039
85218998
85476493
85157845
81612232
85194855
85508587
85069406
85495687
85494704
84075198
85498563
85287883
85290702
85480674
76447759
79549323
85519046
85492967
85507939
76505560
85465607
85496428
84543958
85274033
85061538
83988800
85314319
82842925
85104062
34569098
80513537
85498583
85524977
85426534
85508565
85474867
85421255
85491832
85338333
85400459
83481590
85447337
85518092
85522386
85524592
81840913
84777976
85435685
85519827
85515275
85451769
85516686
85513242
85271427
85521132
85519591
85462578
85516232
85486911
81346091
85515822
85523403
85524159
85445303
85468439
85213835
85522615
85302281
85482841
85308104
85506322
85524724
85126819
78680451
85193621
85502966
84272067
85380521
85481130
85325216
85206987
84988369
85522139

import argparse
import bisect
try:
import numpy as np

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why require numpy? It's an incredibly large dependency for a very small set of features which the built-in random module also provides.

Copy link
Collaborator

@j-berman j-berman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate this :) This is useful documentation that I'm already putting to good use.

One thing: the reference impl doesn't check to make sure selected decoys are unlocked, so coinbase outputs that are <60 blocks old or timelocked outputs can get selected. This skews the distribution a bit compared to a correct implementation.

Suggestion: use the get_outs RPC for selected outputs and check if they're unlocked. Re-pick for any locked outputs.

(I'm currently testing this reference impl against monero-serai)

@j-berman
Copy link
Collaborator

j-berman commented Dec 19, 2023

I implemented the above suggestion (made sure this reference implementation only selects unlocked outputs), compared the result to the DSA I implemented in monero-serai, and found that it passed the Kolmogorov-Smirnov test. It did not pass the KS test before implementing the suggestion (also thank you to @Rucknium for helping me validate the KS test).

I think it's worth making sure the reference implementation does not select locked outputs as well. Here was my code:

    # Do gamma picking and write output
    print("Performing {} picks and writing output to '{}'...".format(args.num_picks, args.output_file))
    print_period = args.num_picks // 1000 if args.num_picks >= 1000 else 1
    batch_size = 10_000

    with open(args.output_file, 'w', newline='') as outf:
        batch_of_picks = []
        for i in range(args.num_picks):
            if (i+1) % print_period == 0:
                progress = (i+1) / args.num_picks * 100
                print("Progress: {:.1f}%".format(progress), end='\r')

            batch_of_picks.append(gamma_pick(crod, average_output_delay, num_usable_rct_outputs))

            if (i+1) % batch_size == 0 or i == (args.num_picks - 1):
                res = daemon.get_outs([{'amount': 0, 'index': pick} for pick in batch_of_picks], get_txid = False)
                assert len(res.outs) == len(batch_of_picks)
                j = 0
                for out in res.outs:
                    pick = batch_of_picks[j]

                    # Pick a decoy until we find one that is unlocked
                    unlocked = out.unlocked
                    while not unlocked:
                        pick = gamma_pick(crod, average_output_delay, num_usable_rct_outputs)
                        backup_res = daemon.get_outs([{'amount': 0, 'index': pick}], get_txid = False)
                        assert len(backup_res.outs) == 1
                        unlocked = backup_res.outs[0].unlocked

                    print(pick, file=outf)

                    j += 1

                batch_of_picks = []

With the above implemented, I also think the reference implementation should remove support for the to_height argument, since the unlocked status of an output from the get_outs RPC depends on the current height. I also think the documentation should clarify that the chain height must match when checking distributions. Perhaps the .txt filename could have the daemon's height appended to it.

In the same vein, I also think it's worth checking that the top block hash at the start of the script is the same at the end, like this:

    # Get daemon height and block hash
    res = daemon.get_info()
    height = res.height
    top_block_hash = res.top_block_hash
    
    
 ...
 
 
    # Make sure height is the same as when we started so that we can be certain
    # output locked status stays consistent
    res = daemon.get_info()
    if res.height != height or res.top_block_hash != top_block_hash:
        print("Error: the chain height advanced while the script was running. This can harm the analysis.")
        exit(1)

When I'm testing, I disconnect my daemon from peers to make sure the height remains constant.

It would be nice if a user could run this script^ even if they can't point to a disconnected daemon, but I don't believe it would be correct to do so without some changes to the get_outs RPC (that makes sure that timelocked outputs are unlocked at a specific height).


Final point, I think it would be helpful if the documentation explained how to run the KS test. It's super simple, just call scipy.stats.kstest(reference_global_output_indexes, implementation_global_output_indexes) and see if the p-value is greater than 0.05. If it's greater than 0.05, that means we cannot reject the null hypothesis (the null hypothesis is that the two distributions are identical).

EDIT: removed extra height param in my sample code above spotted by @jeffro256

@jeffro256
Copy link
Contributor Author

jeffro256 commented Dec 26, 2023 via email

@iamamyth
Copy link

https://docs.python.org/3/library/random.html#random.gammavariate

@jeffro256
Copy link
Contributor Author

Okay the reference code now doesn't select locked outputs and also doesn't use numpy. I'll do some KS tests soon to test if it matches the behavior of wallet2. The reference code also throws an error if the top block hash changes

@jeffro256
Copy link
Contributor Author

@j-berman Is this PR still worth it post-FCMP++?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants