-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathjira.xml
1540 lines (1407 loc) · 55.4 KB
/
jira.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" ?>
<!--
# \\ SPIKE: Secure your secrets with SPIFFE.
# \\\\\ Copyright 2024-present SPIKE contributors.1
# \\\\\\\ SPDX-License-Identifier: Apache-2.0
-->
<stuff>
<purpose>
<target>Our goal is to have a minimally delightful product.</target>
<target>Strive not to add features just for the sake of adding features.</target>
<target>Half-assed features shall be completed before adding more features.</target>
</purpose>
<immediate>
<issue>
* Change the version number in the cod and update release artifacts.
* Also, release artifacts shall have the version number in their file names.
</issue>
<issue>
add this to website too:
SPIKE Contributor Sync — Last Friday of Every Month at 8:15am (Pacific time)
https://us06web.zoom.us/j/84996375494?pwd=rmXv0fV2Ej0KVLkJosQlleYaIMrnub.1
Meeting ID: 849 9637 5494
Passcode: 965019
</issue>
<issue>
Convert ALL asnyc perist operation to sync operations
and also create an ADR about it.
^ The ADR is not done yet. Create it too. Why we had those, why whe changed it, etc.
</issue>
<issue>
godoc undocumented public functions
</issue>
<issue>
better isolate these as a function
rootKeyMu.Lock()
rootKey = binaryRec
rootKeyMu.Unlock()
</issue>
<issue>
// TODO: if at least one Keeper returns a shard, then the system is
// bootstrapped; do not proceed with a re-bootstrap as it will cause
// data loss. Instead stay in `recoverUsingKeeperShards` loop.
// add it here as an additional check.
// Below, SPIKE Nexus is assumed to not have bootstrapped.
// Let's bootstrap it.
// Sync. Wait for this before starting the services.
bootstrapBackingStoreWithNewRootKey(source)
}
</issue>
<issue>
// ensure that all os.Getenv'ed env vars are documented in the readme.
// also ensure that there are no unnecesasry/unused env vars.
</issue>
<issue>
check fName usage everywhere.
</issue>
<issue>
add to docs, backing stores are considered untrusted as per the
// security model of SPIKE; so even if you store it on a public place, you
// don't lose much; but still, it's important to limit access to them.
</issue>
<issue>
// TODO: if you stop nexus, delete the tombstone file, and restart nexus,
// (and no keeper returns a shard and returns 404)
// it will reset its root key and update the keepers to store the new
// root key. This is not an attack vector, because an adversary who can
// delete the tombstone file, can also delete the backing store.
/// Plus no sensitive data is exposed; it's just all data is inaccessible
// now because the root key is lost for good. In either
// case, for production systems, the backing store needs to be backed up
// and the root key needs to be backed up in a secure place too.
// ^ add these to the documentation.
</issue>
<issue>
if db is in memory it should not do all the fancy bootstrapping initialization as it won't need to talk to keepers.
</issue>
<issue>
sanitize keeper id and shard
request := net.HandleRequest[
reqres.ShardContributionRequest, reqres.ShardContributionResponse](
requestBody, w,
reqres.ShardContributionResponse{Err: data.ErrBadInput},
)
if request == nil {
return errors.ErrParseFailure
}
shard := request.Shard
id := request.KeeperId
</issue>
<issue>
ensure that in-memory store
still functions as it did before.
try it without launching keepers.
</issue>
<issue>
validate spiffe id and other parameters
for this and also other keeper endpoints
func RouteShard(
w http.ResponseWriter, r *http.Request, audit *log.AuditEntry,
) error {
const fName = "routeContribute"
log.AuditRequest(fName, r, audit, log.AuditCreate)
requestBody := net.ReadRequestBody(w, r)
if requestBody == nil {
return errors.ErrReadFailure
}
here is an example that does that:
func RoutePutPolicy(
w http.ResponseWriter, r *http.Request, audit *log.AuditEntry,
) error {
const fName = "routePutPolicy"
log.AuditRequest(fName, r, audit, log.AuditCreate)
requestBody := net.ReadRequestBody(w, r)
if requestBody == nil {
return errors.ErrParseFailure
}
request := net.HandleRequest[
reqres.PolicyCreateRequest, reqres.PolicyCreateResponse](
requestBody, w,
reqres.PolicyCreateResponse{Err: data.ErrBadInput},
)
if request == nil {
return errors.ErrReadFailure
}
err := guardPutPolicyRequest(*request, w, r)
if err != nil {
return err
}
</issue>
<issue>
verify nexus crash recovery (i.e. it asks shards from keepers)
</issue>
<issue>
documentation:
explain which log levels mean what, and how to enable
log verbosity for development.
as in:
setting SPIKE_SYSTEM_LOG_LEVEL=debug will show all logs.
</issue>
<issue>
For in-memory store, bypass initialization, shard creation etc.
</issue>
<issue waitingFor="doomsday-dr-implementation">
Cut a new release once all DR scenarios have been implemented.
The only remaining DR use case is the "doomsday" scenario right now.
</issue>
<issue waitingFor="doomsday-dr-implementation">
// If the keepers have crashed too, then a human operator will have to
// manually update the Keeper instances using the "break-the-glass"
// emergency recovery procedure as outlined in https://spike.ist/
^ we don't have that procedure yet; create an issue for it.
</issue>
</immediate>
<next>
<issue>
implement doomsday recovery
i.e. operator saves shards in a secure enclave.
</issue>
<issue>
admin should not be able to create two policies with the same name.
good first issue: https://github.com/spiffe/spike/issues/79
</issue>
<issue>
exponentially back off here
log.Log().Info("tick", "msg", "Waiting for keepers to initialize")
time.Sleep(5 * time.Second)
</issue>
<issue>
consider db backend as untrusted
i.e. encrypt everything you store there; including policies.
(that might already be the case actually) -- if so, document it
in the website.
</issue>
<issue>
// TODO: this check will change once we make #keepers configurable.
if len(keepers) < 3 {
log.FatalLn("Tick: not enough keepers")
}
</issue>
<issue>
remove symbols when packaging binaries for release.
</issue>
<issue>
we don't use admin token and admin recovery metadata anymore.
clean them up from the db and also code.
</issue>
<issue>
SPIKE defaults to sqlite backing store
</issue>
<issue>
nexus tracks its keeper-initialization state.
(time shards created, number of shards etc)
</issue>
<issue>
switch default backing store to SQLite
"in memory" mode could be enabled with a `--dev` flag,
or an environment variable.
</issue>
<issue>
1. Nexus maintaining its initialization state
(i.e. if it successfully initialized the keepers it should recompute the
root key from the keepers instead of auto-generating itself.
for that; it will set a tombstone indicating it has initialized the keepers
the tombstone will be in SQLite; since shards do not make sense in in-memory
backing store)
2. Keepers advertising their status to Nexus regularly
IF Nexus has initialized keepers already Nexus will recompute and
provide the shard to the keeper.
(Nexus will keep the root key in its memory. The threat model of SPIKE
does not protect Nexus against memory-based attacks and it's up to the
user to harden and ensure that nexus runs with non-root privileges
(this threat model is the same for Vault and other secret stores too))
3. Nexus crashes; it figures out it already initialized keepers; asks for
shards and rekeys itself.
4. Nexus crashes; but quorum of keepers that they know their shards cannot
be reached.
Nexus transitions to "locked" state and a manual unlock will be required
(this will be a separate user story)
</issue>
<issue>
// 3. spike policy list gives `null` for no policies instead of a message
// also the response is json rather than a more human readable output.
// also `createdBy` is emppy.
// we can create "good first issue"s for these.
</issue>
<issue>
func computeShares(finalKey []byte) (group.Scalar, []secretsharing.Share) {
// Initialize parameters
g := group.P256
// TODO: these will be configurable
t := uint(1) // Need t+1 shares to reconstruct
n := uint(3) // Total number of shares
</issue>
<issue>
dr: keeper crash
waiting-for: shard generation inversion.
</issue>
<issue>
Check the entire codebase and implement the `TODO:` items.
</issue>
<issue>
Policy creation feature does not work due to validation error.
</issue>
<issue>
Deploy Keycloak and make sure you can initialize it.
</issue>
<issue>
Create a video about this new shamir secret sharing workflow.
</issue>
<issue>
DR: devise a DR scenario when a keeper crashes.
(depends on the new inverted sharding workflow)
</issue>
<issue>
<task>
Outline a disaster recovery scenario when both nexus and keepers are all down.
</task>
<task>The system should work with three keepers by default.</task>
</issue>
</next>
<later>
<issue kind="good-first-issue" ref="https://github.com/spiffe/spike/issues/80">
validations:
along with the error code, also return some explanatory message
instead of this for example
err = validation.ValidateSpiffeIdPattern(spiffeIdPattern)
if err != nil {
responseBody := net.MarshalBody(reqres.PolicyCreateResponse{
Err: data.ErrBadInput,
}, w)
net.Respond(http.StatusBadRequest, responseBody, w)
return err
}
do this
err = validation.ValidateSpiffeIdPattern(spiffeIdPattern)
if err != nil {
responseBody := net.MarshalBody(reqres.PolicyCreateResponse{
Err: data.ErrBadInput,
Reason: "Invalid spiffe id pattern. Matcher should be a regex that can match a spiffe id"
}, w)
net.Respond(http.StatusBadRequest, responseBody, w)
return err
}
</issue>
<issue>
control these with flags.
i.e. the starter script can optionally NOT automatically
start nexus or keepers.
#echo ""
#echo "Waiting before SPIKE Keeper 1..."
#sleep 5
#run_background "./hack/start-keeper-1.sh"
#echo ""
#echo "Waiting before SPIKE Keeper 2..."
#sleep 5
#run_background "./hack/start-keeper-2.sh"
#echo ""
#echo "Waiting before SPIKE Keeper 3..."
#sleep 5
#run_background "./hack/start-keeper-3.sh"
#echo ""
#echo "Waiting before SPIKE Nexus..."
#sleep 5
#run_background "./hack/start-nexus.sh"
</issue>
<issue>
we don't need state.ReadAppState()
and other state enums for keepers anymore
keepers are just dummy stateless keepers.
</issue>
<issue>
this is for policy creation:
allowed := state.CheckAccess(
spiffeid.String(), "*",
[]data.PolicyPermission{data.PermissionSuper},
)
instead of a wildcard, maybe have a predefined path
for access check like "/spike/system/acl"
also disallow people creating secrets etc under
/spike/system
</issue>
</later>
<low-hanging-fruits>
<issue>
something similar for SPIKE too:
Dev mode
The Helm chart may run a OpenBao server in development. This installs a
single OpenBao server with a memory storage backend.
For dev mode:
- no keepers
- no backing store (everything is in memory)
</issue>
<issue>
Consider using google kms, azure keyvault, and other providers
(including an external SPIKE deployment) for root key recovery.
question to consider is whether it's really needed
second question to consider is what to link kms to (keepers or nexus?)
keepers would be better because we'll back up the shards only then.
or google kms can be used as an alternative to keepers
(i.e., store encrypted dek, with the encrypted root key on nexus;
only kms can decrypt it -- but, to me, it does not provide any
additional advantage since if you are on the machine, you can talk to
google kms anyway)
</issue>
<issue>
enable SQLlite by default
and test it (ie. crash nexus and ensure both secrets and policies can be recovered)
</issue>
<issue>
Multiple Keeper instances will be required for fan-in fan-out of the
shards.
configure the current system to work with multiple keepers.
the demo setup should initialize 3 keepers by default.
the demo setup should use sqlite as the backing store by default.
</issue>
<issue>
Install Keycloak locally and experiment with it.
This is required for "named admin" feature.
</issue>
<issue>
this should be configurable:
ticker := time.NewTicker(5 * time.Minute)
</issue>
<issue>
write an adr about why those asnyc are snyc from now on:
// TOD we don't have any retry for policies or for recovery info.
// they are equally important.
// TO: these xsync operations can cause race conditions
//
// 1. process a writes secret
// 2. process b marks secret as deleted
// 3. in memory we write then delete
// 4. but to the backing store it goes as delete then write.
// 5. memory: secret deleted; backing store: secret exists.
//
// to solve it; have a queue of operations (as a go channel)
// and do not consume the next operation until the current
// one is complete.
//
// have one channel for each resource:
// - secrets
// - policies
// - key recovery info.
//
// Or as an alternative; make these xsync operations sync
// and wait for them to complete before reporting success.
// this will make the architecture way simpler without needing
// to rely on channels.
</issue>
<issue>
This reconstruction assumes there are two shards
it should not have that assumption
var shares []secretsharing.Share
shares = append(shares, firstShare)
shares = append(shares, secondShare)
reconstructed, err := secretsharing.Recover(1, shares)
if err != nil {
log.FatalLn("Failed to recover: " + err.Error())
}
</issue>
<issue>
One way token flow;
keeper provides the rootkey to nexus;
nexus init pushes root key to keeper.
that's it.
</issue>
<issue>
If SPIKE is not initialized; `spike` or `spike --help` should display
a reminder to initialize SPIKE first and exit.
</issue>
<issue>
ensure 32bytes
binaryRec, err := reconstructed.MarshalBinary()
if err != nil {
log.FatalLn("Failed to marshal: " + err.Error())
return []byte{}
}
// TOD check size 32bytes.
</issue>
<issue>
add docs to all packages (or most of them at least, where it makes sense)
like, add a docs.go with package description.
</issue>
<issue>
func sanityCheck(secret group.Scalar, shares []shamir.Share) {
t := uint(1) // Need t+1 shares to reconstruct
// TOxDO: 1 and 3 shall come from configs.
// TOxDO: keeper count shall be validated against 3
reconstructed, err := shamir.Recover(t, shares[:2])
if err != nil {
log.FatalLn("sanityCheck: Failed to recover: " + err.Error())
}
if !secret.IsEqual(reconstructed) {
log.FatalLn("sanityCheck: Recovered secret does not match original")
}
}
</issue>
<issue>
simulate a nexus crash and see if you can still list keys.
func ListKeys() []string {
kvMu.Lock()
defer kvMu.Unlock()
return kv.List()
}
</issue>
<issue>
ErrPolicyExists = errors.New("policy already exists")
^ this error is never used; check why.
</issue>
<issue>
For all policy, secret, metadata, etc metods (such as func GetSecret(path string, version int) (map[string]string, error) {
)... if Nexus is configured to use sqlite db and the db is not initialized exit with a warning instead of storing things
to the memory.
</issue>
<issue>
switch storeType {
case env.Memory:
// TODxO: maybe initializememorybackingstore() too.
be =memory.NoopStore{}
case env.Sqlite:
be = InitializeSqliteBackend(rootKey)
default:
be =memory.NoopStore{}
}
</issue>
<issue>
// TO: document public methods including this.
// But before that make all asnyc persist methods sync.
</issue>
<issue>
move this to sdk and use it in the code too.
type retryHandler[T any] func() (T, error)
func doRetry[T any](ctx context.Context, handler retryHandler[T]) (T, error) {
return retry.NewTypedRetrier[T](
retry.NewExponentialRetrier(),
).RetryWithBackoff(ctx, handler)
}
</issue>
<issue>
// TDO: RetryWithBackoff retries indefinitely; we might want to limit the total duration
// of db retry attempts based on sane default and configurable from environment variables.
</issue>
<issue>
// TDO: check all database operations (secrets, policies, metadata) and
// ensure that they are retried with exponential backooff.
</issue>
<issue>
nil check wherever Backend() is called.
var be backend.Backend
</issue>
<issue>
The paths that we set in get put ...etc should look like a unix path.
it will require sanitization!
Check how other secrets stores manage those paths.
</issue>
<issue>
read policies from a yaml or a json file and create them.
</issue>
<issue>
have sqlite as the default backing store.
(until we implement the S3 backing store)
</issue>
<issue>
GitHub has now arm64 runners. we can use it for cross-compilation/automation.
https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/
Here's an example:
https://github.com/kfox1111/cid2pid/blob/main/.github/workflows/release.yaml
</issue>
</low-hanging-fruits>
<later>
<issue>
// TDO: Yes memory is the source of truth; but at least
// attempt some exponential retries before giving up.
if err := be.StoreSecret(ctx, path, *secret); err != nil {
// Log error but continue - memory is the source of truth
log.Log().Warn(fName,
"msg", "Failed to cache secret",
"path", path,
"err", err.Error(),
)
}
</issue>
<issue>
sanitize perms
err := api.CreatePolicy(name, spiffeIddPattern, pathPattern, perms)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
</issue>
<issue>
return cobra.Command{
Use: "delete policy-id",
Short: "Delete a policy",
Args: cobra.ExactArgs(1),
Run: func(cmd *cobra.Command, args []string) {
api := spike.NewWithSource(source)
// TOD: sanitize policy id.
// also validate other command line arguments too if it makes sense.
// better to stop bad data at the client (but still not trust the
// client fully)
Go through all cobra commands and validate/sanitize what needs to be.
</issue>
<issue>
func trustRoot() string {
tr := os.Getenv("SPIKE_TRUST_ROOT") // TODfO: this should be documented and should come from the common env module.
if tr == "" {
return "spike.ist"
}
return tr
}
</issue>
<issue>
spikeDir := filepath.Join(homeDir, ".spike") // TDO: const.
</issue>
<issue>
test that the timeout results in an error.
ctx, cancel := context.WithTimeout(
context.Background(), env.DatabaseOperationTimeout(),
)
defer cancel()
cachedPolicy, err := retry.Do(ctx, func() (*data.Policy, error) {
return be.LoadPolicy(ctx, id)
})
</issue>
<issue>
var ErrNotFound = errors.New("not found")
var ErrUnauthorized = errors.New("unauthorized")
// TO: use these errors from the data package of spike-sdk-go instead.
func body(r *http.Response) (bod []byte, err error) {
body, err := io.ReadAll(r.Body)
if err != nil {
return nil, err
}
return body, err
}
</issue>
<issue>
// TO/DO: document that these need to be updated during a release cut.
const NexusVersion = "0.2.0"
const PilotVersion = "0.2.0"
const KeeperVersion = "0.2.0"
const SpikeNexusTombstoneFile = "spike.nexus.bootstrap.tombstone"
// SpikeNexusDataFolder returns the path to the directory where Nexus stores
// its encrypted backup for its secrets and other data.
func SpikeNexusDataFolder() string {
homeDir, err := os.UserHomeDir()
</issue>
<issue>
# TOO: we should not need workspace; it's better to check for spire-server and spire-agent binaries in $PATH instead of mandating a WORKSPACE variable.
# TOO: ensure that spire.spike.ist resolves before starting these scripts.
# Running spire-agent as super user to read meta information of other users'
# processes. If you are using the current user to use SPIKE only, then you
# can run this command without sudo.
if [ "$1" == "--use-sudo" ]; then
sudo "$WORKSPACE"/spire/bin/spire-agent run \
-config ./config/spire/agent/agent.conf \
-joinToken "$JOIN_TOKEN"
else
"$WORKSPACE"/spire/bin/spire-agent run \
-config ./config/spire/agent/agent.conf \
-joinToken "$JOIN_TOKEN"
fi
</issue>
<issue>
ability to lock nexus programmatically.
when locked, nexus will deny almost all operations
locking is done by executing nexus binary with a certain command line flag.
(i.e. there is no API access, you'll need to physically exec the ./nexus
binary -- regular svid verifications are still required)
only a superadmin can lock or unlock nexus.
</issue>
<issue>
consider using NATS for cross trust boundary (or nor) secret federation
</issue>
<issue>
wrt: secure erasing shards and the root key >>
It would be interesting to try and chat with some of the folks under the cncf
(That's a good idea indeed; I'm noting it down.)
</issue>
<issue>
over the break, I dusted off https://github.com/spiffe/helm-charts-hardened/pull/166 and started playing with the new k8s built in cel based mutation functionality.
the k8s cel support is a little rough, but I was able to do a whole lot in it, and think I can probably get it to work for everything. once 1.33 hits, I think it will be even easier.
I mention this, as I think spike may want similar functionality?
csi driver, specify secrets to fetch to volume automatically, keep it up to date, and maybe poke the process once refreshed
</issue>
<issue>
set sqlilite on by default and make sure everything works.
</issue>
<issue>
volkan@spike:~/Desktop/WORKSPACE/spike$ spike secret get /db
Error reading secret: post: Problem connecting to peer
^ I get an error instead of a "secret not found" message.
</issue>
<issue>
this is from SecretReadResponse, so maybe its entity should be somewhere
common too.
return &data.Secret{Data: res.Data}, nil
</issue>
<issue>
these may come from the environment:
DataDir: ".data",
DatabaseFile: "spike.db",
JournalMode: "WAL",
BusyTimeoutMs: 5000,
MaxOpenConns: 10,
MaxIdleConns: 5,
ConnMaxLifetime: time.Hour,
</issue>
</later>
<reserved>
<issue waitingFor="shamir-to-be-implemented-first">
use case: shamir
1. `spike init` verifies that there are 3 healthy keeper instances.
it creates a shard of 3 shamir secrets (2 of which will be enough to
reassemble the root key) send each share to each keeper.
2. SPIKE nexus regularly polls all keepers and if it can assemble a secret
all good.
3. `spike init` will also save the 2 shards (out of 3) in
`~/.spike/recovery/*`
The admin will be "highly encouraged" do delete those from the machine and
securely back up the keys and distribute them to separate people etc.
[2 and 3 are configurable]
</issue>
<issue waitingFor="shamir-to-be-implemented">
<workflow>
1. `spike init` initializes keeper(s). From that point on, SPIKE Nexus
pulls the root key whenever it needs it.
2. nexus and keeper can use e2e encryption with one time key pairs
to have forward secrecy and defend the transport in the VERY unlikely
case of a SPIFFE mTLS breach.
3. ability for nexus to talk to multiple keepers
4. ability for a keeper to talk to nexus to recover its root key if it
loses it.
5. abiliy for nexus to talk to and initialize multiple keepers.
(phase 1: all keepers share the same key)
6. `spike init` saves its shards (2 out of 3 or smilar) to
`~/.spike/recovery/*`
The admin will be "highly encouraged" to delete those from the machine
and
securely back up the keys and distribute them to separate people etc
`spike init` will also save the primary key used in the shamir's secret
sharing
to `~/.spike/recovery/*` (this is not as sensitive as the root key, but
still
should be kept safe)
- it is important to note that, without the recovery material, your only
opiton
to restore the root key relies on the possibility that more than N
keepers remain
operational at all times. -- that's a good enough possibility anyway
(say 5 keepers in 3 AZs, and you need only 2 to recover the root key;
then it will
be extremely unlikely for all of them to go down at the same time)
so in an ideal scenario you save your recovery material in a secure
encrypted enclave
and never ever use it.
7. `spike recover` will reset a keeper cluster by using the recovery
material.
`spike recover` will also recover the root key.
to use `spike recover` you will need a special SVID (even a super admin
could not use it
without prior authorization)
the SVID who can execute `spike recover` will not be able to execute
anything else.
8. At phase zero, `spike recover` will just save the root key to disk,
also mentioning that it's not secure and the key will be stored safely
and wiped from the disk.
9. maybe double encrypt keeper-nexus communication with one-time key
pairs because
the root key is very sensitive and we would want to make sure it's
secure even
if the SPIFFE mTLS is compromised.
</workflow>
<details>
say user sets up 5 keeper instances.
in nexus, we have a config
keepers:
- nodes: [n1, n2, n3, n4, n5]
nexus can reach out with its own spiffe id to each node in the list. it
can
call the assembly lib with whatever secrets it gets back, as it gets
them back,
and so long as it gets enough, "it just works"
recovery could even be, users have a copy of some of the keeper's
secrets.
they rebuild a secret server and load that piece back in. nexus then can
recover.
that api could also allow for backup configurations
</details>
<docs>
WAITINGFOR: shamir to be implemented
To documentation (Disaster Recovery)
Is it like
Keepers have 3 shares.
I get one share
you get one share.
We keep our shares secure.
none of us alone can assemble a keeper cluster.
But we two can join our forces and do an awesome DR at 3am in the
morning if needed?
or if your not that paranoid, you can keep both shares on one
thumbdrive, or 2
shares on two different thumbdrives in two different safes, and rebuild.
it gives a lot of options on just how secure you want to try and make
things vs
how painful it is to recover
</docs>
</issue>
<issue waitingFor="shamir-to-be-implemented">
func RouteInit(
w http.ResponseWriter, r *http.Request, audit *log.AuditEntry,
) error {
// This flow will change after implementing Shamir Secrets Sharing
// `init` will ensure there are enough keepers connected, and then
// initialize the keeper instances.
//
// We will NOT need the encrypted root key; instead, an admin user will
// fetch enough shards to back up. Admin will need to provide some sort
// of key or password to get the data in encrypted form.
</issue>
</reserved>
<immediate-backlog>
</immediate-backlog>
<runner-up>
<issue>
double-encryption of nexus-keeper comms (in case mTLS gets compromised, or
SPIRE is configured to use an upstream authority that is compromised, this
will provide end-to-end encryption and an additional layer of security
over
the existing PKI)
</issue>
<issue>
Minimally Delightful Product Requirements:
- A containerized SPIKE deployment
- A Kubernetes SPIKE deployment
- Minimal policy enforcement
- Minimal integration tests
- A demo workload that uses SPIKE to test things out as a consumer.
- A golang SDK (we can start at github/zerotohero-dev/spike-sdk-go
and them move it under spiffe once it matures)
</issue>
<issue>
Kubernetification
</issue>
<issue>
v.1.0.0 Requirements:
- Having S3 as a backing store
</issue>
<issue>
Consider a health check / heartbeat between Nexus and Keeper.
This can be more frequent than the root key sync interval.
</issue>
<issue>
Unit tests and coverage reports.
Create a solid integration test before.
</issue>
<issue>
Test automation.
</issue>
<issue>
Assigning secrets to SPIFFE IDs or SPIFFE ID prefixes.
</issue>
<issue>
SPIKE CSI Driver
the CSI Secrets Store driver enables users to create
`SecretProviderClass` objects. These objects define which secret provider
to use and what secrets to retrieve. When pods requesting CSI volumes are
made, the CSI Secrets Store driver sends the request to the OpenBao CSI
provider if the provider is `vault`. The CSI provider then uses the
specified `SecretProviderClass` and the pod’s service account to retrieve
the secrets from OpenBao and mount them into the pod’s CSI volume. Note
that the secret is retrieved from SPIKE Nexus and populated to the CSI
secrets store volume during the `ContainerCreation` phase. Therefore, pods
are blocked from starting until the secrets are read from SPIKE and
written to the volume.
</issue>
<issue>
shall we implement rate limiting; or should that be out of scope
(i.e. to be implemented by the user.
</issue>
<issue>
to docs: the backing store is considered untrusted and it stores
encrypted information
todo: if it's "really" untrusted then maybe it's better to encrypt everything
(including metadata) -- check how other secrets managers does this.
</issue>
<issue>
more fine grained policy management
1. an explicit deny will override allows
2. have allowed/disallowed/required parameters
3. etc.
# This section grants all access on "secret/*". further restrictions can be
# applied to this broad policy, as shown below.
path "secret/*" {
capabilities = ["create", "read", "update", "patch", "delete", "list", "scan"]
}
# Even though we allowed secret/*, this line explicitly denies
# secret/super-secret. this takes precedence.
path "secret/super-secret" {
capabilities = ["deny"]
}
# Policies can also specify allowed, disallowed, and required parameters. here
# the key "secret/restricted" can only contain "foo" (any value) and "bar" (one
# of "zip" or "zap").
path "secret/restricted" {
capabilities = ["create"]
allowed_parameters = {
"foo" = []
"bar" = ["zip", "zap"]
}
but also, instead of going deep down into the policy rabbit hole, maybe
it's better to rely on well-established policy engines like OPA.
A rego-based evaluation will give allow/deny decisions, which SPIKE Nexus
can then honor.
Think about pros/cons of each approach. -- SPIKE can have a good-enough
default policy engine, and for more sophisticated functionality we can
leverage OPA.
</issue>
<issue>
If nexus has not started SPIKE Pilot should give a more informative
error message (i.e. Nexus is not ready, or not initialized, or
unreachable, please check yadda yadda yadda)
</issue>
<issue>
key rotation
NIST rotation guidance
Periodic rotation of the encryption keys is recommended, even in the
absence of compromise. Due to the nature of the AES-256-GCM encryption
used, keys should be rotated before approximately 232
encryptions have been performed, following the guidelines of NIST
publication 800-38D.
SPIKE will automatically rotate the backend encryption key prior to reaching
232 encryption operations by default.
also support manual key rotation
</issue>
<issue>
Do an internal security analysis / threat model for spike.
</issue>
<issue>
TODO in-memory "dev mode" for SPIKE #spike (i.e. in memory mode will not be default)
nexus --dev or something similar (maybe an env var)
</issue>
<issue>
Use SPIKE in lieu of encryption as a service (similar to transit secrets)
</issue>
<issue>
dynamic secrets
</issue>
<issue>
document how to do checksum verification to ensure that the binaries
you download is authentic.
</issue>
<issue>
docs:
Since the storage backend resides outside the barrier, it’s considered
untrusted so SPIKE will encrypt the data before it sends them to the
storage backend. This mechanism ensures that if a malicious attacker
attempts to gain access to the storage backend, the data cannot be
compromised since it remains encrypted, until OpenBao decrypts the data.
The storage backend provides a durable data persistent layer where data
is secured and available across server restarts.
</issue>
<issue>
use case:
one time access to an extremely limited subset of secrets
(maybe using a one time, or time-bound token)
but also consider if SPIKE needs tokens at all; I think we can piggyback
most of the authentication to SPIFFE and/or JWT -- having to convert
various kinds of tokens into internal secrets store tokens is not that much needed.
</issue>
<issue>
- TODO Telemetry
- core system metrics
- audit log metrics
- authentication metrics