-
Notifications
You must be signed in to change notification settings - Fork 49
Add the ability to join networks to "seed mode" so that we can run a seed cluster #182
Conversation
This change adds support for prometheus metrics export via P2P protocol. Metrics are exported with `TextEncoder` and compressed with `zstd` on default compression level (`3` at the implementation time).
* add program id Tx dep missing detection * add tx hash to save db error logs * sync Tx dep verification and tx save to be sure parent are saved before children * add program id Tx dep missing detection * add tx hash to save db error log * sync Tx dep verification and tx save to be sure parent are saved before children * test run tx with all its program id * add program id Tx dep missing detection * add tx hash to save db error logs * sync Tx dep verification and tx save to be sure parent are saved before children * test run tx with all its program id * add program id Tx dep missing detection * sync Tx dep verification and tx save to be sure parent are saved before children --------- Co-authored-by: Tuomas Mäkinen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some modifications that came from my branch that I merge yesterday. Perhaps you'll need to rebase from main?
I had already re-based but it mangled the formatting. I just needed to re-run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you can test the watchdog modification to see?
// Start a basic healthcheck so kubernetes has something to wait on. | ||
let listener = TcpListener::bind(config.http_healthcheck_listen_addr).await?; | ||
tracing::info!("Healthcheck listening on: {}", listener.local_addr()?); | ||
|
||
loop { | ||
sleep(Duration::from_secs(1)); | ||
let (stream, _) = listener.accept().await?; | ||
|
||
let io = TokioIo::new(stream); | ||
|
||
tokio::task::spawn(async move { | ||
if let Err(err) = http1::Builder::new() | ||
.serve_connection(io, service_fn(ok)) | ||
.await | ||
{ | ||
tracing::error!("Error serving connection: {:?}", err); | ||
} | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid starting a new http server you can move the lines 243,244:
let scheduler_watchdog_sender =
watchdog::start_healthcheck(config.http_healthcheck_listen_addr).await?;
Just before the if
for example line 239 and remove these.
It will activate the health check watchdog for all type of node. The scheduler verification will fail, but the / entry point will be ok.
I was thinking it will log an error. If it's too annoying, keep the http server you add, and I'll remove it when I'll change the watchdog to activate per service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if it was safe to re-use that so I just whipped up something quick to get me past the issue I had (without the readiness check kubernetes won't wait to start the next statefulset pod). I'm ok with ignoring any errors it logs for now so fix it when you get around to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, as discussed on slack, this is actually an issue as it generates thousands of errors a second. I'll merge this then we can update it when you've changed the watchdog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the PR in general, but maybe Nix files could be extracted into separate PR and checked for dependencies?
p.google-cloud-sdk-gce | ||
(p.google-cloud-sdk.withExtraComponents [p.google-cloud-sdk.components.gke-gcloud-auth-plugin]) | ||
p.k9s | ||
p.kube-capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't be needed with the node directly, right?
Oh shoot, I missed this message before merging. Let's catch up on slack about how to handle this. |
To be able to reasonably run this in kubernetes we need to be able to run multiple seed pods.
This change has nodes run in
p2p-beacon
mode to try to join an existing network. This allows us two things.At start the seed will attempt to contact the nodes listed in the discovery list. After enough attempts instead of failing it will fall back to bootstrapping a brand new network.