Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

simulation bank must be frozen error in sendTransaction #34027

Closed
buffalu opened this issue Nov 12, 2023 · 6 comments
Closed

simulation bank must be frozen error in sendTransaction #34027

buffalu opened this issue Nov 12, 2023 · 6 comments
Labels
community Community contribution

Comments

@buffalu
Copy link
Contributor

buffalu commented Nov 12, 2023

Problem

Our RPCs are seeing very periodic crashes. We're seeing them this error very frequently, especially over the last few days.

Note that we're running v1.16.17-jito, but there are no changes with replay, consensus, or anything else, leading me to believe this is an issue in the Solana Labs validator client.

I've attached the logs below. Some more context...

Image of our public-facing RPCs restarting.
Screenshot 2023-11-11 at 7 56 36 PM

Crash logs:
crash.txt

JSON-RPC HTTP calls surrounding the crash on ny-mainnet-rpc-2 (note simulateBundle)
Screenshot 2023-11-11 at 8 00 36 PM

Proposed Solution

Debug why this is happening and fix it.

@buffalu buffalu added the community Community contribution label Nov 12, 2023
@buffalu
Copy link
Contributor Author

buffalu commented Nov 12, 2023

crash3.txt

@buffalu
Copy link
Contributor Author

buffalu commented Nov 12, 2023

crash3.txt

@buffalu
Copy link
Contributor Author

buffalu commented Nov 12, 2023

crash4.txt

@steviez
Copy link
Contributor

steviez commented Nov 12, 2023

On Discord you mentioned dropped votes. So, I cracked open the logs you posted and I see that your node is deviating from consensus:

[2023-11-12T16:49:06.539204572Z INFO  solana_runtime::bank]
bank frozen: 229602048
hash: 5AWpgLDLnzFRNtJ1ggDhFoUU1QzasfPN6FQ7Q2kYLSV9
accounts_delta: 71SxzwFEVcRCxSzm3FtX9EvFGvYuQYEDbUYTh51zzbsm
signature_count: 3871 last_blockhash: D5te3tvWsPgRaxqCPQpKDed462NtEcyMwS4ahdFuXKov capitalization: 562291927697549064, stats: BankHashStats { num_updated_accounts: 9338, num_removed_accounts: 6, num_lamports_stored: 10006087385531782, total_data_len: 30377175, num_executable_accounts: 0 }

[2023-11-12T16:49:07.710131234Z WARN  solana_core::cluster_slot_state_verifier]
Cluster duplicate confirmed slot 229602048 with
hash C15Bi8Sy7jEeyTkUM2Z7AXvGuDuiMJD3hhkrhevui18N, but our version has
hash 5AWpgLDLnzFRNtJ1ggDhFoUU1QzasfPN6FQ7Q2kYLSV9

[2023-11-12T16:49:08.847500485Z INFO  solana_runtime::bank]
bank frozen: 229602048
hash: C15Bi8Sy7jEeyTkUM2Z7AXvGuDuiMJD3hhkrhevui18N
accounts_delta: f2e5Ba1hPjVFwpzGcmeMPe2ikGAKFLvo94b1yVV5PXj
signature_count: 3871 last_blockhash: D5te3tvWsPgRaxqCPQpKDed462NtEcyMwS4ahdFuXKov capitalization: 562291927697549064, stats: BankHashStats { num_updated_accounts: 9338, num_removed_accounts: 6, num_lamports_stored: 10006085385383478, total_data_len: 30377175, num_executable_accounts: 0 }

Interestingly, your node was able to recover and compute the correct hash. So, I think there are two issues at play here:

  1. Why is your node deviating in the first place
  2. If a node deviates, it explicitly purges that bank; I haven't looked at code but a guess is that we don't have logic in place to remove a slot from BlockCommitmentCache that has been purged.
[2023-11-12T16:49:07.727736715Z WARN  solana_core::replay_stage] purging slot 229602048

@steviez
Copy link
Contributor

steviez commented Nov 13, 2023

@buffalu - From your Discord message + linked PR, you seemingly found the issue a bug specific to the Jito? If so, can you close this issue out?

I'll look into my earlier point about BlockCommitmentCache, but I can look into that and I think we can write that up as a separate GH issue

@buffalu
Copy link
Contributor Author

buffalu commented Nov 13, 2023

yeah, thanks for reminder

@buffalu buffalu closed this as completed Nov 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
community Community contribution
Projects
None yet
Development

No branches or pull requests

2 participants