test(resharding): Adjust State mapping check for single shard tracking #12706

staffik · 2025-01-08T23:05:20Z

Unblocks #12691

Changes

Adjust check_state_shard_uid_mapping_after_resharding so that it can be run for a client that does not track all shards.
Run check_state_shard_uid_mapping_after_resharding for each client.
Slightly refactor (simplify) resharding test loop.

marcelo-gonzalez

Going to just leave comments instead of approving because I'm not super familiar with the state mapping code, so I would need to check it some more before understanding why the change here is allowed, but I'll come back to it tmr if it hasnt been approved yet

marcelo-gonzalez · 2025-01-09T03:40:29Z

integration-tests/src/test_loop/tests/resharding_v3.rs

-                .get_prev_epoch_id_from_prev_block(&tip.prev_block_hash)
-                .unwrap();
-            let epoch_config = client.epoch_manager.get_epoch_config(&prev_epoch_id).unwrap();
+        let epoch_id =


could also just use epoch_id() on the block_header variable stored above. Also, do we need this change? I mean I guess it's all the same anyway, but could just keep the old one since it's shorter

changed in fa8e29e

marcelo-gonzalez · 2025-01-09T03:49:58Z

integration-tests/src/test_loop/utils/sharding.rs

+    client: &Client,
+    prev_block_hash: &CryptoHash,
+) -> Vec<ShardUId> {
+    let account_id =


nit: could avoid cloning:

diff --git a/integration-tests/src/test_loop/utils/sharding.rs b/integration-tests/src/test_loop/utils/sharding.rs index 9db148659..5dab61245 100644 --- a/integration-tests/src/test_loop/utils/sharding.rs +++ b/integration-tests/src/test_loop/utils/sharding.rs @@ -123,14 +123,14 @@ pub fn get_tracked_shards_from_prev_block( client: &Client, prev_block_hash: &CryptoHash, ) -> Vec<ShardUId> { - let account_id = - client.validator_signer.get().map(|validator| validator.validator_id().clone()); + let signer = client.validator_signer.get(); + let account_id = signer.as_ref().map(|s| s.validator_id()); let mut tracked_shards = vec![]; for shard_uid in client.epoch_manager.get_shard_layout_from_prev_block(prev_block_hash).unwrap().shard_uids() { if client.shard_tracker.care_about_shard( - account_id.as_ref(), + account_id, prev_block_hash, shard_uid.shard_id(), true,

fixed in fa8e29e

marcelo-gonzalez · 2025-01-09T04:03:56Z

integration-tests/src/test_loop/utils/trie_sanity.rs

+        .iter()
+        .any(|child_shard_uid| shards_tracked_after_resharding.contains(child_shard_uid))
+    {
+        assert_eq!(shard_uid_mapping.len(), 2);


Hmm this makes me think of another test case to add: chunk producer tracks the parent and a child after reshrding, then gets assigned to an unrelated shard, then in the future state syncs the child again. then the state mapping will still exist but the state application in state sync will just write the child shard uid directly in the DB. Wonder what happens in that case

When the child shard is no longer tracked doesn't the mapping get removed?

State cleanup will deal with cleaning resharding mapping. For now, once mapping is set it will be used forever for the child.

Hmm this makes me think of another test case to add: chunk producer tracks the parent and a child after reshrding, then gets assigned to an unrelated shard, then in the future state syncs the child again. then the state mapping will still exist but the state application in state sync will just write the child shard uid directly in the DB. Wonder what happens in that case

I am pretty sure that was covered by shard shuffling test, but added test_resharding_v3_stop_track_child_for_2_epochs that would be helpful to test state cleanup.

marcelo-gonzalez · 2025-01-09T04:05:07Z

integration-tests/src/test_loop/tests/resharding_v3.rs

@@ -485,56 +483,47 @@ fn test_resharding_v3_base(params: TestReshardingParameters) {

        let client = clients[client_index];
        let block_header = client.chain.get_block_header(&tip.last_block_hash).unwrap();
-        let shard_layout = client.epoch_manager.get_shard_layout(&tip.epoch_id).unwrap();


No objection from me on removing this and the print below, but I wonder if anybody was using these print statements while debugging?

+1 I think those are good to have.

Haven't used them myself. We can add them back locally to debug if needed 👍

Will leave on of them, because they duplicate

fixed in fa8e29e

Trisfald · 2025-01-09T09:54:19Z

integration-tests/src/test_loop/utils/trie_sanity.rs

+            epoch_config.shard_layout.num_shards() as usize
+        );
+    }
+    // If any child shard was tracked after resharding, it means the node had to split the parent shard.


Wondering what happens if the node tracks the parent both not any child

We do resharding for nothing? 🤔

Yes, added a test in fa8e29e

wacban

LGTM

wacban · 2025-01-09T09:39:40Z

integration-tests/src/test_loop/tests/resharding_v3.rs

@@ -485,56 +483,47 @@ fn test_resharding_v3_base(params: TestReshardingParameters) {

        let client = clients[client_index];
        let block_header = client.chain.get_block_header(&tip.last_block_hash).unwrap();
-        let shard_layout = client.epoch_manager.get_shard_layout(&tip.epoch_id).unwrap();


+1 I think those are good to have.

wacban · 2025-01-09T09:45:53Z

integration-tests/src/test_loop/utils/sharding.rs

+    for shard_uid in
+        client.epoch_manager.get_shard_layout_from_prev_block(prev_block_hash).unwrap().shard_uids()
+    {
+        if client.shard_tracker.care_about_shard(


There is a bit of a mismatch between the function name and the implementation here.

The care_about_shard only checks if the client is a producer for given shard in this epoch. The client might also track a shard because it will be the producer in the next epoch or because it is configured to track all shards.

care_about_shard also checks tracked shards, if me is true:

match self.tracked_config { TrackedConfig::AllShards => { // Avoid looking up EpochId as a performance optimization. true } _ => self.tracks_shard(shard_id, parent_hash).unwrap_or(false), }

wacban · 2025-01-09T09:51:22Z

integration-tests/src/test_loop/utils/trie_sanity.rs

@@ -350,17 +355,35 @@ pub fn check_state_shard_uid_mapping_after_resharding(
        epoch_config.shard_layout.get_children_shards_uids(parent_shard_uid.shard_id()).unwrap();


nit though not your code:
It's more typical to just get the shard layout directly from the epoch manager instead of going through the epoch config.

fixed in fa8e29e

wacban · 2025-01-09T09:54:23Z

integration-tests/src/test_loop/utils/trie_sanity.rs

+    prev_block_hash: &CryptoHash,
+    resharding_block_hash: &CryptoHash,


It feels strange to pass the prev_block_hash and resharding_block_hash, and get the tip from the client on the first line of this method. Mixing "pure"-ish function approach with "stateful" approach is asking for trouble.

fixed in fa8e29e

codecov · 2025-01-09T13:51:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.65%. Comparing base (0dc410a) to head (c617d07).
Report is 4 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12706      +/-   ##
==========================================
+ Coverage   70.48%   70.65%   +0.16%     
==========================================
  Files         848      848              
  Lines      173673   173788     +115     
  Branches   173673   173788     +115     
==========================================
+ Hits       122418   122792     +374     
+ Misses      46143    45866     -277     
- Partials     5112     5130      +18

Flag	Coverage Δ
backward-compatibility	`0.16% <ø> (-0.01%)`	⬇️
db-migration	`0.16% <ø> (?)`
genesis-check	`1.36% <ø> (?)`
linux	`69.19% <0.00%> (+0.05%)`	⬆️
linux-nightly	`70.26% <100.00%> (+<0.01%)`	⬆️
pytests	`1.66% <ø> (+1.49%)`	⬆️
sanity-checks	`1.47% <ø> (?)`
unittests	`70.48% <100.00%> (-0.01%)`	⬇️
upgradability	`0.20% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Trisfald

🚀

Trisfald · 2025-01-10T13:46:07Z

integration-tests/src/test_loop/tests/resharding_v3.rs

+}
+
+#[test]
+fn test_resharding_v3_stop_track_child_for_2_epochs() {


Trisfald · 2025-01-10T13:47:04Z

integration-tests/src/test_loop/tests/resharding_v3.rs

@@ -692,6 +747,8 @@ fn test_resharding_v3_double_sign_resharding_block() {
 }

 #[test]
+// TODO(resharding): fix nearcore and un-ignore this test
+#[ignore]


What happened for test_resharding_v3_shard_shuffling and test_resharding_v3_shard_shuffling_intense, which change made them fail?

Increasing test duration, looks like there is some bug, will describe soon.

Trisfald · 2025-01-10T15:04:48Z

At this stage of the PR we have new failing tests, and different options how to approach this, for instance:

keep the tests ignored and fix later
somehow use shorter test duration for some tests to avoid failures, and investigate further in another task
revert to use shorter test duration everywhere (it it makes sense)
fix all issues in this PR

Personally the only one I don't like much is 1.

Generalize check_state_shard_uid_mapping_after_resharding

bac5355

staffik requested review from wacban, Trisfald and marcelo-gonzalez January 8, 2025 23:05

staffik requested a review from a team as a code owner January 8, 2025 23:05

marcelo-gonzalez reviewed Jan 9, 2025

View reviewed changes

Trisfald reviewed Jan 9, 2025

View reviewed changes

wacban approved these changes Jan 9, 2025

View reviewed changes

address review comments

fa8e29e

staffik requested review from marcelo-gonzalez, wacban and Trisfald January 9, 2025 12:31

clippy

c617d07

Trisfald approved these changes Jan 9, 2025

View reviewed changes

staffik enabled auto-merge January 10, 2025 10:19

staffik force-pushed the stafik/resharding/check-mapping branch from 7f6d7bb to e9a8b26 Compare January 10, 2025 13:14

add test, extend test duration

73d28fd

staffik force-pushed the stafik/resharding/check-mapping branch from e9a8b26 to 73d28fd Compare January 10, 2025 13:43

Trisfald reviewed Jan 10, 2025

View reviewed changes

integration-tests/src/test_loop/tests/resharding_v3.rs

}

#[test]

fn test_resharding_v3_stop_track_child_for_2_epochs() {

Copy link

Contributor

Trisfald Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Trisfald reviewed Jan 10, 2025

View reviewed changes

staffik disabled auto-merge January 10, 2025 14:31

staffik added 2 commits January 10, 2025 15:35

ignore

c22af85

fix

e68f8ba

Trisfald self-requested a review January 10, 2025 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(resharding): Adjust State mapping check for single shard tracking #12706

test(resharding): Adjust State mapping check for single shard tracking #12706

staffik commented Jan 8, 2025

marcelo-gonzalez left a comment

marcelo-gonzalez Jan 9, 2025

staffik Jan 9, 2025

marcelo-gonzalez Jan 9, 2025

staffik Jan 9, 2025

marcelo-gonzalez Jan 9, 2025

Trisfald Jan 9, 2025

staffik Jan 9, 2025

staffik Jan 10, 2025

marcelo-gonzalez Jan 9, 2025

wacban Jan 9, 2025

Trisfald Jan 9, 2025

staffik Jan 9, 2025

staffik Jan 9, 2025

Trisfald Jan 9, 2025

staffik Jan 9, 2025

wacban left a comment

wacban Jan 9, 2025

wacban Jan 9, 2025

staffik Jan 9, 2025

wacban Jan 9, 2025

staffik Jan 9, 2025

wacban Jan 9, 2025

staffik Jan 9, 2025

codecov bot commented Jan 9, 2025

Trisfald left a comment

Trisfald Jan 10, 2025

Trisfald Jan 10, 2025

staffik Jan 10, 2025

Trisfald commented Jan 10, 2025

		@@ -350,17 +355,35 @@ pub fn check_state_shard_uid_mapping_after_resharding(
		epoch_config.shard_layout.get_children_shards_uids(parent_shard_uid.shard_id()).unwrap();

		prev_block_hash: &CryptoHash,
		resharding_block_hash: &CryptoHash,

test(resharding): Adjust State mapping check for single shard tracking #12706

Are you sure you want to change the base?

test(resharding): Adjust State mapping check for single shard tracking #12706

Conversation

staffik commented Jan 8, 2025

marcelo-gonzalez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wacban left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 9, 2025

Codecov Report

Trisfald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Trisfald commented Jan 10, 2025