Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] malus-collator: implement initial version of malicious collator submitting same collation to all backing groups #6924

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

sw10pa
Copy link
Member

@sw10pa sw10pa commented Dec 17, 2024

Issues

Description

Modified the undying collator to include a malus mode, in which it submits the same collation to all assigned backing groups.

TODO

…or submitting same collation to all backing groups
@sw10pa sw10pa added the T10-tests This PR/Issue is related to tests. label Dec 17, 2024
@sw10pa sw10pa requested review from alindima and sandreim December 17, 2024 12:17
@sw10pa sw10pa self-assigned this Dec 17, 2024
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall!

Comment on lines +396 to +400
let scheduled_cores: Vec<CoreIndex> = claim_queue
.iter()
.filter_map(move |(core_index, paras)| {
Some((*core_index, *paras.get(claim_queue_offset.0 as usize)?))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at ClaimQueueSnapshot to reuse some of the code there with iter_claims_at_depth

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to use iter_claims_at_depth but by default there is no dependency and I can't use ClaimQueueSnapshot, so either I have to add some dependency or copy the code (which I did in this case).

charlie: js-script ./assign-core.js with "2,2000,57600" return is 0 within 600 seconds

# Ensure parachains made progress.
alice: parachain 2000 block height is at least 10 within 300 seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add some better assertions, including some log checks that indeed validators got invalid candidates.
Also, if you want this test being run in the CI, it needs to be added to the gitlab config in polkadot.yml

@@ -0,0 +1,54 @@
[settings]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DSL is deprecated, we have a goal to port tests to Zombienet SDK.

@@ -86,6 +90,8 @@ pub struct GraveyardState {
pub zombies: u64,
// Grave seal.
pub seal: [u8; 32],
// Increasing sequence number for core selector.
pub core_selector_number: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting we use the block number (LSB), is there anything preventing us to do so ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use upward messages to place UMP signals and specify the core index, we needed this core_selector_number for the core selector. Do you mean doing something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andrei means that we typically use the block number for the core selector in real parachains.
The undying collator's state does not store the block number, so this core_selector_number is actually the block number

}

// Wait before repeating the process.
sleep(Duration::from_secs(6 as u64));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait 6s ? If relay chain skips a slot, we build another candidate, which is different than what the honest collator is doing. We should be building on each relay parent we import.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean no waiting at all, or should the waiting time somehow depend on the imports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code is very roughly mimicking what the slot based collator does (every slot, pick the best relay chain block)
What Andrei is suggesting is to simulate what the lookahead collator does and subscribe to relay chain block import notifications instead. This would indeed be closer to what the honest collator is doing. I think for a test collator either is fine

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12946769122
Failed job name: test-linux-stable-no-try-runtime

target: LOG_TARGET,
"Persisted validation data is None."
);
Self::wait_for_next_slot();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating this on every continue, code it in a way so that you can move the sleep to the top of the loop.
Alternatively, to avoid these sleeps, we can do as Andrei suggested and subscribe to block import notifications and build in response to that (which is what the non-malicious undying collator does)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T10-tests This PR/Issue is related to tests.
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

3 participants