Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 679 #684

Merged
merged 15 commits into from
Jul 23, 2024
Merged

Fix issue 679 #684

merged 15 commits into from
Jul 23, 2024

Conversation

ailisp
Copy link
Member

@ailisp ailisp commented Jul 16, 2024

Fix #679.

There is a bug in resharing that only the last node can be vote to kicked. Removing any other node will cause resharing fail and nodes stuck. The root cause of the bug is cait-sith require each node keeps original ParticipantId (u32) in resharing, but in our contract->node if non last node is removed, remaining participants were re assigned with participant id 0, 1, ..., k-1. It should be preserving the original id.

I confirmed in playing with cait-sith's unit test, found ParticipantIds doesn't need to be consecutive, as long as the ParticipantIds are unique and remaining ParticipantId maps to original keyshare, reshare will succeed.

Therefore, the fix is to keep track of each account id and map them to the participant id. Also keep track of next available id. If a new participant joins, it gets a new id. If a participant leave, it's id will be vacant and remain participant will keep their id unchanged. If a previous participant rejoin, it gets it's original id.

In order to debug this issue, I make a lot of logging. Leaving some most important ones: logging state transitions in debug, and make some not useful debug logging to trace.

@ailisp ailisp marked this pull request as ready for review July 17, 2024 04:45
@ailisp ailisp changed the title wip fix issue 679 Fix issue 679 Jul 17, 2024
@ailisp
Copy link
Member Author

ailisp commented Jul 17, 2024

Some of the logging added in this PR is helpful, but most are only helpful for debugging this bug. Also it is still hard to read log from integration tests if cannot easily tell which line is from which node. I'll remove most logs and make a separate PR for some logging enhancement

@ailisp ailisp marked this pull request as draft July 17, 2024 08:03
@@ -407,7 +407,7 @@ impl PresignatureManager {
};
match action {
Action::Wait => {
tracing::debug!("waiting");
tracing::trace!("waiting");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are too noisy in debug log

@ailisp ailisp marked this pull request as ready for review July 18, 2024 08:22
@@ -320,3 +320,31 @@ async fn test_multichain_reshare_with_lake_congestion() -> anyhow::Result<()> {
})
.await
}

#[test(tokio::test)]
async fn test_multichain_reshare_with_first_node_leave() -> anyhow::Result<()> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extend the existing test with one more resharing? Not to make are test runs longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not possible to expand the remove last participant test due to #699 . With 3 nodes, removing and add a new one will fail due to #699, so can only remove one participant in a test. With 5 nodes, threshold 3, remove two nodes test passed locally and should work on nightly, but it takes too much ram to run on regular ci. I'll merge these tests to one after figure out #699

@@ -646,7 +646,7 @@ impl VersionedMpcContract {
Self::V0(MpcContract {
protocol_state: ProtocolContractState::Running(RunningContractState {
epoch,
participants: Participants { participants },
participants: Participants::from_init_participants(participants),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the participant Ids stay the same when we move from one contract to another?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this assign each participant from 0 in order, same as the participant id before the change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this probably won't work for migrations from one contract to another with an already running MPC network that has had multiple reshares. Participants::from_init_participants will create a different participant id mapping than the ones the MPC nodes have due to just enumerating over each item and getting their position.

We should have a new parameter for the account_id_to_participant value be passable into init_running so we can get the same mapping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it! Good point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped Participants::from_init_participants, now init_running just takes struct Participants where node admin can get by view VersionedMpcContract::state()

@volovyks volovyks requested review from ChaoticTempest and ppca July 19, 2024 10:54
@ailisp
Copy link
Member Author

ailisp commented Jul 22, 2024

@volovyks ptal. There is a new audit issue will be addressed separately

ppca
ppca previously approved these changes Jul 22, 2024
Copy link
Contributor

@ppca ppca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

ChaoticTempest
ChaoticTempest previously approved these changes Jul 22, 2024
@ailisp ailisp dismissed stale reviews from ChaoticTempest and ppca via 357c6a3 July 23, 2024 01:08
@volovyks volovyks merged commit ce157c4 into develop Jul 23, 2024
4 checks passed
@volovyks volovyks deleted the fix-679 branch July 23, 2024 12:36
Copy link

Terraform Feature Environment Destroy (dev-684)

Terraform Initialization ⚙️success

Terraform Destroy success

Show Destroy Plan


No changes. No objects need to be destroyed.

Either you have not created any objects yet or the existing objects were
already deleted outside of Terraform.

Destroy complete! Resources: 0 destroyed.

Pusher: @volovyks, Action: pull_request, Working Directory: ``, Workflow: Terraform Feature Env (Destroy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vote to kick the non last node will stuck in Resharing
4 participants