-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 679 #684
Fix issue 679 #684
Conversation
Some of the logging added in this PR is helpful, but most are only helpful for debugging this bug. Also it is still hard to read log from integration tests if cannot easily tell which line is from which node. I'll remove most logs and make a separate PR for some logging enhancement |
@@ -407,7 +407,7 @@ impl PresignatureManager { | |||
}; | |||
match action { | |||
Action::Wait => { | |||
tracing::debug!("waiting"); | |||
tracing::trace!("waiting"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are too noisy in debug log
@@ -320,3 +320,31 @@ async fn test_multichain_reshare_with_lake_congestion() -> anyhow::Result<()> { | |||
}) | |||
.await | |||
} | |||
|
|||
#[test(tokio::test)] | |||
async fn test_multichain_reshare_with_first_node_leave() -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extend the existing test with one more resharing? Not to make are test runs longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not possible to expand the remove last participant test due to #699 . With 3 nodes, removing and add a new one will fail due to #699, so can only remove one participant in a test. With 5 nodes, threshold 3, remove two nodes test passed locally and should work on nightly, but it takes too much ram to run on regular ci. I'll merge these tests to one after figure out #699
chain-signatures/contract/src/lib.rs
Outdated
@@ -646,7 +646,7 @@ impl VersionedMpcContract { | |||
Self::V0(MpcContract { | |||
protocol_state: ProtocolContractState::Running(RunningContractState { | |||
epoch, | |||
participants: Participants { participants }, | |||
participants: Participants::from_init_participants(participants), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the participant Ids stay the same when we move from one contract to another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this assign each participant from 0 in order, same as the participant id before the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this probably won't work for migrations from one contract to another with an already running MPC network that has had multiple reshares. Participants::from_init_participants
will create a different participant id mapping than the ones the MPC nodes have due to just enumerating over each item and getting their position.
We should have a new parameter for the account_id_to_participant
value be passable into init_running
so we can get the same mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it! Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped Participants::from_init_participants
, now init_running just takes struct Participants
where node admin can get by view VersionedMpcContract::state()
@volovyks ptal. There is a new audit issue will be addressed separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Terraform Feature Environment Destroy (dev-684)Terraform Initialization ⚙️
|
Fix #679.
There is a bug in resharing that only the last node can be vote to kicked. Removing any other node will cause resharing fail and nodes stuck. The root cause of the bug is cait-sith require each node keeps original
ParticipantId
(u32) in resharing, but in our contract->node if non last node is removed, remaining participants were re assigned with participant id 0, 1, ..., k-1. It should be preserving the original id.I confirmed in playing with cait-sith's unit test, found
ParticipantId
s doesn't need to be consecutive, as long as the ParticipantIds are unique and remaining ParticipantId maps to original keyshare, reshare will succeed.Therefore, the fix is to keep track of each account id and map them to the participant id. Also keep track of next available id. If a new participant joins, it gets a new id. If a participant leave, it's id will be vacant and remain participant will keep their id unchanged. If a previous participant rejoin, it gets it's original id.
In order to debug this issue, I make a lot of logging. Leaving some most important ones: logging state transitions in debug, and make some not useful debug logging to trace.