-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fault tolerant proposer election and sign subset creation #228
Comments
First of all, what kind of fault tolerance are we aiming for? Just solve the problem of nodes being offline? Or full fault tolerance? If you just want fault tolerance against the simple case of nodes being offline sometimes, yeah we can add a patchwork solution to fix it. Your suggestion to increase the But this would still not be fault tolerant against other problems.
To resolve such problems for good, byzantine fault tolerance would be the gold standard. There are different ways to achieve it. I would suggest we would select redundant participants and ensure the protocol can finish with only a subset of them responding. This would make things more robust (fault tolerant) and possibly improve performance, since we don't have to wait for the slowest participant. But for byzantine fault tolerance to work as I intend it, we would need to check if the underlying cryptographic protocol is suitable for such a setup. If not, maybe we can brute-force our way to redundancy in other ways, such as starting multiple protocol invocations in parallel with different subsets. Obviously, that brute-force approach would come at the cost of signature processing bandwidth.
Using all participants would lead to a higher failure rate, wouldn't it? I think only including participants we know to be responsive at the moment makes sense and we should keep it. Unless I'm misunderstanding or missing something. I suggest to solve this by letting the proposer decide on the list of proposers and letting everyone know about it in the initial message. That's a very simple concept that's makes it obviously true that everyone uses the same list of participants at all times. |
Pretty much this for now for our networking side of things.
I agree, will take a look to see how suitable BFT is for our system
Yeah, probably sticking to |
Currently, in
SignQueue
, we will first create asubset
up to threshold number of participants from allstable
participants, and then elect aproposer
from thissubset
.This is not currently fault tolerant where if one of those nodes hop offline, then that particular signature will not be completed unless that node hops back online. This is because when we go to retry the signature generation (for whatever reason like timeout) the same
subset
of participants is used.Additionally, we also use
stable
to create this subset/proposer which is not ideal, because all nodes would have to agree on thestable
set of participants. We should probably be used all participants in this case to be as deterministic as possible.We should come up with a reasonable way to elect a new
proposer
andsubset
in such cases. We have access toentropy
by all nodes and we can potentially use it along withretry_count
to form the seed to elect proposer/subset.@jakmeier let me know your thoughts on this
The text was updated successfully, but these errors were encountered: