Add experimental-max-learners flag #13377

hexfusion · 2021-09-30T17:06:06Z

This PR add support for adjustment of maxLearners (currently hardcoded as 1) via configuration flag --experimental-max-learners. As the value is a runtime configuration care was taken to ensure proper validation to reduce unexpected situations where the value was not set equally among all members. While it is technically possible to bootstrap a cluster with different values this is no different than the possibility of other important runtime configurations such as the heartbeat interval. In general, I don't see a direct need for dynamic reconfiguration during runtime. While I understand a general desire to limit learner counts from a possible performance standpoint I can't see a reason to change this very often thus requiring the value persisted to disk and exposed via API.

key points:

default is still the same maxLearner=1
flag is experimental

possible scenarios and expectations

existing cluster has N learners (--experimental-max-learners=N) and would like to reduce the config to N-1. In this case the learner must be promoted or removed reducing the number of learners before etcd will start with this configuration which will error ErrTooManyLearners.
existing cluster has N learners (--experimental-max-learners=N) and a new member has just been added to the cluster. The runtime configuration is defined as --experimental-max-learners=N-1. etcd will not start with error ErrTooManyLearners until the configuration meets the current learner counts until learners are promoted.
existing cluster has N learners (--experimental-max-learners=N) and would like to add another learner (N+1). This will result in the client returning ErrTooManyLearners.

use cases:

faster and safer cluster bootstrap, parallel vs serial addition of members during scale up. No quorum loss scaling from 1 -> 2.
horizontal and vertical scaling

hexfusion · 2021-09-30T17:10:52Z

cc @ptabor @gyuho @serathius this is still very early on but I wanted to get your input on the approach before I went any further. tl;dr admin should have the ability to define the number of learners allowed in cluster membership.

server/etcdserver/bootstrap.go

hasbro17

SGTM

We'll probably need an update on the docs for the experimental flags
https://etcd.io/docs/v3.5/op-guide/configuration/#experimental-features
https://github.com/etcd-io/website/blob/main/content/en/docs/v3.5/op-guide/configuration.md

tests/integration/clientv3/cluster_test.go

hexfusion · 2021-11-08T15:40:08Z

@serathius @ptabor @chaochn47 PTAL

spzala

Nice work @hexfusion I have couple of comments but lgtm otherwise. Thanks!

server/etcdserver/api/membership/cluster.go

Signed-off-by: Sam Batschelet <[email protected]>

hexfusion · 2021-11-09T16:08:42Z

@spzala updated based on your comments.

spzala

lgtm
Thanks for quickly addressing my comments @hexfusion

hexfusion · 2021-11-11T13:10:55Z

cc @ptabor @serathius any thoughts here?

serathius · 2021-11-15T14:24:34Z

Looks great, one thought about configuration that is provided as local flag and could be problematic when is misconfigured (for example miconfigured max-learners could cause had to debug behavior on leader change), do we have a way for users to detect such cases? For example I imagine that we could expose a metric with hash of subset of configuration that is expected to match in cluster. This way users can create an alert to detect misconfiguration.

This would also help with supportability if we ask users to verify their cluster configuration in Issue template.

hexfusion · 2021-11-15T14:39:57Z

Looks great, one though about configuration that is provided as local flag and could be problematic when is misconfigured (for example miconfigured max-learners could cause had to debug behavior on leader change), do we have a way for users to detect such cases? For example I imagine that we could expose a metric with hash of subset of configuration that is expected to match in cluster. This way users can create an alert to detect misconfiguration.

This would also help with supportability if we ask users to verify their cluster configuration in Issue template.

Appreciate the input I think that is a great idea. If you don't mind I would like that to be a follow-up PR will start work on it this week.

serathius · 2021-11-15T14:48:23Z

great input

Looks great, one though about configuration that is provided as local flag and could be problematic when is misconfigured (for example miconfigured max-learners could cause had to debug behavior on leader change), do we have a way for users to detect such cases? For example I imagine that we could expose a metric with hash of subset of configuration that is expected to match in cluster. This way users can create an alert to detect misconfiguration.
This would also help with supportability if we ask users to verify their cluster configuration in Issue template.

Appreciate the input I think that is a great idea. If you don't mind I would like that to be a follow-up PR will start work on it this week.

Sure, I was treating this as a separate feature.

hexfusion marked this pull request as draft September 30, 2021 17:06

hexfusion added the WIP label Sep 30, 2021

hexfusion force-pushed the add-learner-limit-flag branch 8 times, most recently from f32a1d2 to 4bb0b51 Compare November 3, 2021 21:24

hexfusion removed the WIP label Nov 3, 2021

hexfusion marked this pull request as ready for review November 3, 2021 21:24

hexfusion changed the title ~~Add max-learner flag~~ Add experimental-max-learner flag Nov 3, 2021

hexfusion commented Nov 3, 2021

View reviewed changes

server/etcdserver/bootstrap.go Outdated Show resolved Hide resolved

hasbro17 approved these changes Nov 5, 2021

View reviewed changes

tests/integration/clientv3/cluster_test.go Show resolved Hide resolved

hexfusion changed the title ~~Add experimental-max-learner flag~~ Add experimental-max-learners flag Nov 8, 2021

spzala reviewed Nov 9, 2021

View reviewed changes

server/etcdserver/api/membership/cluster.go Outdated Show resolved Hide resolved

server/etcdserver/api/membership/cluster.go Outdated Show resolved Hide resolved

hexfusion force-pushed the add-learner-limit-flag branch from 4bb0b51 to 8a160dc Compare November 9, 2021 13:57

add --experimental-max-learner flag

63a1cc3

Signed-off-by: Sam Batschelet <[email protected]>

hexfusion force-pushed the add-learner-limit-flag branch from 8a160dc to 63a1cc3 Compare November 9, 2021 14:52

spzala approved these changes Nov 9, 2021

View reviewed changes

serathius approved these changes Nov 15, 2021

View reviewed changes

hexfusion merged commit 29c3b0f into etcd-io:main Nov 15, 2021

hexfusion deleted the add-learner-limit-flag branch November 15, 2021 14:49

hexfusion mentioned this pull request Nov 15, 2021

UPSTREAM: <carry>: add --experimental-max-learner flag openshift/etcd#102

Merged

harshanarayana mentioned this pull request Oct 13, 2022

sig-cl/kubeadm: add provisional KEP for etcd learner mode kubernetes/enhancements#3615

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental-max-learners flag #13377

Add experimental-max-learners flag #13377

hexfusion commented Sep 30, 2021 •

edited

Loading

hexfusion commented Sep 30, 2021

hasbro17 left a comment

hexfusion commented Nov 8, 2021

spzala left a comment

hexfusion commented Nov 9, 2021

spzala left a comment

hexfusion commented Nov 11, 2021 •

edited

Loading

serathius commented Nov 15, 2021 •

edited

Loading

hexfusion commented Nov 15, 2021 •

edited

Loading

serathius commented Nov 15, 2021

Add experimental-max-learners flag #13377

Add experimental-max-learners flag #13377

Conversation

hexfusion commented Sep 30, 2021 • edited Loading

hexfusion commented Sep 30, 2021

hasbro17 left a comment

Choose a reason for hiding this comment

hexfusion commented Nov 8, 2021

spzala left a comment

Choose a reason for hiding this comment

hexfusion commented Nov 9, 2021

spzala left a comment

Choose a reason for hiding this comment

hexfusion commented Nov 11, 2021 • edited Loading

serathius commented Nov 15, 2021 • edited Loading

hexfusion commented Nov 15, 2021 • edited Loading

serathius commented Nov 15, 2021

hexfusion commented Sep 30, 2021 •

edited

Loading

hexfusion commented Nov 11, 2021 •

edited

Loading

serathius commented Nov 15, 2021 •

edited

Loading

hexfusion commented Nov 15, 2021 •

edited

Loading