-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shard plugin #877
Add shard plugin #877
Conversation
This adds a shard_seed that people can then shard on ala: ``` def in_shard(node, shard, shard_size) node['shard_seed'] % shard_size <= shard end ``` Which then enables: ``` if in_shard(node, 4, 100) # some new stuff else # some old stuff end ``` Which is roughly how we shard things. Other code to utilize this to follow.
OK now that lint passes... CC @chef/client-core |
This seems weirdly opinionated to have in Ohai vs. in a helper library. |
when :uuid | ||
get_dmi_thing(dmi, :uuid) | ||
end | ||
shard_seed Digest::MD5.hexdigest(data)[0...7].to_i(16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aren't require
ing either of the digest libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why trim to 8 characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'll add the requires.
Um, I don't remember why we did the 8 characters. Something something crytpo widths. Let me see if I can track that down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So best we can tell that 2^15 buckets seemed sufficient and I think we had overflowed a variable or something somewhere without that....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone know if this will be an issue with FIPS? ISTR that the md5 functions get removed if you have OpenSSL in FIPS mode (groan). Alternatively, just using SHA-2 probably won't hurt anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Digest::MD5
still works in FIPS mode, that's Ruby's internal implementation as opposed to OpenSSL's. It's still in C so probably just as fast. We could also use the Adler CRC implementation in the Zlib module if we didn't want either MD5 or SHA for some reason.
# limitations under the License. | ||
# | ||
|
||
def get_dmi_thing(dmi, thing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't declare a global helper method in a plugin file, either make is a true global helper or put it in the plugin class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a bit inclined towards this being something that should be private... i'm definitely a little worried that once we put it out there people will expect it to work and that they'll nitpick on problems it has... i'm feeling very "no is temporary, yes is forever" on this one so inclined more towards no...
base = Ohai.config[:plugin][:shard_seed][:base] || :hostname | ||
data = case base | ||
when :hostname | ||
fqdn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fqdns can, and often are, nil because DNS. machinename would possibly be better, but you get into a religious warfare there with the short-hostname people who may have collisions in short hostnames across their infrastructure.
when :hostname | ||
fqdn | ||
when :serial | ||
get_dmi_thing(dmi, :serial_number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seen so many 0123456789
DMI serial numbers. mostly on silicon mechanics boxes, but also HPaq motherboard replacements.
when :serial | ||
get_dmi_thing(dmi, :serial_number) | ||
when :uuid | ||
get_dmi_thing(dmi, :uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd bet money this isn't guaranteed to be unique either -- of course i've also seen NIC cards with the same Mac address as well...
If we wanted to build this in at the Chef level it feels like it would be better to just generate a random node ID in the Chef Server and have it available on the node object. That removes all the issues with poor entropy sources for consistent hashing. |
Wow, so many comments:
|
@jaymzh I'm not saying we should require a server (solo4lyfe), I'm saying if we think this API is something we want to offer, it might be good to leave room in the design to support a random ID value instead of consistent hashing as it can be a good approach for a lot of people. In terms of code architecture that might imply this is better suited to being a method in That said, I still think this doesn't belong in Chef in general. This specific set of options makes sense for Facebook, but it doesn't seem like we can easily generalize this given every approach has a ton of caveats. Your example already showed this being wrapped by a helper method, is there a reason the logic for the hashing can't live in that helper method too? |
You're kidding right? This is the single most common request I get at chefconf and community summit. I've unofficially shown someone how to do this a hundred times! People really want to be able to rollout to percentages of fleets, or even percentages of environments. There's many companies now using the grocery-delivery model (which is one of the two officially-endorsed models by Chef) and in that model, this is very necessary and there's nothing facebook specific about that at all. Please stop assuming we're the only ones who want all our shit in code, or the only people who need to scale, it's simply not true. |
Nowhere did I say this isn't something people want to do, that's a strawman. I said this code doesn't seem generic enough to not be a footgun outside of Facebook's specific environment. Like if we wanted to make this more general we should probably have it set up that instead of using a single source of entropy, use them all by default and the config is which values to not put in the hash. That would at least mean that is any one of the values has enough entropy, things will work. I would still say it should live in |
|
Huh? I disagree, different people want their seed to be different things. If you want an |
Basically if @thommay or @adamhjk or someone with project-level guidance perms are on board with this as being a feature Chef would offer then I'm on board and happy to put my code where my mouth is and make an alternate impl with all the tweaks I suggested, but I would want someone to weigh in on if they are cool with this feature in general before I spend the time. |
I think you're making an awfully big deal over a tiny little ohai plugin that a ton of people regularly request... I made it configurable for a variety of environment and am happy to add additional options.... |
This is an obviously useful feature. We should add it - the only question should be what shape it should take, not whether it belongs. I would advocate for making it simple and seeing what things people actually need, rather than over-designing it to be generic up front. Leave yourself the option to extend it, but resist the urge to make it more complex in the name of future (undefined) genericness. |
Okay, so a quick counter-suggestion. class Chef::Node
def shard_seed
shard_seed_exclude = ArrayChef::Config[:shard_seed_exclude]).map {|c| c.split(/\s+|,|;|:/) }.flatten.map(&:to_s)
hash = Digest::MD5.new
(%w{name fqdn hostname serial uuid} - shard_seed_exclude).each do |src|
hash << case src
when "name"
name
when "fqdn"
self["fqdn"]
when :hostname
self["hostname"]
when "serial"
get_dmi_thing(self["dmi"], "serial_number")
when "uuid"
get_dmi_thing(self["dmi"], "uuid")
end
end
hash.hexdigest.to_i(16)
end
end Overall pretty similar but I think this will allow more growth over time :) |
So we're about to release a node function in a cookbook that uses the shard-seed from Ohai and provides the functions a reference in the original description. Having to specify excludes feels like an anti-pattern for a bunch of reasons... namely if you change the list, every shard I have changes and you roll back half my machines and forward my other machines. People should pick a seed and go. And :all should definitely not be the default. |
@jaymzh Hmm, that add-new-ones thing is a good point I didn't consider. My concern with picking any one of those sources as a default is we know it won't work well for at least some large number of users. We don't expose DMI data on Windows (that I can find) so that's out as a default. Hostname and FQDN we know a lot of people have issues with. The node name is better for chef-client users but is often unset for solo so same issues with hostname/fqdn. Maybe some kind of algorithmic choice? Another option would be to just do it purely randomly, write out a file with a UUID or similar big number to the |
How about if the config is an array of ones you want to use. FB can use |
I'm concerned that because DMI data isn't available on Windows, that reduces to just hostname of FQDN for Windows users. Is there something similar to DMI data we can plug in on Windows? I would bet there is a machine UUID we could pull out of WMI somewhere that would be similar. |
Sure. We can also make the default platform-specific easily. |
@coderanger - does this all look sane to you now? Did you see my question? |
Now with tests! |
I find it useful to have shard detection capability built into ohai. But the current implementation is very specific, and the interfaces are not generic enough. But looking at the code amount, we can address that later, it wont be large changes, and the exposed bits (provides) won't be altered. Minor nit: may be call the method get_dmi_thing to just dmi_property or somthing like that 👍 |
Fixing style again. |
@ranjib I'm wondering what else you'd want to see. You can choose what sources you want, it's easy to add more, and you can mix-n-match as necessary. |
BTW, I'm still interested in seeing if there's a better way to factor this so there's less repetition between the mac/linux code.... |
On the interface side, current implementation won't fail if the sharding scheme does not match to one of the defined one (switch case block has no default case). If we want custom source/scheme (e.g. cloud provider based sharding) we can't inject corresponding sharding logic . If the source was a map with keys being scheme name and lambda's being sharding logic, we could do that probably... just thinking out loud. |
I'm wary of having people pass in arbitrary lambdas from client.rb - but an |
OK, added failure for unrecognized sources and a test for that. |
OK, thanks to @coderanger I was able to factor the code out to be more DRY. I think this is about everything except for windows support... @coderanger are you willing to add that in a follow up? |
👍 I got the windows stuff and writing a DOC_CHANGES now :) Will be in a follow up PR and we've got a while until the next release soooo probably fine :) |
Sorry to post a question here. We noticed that this plugin is using MD5 and it would fail if the FIPS mode is enable(Have fips true in client.rb) and it failed the whole chef client run. Has anyone faced this issue? Have we considered to update it to support FIPS mode? |
@sonic0002 Please open a new issue for further discussions. |
This adds a shard_seed that people can then shard on ala:
Which then enables:
Which is roughly how we shard things. Other code to utilize this to follow.