-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design for BMC eventing API #167
Conversation
ac0e64e
to
d79c0a8
Compare
- The BMCEventSubscription must have a way to inject headers, to allow for tokens, | ||
basic auth, etc. | ||
|
||
#### Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to decide when to register the notifications, the Ironic design allows them at any time, but I think we probably want to only configure them when a host is active
? And remove them when it's not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should tie the event subscription to the provisioning state. The goal downstream is to support monitoring hardware events. It doesn't matter whether the hosts being monitored are in use yet or not.
Does relying on Ironic to proxy the subscription calls mean that we need to wait for a host to be successfully registered? So the ironic node is in the "manageable" state? Can anything go wrong in that process that would affect provisioning but not affect the event subscription? Or does Ironic basically check the credentials and that's it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subscription it's an entity that could have its own state (active/not active), and I also agree should be independent by the current host provisioning state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should tie the event subscription to the provisioning state. The goal downstream is to support monitoring hardware events. It doesn't matter whether the hosts being monitored are in use yet or not.
Does relying on Ironic to proxy the subscription calls mean that we need to wait for a host to be successfully registered? So the ironic node is in the "manageable" state? Can anything go wrong in that process that would affect provisioning but not affect the event subscription? Or does Ironic basically check the credentials and that's it?
I believe it'd have to be in manageable or later, but not enroll or an error state. Is the right behavior to keep trying to create the event in Ironic until it succeeds once the BareMetalHost is registered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know we have to not be in the enroll state, we could handle that cleanly with a retry w/o error. Other states should keep retrying and report an error if one is produced. Maybe we can reuse the backoff logic that @andfasano added so we don't thrash ironic with requests.
- The BMCEventSubscription must have a way to inject headers, to allow for tokens, | ||
basic auth, etc. | ||
|
||
#### Open Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should tie the event subscription to the provisioning state. The goal downstream is to support monitoring hardware events. It doesn't matter whether the hosts being monitored are in use yet or not.
Does relying on Ironic to proxy the subscription calls mean that we need to wait for a host to be successfully registered? So the ironic node is in the "manageable" state? Can anything go wrong in that process that would affect provisioning but not affect the event subscription? Or does Ironic basically check the credentials and that's it?
headers: | ||
X-Event-Source: "metal3.io" | ||
headerRef: webhookBridgeAuth | ||
context: “SomeUserContext” |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding an enabled
field to turn on/off the subscription?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API is fairly simple, is there a reason to not just delete the CR if one isn't interested in an event anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to something more like a "pause", but I'd guess that CR deletion could be good enough to start
metadata: | ||
name: worker-1-events | ||
spec: | ||
hostRef: ostest-worker-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here there's the implicit assumption that the event resource will be in the same namespace of the referenced host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about namespaces now that we can watch multiple in some cases. We probably wouldn't want someone setting up an event subscription to a host in another namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
spec: | ||
hostRef: ostest-worker-1 | ||
targetURI: https://events.apps.corp.example.com/webhook | ||
filters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the filter values common across vendors? Are they part of the Redfish standard or defined by Ironic or just free-form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the values reported by Redfish and the ones reported by the HW vendor, what the HW vendor reports is a subset of what Redfish would accept.
http://redfish.dmtf.org/schemas/v1/Event.json#/definitions/EventType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EventType
is deprecated as of Redfish v1.5 and replaced by RegistryPrefix
and ResourceType
so we can't assume filters are values of the same type.
I expect the filters to take a generic JSON string like this:
'{EventType: [...], RegistryPrefix:[...], ResourceType:[...], EventFormatType: "x", OriginResources: [...]'
originResources: [...]
can be a separate field outside of the filters
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link to the schema? This is more relevant to the Ironic design than here, we're an abstraction that sits on top so this is designed around their original proposal.
cc: @iurygregory @honza
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzding I've checked a Dell R640 with FW 4.40.10.00
using RedfishVersion:1.9.0
, the information from EventService
(using v1.5.0 doesn't provide the RegistryPrefix
/RegistryPrefixes
so it's not supported) http://paste.openstack.org/show/806812/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filters won't be necessary, since HW vendors may not have the support for RegistryPrefix/RegistryPrefixes
we are not adding the EventTypes
to ironic API, by default we will be using Alert
option till HW vendors have the support for it.
For more details see the discussion: https://meetings.opendev.org/irclogs/%23openstack-ironic/%23openstack-ironic.2021-06-25.log.html#t2021-06-25T13:41:57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all references to filters.
context: “SomeUserContext” | ||
status: | ||
errorMessage: "" | ||
errorCount: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we've stayed away from Condition fields in the past because the advice was they were deprecated. I don't see any evidence of that in https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties though, and I do see a comment about "phases" being deprecated so I wonder if we got the wrong message at the time.
Would it make sense to use a Condition here to indicate whether the hook is Registered, to be more consistent with other k8s APIs? That doesn't give us the progressive back-off behavior, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@honza this comment seems to be outstanding, could you take look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current status implementation is slightly different
|
||
### Upgrade / Downgrade Strategy | ||
|
||
Not required, this is a new API being introduced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During an upgrade with this feature in use we will lose the Ironic database. How will we address that? We don't want to have duplicate registrations of web hooks on BMCs, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a field in the "Status" specifying if the event registration at the BMC was successful might solve this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real state is outside of the cluster (in the BMC), though, so if something else modifies the BMC configuration the status field would no longer reflect the real configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, @dtantsur @iurygregory How does the Ironic side handle this if we adopt a host? Will you be able to import the subscriptions into Ironic so we could check the API for existing subscriptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I do remember in the ironic API we will offer create/delete/list subscriptions, I will try to check if the BMC's would accept duplicate registrations, Ironic can also check if the node has a subscription that matches the one you are trying to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experimentation I found that the BMC would not report all of the details of subscriptions that were registered. I presume that was to protect values that might need to be kept private/secret. I had to encode a signature of the subscription in the context field so I could find the match again and avoid re-registering it. See https://github.com/dhellmann/redfish-event-controller/blob/main/controllers/redfish/eventsubscription_controller.go#L148 for details
- A BMCEventSubscription resource represents a subscription to the events generated | ||
by a specific BMC. | ||
- Ironic will manage configuring the subscription, using a new API for managing them. | ||
- The BMCEventSubscription with maintain a reference to a BareMetalHost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo with
-> will
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/uncc |
Let's have another look at this. Recent updates:
I like Doug's idea for using subscription signatures for the scenario when the Ironic database goes away. I wonder if using the context field makes sense though. Are we using it for two different purposes? Do we have to come up with some concrete encoding mechanism? Are we happy with the decision about provisioning states? The latest seems to be that we keep retrying until we succeed; perhaps with a jittery back-off to avoid overwhelming the system. |
|
||
### Dependencies | ||
|
||
[Ironic Eventing API](https://storyboard.openstack.org/#!/story/2008366) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Ironic Eventing API/ Ironic Vendor Passthru for Subscriptions
the link is https://storyboard.openstack.org/#!/story/2009061
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
## References | ||
|
||
- [Ironic Eventing API](https://storyboard.openstack.org/#!/story/2008366) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to update this to https://storyboard.openstack.org/#!/story/2009061
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
status: | ||
errorMessage: "" | ||
errorCount: 0 | ||
subscriptionID: aa618a32-9335-42bc-a04b-20ddeed13ade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the ID that the BMC will give when we create a sbuscription? If yes, this is not required to be an UUID (Dell uses UUID, HP uses numbers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an arbitrary string. Is that sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@honza oh ok! I was wondering if could have some sort of validation to make sure it was an UUID, if is just a string it's ok. Thanks!
spec: | ||
hostRef: ostest-worker-1 | ||
targetURI: https://events.apps.corp.example.com/webhook | ||
headerRef: webhookBridgeAuth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atm the vendor passthru will only accept Destination (required) and Context (default to empty string), EventTypes (default to ["Alert"]), Protocol (by default is redfish)
payload = {
'Destination': kwargs.get('Destination'),
'Protocol': kwargs.get('Protocol', "Redfish"),
'Context': kwargs.get('Context', ""),
'EventTypes': kwargs.get('EventTypes', ["Alert"])
}
This is how the payload looks like in Ironic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- The BMCEventSubscription will maintain a reference to a BareMetalHost. | ||
- The BMCEventSubscription will reside in the same namespace as the referenced | ||
BareMetalHost. | ||
- The BMCEventSubscription will maintain a reference to the Ironic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be true, since Ironic won't have any information about the subscription in each node. I think we should probably maintain a reference to the node UUID + subscription ID (this will be different based on the BMC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; clarified the source of the subscription ID. We can use the host ref to look up the node UUID.
BareMetalHost. | ||
- The BMCEventSubscription will maintain a reference to the Ironic | ||
subscription ID. | ||
- The BMCEventSubscription will allow injection of headers using a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this since we don't have the support for headers, I'm wondering if this is a requirement for our downstream case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
- A BMCEventSubscription resource represents a subscription to the events generated | ||
by a specific BMC. | ||
- Ironic will manage configuring the subscription using a new API for managing them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should say Ironic will be using vendor passthru methods instead of a new API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
URI. This design document describes a Metal3 API for configuring a | ||
subscription. While Redfish is the primary target for this design, the | ||
Ironic API is vendor-neutral and seeks to provide a unified interface | ||
for configuring events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the unified interface was delayed, we've concentrated on redfish only for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
|
||
### Dependencies | ||
|
||
[Ironic Vendor Passthru for Subscriptions](https://storyboard.openstack.org/#!/story/2009061) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done, but needs gophercloud support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ we can remove this, we don't have any Dependencies anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
URI. This design document describes a Metal3 API for configuring a | ||
subscription. While Redfish is the primary target for this design, the | ||
Ironic API is vendor-neutral and seeks to provide a unified interface | ||
for configuring events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
context: “SomeUserContext” | ||
eventTypes: | ||
- Alert | ||
protocol: Redfish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add a field for http_headers, it's a list of headers that ironic will send in the request if they are set.
example: [{"key1":"value1"}, {"key2":"value2"}] or [{"key1":"value1", "key2":"value2"}]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@honza I've pushed the gophercloud PR to add the support for HttpHeaders
gophercloud/gophercloud#2224 if you have some time this week can you update this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also this one
obtained from the BMC. | ||
- The baremetal-operator binary will be expanded to include 2 | ||
reconcilers with dedicated controller/reconcile loops for | ||
BareMetalHost and BMCEventSubscriptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the driver does not support the vendor passthru? E.g. something not redfish-based is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtantsur in this case Ironic would return 400
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@honza are we adding something that will prevent creating a subscription in case they are not using redfish in the BMO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at the moment; we can easily add this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should at least somehow indicate the failure instead of retrying infinitely
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
/retitle Design for BMC eventing API |
/test markdownlint |
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
- I'd like to configure my BMC to send events to a destination URI. | ||
- I'd like to provide context to a particular event subscription. | ||
- I'd like to configure an event type | ||
- I'd like to specify a BMC protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@honza probably this one could be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
BareMetalHost. | ||
- The BMCEventSubscription will maintain a reference to the subscription ID | ||
obtained from the BMC. | ||
- The baremetal-operator binary will be expanded to include 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently there are more controllers (ie firmware settings), so I'd suggest to change it to "an additional controller for BMCEventSubscriptions resources"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
This patch implements the fast eventing API: metal3-io/metal3-docs#167 We introduce a new CRD, and a new reconciler. We use the Ironic API to communicate with the BMC to manage node subscriptions.
- The baremetal-operator binary will be expanded to include an additional | ||
reconciler with a dedicated controller/reconcile loop for | ||
BMCEventSubscriptions. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to expand this section to define more precisely the expected behavior in case of host deletion? In case the subscription resource remains alive, what will happen when/if the host will come back? What will be status shown to the user meanwhile?
/hold |
metadata: | ||
name: worker-1-events | ||
spec: | ||
hostRef: ostest-worker-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hostRef was changed to hostName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dtantsur The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
I believe this is completed, unholding |
Maybe @honza can follow up on this thread to see if there's anything missing and/or pending points? |
I think this is good to go. |
@honza how do you feel about changing the status from |
This design document contains an initial draft of an API for configuring BMC event subscriptions, using a new [Ironic API](https://storyboard.openstack.org/#!/story/2008366). Co-Authored-By: Honza Pokorny <[email protected]>
@furkatgofurov7 Done, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This design document contains an initial draft of an API for configuring
BMC event subscriptions, using a new Ironic API.