FRU Concurrent Maintenance Sept 30, 2020 14:42:36 GMT
Post by puranik on Sept 30, 2020 14:42:36 GMT
On IBM systems managed by OpenBMC, we have a requirement for performing Field Replaceable Unit (FRU) Concurrent Maintenance (CM).
CM indicates the ability to add, replace or remove a FRU without bringing the workload down or even powering down the system.
This is an operation that can be triggered out-of-band and involves the following (high-level) steps:
- User selects the FRU to CM. The operation could be add, replace or remove.
- In case of remove/replace operations, the BMC needs to perform necessary background handshakes with the hypervisor/other host firmware (to have it stop using that FRU). Once that is done, the user has to physically remove the FRU.
- If the operation was remove we are done here. If the operation is to add/replace, the user plugs in the new FRU and the BMC has to work with the host firmware to (re)integrate the FRU into the system..
Is there an API that Redfish recommends to perform an operation such as this?
We took a cursory look at the schema and were thinking of using a combination of the ReadyToRemove (https://redfish.dmtf.org/schemas/v1/Drive.v1_11_0.json) and State (http://redfish.dmtf.org/schemas/v1/Resource.json#/definitions/State) properties to trigger and monitor the state of this operation. So the flow would be something like:
User sets ReadyToRemove to true in order to initiate the CM operation. The BMC can react to this and do what is needed to "remove" the FRU from the system. State property can be used to monitor the progress of this operation.
When the State goes to "StandbyOffline" (for example), the remove operation can be deemed complete.
When the new FRU is plugged in, setting ReadyToRemove to false should trigger the BMC to perform an "add" and change the State to something meaningful to indicate completion.
Does this sound like a good way to fit in the CM use case? If so, would it make sense to add the "ReadyToRemove" property to the Assembly (as a part of the AssemblyData property) and PCIeDevice schemas (our current use case for CM only covers FRUs that we plan to represent within these schemas)?