IBM has a requirement that when manufacturing a system and testing it, the firmware prevent the boot of the system and quiesce the host if it is running when an error is logged that points to hardware as the source of the failure.
This is to ensure hardware issues are identified quickly in manufacturing.
Once the system is shipped, firmware will log the errors but allow the boot to continue (is possible) and for the host to continue running.
To fulfill this requirement, we give Manufacturing a software setting in the BMC which they can toggle. BMC firmware then looks at this setting when an error is created to execute the appropriate behavior.
This setting is occasionally used once a system is at a customer site when a system is being serviced. It can be used to ensure the new hardware being replaced is working as expected or to help debug an issue.
Is there any interest in DMTF with making this setting a part of the Redfish interface?
Is this something that needs to be managed through Redfish directly? To me, it sounds like something that might make sense in the UEFI space to standardize on boot behaviors for effectively putting a system into service mode. I also think we wouldn't want this to be managed by an external user, so it seems to be more aligned with UEFI behavior in terms of putting something into a "service mode" (or some other term).
Our systems don't use UEFI, but I wonder if it what you're implying is whether this makes more sense to make this an Attribute under the Bios schema? The BMC firmware looks at this as well so it's a little weird to make a BIOS attribute, but it also kind of fits there too because the BIOS firmware will behave differently if this setting is enabled.
The main use case is definitely in manufacturing but it is occasionally used at a customer site, but usually with a product engineer from our company there with the customer. Our goal is that Redfish be utilized by our manufacturing team, just like any other person managing a server.
If the request is to stop the BMC booting, then this would require talking to uBoot on the BMC (not a UEFI issue).
If you desire to have the system controlled by the BMC not boot to the OS, then the EFI boot variable "Timeout", on that platform, can be set to 0xFFFF. According to the UEFI spec, this means that the system should not boot to the OS but should wait for user intervention. I believe that Redfish already has the capability to set EFI variables and I expect the BMC should have the same.
Sorry, I let this one languish a bit but we definitely still have a need for this over in OpenBMC. There's lots of UEFI/EFI discussion above but I'd really like this to be independent of the BIOS implementation. My main focus is on our POWER based systems which have no UEFI/EFI.
The BMC or the BIOS firmware may detect an error that involves a piece of hardware. In normal situations, the firmware may be able to work around or ignore this issue and still boot the system. I'm looking for a Redfish API to tell the BMC that we want the firmware to stop the boot and do no recovery if this occurs. If the BIOS supports this concept, then the BMC can inform the BIOS using whatever communication method they have. My main interest here is in a Redfish API to the BMC to turn this setting on or off.