|
Post by tdoedline on Oct 9, 2024 13:42:27 GMT
Hi, I work for a hardware company that develops storage and network HBAs. We are looking to further integrate the management and serviceability of our adapters in Redfish, based on customer feedback. In reviewing the documentation and presentations, we understand that there are various data sources that are used by the BMC to collect data that is presented via Redfish. For example, there is a good diagram that illustrates these relationships on page 8 of the following presentation: That diagram shows data sources in the UEFI System firmware (e.g. HII resources) as well as data sources exposed by the hardware itself (e.g. via RDE). Our question is about features and functionality implemented via software in our drivers. For example, we have a software RAID feature exposed by both our UEFI driver and our OS drivers. Where does that fit into this model? For example, we may want to be able to display the software RAID volumes exposed by the driver, or change global software RAID features on the adapter like turning on or off the feature on a per-HBA basis. Our issue is that we don't see where software features exposed by a driver fit into the specifications we've seen. Since these features are honored by our software drivers, they are not known to the firmware and thus can't be exported via PLDM or RDE. Similarly, we could try to expose these features via HII REST_STYLE formsets (such as those detailed in words, a user wouldn't be able to see or change these settings while the OS is booted. Are we missing a portion of the specifications that fits this use case? Are there mechanisms to bridge this gap in OS drivers so that they can also present a data source while the OS is running?
Thank you for any help on this matter.
|
|
|
Post by malbolge on Oct 10, 2024 16:17:20 GMT
This is a familiar problem with RDE. The schema exposes a bunch of properties, some of them can be applied by the firmware, others are host-software-level constructs that the firmware is completely oblivious to. While the request is going to the FW via RDE, the change would have to be applied by the SW.
First, I'd recommend to start by supporting properties that your FW can actually support. Note that virtually all properties in the schema, not counting housekeeping ones, are optional. If there's no clear way for your FW to support nuanced RAID, the most immediate solution would be to skip those properties.
A more permanent solution I'm afraid would involve building some sort of proprietary FW<->SW bride that interrogates the SW about things beyond FW scope. All of this would be 100% proprietary, and from DMTF's perspective, your own internal implementation detail.
This seems to be a common pain point for devices that want to "bolt-on" RDE on top of an existing stack. The classic SW/FW hierarchy assumes SW is the master and often, the sole source of configuration changes. SW talks, FW listens. FW issuing OS-level changes through the SW inverts this model.
An idea I've been floating around is a separate companion app that provides SW functionality to RDE FW, acting as a standalone proxy.
|
|
|
Post by tdoedline on Oct 13, 2024 20:02:12 GMT
Hi, Thank you for the confirmation that we weren't missing something about the overall design of providing data to the BMC for use with Redfish. And yes, while there is benefit to the firmware being the common source of providing data via RDE in that the firmware is always present and "running", no matter if the system is booted into an OS or not, one downside is that the firmware must then know all to be able to report as much data as possible about the system being managed. So any "upper level" functionality that the firmware is unaware of (such as things like software RAID or how the device is mapped into the OS) cannot be reported. This also means that some types of firmware interfaces make this much more difficult. For example, in the past we've worked with Fibre Channel and SAS controller designs that are pure protocol engines, which means that the firmware doesn't do topology discovery and requires that this process be done by the host driver. That doesn't fit very well with this model as the firmware has very little insight into the contents of the topology. That is a great point, one way to get around this would be for the software-to-firmware interface to allow for the firmware to be updated with data that is acquired by the host driver. For example, in a case where the SAS host driver does topology discovery, have the host driver report topology information to the firmware for presentation via RDE. As you said, this would be something custom in the software-to-firmware API that would require firmware work, but is do-able. Unfortunately, we have some product lines where we use third-party controllers that don't allow firmware customization, so in that case we're limited to what the firmware already supports. So is it valid to say that with Redfish overall, that the "system" being managed is really the hardware as viewed by the BMC. The running booted Operating System has no way to provide data to the BMC for exposure via Redfish? That seems to be what this post is alluding to: redfishforum.com/thread/454/get-operating-system-information The only reason that UEFI data is pushed to the BMC is because the System Firmware is treated as an extension of the hardware and is tightly coupled to it, which is alluded to in this post: redfishforum.com/thread/1048/redfish-host-interfaceHas there been any discussion of a "generic" way that an Operating System can provide data to the BMC, similar to how UEFI pushes data about the System Firmware to the BMC during the boot process? This would allow Operating Systems to provide a way for host software to report data to the BMC, though it sounds like this may be outside the scope of what DMTF is trying to accomplish. - Thank you again
|
|
|
Post by malbolge on Oct 14, 2024 11:13:57 GMT
Redfish went into broad deployment onto BMCs, and BMCs used whatever means at their disposal to build their Redfish Resources. AFAIK that involved a multitude of various tools, protocols, glue code, proxies. For example, on some servers with Redfish, for non-RDE devices, the BMC still exposes a lot of data about the network interfaces. It does so in part through a software companion app running on the host OS. While I haven't ever investigated it's function thoroughly, my understanding is that under the hood it does the equivalent of lspci/netstat/other cmd-line tools, and pushes that to the BMC through some channel. Data for Resources any BMC exposes is being AFAIK harvested and applied through a mix of whatever protocols the managed device in mention had available - NVME-Mi for SSDs, NC-SI for NICs, PMBUS for PSUs... The burden of integrating all that was on the BMC, or more generally, the device that hosts the Redfish service. With functionality like PropertyValueConflict annotations for when the user gives conflicting orders, or mutually exclusive states, a Redfish service that relies on external protocols would have to have extremely deep knowledge of the device it's managing. It wasn't practical or sustainable, and thus RDE was born to allow the device to take ownership and interact with Redfish directly. I do get your point about the FW being unaware of stuff like topology. Same thing on networking side, where the HW/FW talks in terms of frame size and neither knows nor cares how those bytes are divided between user-usable data payload (MTU) and headers. I believe there's quite a lot of complexity involved in a generic solution - different OSes, multi-host environments, deployments without a host OS, various levels of virtualization.. a generic fits-all solution might be very complex, incomplete or only temporarily complete. Add to that, Redfish is a model that's supposed to be draped over multiple types of devices. A fancy, smart Redfish-enabled network switch, for example, with it's single-purpose, locked-down OS, has access to both OS-level stuff and registers. I don't get this part. If the firmware is locked down, then presumably you can't fit RDE onto it either? Nope, not unless the BMC vendor builds some custom solution. Although in your case, I believe the chain should be BMC->FW->SW->FW->BMC, not a parallel BMC->SW chain. The situation where both the SW and FW manage parts of Redfish would get messy quickly, especially on a per-property level. Imagine a PATCH payload contained two properties, one that is applied by SW, other by FW, and they are in conflict. Who resolves? Your FW would decide what PDRs it serves, what Resources it exposes, what operation types it supports etc etc - as a single source. And when needed, the FW would reach out to SW for help. This way you don't need the BMC to do anything special. From the BMC's perspective, it's just talking to FW. And RDE does support long-running tasks and delayed ETags in terms of timing, so who the FW talks to becomes an "implementation detail". The issue of wanting to expose extra HW functionality not covered by the standard driver model isn't new. Any graphics card you buy nowadays has a companion app for playing around with LEDs, fan curves and whatnot. I'm thinking that the next step is a similar RDE bridge utility, that proxies requests from an RDE FW device into the SW. Make it fully optional, make some Resources/Properties require this app with a clean fallback into null/skip + info that you need to install the app to have full RDE/Redfish functionality. Note that there's two FW to host OS connections - the blue regular, existing PCIe mechanism for in-band configuration, shuttling data etc, and the new PCIe to app connection. My guess is the earliest DMTF can try to come up with a standard for this is once there's one or more such custom solutions to draw lessons from.
|
|
|
Post by tdoedline on Oct 15, 2024 18:27:31 GMT
Thank you for the history and context of Redfish overall. That isn't obvious to me from the presentations and the specifications. And yes, our goal overall as a company is to figure out what fits best for our many product lines, regarding the several different ways we can present data to the BMC. For example, some products have a fully functional RDE interface that we can enable and support quite easily. Others have a partial implementation (i.e. the firmware knows some things and not others), which may be the best we can do for now. So some of this research is to determine, near term, what is the best we can do to enable the most that we can for our customers. Secondarily, we want to then figure out what "gaps" we may have, either in display of data or configuration, that we can then work with our third-party vendors to improve on future products.
So overall you've done a great job at confirming that we aren't missing anything about Redfish overall (e.g. some hidden way to push data from an Operating System using some sort of software daemon), as well as laying out the constraints of our implementations, including some interesting research we may want to do regarding some proxy-like applications that could fill in the gap.
As for your questions, we have a couple different scenarios in product lines we support:
1.) Hardware-only devices that don't have firmware 2.) Devices with firmware that doesn't support RDE, and also can't be modified 3.) Devices with firmware that does support RDE, but can't be modified 4.) Devices where we create our own firmware that doesn't yet support RDE, but can be modified to support that
So we probably won't have a single, consistent implementation across our many products (which we were planning on doing if we could do this in software in the OS). Rather, we'd have to pick and choose what fits best for each type of device. And yes, in the software case we were asking about, we were planning on disabling any firmware reporting of data and just report it through software to prevent the conflict scenario you describe. But since that isn't an option we'll figure out how best to fit into the existing interfaces.
Thank you again for your time in responding to this question, this was very helpful in confirming our suspicions and giving further context to the specifications.
|
|
|
Post by malbolge on Oct 21, 2024 14:01:41 GMT
Relying on the OS isn't ideal. If I mess up some settings on my storage controller, I presumably can lock myself out from the OS booting. No matter how broken/misconfigured the OS might be, OOB manageability should be available for at least diagnostics, if not recovery of operational settings. If the host OS is up then, practically speaking, SSHing onto it in-band and managing it that is faster and more direct than Redfish. In other words, all else being equal, a device that for it's participation in Redfish, requires the OS to be up is less useful than a device that one that does not.
You will probably run into issues with 3 - Redfish requires it's settings to be persistent. I understand that the device non-volatile memory (FW *and* settings) are static? That would preclude any changes, and make the Redfish model RO.
We'll have to compare notes on the proxy solution one day. I do believe the RDE-into-OS is a design gap that sooner or later will need to be addressed somehow, but someone needs to make the first mistakes.
One thing that can be worked in the meanwhile on is grouping properties into relevant schemas. On the networking side for example, the schemas are pretty well organized into those that can be fulfilled by HW/FW alone, and those that need to be populated by SW. The natural direction this leads towards on NIC side is that the proxy owns entire (high-level, software-centric) Resources - is it like that in storage schemas, or are you finding that firmware and software related properties are mixed into one schema?
|
|
|
Post by tdoedline on Nov 9, 2024 18:44:07 GMT
Thank you for the reply. For your comments on (3), what I meant was that the firmware supports RDE, but the RDE implementation itself can't be modified. What I mean by that is that we are stuck with whatever implementation our vendor gives us and can't add any features that we've added in our solution. In that fixed firmware implementation though, there can be read-only and read-write settings, they are just defined purely by the firmware vendor.
As for the schema organization, a quick scan of them seems to divide them into logical blocks as well, where the firmware / hardware can control one area while the driver controls another. However, I think that may differ depending on the implementation, where some SAS firmware does the topology discovery itself (and can easily report that information), and other SAS firmware relies on the driver to do discovery. Another option in the latter case would be for that sort of firmware to create an API where the host driver can report what it found back to the firmware for reporting via RDE. This can be a custom, simple API completely defined by the third party. In that way the firmware still generates the RDE data, it just gets the contents from the host driver, which can be updated asynchronously. That would be a feature added by the third-party SAS controller vendor though, so it would be up to them to decide that is worth implementing.
|
|