MC behavior while RDE device is not accessible

Abner Chang
Minnow

Posts: 6

MC behavior while RDE device is not accessible Jan 9, 2024 6:44:06 GMT

Quote

Post by Abner Chang on Jan 9, 2024 6:44:06 GMT

The question I am asking is do we define the MC behavior in RDE spec (maybe I missed it) while RDE device was registered before but is not accessible in some scenarios later.

For example, RDE device is in power-off state and user HTP PATCH the properties of URI that is mandated by RDE. How is the MC behavior for this case? Do we define the behavior such as MC could keep the HTTP actions in somewhere and replay the HTTP actions when RDE device is ready for RDeOperationInit? Or this is MC implementation specific?

Thanks
Abner

malbolge
Minnow

Posts: 37

MC behavior while RDE device is not accessible Jan 10, 2024 13:39:06 GMT

Quote

Post by malbolge on Jan 10, 2024 13:39:06 GMT

For pcie/ocp cards at least, you should be able to talk to the device even if it's running on 3.3v / D3Hot only. PCIe/MCTP will be down of course, but SMBUS/MCTP should be up. In that case, the device may have some property value nulls and Resources missing (with whatever +12V needs being unpowered and it's state unknown/not relevant).

If there's no way for the BMC to access the device, then probably the BMC should remove it from the Redfish hierarchy altogether. This should be part of BMC's PLDM logic not strictly RDE though - if a device doesn't meet PLDM base requirements for responding in time and retransmission count is exceeded, it should be dropped as a PLDM device - as a consequence, dropped as an RDE device - and as a consequence, it's Resources unplugged from the BMC-owned hierarchy of Resources.

I believe caching HTTP requests wouldn't be against the spec if you wanted to - but may lead to other issues. What if the device boots without RDE? What if it's RDE configuration changed while it was down and you've got a cached request to a physical function that isn't there anymore? What if the device randomly assigns PDR numbers and ResourceIDs every boot and the BMC needs to divine what the new target for the request is? What if it's just a week-old request, at which point do you decide it's expired?

Abner Chang
Minnow

Posts: 6

MC behavior while RDE device is not accessible Jan 16, 2024 3:58:33 GMT

Quote

Post by Abner Chang on Jan 16, 2024 3:58:33 GMT

Thanks for the feedback.
As RDE spec is mainly focusing on the RDE device, some host BMC behaviors are not quite clearly elaborated, such as the situation I mentioned. Do you think we should spec out the host BMC behavior in RDE spec, mayne optionally to address this situation? Especially to the the opensource implentation such as openBMC, we may need a unified solution.

malbolge
Minnow

Posts: 37

MC behavior while RDE device is not accessible Jan 22, 2024 12:58:11 GMT

Quote

Post by malbolge on Jan 22, 2024 12:58:11 GMT

Jan 16, 2024 3:58:33 GMT Abner Chang said:

Thanks for the feedback.
As RDE spec is mainly focusing on the RDE device, some host BMC behaviors are not quite clearly elaborated, such as the situation I mentioned. Do you think we should spec out the host BMC behavior in RDE spec, mayne optionally to address this situation? Especially to the the opensource implentation such as openBMC, we may need a unified solution.

Awesome news about OpenBMC picking up RDE and I'm looking forward to plugging my RDE device into it. Any tips on what (preferably cheap) box I can pick up that I can flash with some OpenBMC experimental builds once they're public?

Implementing it is the best occasion to introduce corrections, erratas and clarifications to the spec.

Remember to code to specification. RDE is complex and you may find behaviors in devices in the field that aren't explicit in the spec - but aren't prohibited either. Those usually would follow the path of least resistance for a given vendor to implement while preserving spec compliance. Spec compliance still provides ample axis of freedom however, and I wouldn't be surprised if you'd find devices that are simultaneously not explicitly violating the spec and also have various irreconcilable nuances between each other. Whenever you see something that calls for a switch(VENDOR) kind of differentiation, that's a good sign this needs to be addressed in spec.

As for caching case, a paragraph or two about caching requests may make this clearer for future implementors. My recommendation would be to discourage, but not prohibit caching. Both an RDE device and BMC may make special beyond-the-spec provisions to enable caching for a select number of hand-picked cases, but in general, once a device drops off the bus, it ought to drop from the Redfish Resource tree.

Last Edit: Jan 22, 2024 12:59:57 GMT by malbolge

Abner Chang
Minnow

Posts: 6

MC behavior while RDE device is not accessible Jan 24, 2024 6:49:32 GMT

Quote

Post by Abner Chang on Jan 24, 2024 6:49:32 GMT

There is no implementation yet however our BMC team is going to do that. Also, not sure how much we can leverage from the current OpenBMC PLDM implementation. That is for sure I can keep you updated if we have something.
RDE is complex and the implementations from RDE device vendors are slightly different, this is what I learned from the previous work in OEM. Either to give the implementation guidance in the spec or dominate the RDE-MC behavior in opensource BMC implementation is fine for this case. However, I prefer the former which describes the basic MC behavior when RDE device is inaccessible a little bit in the spec. This helps to align the MC implementation with spec more. There is a lack of MC behavior definition for the RDE inaccessible case. Maybe we don't have to mention "cache" specifically. However, more high-level guidance to MC for dealing with this case is fine.

billscherer
Minnow

Posts: 5

MC behavior while RDE device is not accessible Feb 12, 2024 20:22:17 GMT

Quote

Post by billscherer on Feb 12, 2024 20:22:17 GMT

The primary purpose of RDE is to enable a BMC to serve as a proxy between an embedded device and the "upper layer" Redfish service implementation. For policy questions like this -- whether information should be cached for a resource that is not always present -- the Redfish spec, not the RDE spec, is the one to consider. In this case, the Redfish spec has an entire section on absent resources (though it doesn't specifically discuss resources that are transiently absent). I think that any discussion of caching should be there, not in the RDE spec.