Redfish severities for events, logs, and metrics

Redfish severities for events, logs, and metrics Jun 22, 2021 18:14:30 GMT

Quote

Post by scoombs on Jun 22, 2021 18:14:30 GMT

Hello, to introduce myself, my name is Susan Coombs, and I'm with Verizon. Thank you for all of your efforts with Redfish, which are helping us to standardize monitoring. As we continue to expand our use of Redfish, a few questions have come up. We're currently monitoring Redfish events, logs, and metrics, for a variety of servers, and are also monitoring other network elements. We'd like to standardize severity values, and to make them more granular. Standardizing severities supports correlation of Redfish and other monitoring data, across vendors, network elements, and monitoring data types (metric, event, log, etc.) to help identify root causes. More fine-grained severity values help in troubleshooting at the appropriate level when needed, as systems can be configured to receive more severity levels in troubleshooting contexts, while permitting chatter reduction in typical production environments (with chatter reduction/log verbosity also mentioned in a previous post: redfishforum.com/post/693/thread ).

Currently (apart from Redfish), we generally use syslog RFC 5424 as a standard for severity, as defined in section 6.2.1, tools.ietf.org/html/rfc5424#section-6.2.1 :
Numerical Severity
Code

0 Emergency: system is unusable
1 Alert: action must be taken immediately
2 Critical: critical conditions
3 Error: error conditions
4 Warning: warning conditions
5 Notice: normal but significant condition
6 Informational: informational messages
7 Debug: debug-level messages

Table 2. Syslog Message Severities
Attempting to map Redfish severity, and/or health values, which appear to include only "OK", "Warning", and "Critical", to syslog severity levels, some vendors include alternate severities, with logs, for example frequently an Oem.Hpe.Severity of "Informational" comes in the same Redfish log as a Severity of "OK". In general, would "OK" as a Redfish Severity map to "Informational" in syslog, as a starting point? Trying to check previous posts, redfishforum.com/post/693/thread indicates that Redfish "Critical" implies "not functional", so, would Redfish "Critical" then map to syslog "Alert", and perhaps Redfish "Warning" could map to syslog "Warning"?

While it might be challenging to update the severity and health values currently existing in Redfish, which appear to include only "OK", "Warning", and "Critical", might an agreement on how to map existing Redfish severities, along the lines described in the preceding paragraph, while standardizing on syslog severities in the future, address the concern about needing more entries in EventSeverity, raised in redfishforum.com/post/448/thread , without introducing interoperability concerns raised as an objection to adding severities, as vendors would be following syslog RFC 5424 for severity, which would in fact improve interoperability with other systems including syslog and SNMP traps?

Similarly, where Redfish metrics include thresholds implying severities, for example, UpperThresholdCritical and UpperThresholdFatal, could it be more consistent to leverage the same severity framework? Since "Fatal" doesn't appear to be a category in either Redfish events or syslog severities, possibly this could be accomplished by substituting with UpperThresholdCritical and UpperThresholdAlert, so that severity would align with syslog, as well as the Redfish event generated when the threshold is crossed?

What would be involved in, in the future, standardizing on syslog severity, for Redfish events, logs, and metrics, to improve Redfish support for production operations, by reducing chatter in contexts where it's not needed, and facilitating correlation with monitoring data from other systems, enhancing interoperability?

Redfish Specification Forum

Redfish severities for events, logs, and metrics

Post by scoombs on Jun 22, 2021 18:14:30 GMT

Post by jautor on Jul 22, 2021 16:30:22 GMT