Post by suichen on Jun 10, 2021 19:04:32 GMT
Hello,
We would like to export a few health metrics pertaining to the BMC, through the BMC's RedFish interface. The schemas are as follows:
We think there are at least two sets of questions we need to ask regarding these BMC metrics:
First the questions on which schemas to use for each of the metrics listed above:
Then the questions on which URIs to use. It appears there are mainly two ways, the main difference being whether "bmc" should be a parent or child of the various types of metrics:
1. Put all the nodes related to the BMC under the BMC's node, so that we have:
/redfish/v1/Managers/bmc/MemorySummary/MemoryMetrics/HealthData/RemainingSpareBlockPercentage
/redfish/v1/Managers/bmc/Storage/StorageID/Volumes/VolumeID/RemainingCapacityPercent
/redfish/v1/Managers/bmc/I2C/stats (?)
/redfish/v1/Managers/bmc/Boot/BootTime
/redfish/v1/Managers/bmc/MemoryMetrics/CurrentPeriod/{Correctable,Uncorrectable}ECCErrorCount
/redfish/v1/Managers/bmc/TelemetryService/MetricReports/*
In this way, "bmc" becomes the parent node of the above resources.
2. Each node goes to its original node while the BMC is listed as a separate entity under the nodes's associated URIs:
/redfish/v1/Systems/bmc/MemorySummary/MemoryMetrics/HealthData/RemainingSpareBlockPercentage
/redfish/v1/Storage/bmc/Storage/Volumes/VolumeID/RemainingCapacityPercent
/redfish/v1/I2CStats/bmc/stats
/redfish/v1/Systems/bmc/Boot/BootTime
/redfish/v1/Systems/bmc/MemoryMetrics/CurrentPeriod/{Correctable,Uncorrectable}ECCErrorCount
/redfish/v1/TelemetryService/MetricReports/*
In this way, "bmc" is used as a System ID, a Storage ID, an I2C ID (if we imagine the I2C schema would follow existing metrics); The TelemetryService that specifically export BMC's process lists will not be moved under the /Managers/bmc node.
Which one of the above should be used for determining which URI to use?
Thanks and greatly appreciated!
We would like to export a few health metrics pertaining to the BMC, through the BMC's RedFish interface. The schemas are as follows:
- Available memory (the MemAvailable entry in /proc/meminfo) on the BMC
- Available RWFS partition space (the number of free blocks returned by statvfs()) on the BMC
- I2C error counters on the BMC
- BMC's boot time (the FinishTimestampMonotonic property in systemd's DBus service)
- BMC's memory ECC counters (exported onto DBus by the phosphor-ecc daemon, if using OpenBMC)
- List of processes running on the BMC
We think there are at least two sets of questions we need to ask regarding these BMC metrics:
- Which Schemas should be used to embody those metrics
- Which URI should the nodes be added
And we would like to be advised on the answer to the two sets of questions.
First the questions on which schemas to use for each of the metrics listed above:
- The closest one we can find for "memory available" is the MemoryMetrics.HealthData.RemainingSpareBlockPercentage field. If I understand correctly, we can divide the MemAvailable reading with the total BMC memory to obtain the percentage and populate this field. Is this understanding correct?
- The closest one we can find for "available RWFS partition space" is Volume.RemainingCapacityPercent. Can a "flash partition" considered be considered a "Volume"?
- I2C appears to be only used as an enumerate value in the Protocol fields in a few types of nodes such as Storage, Volume, Fabric, etc. There doesn't appear to be a schema regarding I2C ports itself exposing information such as numbers of transactions performed, errors encountered, etc. What would be the right way to propose a new schema?
- ComputerSystem.Boot appears to contain information pertaining to the current boot process such as whether boot source is overridden. There doesn't seem to be a boot time, though, but a Description field exists that appears to take a string. Is it okay if we use the ComputerSystem.Boot schema for the BMC's boot info? Can a BMC be considered a ComputerSystem in this case? If so should we populate the boot info in the Description or propose a boot time field to be added to the schema?
- MemoryMetrics.CurrentPeriod.CorrectableECCErrorCount and MemoryMetrics.CurrentPeriod.UncorrectableECCErrorCount are the ones we want, and the question is the same as all above: This metric appears to be intended for the host, is it okay to use it for the BMC's metric?
- We can't find a schema that defines "list of processes" in a structured way, and the closest one we find so far is the MetricCollection in TelemetryService. One reason for this is "list of processes" is "telemetry" by nature; another reason is MetricCollection is flexible enough in that it can accept strings, such that we can simply export in a human- and machine-readable format like JSON or XML and put it into the MetricCollection. Is this the right approach?
1. Put all the nodes related to the BMC under the BMC's node, so that we have:
/redfish/v1/Managers/bmc/MemorySummary/MemoryMetrics/HealthData/RemainingSpareBlockPercentage
/redfish/v1/Managers/bmc/Storage/StorageID/Volumes/VolumeID/RemainingCapacityPercent
/redfish/v1/Managers/bmc/I2C/stats (?)
/redfish/v1/Managers/bmc/Boot/BootTime
/redfish/v1/Managers/bmc/MemoryMetrics/CurrentPeriod/{Correctable,Uncorrectable}ECCErrorCount
/redfish/v1/Managers/bmc/TelemetryService/MetricReports/*
In this way, "bmc" becomes the parent node of the above resources.
2. Each node goes to its original node while the BMC is listed as a separate entity under the nodes's associated URIs:
/redfish/v1/Systems/bmc/MemorySummary/MemoryMetrics/HealthData/RemainingSpareBlockPercentage
/redfish/v1/Storage/bmc/Storage/Volumes/VolumeID/RemainingCapacityPercent
/redfish/v1/I2CStats/bmc/stats
/redfish/v1/Systems/bmc/Boot/BootTime
/redfish/v1/Systems/bmc/MemoryMetrics/CurrentPeriod/{Correctable,Uncorrectable}ECCErrorCount
/redfish/v1/TelemetryService/MetricReports/*
In this way, "bmc" is used as a System ID, a Storage ID, an I2C ID (if we imagine the I2C schema would follow existing metrics); The TelemetryService that specifically export BMC's process lists will not be moved under the /Managers/bmc node.
Which one of the above should be used for determining which URI to use?
Thanks and greatly appreciated!