Investigating A Host IPMI System Event Log Alarm

Intelligent Platform Management Interface

The Intelligent Platform Management Interface (Abbreviated to: IPMI) is a series of specifications that provide standardised interfaces that enable remote access to hardware health monitoring data and management systems. The specification was developed by Intel, HP, Dell and NEC but has been adopted by many more hardware manufacturers. Specifically IPMI provides:

  • Hardware Monitoring (System Temperatures, Voltages, Fan Speeds / Failures, Power Consumption as well as Power Supply failure notifications)
  • Hardware Control & Recovery (Remote console access as well as remote power management mechanisms to restart, power on or shutdown the hardware)
  • Event Logging (Recording of out of range sensor conditions as well as failed or successful boot sequences and so forth)
  • Hardware Inventory (A list of hardware components detected in a given system and their respective serial or part numbers if accessible)

These functions are not dependant on the host systems CPU(s), BIOS or Operating System, instead a separate micro-controller called the Baseboard Management Controller (BMC) is responsible for processing the hardware monitoring information and control messages. As long the host system has power, the above functions should be available.

The Baseboard Management Controller is either equipped with or has access to a small amount of non-volatile storage. This storage serves as an information repository for some of the above IPMI functions. Typically it contains the System Event Log (SEL), Sensor Data Record (SDR) Repository and Field Replaceable Units (FRUS) listing.

Host IPMI System Event Log Alarm

A VMware ESXi host had triggered the Host IPMI System Event Log Alarm in VMware vCenter. Selecting the host in the vSphere Client and navigating to the Hardware Status tab showed the following:

After expanding the System Event Log item it was apparent that the log was full. To view the contents of the log issue the following command once connected to the host via SSH:

localcli hardware ipmi sel list

In this case the log was full of messages like these:

Record:1:
Record Id: 1
When: 2015-02-07T12:31:21
Event Type: 111 (Unknown)
SEL Type: 2 (System Event)
Message: Assert + OS Boot C: boot completed
Sensor Number: 0
Raw:
Formatted-Raw:

Record:7:
Record Id: 7
When: 2015-02-07T13:01:25
Event Type: 111 (Unknown)
SEL Type: 2 (System Event)
Message: Assert + OS Stop/Shutdown OS graceful shutdown
Sensor Number: 0
Raw:
Formatted-Raw:

To clear the log issue the following command, again whilst connected to the host via SSH:

localcli hardware ipmi sel clear

To view the IPMI System Event Log (SEL) properties issue the following command, this will also show when the log was last cleared:

localcli hardware ipmi sel get

IpmiSELConfig:
Enabled: true
Formatted-Raw:
Last Added: 2015-03-09T09:45:38
Last Cleared: 2013-08-26T13:54:40
Maximum Records: 64
Overflow: true
Raw:
Sel-Clock: 2015-09-18T19:13:22
Total Records: 64
Version: 0x51 (1.5)

Having cleared the log the status in vCenter was updated accordingly:

IPMI Alert Cleared

References:

VMware KB: 1033725

Leave a Reply

Your email address will not be published. Required fields are marked *