Investigating A Host IPMI System Event Log Alarm

Intelligent Platform Management Interface

The Intelligent Platform Management Interface (Abbreviated to: IPMI) is a series of specifications that provide standardised interfaces that enable remote access to hardware health monitoring data and management systems. The specification was developed by Intel, HP, Dell and NEC but has been adopted by many more hardware manufacturers. Specifically IPMI provides:

  • Hardware Monitoring (System Temperatures, Voltages, Fan Speeds / Failures, Power Consumption as well as Power Supply failure notifications)
  • Hardware Control & Recovery (Remote console access as well as remote power management mechanisms to restart, power on or shutdown the hardware)
  • Event Logging (Recording of out of range sensor conditions as well as failed or successful boot sequences and so forth)
  • Hardware Inventory (A list of hardware components detected in a given system and their respective serial or part numbers if accessible)

These functions are not dependant on the host systems CPU(s), BIOS or Operating System, instead a separate micro-controller called the Baseboard Management Controller (BMC) is responsible for processing the hardware monitoring information and control messages. As long the host system has power, the above functions should be available.

The Baseboard Management Controller is either equipped with or has access to a small amount of non-volatile storage. This storage serves as an information repository for some of the above IPMI functions. Typically it contains the System Event Log (SEL), Sensor Data Record (SDR) Repository and Field Replaceable Units (FRUS) listing.

Host IPMI System Event Log Alarm

A VMware ESXi host had triggered the Host IPMI System Event Log Alarm in VMware vCenter. Selecting the host in the vSphere Client and navigating to the Hardware Status tab showed the following:

After expanding the System Event Log item it was apparent that the log was full. To view the contents of the log issue the following command once connected to the host via SSH:

localcli hardware ipmi sel list

In this case the log was full of messages like these:

Record Id: 1
When: 2015-02-07T12:31:21
Event Type: 111 (Unknown)
SEL Type: 2 (System Event)
Message: Assert + OS Boot C: boot completed
Sensor Number: 0

Record Id: 7
When: 2015-02-07T13:01:25
Event Type: 111 (Unknown)
SEL Type: 2 (System Event)
Message: Assert + OS Stop/Shutdown OS graceful shutdown
Sensor Number: 0

To clear the log issue the following command, again whilst connected to the host via SSH:

localcli hardware ipmi sel clear

To view the IPMI System Event Log (SEL) properties issue the following command, this will also show when the log was last cleared:

localcli hardware ipmi sel get

Enabled: true
Last Added: 2015-03-09T09:45:38
Last Cleared: 2013-08-26T13:54:40
Maximum Records: 64
Overflow: true
Sel-Clock: 2015-09-18T19:13:22
Total Records: 64
Version: 0x51 (1.5)

Having cleared the log the status in vCenter was updated accordingly:

IPMI Alert Cleared


VMware KB: 1033725

Firefox’s Certificate Store

Firefox does not use the built in Windows Trusted Root Certification Authorities store, instead Firefox uses its own repository to store certificates. This became apparent after replacing the self-signed certificates used by HP iLO with certificates issued by the Certificate Authority in my Lab. The Lab contains an offline Root CA and an online Intermediate CA, both run Windows Server 2012 R2.

Microsoft Internet Explorer and Google Chrome did not report any certificate issues after importing the Root and Intermediate Certificate Authority certificates into the Trusted Root Certification Authorities store on the Windows 7 workstation. However Mozilla Firefox continued to display the following warning message:

Firefox Certificate Error

Take the following steps to overcome this issue:

  • Browse to the web interface (certsrv) of the online Certificate Authority using its fully qualified domain name, for example https://FQDN/certsrv
  • When prompted, supply your credentials to login and click OK

  • Click on the link labelled Download a CA certificate, certificate chain, or CRL

Certsvr Home Page

  • From the download page click on the link labelled Download CA certificate chain

Certifiate Download Page

  • When prompted, ensure Save file is selected then click OK to download the p7b certificate file

Certifcate Download

  • Once the file has downloaded, click on the three black lines in the upper right hand corner of the Firefox window to display the menu. Then selection Options

Firefox Options Menu

  • From the Options menu select Advanced. Then click on the Certificates tab and finally click on the View Certificates button

Firefox Advanced Options Menu

  • Select the Authorities tab on the Certificate Manager window and then click on the Import button

Certificate Manager

  • Browse to the location of the p7b file downloaded earlier, then click Open. The certificate should now have been imported successfully into Firefox’s certificate repository. Click OK to close the Certificate Manager window.
  • Browse to the site that had previously displayed the warning message, no further messages should be shown if the correct certificates were imported

Setting up an NTP Server in Ubuntu

Following on from my article on Computer time keeping and the Network Time Protocol, this article outlines the steps required to setup an NTP server in Ubuntu Server 14.04 LTS.

Before following the steps in this article and setting up your own NTP server, it is worth considering which time sources you wish to use. Most Internet Service Providers (ISP’s) operate at least one customer accessible NTP server. This should be the closest to your Computer in terms of network hops and is worth considering.

The NTP Pool Project is also worth a look, it offers pools of NTP servers by country. Taking the UK as an example it currently offers four separate groups of UK based NTP servers:



From a terminal prompt issue the following command to install ntpd:

sudo apt-get install ntp


Having successfully installed the NTP daemon, the configuration file should be updated to point to three or more time servers. To open the configuration file ready for editing issue the following command:

sudo nano /etc/ntp.conf

Use the cursor keys to scroll down the file until the following lines are visible:


Update the four lines to match the fully qualified domain names or IP addresses of the NTP servers you wish to use, ensure that the word server remains at the very beginning of each line. To exit nano and save the changes to the ntp.conf file press the following key combinations:

CTRL+X then Y when prompted to save the changes

The NTP daemon needs to be restarted in order for the changes made to the configuration file to take effect, to do this issue the following command:

sudo service ntp restart

Checking the synchronisation status

To check the synchronisation status of the new NTP server issue the following command:

ntpq -p

Initially the output of the above command will look similar to this as the server begins communication with the remote NTP servers:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================     2 u    1   64    1    9.488    0.480   0.000   2 u    -   64    1   13.091   -0.008   0.000
 ntp1.warwicknet .INIT.          16 u    -   64    0    0.000    0.000   0.000
 ntp3.wirehive.n .INIT.          16 u    -   64    0    0.000    0.000   0.000

Once the initialisation has completed and things have settled down the output of the ntpq -p command should look more like this, the NTP server marked with an asterisk (*) is the current primary time reference:

     remote           refid      st t when poll reach   delay   offset  jitter
*     2 u   41   64  377    8.794    0.355   0.162   2 u   35   64  377   12.537   -0.014   0.177
-ntp1.warwicknet     2 u   41   64  377    9.430    3.731   0.183
+ntp3.wirehive.n    2 u   33   64  377   12.310    0.145   0.0787

Additional Information

Below is a explanation of what the various columns in the above output of the ntpq -p command relate to:

Tally Code

The left most character shown in the output of ntpq -p after initialisation provides an insight into the workings of the clock selection algorithm. A character is set for each peer or server association and can take on one of the following values:

Symbol Message Description
space reject The peer is discarded as unreachable, synchronised to this server (synch loop) or outrageous synchronization distance.
x falsetick The peer is discarded by the intersection algorithm as a falseticker.
. excess The peer is discarded as not among the first ten peers sorted by synchronisation distance and so is probably a poor candidate for further consideration.
outlyer The peer is discarded by the clustering algorithm as an outlier.
+ candidate The peer is a survivor and a candidate for the combining algorithm.
# selected The peer is a survivor, but not among the first six peers sorted by synchronisation distance. If the association is ephemeral, it may be demobilised to conserve resources.
* sys.peer The peer has been declared the system peer and lends its variables to the system variables.
o pps.peer The peer has been declared the system peer and lends its variables to the system variables. However, the actual system synchronisation is derived from a pulse-per-second (PPS) signal, either indirectly via the PPS reference clock driver or directly via kernel interface.


  • The FQDN or IP address of the remote peer or server this device is syncing to. If this field display’s LOCAL then no other peers or servers could be contacted to synchronise with


  • The time source the remote peer or server is synchronised to. This field can take on one of the following values per association:
Value Description
IP address The IP address of a remote peer or server
.LOCL. This local host, used when there are no remote peers or servers available
.PPS. This stands for Pulse Per Second and will be provided by a reference clock such as an Atomic clock
.IRIG. Inter-Range Instrumentation Group time code
.ACTS. American NIST time standard telephone modem
.NIST. American NIST time standard telephone modem
.PTB. German PTB time standard telephone modem
.USNO. American USNO time standard telephone modem
.CHU. CHU (HF, Ottawa, ON, Canada) time standard radio receiver
.DCFa. DCF77 (LF, Mainflingen, Germany) time standard radio receiver
.HBG. HBG (LF Prangins, Switzerland) time standard radio receiver
.JJY. JJY (LF Fukushima, Japan) time standard radio receiver
.LORC. LORAN-C station (MF) time standard radio receiver. Note, no longer operational (superseded by eLORAN)
.MSF. MSF (LF, Anthorn, Great Britain) time standard radio receiver
.TDF. TDF (MF, Allouis, France) time standard radio receiver
.WWV. WWV (HF, Ft. Collins, CO, America) time standard radio receiver
.WWVB. WWVB (LF, Ft. Collins, CO, America) time standard radio receiver
.WWVH. WWVH (HF, Kauai, HI, America) time standard radio receiver
.GOES. American Geosynchronous Orbit Environment Satellite
.GPS. American GPS
.GAL. Galileo European GNSS
.ACST. Manycast server
.AUTH. Authentication error
.AUTO. Autokey sequence error
.BCST. Broadcast server
.CRYPT. Autokey protocol error
.DENY. Access denied by server
.INIT. Association initialised
.MCST. Multicast server
.RATE. Polling rate exceeded
.TIME. Association timeout
.STEP. Step time change, the offset is less than the panic threshold (1000ms) but greater than the step threshold (125ms)
.MRS. Multi Reference Sources – A time source that has access to many different time and frequency references for redundancy


  • The remote peer or servers Stratum


  • The type of client, server or connection used, possible values include:
Value Description
u Unicast or manycast client
b Broadcast or multicast client
l Local reference clock
s Symmetric peer
A Manycast server
B Broadcast server
M Multicast server


  • The number of seconds,minutes, hours, or days since the last successfull poll


  • Polling frequency, this typically ranges between 64 and 1024 seconds


  • The reach column is used to display the last eight transactions between the NTP daemon and a given remote peer or time server. The status (success = 1 or fail = 0) of each transaction is added to an 8-bit left-shifting shift register
  • Each time the NTP daemon sends out a request for a time update the entire 8-bit register is shifted one bit to the left with the state of the more recent poll entering from the right
  • This means that unsuccessful requests can be tracked over eight poll intervals before the information is overwritten in the shift register to make room for new poll status information
  • The reachability value is displayed in octal, eight successful polls would produce the following in binary: 1111 1111, in decimal this is 255 and in octal it is 377. Below is a table showing the progression of a failed poll through the 8-bit shift register and the corresponding octal values that may be displayed:
Status Bit Octal Value
1111 1110 376
1111 1101 375
1111 1011 373
1111 0111 367
1110 1111 357
1101 1111 337
1011 1111 277
0111 1111 177
1111 1111 377


  • Round trip communication delay to the remote peer or server in milliseconds


  • Mean offset (phase) in the times reported between this local host and the remote peer or server (RMS, milliseconds)


  • Mean deviation (jitter) in the time reported for that remote peer or server (RMS of difference of multiple time samples, milliseconds)


Computer Time Keeping and the Network Time Protocol (NTP)

How Computers keep track of the passage of time

Most Computer Operating Systems measure the passage of time using one of the following methods:

  • Tick counting – A hardware device is configured by the Operating System to fire an interrupt at a pre-determined rate. For example 100 times per second. The Operating System then processes the interrupts called ticks and by keeping track of the number of ticks in software it can determine how much time has passed
  • Tickless timekeeping – A hardware counter is used to keep a count of the number of time units that have passed since the Computer booted up. The Operating System can then read the value from the counter as required.

Timing Devices

However not all Computers have the type of hardware counter required for tickless time keeping. Below is a list of different Computer timing devices, the exact functionality provided by each of these devices is outside the scope of this article:

  • Time Stamp Counter (TSC) [1]
  • High Precision Event Timer (HPET) [2]
  • Programmable Interval Timer (PIT) [3]
  • CMOS Real Time Clock (RTC) [4]
  • Advanced Programmable Interrupt Controller (APIC) timers [5]
  • Advanced Configuration and Power Interface (ACPI) timer [6]

Tick counting has several disadvantages when compared to Tickless timekeeping, it adds an additional burden on the CPU as it must process the interrupts in a timely manner to keep time accurately. In contrast as a separate hardware counter is used in Tickless timekeeping, this method usually provides time at a higher level of granularity and precision.

The counter used in Tickless timing must increment at a constant rate and be sufficiently large so that it does not overflow and wraparound particularly often. If this does occur, it must do so in a way that can be detected and counted by the Operating System.

In addition to accounting for the passage of time, Operating Systems must also keep track of Wall-clock time [7], also referred to as absolute time. Wall-clock time is generally obtained early on in the Computer’s boot and Operating System start up sequence from the battery backed Real-time clock. If no Real-time clock is available the Computer can query a network time server to obtain the current time. The progression of time is then measured and tracked using one of the methods outlined above.

Clock Drift

The Wall-clock time within physical Computers often tends to drift over time, the time reported by the Operating System may either be ahead or behind the current time by some margin [8] [9] [10]. This apparent loss of timing accuracy can be attributed to a number of factors:

  • Temperature – Increases or decreases in temperature can affect the rate at which Quartz crystals oscillate causing small variations in CPU Clock frequency. A higher frequency may cause the Wall-clock time to accelerate, a lower frequency may cause the Wall-clock time to decelerate and appear to pass more slowly
  • Dynamic CPU Frequency Scaling – the adjustment of the CPU’s clock frequency in order to either conserve power or reduce the heat generated by the CPU [11]. As with fluctuations in temperature, changes to the CPU’s clock frequency need to be accounted for to keep track of the time accurately
  • CMOS RTC Resolution – Typically the Wall-time provided by the Real-time clock on boot is only provided to the nearest second leading to a loss of timing resolution
  • Lost Ticks – Failure to process or acknowledge an interrupt generated by a timing device due to high system load or other factors
  • Clock Frequency Measurement – It is not always possible to determine the exact frequency of a timing device directly in software; this is true for the APIC Timer and Time Stamp Counter. In such situations approximations of the current frequency must be made using lower resolution timing devices which can lead to a loss of timing accuracy

Clock Drift in Virtual Machines

  • Clock drift is often much worse within virtual machines, this is mainly due to the competition for and scheduling of access to, the underlying hardware resources provided by the physical host [12]
  • The introduction of the hypervisor adds a layer of abstraction and prevents direct access to the physical timing devices within the host. Most hypervisors employ techniques mitigate this; however timekeeping inaccuracies may still occur, especially when the physical host is under high CPU load
  • To guard against this it is critical that each and every Virtual Machine (VM) is configured to query and obtain regular time updates from a group of accurate Network Time Protocol (NTP) servers [13]

The Advantages of Accurate and Synchronised Time

Keeping accurate time between computer systems is essential for a multitude of reasons including but not limited to:

  • Log file analysis following a software glitch, hardware failure or network intrusion event. Accurately time stamped logs, if still present will make it easier to determine the order in which devices or systems failed, or were comprised
  • The timely execution of scheduled tasks such as backups operations or data synchronisation events
  • The time stamping and processing of transactions

Network Time Protocol

The Network Time Protocol and associated client and server software provides a method of synchronising the clocks used in computer systems to a reference time source. NTP was originally designed by David L. Mills in 1985 (Original RFC 958) [14]. The most recent revision of NTP is version 4 (RFC 5905) [15]. This version is backwards compatible with version 3 (RFC 1305) [16]. NTP superseded the Time Protocol (RFC 868) [17] [18] and the ICMP Timestamp message (RFC 792) [19].

NTP messages containing timestamp’s are exchanged between the client and server use the User Datagram Protocol (UDP) [20] as the transport mechanism on port 123. NTP is capable of accuracies of less than a millisecond on Local Area Networks (LANs) and up to a few milliseconds on Wide Area Networks (WANs).

NTP uses an algorithm called the intersection algorithm [21] to construct a list of potential candidate peers that could be used as time synchronisation sources. It then computes a confidence interval for each and drops peers (false tickers) that are deemed to be unreliable time sources. The techniques used in the intersection algorithm were adapted from an earlier algorithm perceived by Keith Marzullo [22] [23]. The algorithms used in NTP are able to mitigate the effects of variable network latency.

NTP Implementations

Under Linux the client and server NTP implementation is called ntpd and it runs as a daemon, this is available for installation in many different Linux distributions [24]. In Microsoft Windows Operating Systems the NTP Client runs as a service and is called W32Time [25].

Time Sources

In NTP time sources are arranged in a hierarchical structure. Each tier of the hierarchy is known as a stratum, with each stratum being assigned a number, starting at zero for the upper most tier.

  • Stratum 0 – This tier contains the extremely precise reference clocks which are typically either Caesium or Rubidium atomic clocks, GPS clocks or other radio based clocks. These clocks are directly connected to a computer. The clocks generate a pulse per second which the computer can detect and is used to mark the start of the next second
  • Stratum 1 – This tier contains computers that are directly connected to the reference clocks. Consequentially the system clocks within these computers are synchronised to within a few microseconds of the stratum 0 devices. Stratum 1 time servers may peer with other stratum 1 time servers for validation and redundancy
  • Stratum 2 – Time servers in this tier will communicate and synchronise with Stratum 1 time servers over a network link. Stratum 2 time servers should query at least three Stratum 1 servers for redundancy and reliability. Ideally this communication and synchronisation should occur over diverse internet connections. In addition to this Stratum 2 time servers should also peer with at least two other Stratum 2 time servers that query different Stratum 1 time servers
  • Stratum 3 – This tier may contain computers that are synchronised to Stratum 2 time servers. Alternatively they can also act as time servers providing time for Stratum 4 computers. The same peering rules used for Stratum 2 time servers should be applied. This level of fan out may only be required in larger enterprise environments to handle the volume of requests

The diagram below depicts a robust NTP topology with a significant amount of redundancy. This amount of redundancy probably isn’t required for most NTP deployments. The three Stratum 0 reference clocks are connected to three separate computers. The three Stratum 1 computers peer with each other and exchange time with the Stratum 2 computers over a network link. The Stratum 2 computers also peer with each other.

In the event of a hardware failure or connectivity issues between any one Stratum 2 computer and the Stratum 1 computers, the Stratum 2 computer could potentially contact one of its Stratum 2 peers to obtain the time. The Stratum 2 computers provide the time to the Stratum 3 computers.

NTP Topology example

An example of a robust NTP Topology

Coordinated Universal Time

NTP will ensure that a given computers clock is synchronised to Coordinated Universal Time (UTC) [26] [27]. UTC is an official standard [28] for the computation of time, as such it should not be thought of as a Time Zone.

The UTC time standard is widely used throughout the world, the time in a given country; region or territory can be calculated by adding or subtracting an offset of a certain number of hours and minutes [29]. For example subtracting 5 hours from UTC would give the local time in New York City.

Two components are combined in order to determine UTC, namely Universal Time (UT1) and International Atomic Time (TAI).

Universal Time (UT1)

UT1 also referred to as Astronomical Time, is linked to the rotation of the Earth. It is used to determine the actual length of a day on Earth [30]. UT1 and the length of a day are subject to small variations, these can be attributed to a number of factors including:

  • Zonal Tides – The displacement of the Earth’s surface caused by the gravity of the Moon and Sun (smaller than 2.5 ms)
  • Oceanic Tides – The rise and fall of sea levels caused by the combined effects of gravitational forces exerted by the Moon, Sun, and rotation of the Earth (smaller than 0.03 ms)
  • Atmospheric Circulation
  • Internal Effects – Related to the movement of the Earth’s liquid core
  • Angular moment – The transfer of rotational momentum due to the Moons orbital motion

International Atomic Time (TAI)

TAI is derived from a few hundred extremely precise atomic clocks housed in time laboratories around the world [31]. The atomic clocks used in such laboratories may only slip by one second over the course of 20 to 300 million years depending on the type of clock used.

One second is defined by the International System of Units (SI) as the time taken for a Cesium-133 atom at sea level to oscillate exactly 9,192631,770 times. The Atomic clocks used in the time laboratories will have been specifically designed to detect and count these oscillations.

The time laboratories provide time data from their atomic clocks to the Bureau International des Poids et Mesures (BIPM) [32]. The Time Department within BIPM then combines this time data to form TAI.

Leap Seconds

The pace of TAI is regularly compared to UT1. To compensate for the variations and gradual slowing of the Earth’s rotation, Leap seconds [33] are inserted as required to keep UTC within 0.9 seconds of UT1.

At time of writing the most recent Leap second was added on the 30th of June 2015. At this point TAI was exactly 36 seconds head of UTC. The computation of UTC since 1972 has required the addition of 26 leap seconds; the other 10 seconds were added at the start of 1972 to compensate for an initial discrepancy in timing. At present Leap seconds are added on either the 30 June or 31 December as required [34].

[1] Wikipedia: Time Stamp Counter [2] Wikipedia: High Precision Event Timer
[3] Wikipedia: Programmable Interval Timer [4] Wikipedia: Real Time Clock
[5] Wikipedia: Advanced Programmable Interrupt Controller [6] Wikipedia: Advanced Configuration & Power Interface
[7] Wikipedia: Wall Clock Time [8] Wikipedia: Clock Drift
[9] Journal of Computer Networks and Communications: Internal Clock Drift Estimation in Computer Clusters [10] NTP.Org FAQ’s: Clock Quality
[11] Wikipedia: Dynamic Frequency Scaling [12] VMware: Timekeeping in VM’s
[13] Wikipedia: Network Time Protocol [14] The Internet Engineering Task Force (IETF): Network Time Protocol (NTP) – RFC958 – 1985
[15] The Internet Engineering Task Force (IETF): Network Time Protocol Version 4 – Protocol and Algorithms Specification – RFC5905 – 2010 [16] The Internet Engineering Task Force (IETF): Network Time Protocol Version 3 – Specification, Implementation and Analysis – RFC1305 – 1992
[17] The Internet Engineering Task Force (IETF): Time Protocol – RFC868 – 1983 [18] Wikipedia: Time Protocol
[19] The Internet Engineering Task Force (IETF): Internet Control Message Protocol – RFC792 – 1981 [20] The Internet Engineering Task Force (IETF): User Datagram Protocol – RFC768 – 1980
[21] Wikipedia: Intersection Algorithm [22] Maintaining the Time in a Distributed System – Keith Marzullo & Susan Owicki – 1983
[23] Wikipedia: Keith Marzullo [24] – NTP Project Information Page
[25] Microsoft: Windows Time Service Tools and Settings [26] Wikipedia: Coordinated Universal Time (UTC)
[27] Time and Date: About UTC [28] ITU ITU-R TF.460-6 – Standard-frequency and time-signal emissions
[29] Wikipedia: List of UTC Time Offsets [30] International Earth Rotation and Reference Systems Service (IERS): Universal Time (UT1) and Length of Day (LOD)
[31] Wikipedia: International Atomic Time [32] Bureau International des Poids et Mesures (BIPM): Work Programme – Time
[33] Wikipedia: Leap Second [34] National Institute of Standards and Technology (NIST): Leap Seconds FAQ’s