Using MRTG to monitor NTP
I suggest first installing MRTG and become familiar with how to install and configure it on your own system. You can then add to the MRTG configuration to include timekeeping monitoring and a whole lot more. The following commands would typically live in your C:\mrtg\bin\ directory. I have also used MRTG for computer performance measurement, and for a satellite data service signal level and error rate monitoring.
The rest of this note is written assuming you have MRTG
installed and working correctly. It describes how to extract the data NTP
can report (even from remote clients) into a form which MRTG can use, and offers
some plotting suggestions. Finally, a couple of other monitoring
alternatives are mentioned.
There are a couple of steps which are needed on Windows Vista and Windows-7 & 8 which may not be needed on earlier versions of Windows. This is because SNMP service which may not be needed by many users is not installed by default, so first you must add that Windows feature, and then configure the security settings for the SNMP service.
These steps are also described here.
Likely you will also need to add ntpd.exe to the Windows
Firewall, to allow it to accept incoming connections.
Using fixed scaling and bipolar display
Version for Internet-synced sources, displays +/- 100 ms
An offset of +/- 100 milliseconds should be within the range of most devices when synced to Internet sources. You may well do better! As MRTG cannot plot negative numbers, I choose to plot the offset plus a bias, with a fixed scaling which makes the zero offset line in the middle of the graph. You need a command-line program to extract the output from an NTP query command "ntpq -c rv". As MRTG requires Perl, I wrote this simple program in Perl as well:
To test whether this is working, get a command prompt, CD to your MRTG bin directory (e.g. C:\mrtg\bin\), and enter the command:
perl GetNTP.pl pc-name
pc-name can be this PC, or another one on your network. You would expect to get back a four-line response, where "109" is the offset plus 100 milliseconds in the example below:
0 109 0 0
Here's a sample of the output, although this PC keeps rather more accurate time than the +/- 100 ms scale shows. You can click on the graph to see the four time periods which MRTG normally displays:
Here is a sample of the output with +/- 3ms scale, click on the graph for more examples:
If you are running version 4.2.4 or earlier of ntp, and you have Windows Vista, Windows-7/8 or later installed, it's possible that you have both IPv4 and IPv6 working on your network. The IPv6 is started automatically on these later operating systems. I found that under these circumstances, using the ntpq <local-pc-name> didn't work, possibly because the ntpd wasn't binding to both the IPv4 and IPv6 addresses (this is still under investigation). To get this to work properly, I found that you could either:
For the local PC, use the form:
Target[odin_ntp]: `perl GetNTP.pl 127.0.0.1`
Adding a "zero" line in the middle of the graph
By making the MaxBytes and MaxBytes2 values different, you can get MaxBytes
plotted as a red dotted line on the graph, nicely indicating the nominal value if you make MaxBytes half MaxBytes2.
So for a 100ms offset from zero
for example, you could change the lines as shown below.
With Windows PCs synced from a local stratum-1 reference clock, a range of +/-3 milliseconds is more appropriate than +/- 100 milliseconds. A different Perl script is required to extract the offset data. One oddity here is that when specifying 6000 (Ás) as the maximum value for the graph, MRTG seemed to set a value slightly greater than the 6ms I wanted, so I had to set the maximum to 5990. This had the unfortunate effect that when the offset exceeded 6000, the last value less than 6000 was plotted, rather than the 6000 limit. Hence I changed the Perl script to limit the positive value it returned to 5985 in an attempt to ensure that values over the limit are displayed as such.
Note the MaxBytes and MaxBytes2 values below, for a +/-3ms offset from zero as shown below:
Here is a sample of the output, click on the graph for more examples:
Version for ref-clock sources, displays +/- 20 Ás.
In February 2006, I added a simple stratum 1 server, and added a different version of the Perl script to cover the more limited range of +/-20 microseconds (displayed as 0..40 Ás). By July 2006, the GPS was failing more often (tree leaf growth?), so I modified the script to limit on both positive and negative excursions (as without the GPS the server could be hundreds of microseconds out).
This script was later modified for rather less accurate
Windows-based ref-clock systems, so that an offset swing of +/- 500 Ás could be
displayed on a scale of 0..1000 Ás.
Here's a sample of the output:
Here's an interesting result - a PC which is normally fairly lightly loaded runs a particular job once a week. During the job, the CPU is used intensively, and jumps up from a few percent to almost 100% usage. CPU gets hot, warms the interior of the PC and hence the clock crystal, so NTP starts to compensate for the warming by changing the system clock divider. While the rate is changing to accommodate the new crystal frequency, there is an offset as a result. This quite neatly captured in the graphs below. Note that the offset may exceed 3.0 milliseconds - it's clipped for presentation purposes.
.. and here's another view from my NTP Plotter program showing show offset is related to the rate of change of frequency,
.. and another view, this time from the Meinberg NTP Time Server Monitor program:
New method with automatic scaling
An alternative first suggested by John Say is to plot positive and negative offsets as two separate graphs. Although John didn't use this, it would allow automatic scaling rather than the fixed scaling of the earlier approach. I have based the suggested Perl script and MRTG configuration file on John's approach, and I'm using microseconds rather than milliseconds as it suits my systems better (although the prospect of seeing kÁsec rather than milliseconds is rather offensive!) These files are my first attempt, and will likely be revised in the light of experience.
Here is a sample of this format of output:
Earlier Information - where it all started
Here is my earlier information on the topic - a text file.