Monitoring with MRTG
Home Page Up Using NTP Setup on Windows Windows LAN tips Performance Events Cable modem notes Monitoring with MRTG GPS18 + FreeBSD GPS18 + Windows NTP 4.2.4 vs. 4.2.5 Vista & Windows-7 Timestamp issues After reboot

 

Using MRTG to monitor NTP

I suggest first installing MRTG and become familiar with how to install and configure it on your own system.  You can then add to the MRTG configuration to include timekeeping monitoring and a whole lot more.  The following commands would typically live in your C:\mrtg\bin\ directory.  I have also used MRTG for computer performance measurement, and for a satellite data service signal level and error rate monitoring.
 

Using MRTG on Windows Vista and Windows-7

There are a couple of steps which are needed on Windows-7 (and Windows Vista) which may not be needed on earlier versions of Windows.  This is because software which may not be needed by many users is not installed by default on Windows-7.  You need to:

  • Add the Windows SNMP component (in the Management and Monitoring tools)
  • Add access to the SNMP data for the "public" community
  • Allow read-only access to the SNMP data from any host

These steps are described here.
 

Using fixed scaling and bipolar display

Version for Internet-synced sources, displays +/- 100ms

You need a command-line program to extract the output from an NTP query command.  As MRTG requires Perl, I wrote this program in Perl as well:

File: GetNTP.pl
$ntp_str = `ntpq -c rv $ARGV[0]`;
$val = (split(/\,/,$ntp_str))[20];
$val =~ s/offset=//i;
$val = int ($val + 100);
if ($val < 0) {
  $val = 0;
}
print "0\n";
print "$val\n";
print "0\n";
print "0\n";

To test whether this is working, get a command prompt, CD to C:\mrtg\bin\, and enter the command:

perl  GetNTP.pl  pc-name

pc-name can be this PC, or another one on your network.  You would expect to get back a four-line response:

0
109
0
0

IPv6 note

If you are running version 4.2.4 or earlier of ntp, and you have Windows Vista, Windows-7 or later installed, it's possible that you have both IPv4 and IPv6 working on your network.  The IPv6 is started automatically on these later operating systems.  I found that under these circumstances, using the ntpq <local-pc-name> didn't work, possibly because the ntpd wasn't binding to both the IPv4 and IPv6 addresses (this is still under investigation).  To get this to work properly, I found that you could either:

  • use the IPv4 numeric address in the mrtg.cfg Target line
  • add an entry to \etc\hosts for the PC such as:  192.168.0.5  pc-name
      

Adding a "zero" line in the middle of the graph

By making the MaxBytes and MaxBytes2 values different, you can get MaxBytes plotted as a red dotted line on the graph, nicely indicating the nominal value if you make MaxBytes half MaxBytes2.  So for a 100ms offset from zero for example, you could change the lines as shown below.
  

Extract from mrtg.cfg
Target[odin_ntp]: `perl GetNTP.pl odin`
MaxBytes[odin_ntp]: 100
MaxBytes2[odin_ntp]: 200
Unscaled[odin_ntp]: dwmy
Timezone[odin_ntp]: GMT
Title[odin_ntp]: NTP statistics for Odin - offset from NTP
Options[odin_ntp]: integer, gauge, nopercent, growright
YLegend[odin_ntp]: offset+100 ms
ShortLegend[odin_ntp]: ms
LegendI[odin_ntp]: 
LegendO[odin_ntp]: offset:&nbsp;
Legend1[odin_ntp]: n/a
Legend2[odin_ntp]: time offset in ms, with 100ms offset added to ensure it's positive!
PageTop[odin_ntp]: <H1>NTP -- PC Odin</H1>

Here is a sample of the output, click on the graph for more examples:

 

 

Version for ref-clock-synced sources, displays +/- 3ms

One oddity here is that when specifying 6000 (µs) as the maximum value for the graph, MRTG seemed to set a value slightly greater than the 6ms I wanted, so I had to set the maximum to 5990.  This had the unfortunate effect that when the offset exceeded 6000, the last value less than 6000 was plotted, rather than the 6000 limit.  Hence I changed the Perl script to limit the positive value it returned to 5985 in an attempt to ensure that values over the limit are displayed as such. 

File: GetNTP3000usec.pl
$ntp_str = `ntpq -c rv $ARGV[0]`;
$val = (split(/\,/,$ntp_str))[20];
$val =~ s/offset=//i;
$val = 1000.0 * $val;    # convert to microseconds
$report = int ($val + 3000);
# limit negative value to 0
if ($report < 0) {
  $report = 0;
}
# limit positive value to just under 6000
if ($report > 5985) {
  $report = 5985;
}
print "0\n";
print "$report\n";
print "0\n";
print "$ARGV[0]\n";

Note the MaxBytes and MaxBytes2 values below, for a +/-3ms offset from zero as shown below:

File: narvik-ntp-b.inc
#---------------------------------------------------------------
#	PC Narvik - timekeeping
#---------------------------------------------------------------

Target[narvik_ntp-b]: `perl GetNTP3000usec.pl narvik`
MaxBytes[narvik_ntp-b]: 5990
MaxBytes2[narvik_ntp-b]: 3000
Unscaled[narvik_ntp-b]: dwmy
Title[narvik_ntp-b]: NTP statistics for Narvik - offset from UTC
Options[narvik_ntp-b]: integer, gauge, nopercent, growright
YLegend[narvik_ntp-b]: offset + 3ms
kMG[narvik_ntp-b]: ,ms,,,,
ShortLegend[narvik_ntp-b]: µs
LegendI[narvik_ntp-b]: 
LegendO[narvik_ntp-b]: offset + 3000µs:&nbsp;
Legend1[narvik_ntp-b]: n/a
Legend2[narvik_ntp-b]: time offset in µs, with 3000µs offset added to ensure it's positive.
PageTop[narvik_ntp-b]: <H1>NTP -- PC Narvik</H1>

Here is a sample of the output, click on the graph for more examples:

 

 

Version for ref-clock sources, displays +/- 20µs.

In February 2006, I added a simple stratum 1 server, and added a different version of the Perl script to cover the more limited range of +/-20 microseconds (displayed as 0..40µs).  By July 2006, the GPS was failing more often (tree leaf growth?), so I modified the script to limit on both positive and negative excursions (as without the GPS the server could be hundreds of microseconds out).  

File: GetNTP20microseconds.pl
$ntp_str = `ntpq -c rv $ARGV[0]`;
$val = (split(/\,/,$ntp_str))[20];
$val =~ s/offset=//i;
$val = 1000.0 * $val;        # convert to microseconds
$report = int ($val + 20);
if ($report < 0) {
  $report = 0;
}
if ($report > 40) {
  $report = 40;
}
print "0\n";
print "$report\n";
print "0\n";
print "0\n";

This script was later modified for rather less accurate Windows-based ref-clock systems, so that a swing of +/- 500µs could be displayed on a scale of 0..1000µs .
  
File: GetNTP500microseconds.pl
$ntp_str = `ntpq -c rv $ARGV[0]`;
$val = (split(/\,/,$ntp_str))[20];
$val =~ s/offset=//i;
$val = 1000.0 * $val;           # convert to microseconds
$report = int ($val + 500);     #  -500..+500 microseconds => 0..1000
if ($report < 0) {
  $report = 0;
}
if ($report > 1000) {
  $report = 1000;
}
print "0\n";
print "$report\n";
print "0\n";
print "0\n";


 

A warm CPU affects timekeeping

Here's an interesting result - a PC which is normally fairly lightly loaded runs a particular job once a week.  During the job, the CPU is used intensively, and jumps up from a few percent to almost 100% usage.  CPU gets hot, warms the interior of the PC and hence the clock crystal, so NTP starts to compensate for the warming by changing the system clock divider.  While the rate is changing to accommodate the new crystal frequency, there is an offset as a result.  This quite neatly captured in the graphs below.  Note that the offset may exceed 3.0 milliseconds - it's clipped for presentation purposes.

.. and here's another view from my NTP Plotter program showing show offset is related to the rate of change of frequency,

.. and another view, this time from the Meinberg NTP Time Server Monitor program:

  

New method with automatic scaling

An alternative first suggested by John Say is to plot positive and negative offsets as two separate graphs.  Although John didn't use this, it would allow automatic scaling rather than the fixed scaling of the earlier approach.  I have based the suggested Perl script and MRTG configuration file on John's approach, and I'm using microseconds rather than milliseconds as it suits my systems better (although the prospect of seeing kµsec rather than milliseconds is rather offensive!)  These files are my first attempt, and will likely be revised in the light of experience.

 File: GetNTPoffset.pl
# Expects node name as a parameter
# Returns 1st value for positive offsets, second value for negative
# Returns microseconds of offset
$ntp_str = `ntpq -c rv $ARGV[0]`;       # execute "ntpq -c rv <node>"
$val = (split(/\,/,$ntp_str))[20];      # get the offset string
$val =~ s/offset=//i;                   # remove the "offset="
$val = int (1000 * $val);               # convert to microseconds
$nval = $val;                           # prepare the negative value
if ($val < 0){
$nval = -$nval;                         # make the value positive
$val = 0;                               # ensure zero return for the positive
} else {
$nval = 0;                              # ensure zero return for the negative
}
print "$nval\n";                        # return four numbers, incoming
print "$val\n";                         # outgoing
print "0\n";
print "$ARGV[0]\n";

File: narvik-ntp-p.inc
#---------------------------------------------------------------
#	PC Narvik - timekeeping
#---------------------------------------------------------------

Target[narvik_ntp-p]: `perl GetNTPoffset.pl narvik`
MaxBytes[narvik_ntp-p]: 100000
Title[narvik_ntp-p]: NTP statistics for Narvik - offset from UTC
Options[narvik_ntp-p]: integer, gauge, nopercent, growright
Colours[narvik_ntp-p]: BLUE#0033FF, RED#FF0000, BLUE#0033FF, RED#FF0000, 
YLegend[narvik_ntp-p]: offset +/- us
ShortLegend[narvik_ntp-p]: µs
LegendI[narvik_ntp-p]: offset µs (-):&nbsp;
LegendO[narvik_ntp-p]: offset µs (+):&nbsp;
Legend1[narvik_ntp-p]: Time offset in µs (-)
Legend2[narvik_ntp-p]: Time offset in µs (+)
PageTop[narvik_ntp-p]: <H1>NTP -- PC Narvik</H1>

Here is a sample of this format of output:

 

Earlier Information - where it all started

Here is my earlier information on the topic - a text file.

 
Copyright © David Taylor, Edinburgh Last modified: 2010 Mar 25 at 14:58