Events

Leap second
Dec 2005

NTP glitch
Dec 2005

Drift values &
useful links

Leap-Second - 2012-June-30

[This section is under development]

I saw no showstopper bugs with either my FreeBSD or Windows systems, although many Linux bugs were reported.

FreeBSD 8.2 PC fed from the Garmin GPS 18 LVC behaved perfectly.
PC Pixie on the graphs. running ntpd 4.2.7p255.
Refclock Windows PCs fed from either Garmin GPS 18/x LVC or Sure Electronics GPS boards appear to have reset some 17..24 minutes after 00:00, suggesting they were using the GPS for the time (as well as the precise edge), and the GPS devices took some time to reflect the leap-second in their serial outputs. Perhaps it would have been better had they not believed the GPS serial time, but that coming from other servers? The reset resulted in a further period of about an hour until full accuracy was restored.
PCs Alta, Bacchus, Feenix and Stamsund, running ntpd 4.2.7p285.
LAN/WAN synced Windows PCs saw no glitch.
PCs Hydra, Molde, Narvik, Torvik and Ystad on the graphs, all running ntpd 4.2.7p285.

Offset graphs (148 KB PNG file)

Zip archive of peerstats, loopstats and event log entries for GPS-synced PC Alta (338 KB Zip file)

Event log entries for PC Alta

Level	Date and Time	Source	Event ID	Task Category
Warning	01/07/2012 01:24:33	NTP	2	None	clock would have gone backward 1 times, max 1000612.1 usec 
Information	01/07/2012 01:24:32	NTP	3	None	HZ 64.102 using 43 msec timer 23.256 Hz 64 deep   
Information	01/07/2012 01:00:00	NTP	3	None	Leap second announcement disarmed

Leap-Second - 2008-December-31

By contrast with earlier events, the leap-second at the end of 2008 went of without a hitch. Here is a screen shot showing December 31 and January 01. As you can see, there was no disruption at the transition Dec 31 - Jan 01. The analog clock on my main PC took two seconds to go between 00:00:00 and 00:00:01, so my thanks to the person who wrote that part of the Windows NTP port for getting the code right! I left the heating on over the transition, as normally the temperature step at 06:00 in the morning and gradual cool-down late at night causes a change in the offset which is particularly noticeable in the GPS-synced PC Pixie, because of the accuracy it normally achieves. You can see the morning effect on the extreme left of the Pixie plot.

Leap-second at the end of 2008

At 0630 UTC on January 31, just one of the servers I referenced indicated that it had not stepped, and was therefore a full second ahead, and NTP on the client had marked it a false-ticker - with an "X". Note from the graphs above that the presence of this one incorrect server did not affect PC Stamsund.

C:\Davids\NTP>ntpq -p stamsund
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*pixie           .PPS.            1 u  401 1024  377    3.461   -0.287   0.155
+dns0.rmplc.co.u 195.66.241.3     2 u  328 1024  377   25.216    1.052   0.703
xmango.haggett.o 66.187.233.4     2 u  276 1024  377   23.728  999.634   0.352
+utserv.mcc.ac.u 193.62.22.98     2 u  289 1024  377   28.272    1.833   1.459
-linnaeus.inf.ed 129.215.64.235   3 u   47 1024  377   33.133    1.648   2.730

Bill Unruh posted output from his GPS18 LVC on the comp.protocols.time.ntp newsgroup. He writes:

Here is the output from my GPS 18LVC. The most interesting thing is that GPRMC and PGRMF reported the leap second differently.
$GPRMC,235959,A,4915.6384,N,12310.7450,W,000.0,257.9,311208,018.9,E*62^M
$PGRMF,488,345613,311208,235959,14,4915.6384,N,12310.7450,W,A,2,0,258,2,1*0C^M
$GPRMC,000000,A,4915.6384,N,12310.7450,W,000.0,257.9,311208,018.9,E*63^M
$PGRMF,488,345614,311208,235959,15,4915.6384,N,12310.7450,W,A,2,0,258,3,1*0B^M
$GPRMC,000000,A,4915.6384,N,12310.7450,W,000.0,257.9,010109,018.9,E*63^M
$PGRMF,488,345615,010109,000000,15,4915.6384,N,12310.7450,W,A,2,0,258,2,1*0A^M
$GPRMC,000001,A,4915.6384,N,12310.7450,W,000.0,257.9,010109,018.9,E*62^M
$PGRMF,488,345616,010109,000001,15,4915.6384,N,12310.7450,W,A,2,0,258,3,2*0A^M
I.e. it repeats 00:00:00 but it also states that the difference between UTC and GPS time went from 14 to 15 on the first 00:00:00. It is also weird that on the first one, the time reported in the GPRMC string is different from that in the PGRMF. In PGRMF it is 23:59:59 that is repeated (which is closer to the UTC definition).

Other resources

David Malone has a fascinating Web site at http://www.maths.tcd.ie/~dwmalone/time/leap2008.html where the behaviour of a number of systems (VLF and short-wave radio, NTP and DVB) are shown. This is a must-visit Web site if you are at all interested in leap-second issues.

False Leap-Second - 2006-July-01

On 2006 July 01 I noticed that the timekeeping on two of my PCs had gone wild. The offset graphs (shown below) were similar to ones I had seen before, with the value in the drift file being set to a large positive values (more than 400 ppm) and values which were grossly incorrect. Stopping NTP, restoring a correct drift value to the file ntp.drift, and restarting NTP cured the problem. I did this at about 07:30 clock. Noticing that the transient had started at 01:00 clock (00:00 UTC), I wondered if it had any connection with leap seconds. Sure enough, on looking in the Windows Event Log for the PCs in question, on both problem PCs, at 01:00 a positive leap second is inserted. Arrgh!

The announcement wasn't coming from my own stratum 1 server (at least I hope it wasn't, as not all client PCs were affected). On PC Bacchus, a positive leap second was detected by NTP at 09:10 (clock) on June 06, the event log does not show which server provided this duff information. The NTP on that server was restarted on March 04 and was using NTP UK pool servers, plus ntp2c.mcc.ac.uk. On PC Stamsund, a positive leap second was detected by NTP at 09:17 (clock) on June 6. Its servers included UK pool servers, and 130.88.200.6 (utserv.mcc.ac.uk). (Interestingly, those are the two PCs which I didn't touch after the leap-second problems at the start of the year. Coincidence?

On checking with the: ntpq -c rv <host-name> command neither the utserv.mcc.ac.uk server nor my own simple stratum-1 server was showing a leap-second flag (leap=00). (Thanks to David Malone for the syntax of this command - he checked a number of servers after the 2006 leap-second issue). Karel Sandler reported: "According to the www.pool.ntp.org there are 57 UK servers today. All these servers (3 S1, 32 S2, 20 S3 and 2 S4) have 'leap 00' (at Jul 1 23:36 UTC) and all those three S1 were OK according to the pool scores. But - one more server (S1) has been there until Jul 1 03 AM UTC. Maybe, don't know."

So I am not 100% sure if the July transient was a hangover from the January problems, or if some servers were incorrectly sending out a positive-leap-second-is-due announcement. I will make the following suggestions:

- to the NTP Pool managers: that the servers in the pool should be monitored for spurious leap-second announcements

- to the NTP Developers: that NTP be more robust before acting on the leap-second announcement from a single server.

My thanks to those who helped diagnose the problem.

Screen-shot of the July 2006 transient on my PCs

screen-shot

NTP Leap-Second Behaviour - 2005-2006

The end of December 2005 was the first occasion for several years where a leap second was inserted in the UTC time scale to bring it back into line with the Earth's rotation. The software I use to keep my PCs' time correct, NTP, has full provision for handling the leap seconds, and can also control the computer either via the Unix kernel or by the Windows SetSystemTimeAdjustment routine, so that the leap second is handled smoothly. However, it seems that not all external systems were running the current software, or they they know about the leap second correctly. I was running two versions of the NTP software for Windows - both NTP 4.2.0b. On Bacchus and Hermes I had Meinberg version 1431, and on Odin and Stamsund Meinberg version 1436 - a beta test version which was able to correctly insert the leap-second on Windows, a function which the basic OS lacks.

What happened

Rather than being out on the streets of Edinburgh celebrating the New Year, I watched how the different systems handled the leap-second! It seemed that about half the Internet servers I sync to inserted the leap-second, but about half did not. This confused NTP, as it did not know which of the two clusters of servers were telling the correct time. Like many NTP users, I have no reference time source other than the Internet. This appeared to result in NTP assuming the worst - that the computers clock rate was way off, and it thought that the clock drift was nearly 500 parts per million, the maximum it could be. As a result, the timekeeping was all over the place (although mostly within the 128ms range NTP allows before it steps the computer clock). I left the computer in this state around 01:30 UTC.

At 07:40 UTC I returned to see how things were going. Two of the PCs (Bacchus and Stamsund) whilst still having errors greatly in excess of their normal levels, at least had more reasonable (non-limiting) drift values, so I left them to try and sort themselves out. The other two PCs (Hermes and Odin) were still showing grossly incorrect drift values (as seen in the file ntp.drift), and were having gross time errors. Consequently, I decided to stop the NTP client, delete the drift file, and restart the client on those PCs. This resulted in Hermes setting down correctly with a couple of hours, but Odin took somewhat longer (10 hours) to determine a sensible drift value.

Conclusions

It's difficult to know what conclusions to draw, as I am not an NTP expert!

If I had a local reference clock, would the system have behaved differently? Perhaps. Some people did report, though, that their accurate and correctly leap-second transitioned reference clock was ignored and NTP saw many external clocks without the leap second, and discarded the accurate local reference clock.
Did having the newer software which transitioned the OS cleanly on Odin and Stamsund make any difference? Difficult to say, but note that the two systems without the code (Bacchus and Hermes) showed an initial positive offset (they lacked the leap second and were therefore ahead of the new correct time), whereas the two systems with the software (Odin and Stamsund) showed an initial negative offset. So I think that the software did make a difference.
How would the system have behaved if all external reference sources had inserted the leap second correctly? I can't say, but one could only hope it would have been better!
Work is needed by the owners of many NTP servers the ensure that their servers do insert the leap-second notice correctly, and that the servers themselves perform that transition smoothly.
I might have been better off with the beta leap-second compliant software installed on my Internet facing servers rather than the local clients. I hadn't done this because it was beta software! If I had, the Internet-facing servers might have rejected the servers which were in error, and the clients might have been less confused.
PC Bacchus shows an interesting recovery curve - you can see a high rate error from about 04:00-07:00 followed by a reset, then a smaller rate error from 08:00-14:00 and another reset, an even smaller rate error from 16:00-21:00 and another reset. This would appear to be NTP iterating to determine the correct drift value.

My thanks to Martin Burnicki of Meinberg, Germany, for providing the update to NTP for Windows which inserted the leap second. Martin also compared his code against a precision source, and the most interesting results appear here.

Screen-shot of the January 2006 transient on my four PCs

NTP Glitch - December 2005

What happened

Briefly, something happened on 2005 December 02 (a software change probably), which caused NTP on my Windows XP Pro PC to show much more instability than I had seen since the installation of XP Service Pack 2. After much useful discussion on the comp.protocols.time.ntp newsgroup, I determined that running the MultiMedia timer continuously at 1ms resolution, cured the instability, which appears to be from the OS making time steps both when changing from regular to MM timer mode and when changing back again. I modified one of my own programs to provide a function to enable the MM Timer, and Martin Burnicki of Meinberg, Germany, provided a version of the Windows NTP software where that function could be enabled while the NTP software was running. I subsequently installed that version of the NTP software on Odin, where it completely cured the problem, and also on a Windows 2000 PC (Stamsund) which had always exhibited instability. It seems that running certain software (something like JavaScript or Flash under Internet Explorer) could set and reset the MM Timer mode, and thereby cause a timekeeping instability.

The start of the problem - 2005, Friday 2nd December

Curing NTP instability on a Windows 2000 system

The instability of this system was completely cured by having the MM Timer running continuously, avoiding the glitches when it was enabled or disabled. The same software restored the Windows XP system to stable timekeeping. The glitch at the end of the graph is the leap-second issue described above. The Windows version of NTP has now been modified to include the -M parameter at startup which enables the MM timer, and thus provides much better timekeeping (on systems where this is a problem).

Typical Drift Values

This is just for my own reference, but I might as well record it here as drift came into the leap-second discussion. The different values are recorded at different dates.

PC Name	CPU	Motherboard	OS	2006	Late 2007	31 Dec 2008	23 Mar 2011	16 Jan 2012
Alta	i5-760 quad-core	ASUSTeK P7P55 LX	Win-7/64				+3.344	+3.372
Bacchus	AMD 266			-74.237
Bacchus	Pentium III 550 MHz	Gigabyte Intel 440ZX/BX	Win 2000		-5.629	-5.039	-5.124	-6.846
Feenix	Pentium 4 1.9 GHz	Dell 4400	Win XP			-4.315	-4.891	-5.814
Gemini	AMD-64 X2 4400	ASUS A8N SLI deluxe	Vista SP2	-15.240	3.113	2.085	+3.252	+24.758
Hermes	Pentium III 1 GHz			-95.345	-92.276
Hydra	AMD64 3200+	Compaq SR1619UK	Win-7/64	13.395	12.958	10.435	+4.806	+4.606
Narvik	Intel E6600	Dell 9216	Win XP		11.077	11.289	+12.473	+13.213
Odin				-8.022
Stamsund	Pentium 4 2.8 GHz HT	ASUS P4P800 deluxe	Win-7/32	-11.822	-11.354	-9.136	+3.327	+3.267
Ystad	Atom N455 1.7 GHz HT	Samsung N150P	Win-7/32					+1.452

Leap-Second - 2012-June-30

Leap-Second - 2008-December-31

Other resources

False Leap-Second - 2006-July-01

Screen-shot of the July 2006 transient on my PCs

NTP Leap-Second Behaviour - 2005-2006

What happened

Conclusions

Screen-shot of the January 2006 transient on my four PCs

NTP Glitch - December 2005

What happened

The start of the problem - 2005, Friday 2nd December

Curing NTP instability on a Windows 2000 system

Typical Drift Values

Related Links