Monthly Archives: 八月 2018

NTP 的一些问题

这篇文章写的

ntp 状态及含义

ntp 又叫 Network Protocol Protocol,是一种通过网络来同步时间的协议。

当在linux里维护自己的NTP server时,通过ntpq命令可以查看当前ntp server的状态。

而linux ntp客户端可以通过ntptime来查看当前ntp的同步情况,通过ntpdate来手动同步时间。

ntpdate

强制同步时间

  1. 如果有ntpd,需要先关闭,再同步
service ntp stop
ntpdate time.nist.gov
service ntp start
  1. 或者是加上 -u 参数
ntpdate -u time.nist.gov
  1. 或者是手动同步
date -u --set='2016-09-20 08:14:17.427319'

ntpq

ntpq -pn
root@owning:~# ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+193.228.143.14  192.36.143.151   2 u  142  256  317  247.201   20.400  15.036
-202.112.10.37   10.3.8.150       5 u  177  256  373   49.110    5.891   1.191
+61.216.153.106  118.163.81.63    3 u  151  256  377   53.737   -4.461   1.382
*120.25.108.11   10.137.53.7      2 u  215  256  377    7.486    7.879   1.613
-91.189.89.198   17.253.34.125    2 u  215  256  377  402.811  -35.993   9.119
root@owning:~#

输出含义

  • remote and refid: remote NTP server, and its NTP server
  • st: stratum of server
    # from https://serverfault.com/questions/277375/ntpdate-d-server-dropped-strata-too-high
    
    NTP increases the stratum for each level in the hierarchy - a NTP server pulling time from a "stratum 1" server would advertise itself as "stratum 2" to its clients.
    
    A stratum value of "16" is reserved for unsynchronized servers meaning that your internal NTP server at 192.168.92.82 thinks not to have a reliable timesource (i.e. not synchronizing to a higher-level stratum server).
    
  • t: type of server (local, unicast, multicast, or broadcast)

  • poll: how frequently to query server (in seconds)
  • when: how long since last poll (in seconds)
  • reach: octal bitmask of success or failure of last 8 queries (left-shifted); 377 = 11111111 = all recent queries were successful; 257 = 10101111 = 4 most recent were successful, 5 and 7 failed

    最后八次查询(每次查询间隔为poll)的结果。

    这个值非常的 tricky,用了八位的bit,先转换成八进制,再以字符串的方式显示出来。

    00 000 000 -> 0
    00 000 001 -> 1
    00 000 011 -> 3
    00 000 111 -> 7
    00 001 111 -> 17
    00 011 111 -> 37
    00 111 111 -> 77
    01 111 111 -> 177
    11 111 111 -> 377
    
    10 101 111 -> 257
    

    测试发现,当ntp server重启后,值变为 17(即最近四次的query,除去刚启动时的一次,说明启动后间隔poll的三次query都成功),这时remote会加上*。下面是示例。

    # 第三次,reach 是 7。
    Wed Aug 15 11:56:02 CST 2018
     remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
    202.108.6.95    10.6.63.22       2 u   65   64    7   24.777  -15.577   0.082
    
    # 第四次,reach 是 17。
    Wed Aug 15 11:56:03 CST 2018
     remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
    *202.108.6.95    10.6.63.22       2 u    -   64   17   24.327  -15.328   0.209
    
  • delay: network round trip time (in milliseconds)
  • offset: difference between local clock and remote clock (in milliseconds)
  • jitter: difference of successive time values from server (high jitter could be due to an unstable clock or, more likely, poor network performance)

参考

remote 前的符号是什么意思?

NTP源码 查看,是根据查询上游服务器的结果的返回值来决定展示什么符号。

char flash3[] = " x.-+#*o"; /* flash decode for peer status version 3 */

所以得查 RFC1305,才能知道每个符号的具体含义:


' ' 0, rejected 'x' 1, passed sanity checks (tests 1 through 8 in Section 3.4.3) '.' 2, passed correctness checks (intersection algorithm in Section 4.2.1) '-' 3, passed candidate checks (if limit check implemented) '+' 4, passed outlyer checks (clustering algorithm in Section 4.2.2) '#' 5, current synchronization source; max distance exceeded (if limit check implemented) '*' 6, current synchronization source; max distance okay 只有ntp server的remote(upstream)为完全同步时,其它ntp client才能从这个server同步时间。 'o' 7, reserved

参考

from ntp source code

    case MODE_CLIENT:
        if (ISREFCLOCKADR(&srcadr))
            type = 'l'; /* local refclock*/
        else if (SOCK_UNSPEC(&srcadr))
            type = 'p'; /* pool */
        else if (IS_MCAST(&srcadr))
            type = 'a'; /* manycastclient */
        else
            type = 'u'; /* unicast */
        break;

ntptime

[root@node01 ~]# ntptime
ntp_gettime() returns code 5 (ERROR)
  time de9e7743.14f3881c  Thu, May 10 2018 15:46:11.081, (.081841975),
  maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
ntp_adjtime() returns code 5 (ERROR)
  modes 0x0 (),
  offset 0.000 us, frequency -8.823 ppm, interval 1 s,
  maximum error 16000000 us, estimated error 16000000 us,
  status 0x2041 (PLL,UNSYNC,NANO),
  time constant 9, precision 0.001 us, tolerance 500 ppm,
[root@node01 ~]#

如果系统起了 ntpd 服务,手动同步主机时间(date -u --set XXX)后,ntptime 可能会如上错误。

其它

timedatectl

root@owning:~# timedatectl status
      Local time: Thu 2018-05-17 11:59:14 CST
  Universal time: Thu 2018-05-17 03:59:14 UTC
        RTC time: Thu 2018-05-17 03:59:12
        Timezone: Asia/Shanghai (CST, +0800)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: n/a
root@owning:~#

问题及排查

问题:ntpdate 同步,报 no server suitable for synchronization found

ntp client 从 ntp server 同步,报错 no server suitable for synchronization found

[root@client ~]# ntpdate -dv -b 192.168.2.61
15 Aug 11:38:35 ntpdate[3123]: ntpdate 4.2.6p5@1.2349-o Fri Jan 26 02:18:05 UTC 2018 (1)
Looking for host 192.168.2.61 and service ntp
host found : bigtable-01
transmit(192.168.2.61)
receive(192.168.2.61)
transmit(192.168.2.61)
receive(192.168.2.61)
transmit(192.168.2.61)
receive(192.168.2.61)
transmit(192.168.2.61)
receive(192.168.2.61)
192.168.2.61: Server dropped: strata too high
server 192.168.2.61, port 123
stratum 16, precision -22, leap 11, trust 000
refid [192.168.2.61], delay 0.02580, dispersion 0.00003
transmitted 4, in filter 4
reference time:    00000000.00000000  Mon, Jan  1 1900  8:05:43.000
originate timestamp: df1e1ebb.39c5652e  Wed, Aug 15 2018 11:38:35.225
transmit timestamp:  df1e1ebb.424aafc4  Wed, Aug 15 2018 11:38:35.258
filter delay:  0.02583  0.02591  0.02580  0.02589
         0.00000  0.00000  0.00000  0.00000
filter offset: -0.03335 -0.03343 -0.03338 -0.03343
         0.000000 0.000000 0.000000 0.000000
delay 0.02580, dispersion 0.00003
offset -0.033386

15 Aug 11:38:35 ntpdate[3123]: no server suitable for synchronization found
[root@client ~]#

从这段输出,stratum 为 16,16 是系统预留值,表明 ntp server 还未完全同步。

192.168.2.61: Server dropped: strata too high
server 192.168.2.61, port 123
stratum 16, precision -22, leap 11, trust 000

在 ntp server 上执行 ntpq -pn,也能发现,ntp server 和 upstream 的时间比较接近,但还没有完全同步(remote 前面没有 * 的标记)。

# ntp server 上检查
[root@server ~]# ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 202.108.6.95    10.6.63.22       2 u   56   64    7   24.749  -23.655   0.091
[root@server ~]#

ntp server 在启动后,需要等几个poll周期(1+3个周期,可以根据reach字段来判断。)之后,才会认为和 upstream 已同步。

[root@server ~]# ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*202.108.6.95    10.6.63.22       2 u   43   64   17   24.478  -23.552   0.092
[root@server ~]#

等 ntp server 同步后,ntp client 重试下,就能同步了。

[root@client ~]# ntpdate -dv -b 192.168.2.61
15 Aug 11:51:56 ntpdate[6619]: ntpdate 4.2.6p5@1.2349-o Fri Jan 26 02:18:05 UTC 2018 (1)
Looking for host 192.168.2.61 and service ntp
host found : bigtable-01
transmit(192.168.2.61)
receive(192.168.2.61)
transmit(192.168.2.61)
receive(192.168.2.61)
transmit(192.168.2.61)
receive(192.168.2.61)
transmit(192.168.2.61)
receive(192.168.2.61)
server 192.168.2.61, port 123
stratum 3, precision -22, leap 00, trust 000
refid [192.168.2.61], delay 0.02580, dispersion 0.00003
transmitted 4, in filter 4
reference time:    df1e21ae.f96502d4  Wed, Aug 15 2018 11:51:10.974
originate timestamp: df1e21dc.2d33be99  Wed, Aug 15 2018 11:51:56.176
transmit timestamp:  df1e21dc.28b5ea58  Wed, Aug 15 2018 11:51:56.159
filter delay:  0.02588  0.02592  0.02580  0.02589
         0.00000  0.00000  0.00000  0.00000
filter offset: 0.017501 0.017394 0.017443 0.017392
         0.000000 0.000000 0.000000 0.000000
delay 0.02580, dispersion 0.00003
offset 0.017443

15 Aug 11:51:56 ntpdate[6619]: step time server 192.168.2.61 offset 0.017443 sec
[root@client ~]#

如果出于网络原因,ntp server就是没法和upstream去做同步,但内网又需要一个ntp server,那么可以
将ntp server本机设为一个稍微不那么可靠的ntp server。在 /etc/ntp.conf 加如下配置:

server 127.127.1.0
fudge 127.127.1.0 stratum 8

再重启 ntpd 即可。

Reference clock type 1 is a computer’s internal clock. It should be
used only if an NTP server should (continue to) serve time when it
(temporarily or permanently) has no real reference clock available,
and should always be fudged to high stratum.

参考