DNS resolving issue

1 post / 0 new
#1 Sun, 2017-02-19 13:43
moshemorad
  • moshemorad's picture
  • Offline
  • Last seen: 9 months 1 week ago
  • Joined: 2017-02-19

Hi,

We are running alpine (3.4) in a docker container over a Kubernetes cluster (GCP).

We have been seeing some anomalies where our thread is stuck for 2.5 sec.
After some research using strace we saw that DNS resolving gets timed-out once in a while.

Here are some examples:

23:18:27 recvfrom(5, "\f\361\201\203\0\1\0\0\0\1\0\0\2db\6devone\5*****\3net\3svc\7cluster\5local\0\0\1\0\1\7cluster\5local\0\0\6\0\1\0\0\0<\0D\2ns\3dns\7cluster\5local\0\nhostmaster\7cluster\5local\0X\243\213\360\0\0p\200\0\0\34 \0\t:\200\0\0\0<", 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.3.240.10")}, [16]) = 148 <0.000045>
23:18:27 recvfrom(5, 0x7ffdd0e1fb90, 512, 0, 0x7ffdd0e1f640, 0x7ffdd0e1f61c) = -1 EAGAIN (Resource temporarily unavailable) <0.000014>
23:18:27 clock_gettime(CLOCK_REALTIME, {1487114307, 714908396}) = 0 <0.000015>
23:18:27 poll([{fd=5, events=POLLIN}], 1, 2499) = 0 (Timeout) <2.502024>

09:04:27 recvfrom(5<UDP:[0.0.0.0:36148]>, "\354\211\201\203\0\1\0\0\0\1\0\0\2db\6devone\5*****\3net\3svc\7cluster\5local\0\0\1\0\1\7cluster\5local\0\0\6\0\1\0\0\0<\0D\2ns\3dns\7cluster\5local\0\nhostmaster\7cluster\5local\0X\244\30\220\0\0p\200\0\0\34 \0\t:\200\0\0\0<", 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.3.240.10")}, [16]) = 148 <0.000041>
09:04:27 recvfrom(5<UDP:[0.0.0.0:36148]>, 0x7ffec3d9b0b0, 512, 0, 0x7ffec3d9ab60, 0x7ffec3d9ab3c) = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
09:04:27 clock_gettime(CLOCK_REALTIME, {1487149467, 555317749}) = 0 <0.000008>
09:04:27 poll([{fd=5<UDP:[0.0.0.0:36148]>, events=POLLIN}], 1, 2498) = 0 (Timeout) <2.499671>

09:18:47 recvfrom(5<UDP:[0.0.0.0:47282]>, " B\201\200\0\1\0\1\0\0\0\0\2db\6devone\5*****\3net\0\0\1\0\1\300\f\0\1\0\1\0\0\0\200\0\4h\307\16N", 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.3.240.10")}, [16]) = 53 <0.000011>
09:18:47 recvfrom(5<UDP:[0.0.0.0:47282]>, 0x7ffdd0e1fb90, 512, 0, 0x7ffdd0e1f640, 0x7ffdd0e1f61c) = -1 EAGAIN (Resource temporarily unavailable) <0.000008>
09:18:47 clock_gettime(CLOCK_REALTIME, {1487150327, 679292144}) = 0 <0.000005>
09:18:47 poll([{fd=5<UDP:[0.0.0.0:47282]>, events=POLLIN}], 1, 2497) = 0 (Timeout) <2.498797>

And a good example:

08:22:25 recvfrom(5<UDP:[0.0.0.0:59162]>, "\20j\201\203\0\1\0\0\0\1\0\0\2db\6devone\5*****\3net\3svc\7cluster\5local\0\0\34\0\1\7cluster\5local\0\0\6\0\1\0\0\0<\0D\2ns\3dns\7cluster\5local\0\nhostmaster\7cluster\5local\0X\244\n\200\0\0p\200\0\0\34 \0\t:\200\0\0\0<", 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.3.240.10")}, [16]) = 148 <0.000014>
08:22:25 recvfrom(5<UDP:[0.0.0.0:59162]>, 0x7ffec3d9aeb0, 512, 0, 0x7ffec3d9ab60, 0x7ffec3d9ab3c) = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
08:22:25 clock_gettime(CLOCK_REALTIME, {1487146945, 638264715}) = 0 <0.000010>
08:22:25 poll([{fd=5<UDP:[0.0.0.0:59162]>, events=POLLIN}], 1, 2498) = 1 ([{fd=5, revents=POLLIN}]) <0.000010>

In the past we already had some issues with DNS resolving in older an version(3.3), which have been resolved since we moved to 3.4 (or so we thought).

Is this a known issue?
Does anybody have a solution / workaround / suggestion what to do?

Thanks a lot.