Unusable server idle connection leading to error and redispatch · Issue #2092 · haproxy/haproxy · GitHub

您所在的位置:网站首页 proxy_fcgi:error Unusable server idle connection leading to error and redispatch · Issue #2092 · haproxy/haproxy · GitHub

Unusable server idle connection leading to error and redispatch · Issue #2092 · haproxy/haproxy · GitHub

2023-03-31 22:33| 来源: 网络整理| 查看: 265

Detailed Description of the Problem

When using h2 frontend and h2 backend under low load, the proxy uses 1-2 connections to each backend and these connections can become idle. However, the 1 remaining idle backend connection appears to get into a state where the underlying TCP connection has closed but HAproxy keeps trying to use it and never attempts to create a new one despite every request failing (with flags SC iirc) or being redispatched. If all backends fall into this state, all requests will fail 503. The affected backends do not get put into down state though, as the http check on a secondary port remains working and using observe layer7 does not help. Testing with requests directly to the backend from the HAproxy host, it responds fine so it does not appear to be a connectivity issue (plus health checks are working).

Below is the stats UI when the proxy is in this state, you can see server_2 is redispatching everything to server_1. It reports 1 idle connection but the bytes in/out counters are not increasing over time as the sessions and redis counters are inrementing. Similar values are reported in show servers conn on the api.

haproxy-1 haproxy-2

Looking at socket stats with ss, the idle connection to server_2 does not exist, so it seems HAproxy is repeatedly trying to reuse a closed socket:

root@hostname:~# nsenter -t $(pidof -s haproxy) -n ss -tpa | grep 4252 ESTAB 0 0 [ipv6 of haproxy]:53446 [ipv6 of server_1]:4252 users:(("haproxy",pid=1474250,fd=33))

Observing tcpdump at both sides, I see zero traffic (connection attempts nor data) on the main port to server_2 but can see the health checks on the secondary port.

If I then do a set server ... state maint then set server ... state ready with the API on the affected backend server then the problem resolves.

Some environment specific details:

Running under docker and haproxy and the backends are within the same host and network h2 connections are carrying gRPC, which may have very long-lived requests Observed in multiple recent versions (at least 2.6 and 2.7) Potentially related to switching to IPv6 for connections to backends, but maybe was happening before. Can be tested reverting this if you think may be relevant?

Things I have tried:

Reducing various associated timeouts (server, http-keep-alive, server-fin etc). Even after the keep alive timeout should have triggered, the connection is not being closed Using check with observe layer7, this makes no difference Using redispatch. This works to workaround the failed backend but to recover the backend it must be cycled again manually and will not autorecover

I have tried searching for related issues here and in discourse; the only potentially related topic I could find was this one with no solution: https://discourse.haproxy.org/t/frequent-connection-retries-and-timeouts-with-grpc/5689/1

If there's any API commands you would like me to try when in this state please let me know

Expected Behavior

The idle connection should be closed when it has failed and a new one opened, without having to manually force the server into maintenance

Steps to Reproduce the Behavior

I have not yet found a way of determistically triggering this every time but it happens intermittently when both the backend and HAproxy are starting at the same time. Seems potentially race-condition'y as sometimes no servers are affected, often 1 is affected and sometimes multiple are.

Do you have any idea what may have caused this?

I see a comment below on the HAproxy site:

pool-purge-delay: Frequency at which HAProxy will close idle and unused connections. Only half of the idle connections will be closed after this period of time The default is 5 seconds.

If there is only a single idle connection, will this 'half' of 1 mean none get closed?

Alternatively, is the use of srvtcpka preventing the connection ever looking idle?

Do you have an idea how to solve the issue?

No response

What is your configuration? Minimal config of the affected frontend/backend: global maxconn 1000 log stdout local0 defaults log global mode http maxconn 1000 option contstats option log-separate-errors option redispatch # These values have been experimented with extensively and cannot resolve the problem timeout connect 10s timeout client 1d timeout server 1d timeout http-request 5s timeout http-keep-alive 30s timeout client-fin 10s timeout server-fin 10s option clitcpka clitcpka-idle 60s clitcpka-intvl 60s option srvtcpka srvtcpka-idle 60s srvtcpka-intvl 60s resolvers dns_resolver nameserver dns_1 127.0.0.11:53 accepted_payload_size 8192 hold valid 2s frontend fe_example bind :::42520 v6only proto h2 option httplog option logasap option forwardfor default_backend be_example backend be_example balance leastconn option httpchk http-check connect port 10480 http-check send meth GET uri / ver HTTP/1.1 http-check expect status 200 default-server proto h2 check observe layer7 resolvers dns_resolver init-addr none inter 3s fastinter 1s fall 3 rise 2 on-marked-down shutdown-sessions server-template server_ 5 the_backend_dns_name:4252 Output of haproxy -vv HAProxy version 2.7.5-8d23021 2023/03/17 - https://haproxy.org/ Status: stable branch - will stop receiving fixes around Q1 2024. Known bugs: http://www.haproxy.org/bugs/bugs-2.7.5.html Running on: Linux 5.10.0-21-cloud-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 Build options : TARGET = linux-glibc CPU = generic CC = cc CFLAGS = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_PROMEX=1 DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -PTHREAD_EMULATION -QUIC +RT +SHM_OPEN +SLZ -STATIC_PCRE -STATIC_PCRE2 Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=1). Built with OpenSSL version : OpenSSL 1.1.1n 15 Mar 2022 Running on OpenSSL version : OpenSSL 1.1.1n 15 Mar 2022 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 Built with Lua version : Lua 5.3.3 Built with the Prometheus exporter as a service Built with network namespace support. Support for malloc_trim() is enabled. Built with libslz for stateless compression. Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_F REEBIND Built with PCRE2 version : 10.32 2018-09-10 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Built with gcc compiler version 8.3.0 Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG : mode=HTTP side=FE|BE mux=H1 flags=HTX h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG : mode=TCP side=FE|BE mux=PASS flags= none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG Available services : prometheus-exporter Available filters : [BWLIM] bwlim-in [BWLIM] bwlim-out [CACHE] cache [COMP] compression [FCGI] fcgi-app [SPOE] spoe [TRACE] trace Last Outputs and Backtraces No crash Additional Information

No response



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3