Opened 7 years ago
Last modified 11 months ago
#1450 new defect
TCP connections can cause process to hang
Reported by: | sjamaan | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | someday |
Component: | core libraries | Version: | 4.13.0 |
Keywords: | tcp, spiffy, sockets, hang, scheduler | Cc: | Jim Ursetto, felix winkelmann, Mario Domenech Goulart |
Estimated difficulty: | hard |
Description
As reported by Jim Ursetto, his Spiffy server (which is running on localhost behind a NGINX proxy) will stop responding at some point. It is currently not 100% clear when exactly this happens.
As Jim says:
I believe I’ve tracked down the problem, although not the solution yet. It seems the file descriptor table is filled up with half open sockets. lsof shows:
…
chickadee 13361 jim 1019u sock 0,7 0t0 2365037 can't identify protocol
chickadee 13361 jim 1020u sock 0,7 0t0 2366414 can't identify protocol
chickadee 13361 jim 1021u sock 0,7 0t0 2368047 can't identify protocol
chickadee 13361 jim 1022u sock 0,7 0t0 2368343 can't identify protocol
And this message on Linux seems to occur when sockets are half-open (or half-closed, if you are a pessimist).
And later:
my hunch is that this happens when the connecting side hangs up while we are still sending. Which is probably obvious from my previous description of the problem. I was going to start attacking it by inserting strategic sleeps at various locations to try and manually trigger it, although you probably have a better way. Obviously you do appear to catch all errors in spiffy, so I’m not sure if one of those error handlers is neglecting to close a socket, if the socket close fails and isn’t reported, or if this is a deeper bug inside the tcp unit.
I hear faint echoes of #340...
Change History (6)
comment:1 Changed 6 years ago by
comment:2 Changed 6 years ago by
Milestone: | 4.14.0 → 5.1 |
---|
comment:3 Changed 5 years ago by
Milestone: | 5.1 → 5.2 |
---|
Getting ready for 5.1, moving tickets which won't make it in to 5.2.
comment:4 Changed 5 years ago by
Milestone: | 5.2 → 5.3 |
---|
comment:5 Changed 3 years ago by
Milestone: | 5.3 → 5.4 |
---|
I think this is too obscure to be fixed any time soon, shifted to 5.4
comment:6 Changed 11 months ago by
Milestone: | 5.4 → someday |
---|
I’ve investigated this bug a little, and it seems to be caused when reading or writing to the socket causes a broken pipe exception. The sockets seem to stay half-open even after the ports have been closed.
Forcing the socket closing with
file-close
(in spiffy) makes the issue disappear.I’ll have to dig a bit more, I don’t know what the POSIX spec says about the state of sockets in this case.