Opened 6 years ago

Last modified 6 months ago

#1450 new defect

TCP connections can cause process to hang

Reported by: sjamaan Owned by:
Priority: major Milestone: someday
Component: core libraries Version: 4.13.0
Keywords: tcp, spiffy, sockets, hang, scheduler Cc: Jim Ursetto, felix winkelmann, Mario Domenech Goulart
Estimated difficulty: hard

Description

As reported by Jim Ursetto, his Spiffy server (which is running on localhost behind a NGINX proxy) will stop responding at some point. It is currently not 100% clear when exactly this happens.

As Jim says:

I believe I’ve tracked down the problem, although not the solution yet. It seems the file descriptor table is filled up with half open sockets. lsof shows:


chickadee 13361 jim 1019u sock 0,7 0t0 2365037 can't identify protocol
chickadee 13361 jim 1020u sock 0,7 0t0 2366414 can't identify protocol
chickadee 13361 jim 1021u sock 0,7 0t0 2368047 can't identify protocol
chickadee 13361 jim 1022u sock 0,7 0t0 2368343 can't identify protocol

And this message on Linux seems to occur when sockets are half-open (or half-closed, if you are a pessimist).

And later:

my hunch is that this happens when the connecting side hangs up while we are still sending. Which is probably obvious from my previous description of the problem. I was going to start attacking it by inserting strategic sleeps at various locations to try and manually trigger it, although you probably have a better way. Obviously you do appear to catch all errors in spiffy, so I’m not sure if one of those error handlers is neglecting to close a socket, if the socket close fails and isn’t reported, or if this is a deeper bug inside the tcp unit.

I hear faint echoes of #340...

Change History (6)

comment:1 Changed 6 years ago by Kooda

I’ve investigated this bug a little, and it seems to be caused when reading or writing to the socket causes a broken pipe exception. The sockets seem to stay half-open even after the ports have been closed.

Forcing the socket closing with file-close (in spiffy) makes the issue disappear.

I’ll have to dig a bit more, I don’t know what the POSIX spec says about the state of sockets in this case.

comment:2 Changed 6 years ago by Kooda

Milestone: 4.14.05.1

comment:3 Changed 5 years ago by sjamaan

Milestone: 5.15.2

Getting ready for 5.1, moving tickets which won't make it in to 5.2.

comment:4 Changed 5 years ago by felix winkelmann

Milestone: 5.25.3

comment:5 Changed 3 years ago by felix winkelmann

Milestone: 5.35.4

I think this is too obscure to be fixed any time soon, shifted to 5.4

comment:6 Changed 6 months ago by felix winkelmann

Milestone: 5.4someday
Note: See TracTickets for help on using tickets.