Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#1074 closed defect (fixed)

intarweb request parsing and Spiffy handling of said requests is inconsistent in case of improper request line URIs

Reported by: RvdH Owned by: sjamaan
Priority: major Milestone: someday
Component: unknown Version: 4.8.x
Keywords: bad-request connection Cc: andyjpb@…
Estimated difficulty:

Description

Not sure in which egg the actual bug is residing, but sometimes the server closes the connection instead of returning a 400: Bad Request

You can simulate a Bad Request by starting Spiffy and requesting http://127.0.0.1:8080//. Requesting a double slash returns a 400. (Whether that is a valid 400 or not is open for debate.)

But requesting the following URLs closes the connection, without returning anything:

" is actually %22, so I think the problem lies somewhere with handling %-characters.

Change History (17)

comment:1 Changed 11 years ago by sjamaan

Summary: [intarweb/spiffy] Connection closed on "Bad Request" with %-characteruri-generic: percent-encodings in the authority section cause problems

Congrats for finding a weird corner case which uri-generic doesn't handle :)

Here's one example: (uri-reference "//%20") causes the error: Error: (char-numeric?) bad argument type - not a character: (#\% #\2 #\0)

I'll have a look at fixing it.

Spiffy seems correct, though: it should return a 400 because these weird URIs seem to cause the browser to send invalid request lines like GET //%20/ HTTP/1.0, which violates the production Request-URI = "*" | absoluteURI | abs_path | authority. (this is a protocol-relative URI reference, which is valid as neither an absolute URI, an abs_path component nor an authority)

You should probably send a bugreport to the vendors of browsers which generate these request lines, too ;)

comment:2 Changed 11 years ago by sjamaan

I've fixed the one problem in uri-generic (trunk), but I'm unsure how to proceed now.

Spiffy still disconnects when receiving 3 of the 5 headers after this patch. The reason is, like I said, the browser is generating invalid URIs. (the /%20/ case is a valid URI reference, it's just not a valid reference type to use in this context; all the others really cause the browser to produce invalid URI syntax like /%, /%%22 and //%%20/)

The problem is, the HTTP protocol to use is determined from the request line. However, the request line itself can't even be properly parsed (because the URI cannot be parsed), so we're properly speaking not even using the protocol, so we can't signal an in-protocol error (400 bad request).

I COULD change the intarweb code to try to parse, say METHOD [random cruft] HTTP/1.0 to a request object with no URI, and let spiffy detect this and return a 400 (since obviously the client is trying to speak HTTP/1.0; it just fails to send a 100% correct request line). However, strictly speaking that's not correct, and all code that uses intarweb will now need to be able to detect this invalid half-parsed state with an incomplete object being returned.

If I don't change it to do that, for consistency I probably should change it to reject the invalid URIs outright by refusing to recognise these URIs on the request line, so all the examples above produce a disconnection.

I'm really sitting on the fence here. Both options seem problematic: the one is incorrect (sending a HTTP response of "400 bad request" on what is essentially a non-HTTP request), and the other is just unfriendly (abruptly disconnecting when receiving a 99% correct "HTTP request"). What do you think?

comment:3 Changed 11 years ago by sjamaan

Summary: uri-generic: percent-encodings in the authority section cause problemsintarweb request parsing and Spiffy handling of said requests is invalid in case of improper request line URIs

comment:4 Changed 11 years ago by sjamaan

Summary: intarweb request parsing and Spiffy handling of said requests is invalid in case of improper request line URIsintarweb request parsing and Spiffy handling of said requests is inconsistent in case of improper request line URIs

comment:5 Changed 11 years ago by RvdH

First of all, I think the server should always respond with a status code. Disconnecting is (as you mentioned as well) not really nice.

W3C says (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html) the following on 400 Bad Request:

The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.

If the URI is not a URI, you could say it is malformed syntax, so returning a 400 is probably correct.

If you do not like the idea of returning a 400, you could instead return a 500 (Internal Server Error) for every exception/error that appears during parsing.

comment:6 Changed 11 years ago by andyjpb

I've been asked for my opinion on this and I've mulled this over somewhat.

I was initially in favour of eagerly trying to return a 400 error (in spiffy itself) if we could sense that they might be trying to speak HTTP but couldn't parse a valid URL. My reasons for this were twofold. It can be somewhat "expensive" to set up an HTTP connection, especially with SSL and it's nice to get an error message when things go wrong.

However, errors like that are not the common case so the "performance" advantage doesn't hold up to scrutiny. Neither does trying to send the error message. The protocol only works provided both sides stay in sync. If we think they're speaking HTTP and they're not, either deliberately or due to a mistake on their part, we should finish the connection up as quickly as possible and move on.

Moreover, on another unrelated (apache) server, I happen to have recently been seeing lots of requests like so in the access log:

"POST //%63%67%69%2d%62%69%6e/%70%68%70?%2d%64+%61%6c%6c%6f%77%5f%75%72%6c%5f%69%6e%63%6c%75%64%65%3d%6f%6e+%2d%64+%73%61%66%65%5f%6d%6f%64%65%3d%6f%66%66+%2d%64+%73%75%68%6f%73%69%6e%2e%73%69%6d%75%6c%61%74%69%6f%6e%3d%6f%6e+%2d%64+%64%69%73%61%62%6c%65%5f%66%75%6e%63%74%69%6f%6e%73%3d%22%22+%2d%64+%6f%70%65%6e%5f%62%61%73%65%64%69%72%3d%6e%6f%6e%65+%2d%64+%61%75%74%6f%5f%70%72%65%70%65%6e%64%5f%66%69%6c%65%3d%70%68%70%3a%2f%2f%69%6e%70%75%74+%2d%64+%63%67%69%2e%66%6f%72%63%65%5f%72%65%64%69%72%65%63%74%3d%30+%2d%64+%63%67%69%2e%72%65%64%69%72%65%63%74%5f%73%74%61%74%75%73%5f%65%6e%76%3d%30+%2d%64+%61%75%74%6f%5f%70%72%65%70%65%6e%64%5f%66%69%6c%65%3d%70%68%70%3a%2f%2f%69%6e%70%75%74+%2d%6e HTTP/1.1" 404 406

as well as

"-" 408 0 

and

"GET  HTTP/1.1" 400 299

The last request seems to produce the following in the error log:

Invalid URI in request GET  HTTP/1.1

Apache seems to produce a response for each of these (404/408/400) but that doesn't mean that we should.

This is a known avenue for security exploits and trying to do the best with the connection just spreads the attack surface further up the stack. Also, in order to provide good, robust support for keep-alive connections, spiffy is already generally very conservative about what it allows to happen inside a connection whilst still allowing it to stay open upon conclusion of the request.

I think we do actually have a "free" choice here but I think it is sensible for us to come down on the side of closing the connection or risk causing complications in the future.

comment:7 in reply to:  5 Changed 11 years ago by andyjpb

Replying to RvdH:

First of all, I think the server should always respond with a status code. Disconnecting is (as you mentioned as well) not really nice.

W3C says (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html) the following on 400 Bad Request:

The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.

If the URI is not a URI, you could say it is malformed syntax, so returning a 400 is probably correct.

If you do not like the idea of returning a 400, you could instead return a 500 (Internal Server Error) for every exception/error that appears during parsing.

This is very much along the lines of my own thinking when I first read this ticket. However, we should strictly return a 4xx series error rather than a 5xx series error because the 4xx class is reserved for errors caused by the client and the 5xx class for errors caused by the server.
Moreover, the W3C spec isn't too detailed on what to do with respect to ensuring synchronization between both sides when the connection is coming up, although it does talk a bit about what to do when it's going down:

If the client is sending data, a server implementation using TCP SHOULD be careful to ensure that the client acknowledges receipt of the packet(s) containing the response, before the server closes the input connection. If the client continues sending data to the server after the close, the server's TCP stack will send a reset packet to the client, which may erase the client's unacknowledged input buffers before they can be read and interpreted by the HTTP application. 

The scenario outlined in this ticket is particularly tricky because the error occurs during the parsing of the very first line of the request and the result of that parsing is used to actually bring the HTTP connection itself up on top of the raw TCP connection.

It appeals to common sense that we'd return a 400 error here because that's what we'd do if we had trouble decoding some other part of the URI: for example the query string. However, there are much wider security and connection stability concerns at stake in this particular instance. On balance, I think we should do the safest thing, rather than the most elegant or most useful thing.

comment:8 Changed 11 years ago by andyjpb

Cc: andyjpb@… added

comment:9 Changed 11 years ago by sjamaan

On balance, I think we should do the safest thing, rather than the most elegant or most useful thing.

This.

By returning a HTTP response to some application that does not speak HTTP, you might be sending it off the deep end. The current case is just an edge case of trying to speak HTTP but failing.

It's funny: the security rule is the opposite of the IETF rule's (Postel's law): be conservative in what you accept, for the other party may be trying to trick you. Anyway, the browser is severely misbehaving here, which shouldn't happen in the first place. So, anything we decide to do is acceptable. The browser needs to be fixed not to send such bad requests.

comment:10 Changed 11 years ago by RvdH

This is very much along the lines of my own thinking when I first read this ticket. However, we should strictly return a 4xx series error rather than a 5xx series error because the 4xx class is reserved for errors caused by the client and the 5xx class for errors caused by the server.

What I meant was, if the server experiences any error , the server should return a 500. It is the most basic status code a server could send. Think about it: on any request an HTTP server receives, it is allowed to return a 500.

Of course this doesn't mean that we should return a 500 in this case. We know it's an error on the client side, so returning a 400 is better.

comment:11 Changed 11 years ago by RvdH

However, there are much wider security and connection stability concerns at stake in this particular instance.

Sorry, but I can think of none.

On balance, I think we should do the safest thing, rather than the most elegant or most useful thing.

It has nothing to do with safeness, elegance or usefulness. It has to do with conformance to a specification. For example, non-conformance to a specification is exactly what went wrong with all the different browsers trying to implement HTML.

If you claim to be a HTTP server, you implement the HTTP specification. It is that simple.

comment:12 Changed 11 years ago by RvdH

Modern browsers nowadays don't send bogus requests. But the point is: not all clients are browsers. Actually, more and more of the clients are non-browsers.

If you develop web services (i.e. server-to-server communication), it a extremely useful to have a web server that sends you correct status codes back, so you know what you did wrong as a developer.

That's why (claiming to be an HTTP server) it matters what you send back and that's why it matters that you adhere to the HTTP standard.

comment:13 in reply to:  11 Changed 11 years ago by sjamaan

Replying to RvdH:

However, there are much wider security and connection stability concerns at stake in this particular instance.

Sorry, but I can think of none.

That's irrelevant. An attacker will find them for you.

It's pretty obvious from the request log fragment Andy posted that there are attacks being attempted "in the wild", exactly in this dusty corner of the spec we're looking at. Disconnecting those assholes who are trying these things is better than trying to serve them a friendly response telling them what they were doing wrong so that they can tweak their code to jump through this hoop getting deeper down into the system to exploit. Like Andy said, it spreads the attack surface.

On balance, I think we should do the safest thing, rather than the most elegant or most useful thing.

It has nothing to do with safeness, elegance or usefulness.

I don't understand why you are so rigid about this. Elegance, usefulness and especially safety can trump conformance with a spec if there are good reasons. There are plenty of examples where specs got it wrong and are blatantly insecure. That's for example why browsers have stopped applying CSS styling to "visited" links. According to your logic, they should just keep violating their users' privacy because compliance with a spec is more important than safety.

It has to do with conformance to a specification. For example, non-conformance to a specification is exactly what went wrong with all the different browsers trying to implement HTML.

Actually, part of the HTML mess is self-inflicted by browsers trying to be so liberal in accepting cruft and trying to make the best of it. Because different browsers massage cruft into something different, some sites break in some browsers. It'd be much better if browsers refused to display malformed HTML, but the spec says you must be tolerant of errors.

And that leads to fun things like deviations in attribute quotation and HTML tree rearrangement, making XSS attacks easier to accomplish and harder to filter out.

If you claim to be a HTTP server, you implement the HTTP specification. It is that simple.

If it were, there wouldn't be so many hacks and workarounds in Intarweb for broken servers.

But thanks for hardening my resolve in this matter. I've updated intarweb to refuse invalid URIs in the request line, so the above requests now consistently cause the connection to be dropped.

comment:14 in reply to:  12 ; Changed 11 years ago by sjamaan

Resolution: fixed
Status: newclosed

Replying to RvdH:

Modern browsers nowadays don't send bogus requests. But the point is: not all clients are browsers.

That's exactly the problem. You can't assume a well-behaved client. The worst kind of "people" are on the internet, actively trying to abuse software which does assumes all clients play by the rules.

Actually, more and more of the clients are non-browsers.

If you develop web services (i.e. server-to-server communication), it a extremely useful to have a web server that sends you correct status codes back, so you know what you did wrong as a developer.

I understand this. However, if a developer is exceeding request limits, he's doing something Very Wrong. Again, they can always contact the admin and ask what went wrong. Then the admin can decide whether to trust this person or not.

That's why (claiming to be an HTTP server) it matters what you send back and that's why it matters that you adhere to the HTTP standard.

But in this case it's the client who isn't adhering to that same standard: they're not even sending HTTP/1.x requests! So your logic is upside-down; these requests fall outside the spec, and the spec does not say anything about what to do with non-HTTP requests. We are still conforming to the RFC when we reject these requests as being non-HTTP by disconnecting.

comment:15 in reply to:  14 Changed 11 years ago by sjamaan

Replying to sjamaan:

Replying to RvdH:

Modern browsers nowadays don't send bogus requests.

That's exactly the problem. You can't assume a well-behaved client.

Besides, they do send bogus requests! Simply clicking any of those links above cause a bogus request to be sent, by any run-of-the-mill web browser.

Like I stated a few times before, those URIs you posted result in invalid requests that do not conform with the RFC's request line definition. That's the whole topic of this discussion.

comment:16 Changed 11 years ago by RvdH

But thanks for hardening my resolve in this matter. I've updated intarweb to refuse invalid URIs in the request line, so the above requests now consistently cause the connection to be dropped.

Good for you!

comment:17 in reply to:  16 Changed 11 years ago by sjamaan

Replying to RvdH:

But thanks for hardening my resolve in this matter. I've updated intarweb to refuse invalid URIs in the request line, so the above requests now consistently cause the connection to be dropped.

Good for you!

I did not mean that sarcastically or in a mean-spirited way. Your comments really made me think, and thanks to this discussion I was able to make up my mind in this direction. It just happened to be another direction than you were arguing for.

Note: See TracTickets for help on using tickets.