Advanced HTTPClient Info

Authorization

Authorization briefly described in Getting Started. In addition to the 'Basic' authorization scheme mentioned therein, HTTPClient will handle other schemes too. Both HTTPConnection.addAuthorization() and AuthorizationInfo.addAuthorization() can be used to give the necessary information - use the params to specify the necessary name/value pairs.

When confronted with an authorization request HTTPClient will query all known authorization info for a possible candidate (the match must be for the host, port, scheme and realm). If no suitable info is found, or if the server rejects any found info, an authorization handler is called to try and get the necessary info from the user; if the user does not give any information, or if the information he gives is also rejected, then the retrying is terminated and the last failure status returned to the caller. The default handler currently only understands requests for the 'Basic' authorization scheme; you may however set your own handler via the AuthorizationInfo.setAuthHandler() method. The handler given must implement the AuthorizationHandler interface, or it must be null.

A server may send multiple authorization challenges in the response, in which case the above algorithm is modified to go through the list of challenges in the same order as they were sent, trying to get authorization info for each challenge in the list and going to the next challenge if either no info was found or the server rejects that info. If the end of the list is reached without achieving authorization then the authorization handler is called on each challenge (in the same order) until either an authorization request is successful or the list is exhausted, in which case the response to the last failed request is returned.

Proxies

Support for proxies (including SOCKS) is fully implemented. However the use is subject to a number of security restrictions (see security for more information on the various security policies and the consequences that arise from them). If you are using an http proxy then use the HTTPConnection.setProxyServer() method. If you are using SOCKS then the method to use is HTTPConnection.setSocksServer(). Note that both can be set at the same time, in which case a request is sent via the SOCKS server to the proxy server, which in turn relays the request to the desired destination. See the package documentation for more info on these methods and the corresponding acls HTTPClient understands.

Persistent Connections (Keep-Alive's)

The Hypertext Transfer Protocol originally allowed only one request per TCP connection. However establishing a TCP connection is fairly expensive time wise, so that some implementors of HTTP/1.0 added so called Keep-Alive's to keep a connection open after a request was completed and to allow further requests to be made over that connection. Unfortunately this was not well defined and in a number of cases the implementation is broken. HTTP/1.1 defines persistent connections correctly and even makes them the default, but uses a different keyword.

The HTTPClient will by default try to do persistent connections for both HTTP/1.0 and HTTP/1.1 unless it's talking to a proxy server that is only HTTP/1.0 compliant. To disable persistent connections you can specify a Connection header with the value close. Example:

    NVPair[] def_hdrs = { new NVPair("Connection", "close") };
    con.setDefaultHeaders(def_hdrs);

This will disable persistent connections for all future request (unless overridden by a connection header on the request method call).

Keeping the connection to the server open after a request is fine as long as another request follows within a short period of time. However when you are done you should let the library know by passing the above Connection: close header with the last request. Furthermore, to limit the length of time the connection will be held open a timer is started after each request which will close the connection if no further requests arrive within the next 10 seconds.

Note that most of this is transparent as far as the functioning of the requests is concerned; the only differences you will notice is in the time required for a request to be sent. Also note that persistent connections are only done within the context of a given instance of HTTPConnection, that is if you create two instances both pointing at the same server then they will create separate connections to the server.

Pipelining

If the connection is kept open across request then the request may be pipelined. Pipelining here means that a new request is sent before the response to a previous request is received. It is obvious that this may speed up requests, so HTTPClient allows pipelining (at the expense of some extra code to keep track of the outstanding requests). To fully utilize this however the HTTPConnection must be switched to raw mode (see HTTPConnection.setRawMode()). The reason is that to handle redirections and authorization challenges the status code of the response must be read before returning from the Get(), Post(), etc, which of course implies that HTTPClient will wait for the response before returning. In raw mode on the other hand this whole handling of various return codes is skipped and control returned to the callee before the response is received. This however means that you will have to handle authorization requests and redirections yourself.

In spite of all the possible pipelining going on underneath the programming model still stays simple: for every request you send you get a reponse back which contains the headers and data of the servers response. Now with pipelining the fields in the reponse aren't necessarily filled yet (i.e. the actual response headers and data haven't been read off the net), but the first call to any method in the reponse (e.g. a getStatusCode()) will wait till the reponse has actually been read and parsed. Also any previous requests will be forced to read their responses if they have not already done so (so e.g. if you send two consecutive requests and receive responses r1 and r2, calling r2.getHeader("Content-type") will first read the complete response for r1, and then read the response for r2). All this should be completely transparent, except for the fact that a call to one response may sometimes take a second to return, while the same call to a different response will return immediately with the desired info.

One problem still left is that if an IOException occurs while reading the response for any request, then an IOException will not only be sent to that response but also to all other outstanding responses (i.e. for all requests later in the pipe). This will be fixed in the next version which will retry these outstanding requests.

Protocol Version

The HTTPClient will generate either HTTP/1.0 or HTTP/1.1 requests depending on what the server version is. The very first request is done using HTTP/1.1; if the server replies with an HTTP/1.0 or HTTP/0.9 response then the HTTPClient will switch to HTTP/1.0 for further requests; else it will stay with HTTP/1.1. HTTP/0.9 request are never generated. The main difference between HTTP/1.0 and HTTP/1.1 requests as far as the HTTPClient is concerned is in the use of tokens for persistent connections; most other differences are in headers not relevant for the HTTPClient.

HTTP Headers

All request methods accept optional headers to be sent with the request. Here are a list of possible request and response headers as defined in the HTTP/1.1 spec. I have added some comments to some of them, but for further info I recommend getting the specs (every header is described in a paragraph of its own in the spec, so you can read just the part that interests you and ignore the rest).

Request Headers

Response Headers

Further Reading

* General HTTP Info at W3C
* HTTP/1.0 Spec (RFC 1945)
* HTTP/1.1 Spec (RFC 2068)

[HTTPClient]


Ronald Tschalär / 23 March 1997 / ronald@innovation.ch.