Tuesday, December 25, 2012

HTTP protocol and URI fragments (#)

I've been working to create JMeter scripts for load testing an intranet web application and usually I tend to use the JMeter recording proxy, but as I was already familiar with the application and how it works and this was just another small addition to existing scripts, I just duplicated an existing HTTP Sampler and changed the path and some parameters/headers as needed. So I used Google Chrome developer tools to see what's the path I need to send the HTTP Request to and got:

So I copied the path part of the URI: "/message/1002#1003" and put it into my JMeter script, to generate the same HTTP request:

The thing is that when running this request from JMeter, I got a HTTP 500 error. I was going mad, this made no sense, running the same request with all needed headers/cookies from Chrome worked. I looked into the server's Apache access log and noted that when sending the request from JMeter for the path "/message/1002#1003", indeed there was a record for this path with a result of 500. The surprise was when I generated the same request from Google Chrome, I saw in the Apache logs that there is a request with a 200 result, but the path is "/message/1002", without the trailing "#1003"!!

That got me starting to think, why would a browser send the trailing part after the '#', it made no sense, as the application/web server don't care which part of the results we are looking for, the result will be the same with any given values after the '#' sign. It should be with the scope of the browser/client only and never sent to the HTTP server as part of the URI.

Looking at the Spec, (http://tools.ietf.org/html/rfc3986#section-3.5 - fifth paragraph) I found the approval to what I was looking for:
Fragment identifiers have a special role in information retrieval systems as the primary form of client-side indirect referencing, allowing an author to specifically identify aspects of an existing resource that are only indirectly provided by the resource owner. As such, the fragment identifier is not used in the scheme-specific processing of a URI; instead, the fragment identifier is separated from the rest of the URI prior to a dereference, and thus the identifying information within the fragment itself is dereferenced solely by the user agent, regardless of the URI scheme. Although this separate handling is often perceived to be a loss of information, particularly for accurate redirection of references as resources move over time, it also serves to prevent information providers from denying reference authors the right to refer to information within a resource selectively. Indirect referencing also provides additional flexibility and extensibility to systems that use URIs, as new media types are easier to define and deploy than new schemes of identification.
For conclusion - now I know that:

  1. Google Chrome Dev tools doesn't show the real Request URL (while others like HTTP Watch do).
  2. Apache JMeter will sometimes send the values after the '#' sign in the path (HC3.1 and Java implementations will skip it, but HC4 will send it).
  3. The name for this '#' sign when speaking about the HTTP Protocol is "URI fragment".
  4. In any way - during future testing, it will make no sense to include URI fragments as part of the path, as they are only handled by the client / browser and in JMeter it make no sense.