 Put away your magnifying glass and scalpel - all you'll need for dissecting your web server log files is your computer, a text editor and a little bit of analytical thought. In this article, we're going to take a look at the components of a web server log and discuss how analytics packages use these fields to provide meaningful data to you.
We'll be concentrating on two of the most popular web servers: Microsoft's Internet Information Server (a.k.a. IIS) and the open source web server Apache. Both of these web servers enable data to be served up to a user through an Internet browser. As a user browses a web site, most of their actions are logged to a file which is kept on the web server. These logs can then be fed into analytics packages like ClickTracks for analysis.
Web Log Fields
A web server doesn't discriminate—it logs field information whether your analytics package needs them or not. Let's take a look at the most common log file fields and explore what each field is used for.
-
Date and Time
-
Client IP Address
-
HTTP Method
-
Requested file and Query string
-
User Agent
-
Referrer
-
Status code
-
Cookie (preferable, but not required)
-
Virtual server name (required only for multi-domain logs)
First we'll dissect the IIS web server log file since its layout is the simplest.
I've loaded a sample IIS log into a simple text editor (something like Notepad or Wordpad would both work well). The image below is what you can expect to see in an IIS log file. IIS files are a bit easier to read—since they typically provides a header, all you have to do is line up the header column with its corresponding value.
Click image to view full size with nomenclature
Now let's look at the slightly more complicated Apache log file format.
Apache log files are a bit trickier to parse, because there's no header line in the file. Compare your own Apache log to the diagram below to get an idea of what's what.
Click image to view full size with nomenclature
Once you have a clear understanding of what fields your web server is logging and what those field results look like in IIS and Apache, let's get to the real question: What does it all mean? Let's examine the most important log file fields, one by one.
Date and Time
This is the field that stores the date and time a particular object (like an image) was requested. This field is crucial to building a visitor session and finding metrics like Time on Site.
Client IP Address
This is the IP address of the machine that accessed your web site. Although IP addresses aren't necessarily unique to any one visitor (as most visitors surf the web via a dynamic IP address provided by their ISP and not their own dedicated static IP and pipe), the IP address can still be useful in partitioning the log file into visitor sessions.
You may also notice a Server IP field in your log files. This is the field that logs the IP address of the web server machine that served a particular web site. This field is interchangeable with the Virtual Host field (see below for definition of virtual host).
HTTP Method
This is the field that stores the way that the web site was accessed. There are several possible values for the HTTP Method, but the two most common are GET and POST. This field isn't logged as a separate field in some flavors of web servers. For example, IIS logs it as a separate field but Apache combines the request and the HTTP Method into one field.
Requested file and query string: This is the field that logs the object (file) that's being requested. In many cases, the object doesn't get requested by itself—in fact, it's very common for an object to be requested with query parameters appended to it. These parameters are logged as a part of the Query String.
Once again, depending on your particular web server, these two fields can be separate or together. For example, in IIS these two fields are logged separately and in Apache they're logged together in one field.
The Query String is a very important (and sometimes mandatory) parameter. Not only is this the field that stores the URL parameters on your site, making it possible to track dynamic sites, but it also stores the tracking parameters that you use in your PPC landing URLs. Tracking parameters are crucial to distinguishing between organic vs. PPC traffic.
User Agent
The user agent is also known as the client signature�and nope, this isn't the visitor's John Hancock! This is the field that logs the browser signature of the client that accesses a web site. For example, Netscape and Firefox browsers will have the string "Mozilla" in their User Agents. Internet Explorer browsers will have the string "IE" in their User Agents. Robots and spiders also have their own user agent signatures: Google's spider will have the string "googlebot" in its signature.
Referrer
This is the field that logs the web page from which the visitor arrived. The referrer field can show you search engines, affiliates and even advertisements. But that's not all—we also can discover the keyword that was used when the visitor searched in a search engine and came across your site.
The referrer field is also very important in building a visitor session. It's almost impossible to build an accurate visitor session if we don't know where the visitor came from�the referrer field lets you, in essence, 'follow' a visitor from one page to the next.
Status code
This is the field that stores whether the requested action was successful or not. There are several possible values for this field; here are a few of the more common:
-
200 level = Status ok. Action completed successfully
-
300 level = Upon requesting a particular file, the visitor is redirected to request another file
-
400 level = File not found error
Cookie (preferable, but not required)
This field is optional but very beneficial. If you place cookies on your visitors' machines, the cookies will be logged in this field, making them available to be used in your reporting. The presence of a persistent cookie lets you accurately track unique and return visitors. Persistent cookies can also help in tracking your latent conversions from your ads. Plus, with the introduction of cookies, you're now able to store additional information that isn't available through standard web server logs, and then report on this information later. The moral of the story? Just say yes to cookies.
Virtual server name (required only for multi-domain logs)
Typically, each web site will have its own set of log files. A multi-domain log file is a log file where the requests for multiple web sites are logged to one log file. So, in this case, there has to be something in every log file line that ties it to a particular web site. The Virtual Host field does this, and makes it possible to accurately filter the log file entries by web site.
What if one of my fields is missing? How do I make all this happen?
After reading this article, you may notice that some of the fields we described aren't showing up in your log files. In that case, you just need to turn the fields 'on' by making changes to your web server. Depending on your setup, you may need to contact your hosting company to get this done or you can make changes yourself if you have control of the server.
Apache: Look for a file called httpd.conf. Open this file for editing and look for the section on logging. You'll notice a logging string that corresponds to the format you see in the log file. If you use the following string, it should record and report on all the important fields.
LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{cookie}i\"" combined
Then save the file, and restart Apache.
IIS: Tweaks to IIS log file formats need to be made in the Internet Services Manager which is typically found in your Windows Control Panel ' Administrative Tools.
Right click on the web site in question, and select properties. Then move to the section on logging. You can simply check or uncheck the fields that you need. Then, just like with Apache, restart the IIS service to apply the changes.
Happy tracking!
|