Dissecting Log Files PDF Print E-mail
Web Analytics
Written by Lyris HQ Staff Writer   
Friday, 02 May 2008
Dissecting Log FilesPut away your magnifying glass and scalpel - all you'll need for dissecting your web server log files is your computer, a text editor and a little bit of analytical thought. In this article, we're going to take a look at the components of a web server log and discuss how analytics packages use these fields to provide meaningful data to you.

We'll be concentrating on two of the most popular web servers: Microsoft's Internet Information Server (a.k.a. IIS) and the open source web server Apache. Both of these web servers enable data to be served up to a user through an Internet browser. As a user browses a web site, most of their actions are logged to a file which is kept on the web server. These logs can then be fed into analytics packages like ClickTracks for analysis.

Web Log Fields


A web server doesn't discriminate—it logs field information whether your analytics package needs them or not. Let's take a look at the most common log file fields and explore what each field is used for.

  • Date and Time
  • Client IP Address
  • HTTP Method
  • Requested file and Query string
  • User Agent
  • Referrer
  • Status code
  • Cookie (preferable, but not required)
  • Virtual server name (required only for multi-domain logs)


First we'll dissect the IIS web server log file since its layout is the simplest.

I've loaded a sample IIS log into a simple text editor (something like Notepad or Wordpad would both work well). The image below is what you can expect to see in an IIS log file. IIS files are a bit easier to read—since they typically provides a header, all you have to do is line up the header column with its corresponding value.


IIS Log Format









                                       Click image to view full size with nomenclature

Now let's look at the slightly more complicated Apache log file format.
Apache log files are a bit trickier to parse, because there's no header line in the file. Compare your own Apache log to the diagram below to get an idea of what's what.


Apache Log Format








                                       Click image to view full size with nomenclature

Once you have a clear understanding of what fields your web server is logging and what those field results look like in IIS and Apache, let's get to the real question: What does it all mean? Let's examine the most important log file fields, one by one.

Date and Time


This is the field that stores the date and time a particular object (like an image) was requested. This field is crucial to building a visitor session and finding metrics like Time on Site.

Client IP Address


This is the IP address of the machine that accessed your web site. Although IP addresses aren't necessarily unique to any one visitor (as most visitors surf the web via a dynamic IP address provided by their ISP and not their own dedicated static IP and pipe), the IP address can still be useful in partitioning the log file into visitor sessions.

You may also notice a Server IP field in your log files. This is the field that logs the IP address of the web server machine that served a particular web site. This field is interchangeable with the Virtual Host field (see below for definition of virtual host).

HTTP Method


This is the field that stores the way that the web site was accessed. There are several possible values for the HTTP Method, but the two most common are GET and POST. This field isn't logged as a separate field in some flavors of web servers. For example, IIS logs it as a separate field but Apache combines the request and the HTTP Method into one field.

Requested file and query string: This is the field that logs the object (file) that's being requested. In many cases, the object doesn't get requested by itself—in fact, it's very common for an object to be requested with query parameters appended to it. These parameters are logged as a part of the Query String.

Once again, depending on your particular web server, these two fields can be separate or together. For example, in IIS these two fields are logged separately and in Apache they're logged together in one field.

The Query String is a very important (and sometimes mandatory) parameter. Not only is this the field that stores the URL parameters on your site, making it possible to track dynamic sites, but it also stores the tracking parameters that you use in your PPC landing URLs. Tracking parameters are crucial to distinguishing between organic vs. PPC traffic.

User Agent


The user agent is also known as the client signature�and nope, this isn't the visitor's John Hancock! This is the field that logs the browser signature of the client that accesses a web site. For example, Netscape and Firefox browsers will have the string "Mozilla" in their User Agents. Internet Explorer browsers will have the string "IE" in their User Agents. Robots and spiders also have their own user agent signatures: Google's spider will have the string "googlebot" in its signature.

Referrer


This is the field that logs the web page from which the visitor arrived. The referrer field can show you search engines, affiliates and even advertisements. But that's not all—we also can discover the keyword that was used when the visitor searched in a search engine and came across your site.

The referrer field is also very important in building a visitor session. It's almost impossible to build an accurate visitor session if we don't know where the visitor came from�the referrer field lets you, in essence, 'follow' a visitor from one page to the next.

Status code


This is the field that stores whether the requested action was successful or not. There are several possible values for this field; here are a few of the more common:

  • 200 level = Status ok. Action completed successfully
  • 300 level = Upon requesting a particular file, the visitor is redirected to request another file
  • 400 level = File not found error


Cookie (preferable, but not required)


This field is optional but very beneficial. If you place cookies on your visitors' machines, the cookies will be logged in this field, making them available to be used in your reporting. The presence of a persistent cookie lets you accurately track unique and return visitors. Persistent cookies can also help in tracking your latent conversions from your ads. Plus, with the introduction of cookies, you're now able to store additional information that isn't available through standard web server logs, and then report on this information later. The moral of the story? Just say yes to cookies.

Virtual server name (required only for multi-domain logs)


Typically, each web site will have its own set of log files. A multi-domain log file is a log file where the requests for multiple web sites are logged to one log file. So, in this case, there has to be something in every log file line that ties it to a particular web site. The Virtual Host field does this, and makes it possible to accurately filter the log file entries by web site.

What if one of my fields is missing? How do I make all this happen?


After reading this article, you may notice that some of the fields we described aren't showing up in your log files. In that case, you just need to turn the fields 'on' by making changes to your web server. Depending on your setup, you may need to contact your hosting company to get this done or you can make changes yourself if you have control of the server.

Apache: Look for a file called httpd.conf. Open this file for editing and look for the section on logging. You'll notice a logging string that corresponds to the format you see in the log file. If you use the following string, it should record and report on all the important fields.

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{cookie}i\"" combined

Then save the file, and restart Apache.

IIS: Tweaks to IIS log file formats need to be made in the Internet Services Manager which is typically found in your Windows Control Panel ' Administrative Tools.

Right click on the web site in question, and select properties. Then move to the section on logging. You can simply check or uncheck the fields that you need. Then, just like with Apache, restart the IIS service to apply the changes.

Happy tracking!

Comments (0)Add Comment

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smaller | bigger

busy
 
Next >

Lyris HQ Client Login

Untitled Document
 
Forgot your password or forgot your user name?
The New Look of Lyris HQ
Advertisement
 

LyrisHQ

Lyris HQ provides a single marketing platform for the integrated products today's digital marketer needs: email marketing, web analytics, PPC bid management, SEO, and web content management. Also included is a unified calendar, a message board, and a centralized reporting dashboard.

EmailLabs

EmailLabs provides leading email marketing solutions to over 500 customers worldwide. Beyond our advanced technology and unrivaled reporting & tracking, we also offer our customers access to email marketing expertise and consulting services.

 Visit EmailLabs

ClickTracks

ClickTracks' award-winning web analytics software uses a radically different architecture to offer intuitive, insightful analysis of Web sites, showing users statistics on their campaigns, site navigation patterns, PPC, SEO and ROI.

 Visit ClickTracks

HotBanana

Hot Banana is an award-winning Web CMS that helps marketers build and manage SEO-friendly Web sites that can be automated and optimized for maximum lead generation and conversions.

 Visit HotBanana

EmailAdvisor

EmailAdvisor is an email deliverability toolset that provides important information on a company's email campaign, including a preview of how emails will render, content analysis, blacklist and ISP monitoring, audit capabilities, and more.

 Visit EmailAdvisor

BidHero

BidHero is a web-based campaign management solution that allows users to easily set up keyword bids on multiple search engines as well as other ad networks through a single interface and automatically update those bids.

 Visit BidHero