Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Home-made Apache log analyzer to count hits

0.00/5 (No votes)
10 Jan 2012 1  
How to report the number of hits from localhost and elsewhere based on the Apache log file.

Introduction

There are many log analyzers for the Apache log file but there are also always special cases that none of them can handle or not the way you would like to.

This is a simple Perl script that allows you to count the visitors based on their IP address.

Background

In this case, I had to count how many hits came from "localhost" - that is from "127.0.0.1", and how many from elsewhere. I show and explain the script that does this.

Using the code

The log file generated by Apache has lots of rows, all of them starting like this:

127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] ...
127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] ...
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] ...
217.1.20.22 - - [10/Apr/2007:10:40:54 +0300] ...

First the IP address, then a space, a dash (-), and then more data.

The script looks like this:

#!/usr/bin/perl
use strict;
use warnings;

my $file = shift or die "Usage: $0 FILE\n";
open my $fh, '<', $file or die "Could not open '$file': $!";

my $local  = 0;
my $remote = 0;
while (my $line = <$fh>) {
    my $length = index ($line, " ");
    my $ip = substr($line, 0, $length);
    if ($ip eq "127.0.0.1") {
        $local++;
    } else {
        $remote++;
    }
}

print "Local: $local Remote: $remote\n";

Save it as "analyzer.pl" and run it as "perl analyzer.pl".

Let's go over it.

The first line is called the sh-bang. It is only needed if you'd like to turn the script into a Unix/Linux executable.

use strict;
use warnings;

These are very similar to compiler flags in other languages. They help you avoid common programming mistakes. I call them safety-net. I would not write any Perl script without them.

my $file = shift or die "Usage: $0 FILE\n";   

is better explained in two parts:

my $file = shift

will take the first element from @ARGV, the array holding the command line parameters, and move it to the $file variable which has just been declared using the "my" keyword.

Then there is the "or" logical operator.

If the user provides a filename, the left-hand-side of "or" is evaluated to true-ness and the script goes on. If the user has not provided a command line parameter, then the right-hand-side of the "or" kicks in and Perl will stop executing displaying the usage message.

Usage: analyze.pl FILENAME
open my $fh, '<', $file or die "Could not open '$file': $!";  

The above is a similar logical expression. The left-hand-side opens $file for reading and puts the file-handle in the new, $fh variable. If this is successful, open returns true and the script goes on. If this fails, open returns false and the right-hand-side kicks in. Perl displays an error message and stops executing.

Then we declare two scalar variables and assign 0 to each one of them. We'll use them as counters for the number of lines that start with "127.0.0.1" and the other lines.

The while loop reads the file line-by-line and executes the content of the block for every line. The while will stop when we are finished reading the file.

while (my $line = <$fh>) {
}

The index() function gets a string and a substring and returns the location of the substring (the second parameter) in the first string. It uses 0-based indexing and we are looking for a space. The resulting number will be the length of the IP address in the current line.

my $length = index ($line, " "); 

substr() gets a string, an index (offset), and a length. It returns the substring located in the specific place. In our case, that happens to be the IP address of the current line.

my $ip = substr($line, 0, $length);  

The only thing that remains is to check if this is "localhost" or not and increment the appropriate counter.

if ($ip eq "127.0.0.1") {
    $local++;
} else {
    $remote++;
}

Once the loop finishes, we print the results:

print "Local: $local Remote: $remote\n"; 

That's it. You can now use this script and even improve it based on the explanation. See the next article to get full source analysis of all hits.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here