Introduction
There are many log analyzers for the Apache log file but there are also always special cases that none of them can handle or not the way you would like to.
This is a simple Perl script that allows you to count the visitors based on their IP address.
Background
In this case, I had to count how many hits came from "localhost" - that is from "127.0.0.1", and how many from elsewhere. I show and explain the script that does this.
Using the code
The log file generated by Apache has lots of rows, all of them starting like this:
127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] ...
127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] ...
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] ...
217.1.20.22 - - [10/Apr/2007:10:40:54 +0300] ...
First the IP address, then a space, a dash (-), and then more data.
The script looks like this:
#!/usr/bin/perl
use strict;
use warnings;
my $file = shift or die "Usage: $0 FILE\n";
open my $fh, '<', $file or die "Could not open '$file': $!";
my $local = 0;
my $remote = 0;
while (my $line = <$fh>) {
my $length = index ($line, " ");
my $ip = substr($line, 0, $length);
if ($ip eq "127.0.0.1") {
$local++;
} else {
$remote++;
}
}
print "Local: $local Remote: $remote\n";
Save it as "analyzer.pl" and run it as "perl analyzer.pl".
Let's go over it.
The first line is called the sh-bang. It is only needed if you'd like to turn the script into a Unix/Linux executable.
use strict;
use warnings;
These are very similar to compiler flags in other languages. They help you avoid common programming mistakes. I call them safety-net. I would not write any Perl script without them.
my $file = shift or die "Usage: $0 FILE\n";
is better explained in two parts:
my $file = shift
will take the first element from @ARGV
, the array holding the command line parameters, and move it to the
$file
variable which has just been declared using the "my
" keyword.
Then there is the "or
" logical operator.
If the user provides a filename, the left-hand-side of "or
" is evaluated to true-ness and the script goes on. If the user has not provided a command line
parameter, then the right-hand-side of the "or
" kicks in and Perl will stop executing displaying the usage message.
Usage: analyze.pl FILENAME
open my $fh, '<', $file or die "Could not open '$file': $!";
The above is a similar logical expression. The left-hand-side opens $file
for reading and puts the file-handle in the new,
$fh
variable. If this is successful, open
returns true and the script goes on. If this fails, open
returns false
and the right-hand-side kicks in. Perl displays an error message and stops executing.
Then we declare two scalar variables and assign 0 to each one of them. We'll use them as counters for the number of lines that start with "127.0.0.1" and the other lines.
The while
loop reads the file line-by-line and executes the content of the block for every line. The while
will stop when we are finished reading the file.
while (my $line = <$fh>) {
}
The index()
function gets a string and a substring and returns the location of the substring (the second parameter) in the first string. It uses 0-based indexing
and we are looking for a space. The resulting number will be the length of the IP address in the current line.
my $length = index ($line, " ");
substr()
gets a string, an index (offset), and a length. It returns the substring located in the specific place.
In our case, that happens to be the IP address of the current line.
my $ip = substr($line, 0, $length);
The only thing that remains is to check if this is "localhost" or not and increment the appropriate counter.
if ($ip eq "127.0.0.1") {
$local++;
} else {
$remote++;
}
Once the loop finishes, we print the results:
print "Local: $local Remote: $remote\n";
That's it. You can now use this script and even improve it based on the explanation. See the next article to get full
source analysis of all hits.