tail -# file, in PHP

I read Kevin’s Read Line from File article today and thought I would add some sample code I wrote a while back to address a similar task in PHP – How to perform the equivalent of “tail -# filename” in PHP.

A typical task in some applications such as a shoutbox or a simple log file viewer is how to extract the last ‘n’ lines from a (very) large file – efficiently.  There are several approaches one can take to solving the problem (as always in programming there are many ways to skin the proverbial cat) – taking the easy way out and shelling out to run the tail command; or reading the file line by line keeping the last ‘n’ lines in memory with a queue.

However, here I will demonstrate a fairly tuned method of seeking to the end of the file and stepping back to read sufficient lines to the end of the file. If insufficient lines are returned, it incrementally looks back further in the file until it either can look no further, or sufficient lines are returned.

Assumptions need to be made in order to tune the algorithm for the competing challenges of :

  • Reading enough lines in
  • as few reads as possible

My approach to this is to provide an approximation to the average size of a line: $linelength
Multiply by ‘n’: $linecount
and we have the byte $offset from the end of the file that we will seek to to begin reading.

A check we need to do right here is that we haven’t offset to before the beginning of the file, and so we have to override $offset to match the file size.

Also as I’m being over-cautious, I’m going to tell it to offset $linecount + 1 – The main reason for this is by seeking to a specific byte location in the file, we would have to be very lucky to land on the first character of a new line – therefore we must perform a fgets() and throw away that result.

Typically, I want it to be able to read ‘n’ lines forward from the offset given, however if that proves insufficient, I’m going to grow the offset by 10% and try again. I also want to make it so that the algorithm is better able to tune itself if we grossly underestimate what $linelength should be.  In order to do this, we’re going to track the string length of each line we do get back and adjust the offset accordingly.

In our example, lets try reading the last 10 lines from Apache’s access_log

So let’s look at the code so far, nothing interesting, we’re just prepping for the interesting stuff:

$linecount  10;  // Number of lines we want to read
$linelength 160// Apache's logs are typically ~200+ chars
// I've set this to < 200 to show the dynamic nature of the algorithm
// offset correction.
$file '/usr/local/apache2/logs/access_log.demo';
$fsize filesize($file);


// check if file is smaller than possible max lines
$offset = ($linecount+1) * $linelength;
if ($offset $fsize$offset $fsize;

Next up we’re going to open the file and using our method of seeking to the end of the file, less our offset, here is the meat of our routine:

$fp fopen($file'r');
if (
$fp === false) exit;

$lines = array(); // array to store the lines we read.

$readloop true;
while(
$readloop) {
// we will finish reading when we have read $linecount lines, or the file
// just doesn’t have $linecount lines

// seek to $offset bytes from the end of the file
fseek($fp– $offsetSEEK_END);

  // discard the first line as it won't be a complete line
// unless we're right at the start of the file
if ($offset != $fsizefgets($fp);

// tally of the number of bytes in each line we read
$linesize 0;

// read from here till the end of the file and remember each line
while($line fgets($fp)) {
array_push($lines$line);
$linesize += strlen($line); // total up the char count

// if we’ve been able to get more lines than we need
// lose the first entry in the queue
// Logically we should decrement $linesize too, but if we
// hit the magic number of lines, we are never going to use it
if (count($lines) > $linecountarray_shift($lines);
}

// We have now read all the lines from $offset until the end of the file
if (count($lines) == $linecount) {
// perfect – have enough lines, can exit the loop
$readloop false;
} elseif (
$offset >= $fsize) {
// file is too small – nothing more we can do, we must exit the loop
$readloop false;
} elseif (
count($lines) < $linecount) {
// try again with a bigger offset
$offset intval($offset 1.1);  // increase offset 10%
// but also work out what the offset could be if based on the lines we saw
$offset2 intval($linesize/count($lines) * ($linecount+1));
// and if it is larger, then use that one instead (self-tuning)
if ($offset2 $offset$offset $offset2;
// Also remember we can’t seek back past the start of the file
if ($offset $fsize$offset $fsize;
echo 
‘Trying with a bigger offset: ‘$offset“n”;
    // and reset
$lines = array();
  }
}

// Let’s have a look at the lines we read.
print_r($lines);

At first glance it might seem line overkill for the task, however stepping through the code you can see the expected while loop with fgets() to read each line. The only thing we are doing at this stage is shifting the first line of the $lines array if we happen to read too many lines, and also tallying up how many characters we managed to read for each line.

If we exit the while/fgets loop with the correct number of lines, then all is well, we can exit the main retry loop and we have the result in $lines.

Where the code gets interesting is what we do if we don’t achieve the required number of lines.  The simple fix is to step back by a further 10% by increasing the offset and trying again, but remember we also counted up the number of bytes we read for each line we did get, so we can very simply obtain an average real-file line size by dividing this with the number of lines in our $lines array. This enables us to override the previous offset value to something larger, if indeed we were wildly off in our estimates.

By adjusting our offset value and letting the loop repeat, the routine will try again and repeat until it succeeds or fails gracefully by the file not having sufficient lines, in which case it’ll return what it could get.

For the complete, working, source code please visit:

http://pgregg.com/projects/php/code/tail-10.phps

Sample execution:

http://pgregg.com/projects/php/code/tail-10.php

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

All content © Paul Gregg, 1994 - 2024
This site http://pgregg.com has been online since 5th October 2000
Previous websites live at various URLs since 1994