Mon, 14 Sep 2009

Quick Log File Processing with Perl

A common thing to want to do as a sysadmin is match and print text from a file in a particular output format. There are lots of ways to do this using shell tools - grep, sed and awk are used frequently - but I’d like to show you a common Perl idiom for doing this type of task.

Perl was originally designed to be a replacement for the various shell tools, and while it has grown into much more over the years, it is still a great tool to have in your command line toolbox. Here’s an example. Let’s say you want to print the date, time, IP address and URL each time your website is crawled by a Googlebot. The Apache access log will look something like this:

... 10.249.66.234 - - [12/Sep/2009:19:22:51 -0400] "GET /robots.txt HTTP/1.1" 404 424 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 10.249.66.234 - - [12/Sep/2009:19:22:51 -0400] "GET / HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ...

A quick solution is this, all in one line:

serenity:~# perl -wnle 'print "Googlebot accessed \"$4\" from $1 on $2 at $3" if (/^ (\d+\.\d+\.\d+\.\d+) .+? \[ (.+?) : (.+?) \s .+? GET\s+(.+?)\s+HTTP .+ Googlebot/x)' /var/log/apache2/access.log Googlebot accessed "/robots.txt" from 10.249.66.234 on 12/Sep/2009 at 19:22:51 Googlebot accessed "/" from 10.249.66.234 on 12/Sep/2009 at 19:22:51 serenity:~#

There are four command line options used here:

See the perlrun manpage for details, there is much more to Perl’s command line processing.

I build the regular expression by picking a target line and going through it from left to right, adding expressions as I go. I make use of the /x modifier so that it is easier to read - this makes Perl ignore whitespace in the regexp. I also use Perl’s non-greedy quantifier quite a bit, this is the question mark in expressions like .+? \[. This little snippet matches one or more of any character, followed by a left-bracket. The question mark ensures that the first such left-bracket is matched. Normally Perl’s regexp engine would happily chomp away at characters and match the last left bracket it found in the line. Using the greedy form .+ \[ would work for us, since there is only one such left bracket in each line, but it turns out to be a performance improvement if we are parsing large text files (For more info, I encourage you to read Mastering Regular Expressions by Jeffrey Friedl, or start with the Regular Expression Tutorial).

This method has a few advantages. For one, it relies on just one tool, not a few disparate ones. Perl is portable to many operating systems, so you could use this to parse text files on Windows, for example. You also have the ability to load modules on the command line with the ‘-M’ switch. This gives you access to all of CPAN, potentially a huge time-saver.

posted at: 21:20 | path: / | permanent link to this entry | 0 comments | tags:

[Post to Yahoo Buzz]  [Post to Delicious]  [Post to Digg]  [Post to Reddit]  [Post to StumbleUpon] 

Fri, 17 Jul 2009

Monitoring and Alerting on Linux Logfiles

As a sysadmin, I’ve found it’s always useful to monitor system logs on your Linux or Unix servers for specific patterns of activity, things that can indicate security or system issues. Even nicer to get alerts when activity occurs. Some time ago I wanted a simple solution that would allow me to continuously monitor the ClamAV updater (freshclam) logfiles and send email alerts - the result was this script. Recently I wanted something a bit more general, so I wrote this Perl script that monitors any logfile for a specific pattern and generates email or syslog alerts.

Installing Logmon

It needs a few non-core Perl modules to run, namely Mail::Mailer, Proc::Daemon, Unix::Syslog and File::Tail, but these can be installed pretty easily as packaged modules or via CPAN. On Debian/Ubuntu systems, all the needed modules are pre-packaged for you:

apt-get install libmailtools-perl libunix-syslog-perl libfile-tail-perl libproc-daemon-perl

On red Hat/Fedora servers, you can use yum:

yum install perl-MailTools perl-Unix-Syslog perl-File-Tail perl-Proc-Daemon

To pull in all the modules from CPAN, you can use this one-liner:

for m in Mail::Mailer Proc::Daemon Unix::Syslog File::Tail; do perl -MCPAN -e "install $m"; done

Once the modules are installed, download the script and double check that it will run without error, then copy it to your path and make it executable. You should see no errors about missing modules when you run the script under ‘perl -cwT’.

perl -cwT ./logmon.pl install -m 755 logmon.pl /usr/local/bin/

Running Logmon

It’s meant to be both simple and secure, here is the usage summary:

logmon.pl synopsis: Daemon that periodically checks logfile for a pattern and send alerts Pattern is always required. If no other options are given, defaults to syslog alerts and monitors /var/log/messages for given pattern. Usage: logmon.pl -p pattern [-m alerts@example.com] [-f logfile] [-u run as user] [-g run as group] [-i max interval] [-v] [-d] [-h] -m: Email destination for alerts -f: logfile to monitor -p: Pattern to match against lines in logfile (Perl regexp, match is case-insensitive) -u: Run with permissions of user -g: Run with permissions of group -i: Max time to sleep between checks -d: Debug output to STDOUT, do not daemonize -v: Verbose logging (use with caution or you may have endless alerts) -h: This help text

Running the script is pretty straightforward, you specify a pattern to match against with -p (this is the only required parameter), and optionally an email recipient (-m) and logfile to watch (-f). Here is an example. Let’s say you want to get alerts whenever MySQL detects a crashed table. The syslog event for this looks like this on my Ubuntu box:

Jul 17 08:02:49 kaylee mysqld[1532]: 090717 8:02:49 [ERROR] /usr/sbin/mysqld: Table './mysql/user' is marked as crashed and should be repaired

And here is the command line usage:

logmon.pl -p 'mysqld.+?table.+?crashed' -m you@example.com -u nobody -g adm -f /var/log/syslog -i 30

Note that the pattern match is case-insensitive. When run this way, Logmon will detach itself from your terminal and run as a daemon, checking /var/log/syslog every 30 seconds for the supplied pattern. I recommend you use the -u and -g options to force Logmon to drop it’s privileges, just make sure you specify a user or group that have read-permissions on the specified logfile. On Debian and Ubuntu servers, all the system logs are readable by the group ‘adm’.

Other Options and Tips for Testing Logmon

Logmon will dump alerts to your default system log if you leave off the -m option. If you also use the -v option for verbose logging, these syslog entries and alerts will have pattern and match data. If you end up monitoring the same file you are dumping alerts into you’ll get an endless series of alerts continuously being added to the system log. For this reason when alerts are sent to syslog, by default they are very generic (email alerts are always verbose).

To test Logmon, use the -v and -d options together. Run this way, Logmon will not daemonize itself, and will just print alerts and activity to your console (STDERR).

Errors

Any errors that cause the script to die while it’s running as a daemon can be found in your system log.

Startup and Shutdown

I haven’t yet written any startup scripts for Logmon, although I plan to. For now, just start it from one of your system’s startup scripts, and if you have to stop it you can just use pkill logmon. Please send any bugs or suggestions to the email address in the script header, or leave a comment here.

posted at: 02:17 | path: / | permanent link to this entry | 0 comments | tags:

[Post to Yahoo Buzz]  [Post to Delicious]  [Post to Digg]  [Post to Reddit]  [Post to StumbleUpon]