Mon, 14 Sep 2009

Quick Log File Processing with Perl

A common thing to want to do as a sysadmin is match and print text from a file in a particular output format. There are lots of ways to do this using shell tools - grep, sed and awk are used frequently - but I’d like to show you a common Perl idiom for doing this type of task.

Perl was originally designed to be a replacement for the various shell tools, and while it has grown into much more over the years, it is still a great tool to have in your command line toolbox. Here’s an example. Let’s say you want to print the date, time, IP address and URL each time your website is crawled by a Googlebot. The Apache access log will look something like this:

... 10.249.66.234 - - [12/Sep/2009:19:22:51 -0400] "GET /robots.txt HTTP/1.1" 404 424 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 10.249.66.234 - - [12/Sep/2009:19:22:51 -0400] "GET / HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ...

A quick solution is this, all in one line:

serenity:~# perl -wnle 'print "Googlebot accessed \"$4\" from $1 on $2 at $3" if (/^ (\d+\.\d+\.\d+\.\d+) .+? \[ (.+?) : (.+?) \s .+? GET\s+(.+?)\s+HTTP .+ Googlebot/x)' /var/log/apache2/access.log Googlebot accessed "/robots.txt" from 10.249.66.234 on 12/Sep/2009 at 19:22:51 Googlebot accessed "/" from 10.249.66.234 on 12/Sep/2009 at 19:22:51 serenity:~#

There are four command line options used here:

See the perlrun manpage for details, there is much more to Perl’s command line processing.

I build the regular expression by picking a target line and going through it from left to right, adding expressions as I go. I make use of the /x modifier so that it is easier to read - this makes Perl ignore whitespace in the regexp. I also use Perl’s non-greedy quantifier quite a bit, this is the question mark in expressions like .+? \[. This little snippet matches one or more of any character, followed by a left-bracket. The question mark ensures that the first such left-bracket is matched. Normally Perl’s regexp engine would happily chomp away at characters and match the last left bracket it found in the line. Using the greedy form .+ \[ would work for us, since there is only one such left bracket in each line, but it turns out to be a performance improvement if we are parsing large text files (For more info, I encourage you to read Mastering Regular Expressions by Jeffrey Friedl, or start with the Regular Expression Tutorial).

This method has a few advantages. For one, it relies on just one tool, not a few disparate ones. Perl is portable to many operating systems, so you could use this to parse text files on Windows, for example. You also have the ability to load modules on the command line with the ‘-M’ switch. This gives you access to all of CPAN, potentially a huge time-saver.

posted at: 21:20 | path: / | permanent link to this entry | 0 comments | tags:

[Post to Yahoo Buzz]  [Post to Delicious]  [Post to Digg]  [Post to Reddit]  [Post to StumbleUpon] 

Wed, 02 Sep 2009

Troubleshooting SSH Connections

I’ve helped a few people recently who have had trouble getting OpenSSH working properly; I’ve also had my share of issues over the years. Generally problems with SSH connections fall into two groups - network related and server related. Most of these problems can be fixed fairly quickly if you know what to look for.

Network Related

These will typically be caused by improper routing or firewall configurations. Here are some things to check.

1. If your SSH server sits behind a firewall or router, make sure the default route of your internal SSH server points back to that firewall or router. Seems obvious, but it’s common to forget about the return trip packets need to make. This will display your default gateway:

netstat -rn | grep '^0'

Sometimes the default gateway is just one of your server interfaces, this is OK as long as that interface is directly connected to something that knows how to get back to your client.

2. While you’re at it, make sure the incoming SSH packets are actually getting to your SSH server. Tcpdump works very nicely for this, you’ll need to be root to run it on the server:

tcpdump -n -i eth0 tcp port 22 and host [IP address of client]

Just replace eth0 by your client-facing interface name. If you don’t see incoming SSH packets during connection attempts, it’s probably due to a firewall or router access list.

SSH Server Problems

All of these issues revolve around SSH server configuration settings - not misconfigurations necessarily, just settings you may not be aware of.

1. Permissions can be a problem - in its default configuration, OpenSSH sets StrictModes to yes and won’t allow any connections if the account you’re trying to SSH into has group- or world-writable permissions on its home directory, ~/.ssh directory, or ~/.ssh/authorized_keys file. I typically just make the two directories mode 700 and the authorized_keys file mode 600. The sshd man page suggests this one-liner:

chmod go-w ~/ ~/.ssh ~/.ssh/authorized_keys

2. On Debian or Ubuntu systems, it is possible the keys you are using to connect are blacklisted. This is only an issue on Debian or Debian-based clients, and stems from this now-famous vulnerability in May of 2008. To detect any such blacklisted keys, run ssh-vulnkey on the client, while logged into the account you are connecting from. Debian and Ubuntu SSH servers will reject any such keys unless the PermitBlacklistedKeys directive in the /etc/ssh/sshd_config file is set to no. I don’t recommend you actually leave this security check disabled, but it can be useful to temporarily disable it during testing.

3. Finally, if all else fails, you can see exactly what the SSH server is doing by running it in debug mode on a non-standard port:

/usr/sbin/sshd -d -p 2222

Then, on the client, connect and watch the server output:

ssh -vv -p 2222 [Server IP]

Note the -vv option to provide verbose client output. This alone can sometimes help debug connection issues.

posted at: 22:17 | path: / | permanent link to this entry | 0 comments | tags:

[Post to Yahoo Buzz]  [Post to Delicious]  [Post to Digg]  [Post to Reddit]  [Post to StumbleUpon]