Friday 26 September 2008

One other little known command

Do you know about "join"? It's a nice little utility that comes from the "coreutils" package, which means that it's there by default on most, if not all Unix/Linux distributions. What this does is join (duh) two files, based on the values in fields.

For example, the other day I had to correlate Apache requests from the logs on one Apache server acting as a proxy, with the logs of another Apache server used as a backend. Because of missing configuration, the proxy Apache had public IP addresses, but didn't have domain names (virtual hosts), and the backend had domain names but only the private IP addresses from the proxy systems. So, I took copies of the two Apache's combined_logs, and awk'ed them to only keep the IP, URI requested, and domain name requested (this was necessary because it's better to have clear field numbers with join, and Apache can, if configured to do it, log the UserAgent of a client, which spans multiple fields if you field separator is space).

With the logs doctored as required to be easy to handle with join, you can just run the command:

join -1 2 -2 2 -o "1.1 0 2.3" log_proxy.txt log_backend.txt


What this will do, is tell join to use the second field from the first file (-1 2, aka log_proxy.txt), the second field from the second file (-2 2, aka log_backend), and join the data together to form a new file, outputted to standard output, following the format stated in -o: first file's first field (the public IP), the join field (0, or the URI in this case, which appears in both files), and the third field in the second file (the virtual host domain). You can obviously adjust the field numbers if necessary, or change various settings such as the field separator (default is space), which is all very clearly documented in the man page.

On other news, I seem to have because fairly well know in my workplace to be the Ubuntu wizard... Maybe the Ubuntu laptop bag helps? Anyway... Friday I was asked by Marc about installing Ubuntu on a Lenovo T60, and yesterday I was asked about my thoughts and experiences with Intrepid by Richard. I've already managed to get one Ubuntu machine in use as a server, and hopefully there will be more to come in the future.

No comments: