HELLO! HERE IS YOUR WEEK 4 HOMEWORK

Your week four homework is two parts. The first part is making your week 3 homework use threads. Make the tasks in your week 3 homework run in parallel. Make sure to synchronize when appropriate.

The second part is writing an apache log processor. For this project you will:

  • Provide a commandline tool that safely modifies files
  • Change IP Address at the beginning of the line in an Apache logfile
  • Must be multithreaded
  • Must maintain log order
  • Cache name lookups across invocations
  • Age the cache items (expire lookups)
  • Commandline option to limit number of threads (you should determine a sensible default)
  • Look at “resolv” for DNS lookups
  • This homework can be completed using only tools from ruby’s standard library

The file format will look like this:

208.77.188.166 - - [29/Apr/2009:16:07:38 -0700] "GET / HTTP/1.1" 200 1342
75.119.201.189 - - [29/Apr/2009:16:07:44 -0700] "GET /favicon.ico HTTP/1.1" 200 1406
75.146.57.34 - - [29/Apr/2009:16:08:38 -0700] "GET / HTTP/1.1" 304 -
75.119.201.189 - - [29/Apr/2009:16:09:53 -0700] "GET / HTTP/1.1" 200 1340
208.77.188.166 - - [29/Apr/2009:16:11:51 -0700] "GET / HTTP/1.1" 304 -
75.146.57.34 - - [29/Apr/2009:16:12:00 -0700] "GET / HTTP/1.1" 304 -
75.119.201.189 - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 -
208.77.188.166 - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 -
75.119.201.189 - - [29/Apr/2009:16:13:17 -0700] "GET / HTTP/1.1" 304 -
75.146.57.34 - - [29/Apr/2009:16:13:50 -0700] "GET / HTTP/1.1" 200 1294
75.146.57.34 - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/main.css?1240264242 HTTP/1.1" 200 2968
74.125.67.100 - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/home.css?1240264242 HTTP/1.1" 200 7829

We want the file to end up looking like this:

example.com - - [29/Apr/2009:16:07:38 -0700] "GET / HTTP/1.1" 200 1342
example.com - - [29/Apr/2009:16:07:44 -0700] "GET /favicon.ico HTTP/1.1" 200 1406
example.com - - [29/Apr/2009:16:08:38 -0700] "GET / HTTP/1.1" 304 -
example.com - - [29/Apr/2009:16:09:53 -0700] "GET / HTTP/1.1" 200 1340
example.com - - [29/Apr/2009:16:11:51 -0700] "GET / HTTP/1.1" 304 -
example.com - - [29/Apr/2009:16:12:00 -0700] "GET / HTTP/1.1" 304 -
example.com - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 -
example.com - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 -
example.com - - [29/Apr/2009:16:13:17 -0700] "GET / HTTP/1.1" 304 -
example.com - - [29/Apr/2009:16:13:50 -0700] "GET / HTTP/1.1" 200 1294
example.com - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/main.css?1240264242 HTTP/1.1" 200 2968
example.com - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/home.css?1240264242 HTTP/1.1" 200 7829

These domain names aren’t correct, but your output format should be similar.

Here is an example session using the command line tool you will write for your homework:

$ apache_lookup my_logs.log

After execution you should be able to examine my_logs.log and see that the ip addresses have been replaced with domain names. You should also be able to do this:

$ apache_lookup -t 100 my_logs.log
The above execution should spawn 100 threads to process your logfile.

This homework MUST be test driven. We will dock points for non-test driven code. You MUST turn your homework in as a tar.gz or as a gem. Remember to email the mailing list, use IRC, start early, test all the time, and have fun!

Leave a Reply