Your week four homework is two parts. The first part is making your week 3 homework use threads. Make the tasks in your week 3 homework run in parallel. Make sure to synchronize when appropriate.
The second part is writing an apache log processor. For this project you will:
- Provide a commandline tool that safely modifies files
- Change IP Address at the beginning of the line in an Apache logfile
- Must be multithreaded
- Must maintain log order
- Cache name lookups across invocations
- Age the cache items (expire lookups)
- Commandline option to limit number of threads (you should determine a sensible default)
- Look at “resolv” for DNS lookups
- This homework can be completed using only tools from ruby’s standard library
The file format will look like this:
208.77.188.166 - - [29/Apr/2009:16:07:38 -0700] "GET / HTTP/1.1" 200 1342 75.119.201.189 - - [29/Apr/2009:16:07:44 -0700] "GET /favicon.ico HTTP/1.1" 200 1406 75.146.57.34 - - [29/Apr/2009:16:08:38 -0700] "GET / HTTP/1.1" 304 - 75.119.201.189 - - [29/Apr/2009:16:09:53 -0700] "GET / HTTP/1.1" 200 1340 208.77.188.166 - - [29/Apr/2009:16:11:51 -0700] "GET / HTTP/1.1" 304 - 75.146.57.34 - - [29/Apr/2009:16:12:00 -0700] "GET / HTTP/1.1" 304 - 75.119.201.189 - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 - 208.77.188.166 - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 - 75.119.201.189 - - [29/Apr/2009:16:13:17 -0700] "GET / HTTP/1.1" 304 - 75.146.57.34 - - [29/Apr/2009:16:13:50 -0700] "GET / HTTP/1.1" 200 1294 75.146.57.34 - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/main.css?1240264242 HTTP/1.1" 200 2968 74.125.67.100 - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/home.css?1240264242 HTTP/1.1" 200 7829
We want the file to end up looking like this:
example.com - - [29/Apr/2009:16:07:38 -0700] "GET / HTTP/1.1" 200 1342 example.com - - [29/Apr/2009:16:07:44 -0700] "GET /favicon.ico HTTP/1.1" 200 1406 example.com - - [29/Apr/2009:16:08:38 -0700] "GET / HTTP/1.1" 304 - example.com - - [29/Apr/2009:16:09:53 -0700] "GET / HTTP/1.1" 200 1340 example.com - - [29/Apr/2009:16:11:51 -0700] "GET / HTTP/1.1" 304 - example.com - - [29/Apr/2009:16:12:00 -0700] "GET / HTTP/1.1" 304 - example.com - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 - example.com - - [29/Apr/2009:16:13:15 -0700] "GET / HTTP/1.1" 304 - example.com - - [29/Apr/2009:16:13:17 -0700] "GET / HTTP/1.1" 304 - example.com - - [29/Apr/2009:16:13:50 -0700] "GET / HTTP/1.1" 200 1294 example.com - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/main.css?1240264242 HTTP/1.1" 200 2968 example.com - - [29/Apr/2009:16:13:55 -0700] "GET /stylesheets/home.css?1240264242 HTTP/1.1" 200 7829
These domain names aren’t correct, but your output format should be similar.
Here is an example session using the command line tool you will write for your homework:
$ apache_lookup my_logs.log
After execution you should be able to examine my_logs.log and see that the ip addresses have been replaced with domain names. You should also be able to do this:
$ apache_lookup -t 100 my_logs.logThe above execution should spawn 100 threads to process your logfile.
This homework MUST be test driven. We will dock points for non-test driven code. You MUST turn your homework in as a tar.gz or as a gem. Remember to email the mailing list, use IRC, start early, test all the time, and have fun!