Graphing requests per second out of apache log files
angelo | March 19, 2009I’ve been doing some analysis on some log files I got from a client. Since I have to propose an infrastructure that can support these web applications, I’d like to do some reconnaisance first, and see what the traffic is, and how many requests/sec are actually happening. Are we talking 10 req/s or 1000 req/s?
So, I wrote some scripts to graph out the requests/sec from an apache logfile, and I want to share this with you.
First of all, I want to merge the logfiles, so I get one big log file of all cluster nodes. For this, I use the logresolvemerge.pl script that comes with AWStats.
perl logresolvemerge.pl example.node1.log example.node2.log > example.log
Then basicly what I do, is I sort the log file (aparently apache log files are not really chronologic, probably because of the difference between the time a request is made, and the time a request is completed), then I run a script to get the number of requests per minute using this requests.rb script. I then create an rrd file, post all the information about the requests per minute into it, and then I export graphs
To sort the logfile, save this script as _sortlog.sh (you might want to use another location than ~/tmp as temporary sorting dir:
if [ ! -f $1 ]; then
   echo "Usage: $0 "
   exit
fi
echo "Sorting $1"
sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n -T ~/tmp $1 > $2
Here’s a complete script. Keep in mind I use FreeBSD, so on other *nix flavours you might have to change the script a bit (use expr instead of gexpr, etc). Requirements are the _sortlog.sh script and the requests.rb script mentioned above, and on FreeBSD the packages gexpr and rrdtool12 (or maybe rrdtool 1.3 wil word as well). The 4 timestamps mentioned in the script are for the exporting of the graphs. (My blog will replace straigt double quotes with the curly ones as well, you should replace them back):
if [ $# -ne 1 ]; then
   echo "Usage: $0 "
   exit
fi
TIMESTAMP_WEEK_END=1237258800
TIMESTAMP_DAY_BEGIN=1237071600
TIMESTAMP_DAY_END=1237158000
echo `date` sorting log file "$LOGFILENAME".log to "$LOGFILENAME"_sorted.log..
sh _sortlog.sh "$LOGFILENAME".log "$LOGFILENAME"_sorted.log
echo `date` calculating requests per minute in file "$LOGFILENAME"_requests.txt..
cat "$LOGFILENAME"_sorted.log | ./requests.rb > "$LOGFILENAME"_requests.txt
echo `date` making script "$LOGFILENAME"_createrrd.sh..
FIRSTTIMESTAMP=`gexpr "$FIRSTTIMESTAMP - 60"`
echo --step 60Â \\ >> "$LOGFILENAME"_createrrd.sh
echo --start $FIRSTTIMESTAMP \\ >> "$LOGFILENAME"_createrrd.sh
echo DS:requests:ABSOLUTE:60:0:U \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:1:500 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:1440:3985 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:1440:3985 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:1:500 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:1440:3985 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:1440:3985 >> "$LOGFILENAME"_createrrd.sh
rm "$LOGFILENAME".rrd
sh "$LOGFILENAME"_createrrd.sh
echo `date` making script "$LOGFILENAME"_updaterrd.sh..
 AWKSTRING="{print \"rrdtool update $LOGFILENAME.rrd \" \$1 \":\" \$2}"
 echo `echo ${line} | awk "$AWKSTRING"` >> "$LOGFILENAME"_updaterrd.sh
done
sh "$LOGFILENAME"_updaterrd.sh
echo --imgformat=PNG \\ >> "$LOGFILENAME"_graph_week.sh
echo --start=$TIMESTAMP_WEEK_BEGIN \\ >> "$LOGFILENAME"_graph_week.sh
echo --end=$TIMESTAMP_WEEK_END \\ >> "$LOGFILENAME"_graph_week.sh
echo --title=\"$LOGFILENAME requests\" \\ >> "$LOGFILENAME"_graph_week.sh
echo --rigid \\ >> "$LOGFILENAME"_graph_week.sh
echo --base=1000 \\ >> "$LOGFILENAME"_graph_week.sh
echo --height=120 \\ >> "$LOGFILENAME"_graph_week.sh
echo --width=500 \\ >> "$LOGFILENAME"_graph_week.sh
echo --alt-autoscale-max \\ >> "$LOGFILENAME"_graph_week.sh
echo --lower-limit=0 \\ >> "$LOGFILENAME"_graph_week.sh
echo --vertical-label=\"requests per second\" \\ >> "$LOGFILENAME"_graph_week.sh
echo --slope-mode \\ >> "$LOGFILENAME"_graph_week.sh
echo --font TITLE:9:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo --font AXIS:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo --font LEGEND:8:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo --font UNIT:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:a=\"$LOGFILENAME.rrd\":requests:AVERAGE \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:b=\"$LOGFILENAME.rrd\":requests:LAST \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:c=\"$LOGFILENAME.rrd\":requests:MIN \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:d=\"$LOGFILENAME.rrd\":requests:MAX \\ >> "$LOGFILENAME"_graph_week.sh
echo AREA:a#6EA1007F:\"HTTP Requests\"Â \\ >> "$LOGFILENAME"_graph_week.sh
echo GPRINT:b:LAST:\"Current\\:%8.0lf\"Â \\ >> "$LOGFILENAME"_graph_week.sh
echo GPRINT:a:AVERAGE:\"Average\\:%8.0lf\"Â \\ >> "$LOGFILENAME"_graph_week.sh
echo GPRINT:d:MAX:\"Maximum\\:%8.0lf\"Â \\ >> "$LOGFILENAME"_graph_week.sh
echo LINE1:a#6EA100FF:\"\" >> "$LOGFILENAME"_graph_week.sh
sh "$LOGFILENAME"_graph_week.sh
echo --imgformat=PNG \\ >> "$LOGFILENAME"_graph_day.sh
echo --start=$TIMESTAMP_DAY_BEGIN \\ >> "$LOGFILENAME"_graph_day.sh
echo --end=$TIMESTAMP_DAY_END \\ >> "$LOGFILENAME"_graph_day.sh
echo --title=\"$LOGFILENAME requests\" \\ >> "$LOGFILENAME"_graph_day.sh
echo --rigid \\ >> "$LOGFILENAME"_graph_day.sh
echo --base=1000 \\ >> "$LOGFILENAME"_graph_day.sh
echo --height=120 \\ >> "$LOGFILENAME"_graph_day.sh
echo --width=500 \\ >> "$LOGFILENAME"_graph_day.sh
echo --alt-autoscale-max \\ >> "$LOGFILENAME"_graph_day.sh
echo --lower-limit=0 \\ >> "$LOGFILENAME"_graph_day.sh
echo --vertical-label=\"requests per second\" \\ >> "$LOGFILENAME"_graph_day.sh
echo --slope-mode \\ >> "$LOGFILENAME"_graph_day.sh
echo --font TITLE:9:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo --font AXIS:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo --font LEGEND:8:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo --font UNIT:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:a=\"$LOGFILENAME.rrd\":requests:AVERAGE \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:b=\"$LOGFILENAME.rrd\":requests:LAST \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:c=\"$LOGFILENAME.rrd\":requests:MIN \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:d=\"$LOGFILENAME.rrd\":requests:MAX \\ >> "$LOGFILENAME"_graph_day.sh
echo AREA:a#6EA1007F:\"HTTP Requests\"Â \\ >> "$LOGFILENAME"_graph_day.sh
echo GPRINT:b:LAST:\"Current\\:%8.0lf\"Â \\ >> "$LOGFILENAME"_graph_day.sh
echo GPRINT:a:AVERAGE:\"Average\\:%8.0lf\"Â \\ >> "$LOGFILENAME"_graph_day.sh
echo GPRINT:d:MAX:\"Maximum\\:%8.0lf\"Â \\ >> "$LOGFILENAME"_graph_day.sh
echo LINE1:a#6EA100FF:\"\" >> "$LOGFILENAME"_graph_day.sh
sh "$LOGFILENAME"_graph_day.sh
You can run this script using the command:
$ ./go.sh example.log
And the output would look like:
Wed Mar 18 17:30:30 CET 2009 started script for log example.log..
Wed Mar 18 17:30:30 CET 2009 sorting log file example.log to example_sorted.log..
Sorting example.log
Wed Mar 18 17:54:23 CET 2009 calculating requests per minute in file example_requests.txt..
Thu Mar 19 03:22:42 CET 2009 making script example_createrrd.sh..
Thu Mar 19 03:22:42 CET 2009 deleting example.rrd..
rm: example.rrd: No such file or directory
Thu Mar 19 03:22:42 CET 2009 executing script example_createrrd.sh..
Thu Mar 19 03:22:42 CET 2009 making script example_updaterrd.sh..
rm: example_updaterrd.sh: No such file or directory
Thu Mar 19 03:23:10 CET 2009 executing script example_updaterrd.sh..
Thu Mar 19 03:23:27 CET 2009 making script example_graph_week.sh..
Thu Mar 19 03:23:27 CET 2009 executing script example_graph_week.sh..
595×199
Thu Mar 19 03:23:27 CET 2009 making script example_graph_day.sh..
Thu Mar 19 03:23:27 CET 2009 executing script example_graph_day.sh..
595×199
Thu Mar 19 03:23:27 CET 2009 done
Note the requests.rb file can take long, the above output was for an 18GB file. The graphs will perhaps look like this:


My firewall/asterisk/mail scanner is a Compaq Deskpro EN/SFF, a very small Pentium III 500 box, with 256MB RAM and a 13GB ATA disk. I am very happy with it, since it scans inbound and outbound email all day using amavisd-new, spamassassin and ClamAV, and at the same time I can call through asterisk without a glitch. And it also runs OpenVPN server, DHCP server, BIND, NTP daemon, IPv6 tunnel, PF firewall with QoS, etc. And all that with only 30W power consumption! On auction sites, they go for around 25-30 EUR a piece. You don’t get a Cisco router with that flexibility and features for that price 




