Graphing requests per second out of apache log files
March 19, 2009 | 11:00 amI’ve been doing some analysis on some log files I got from a client. Since I have to propose an infrastructure that can support these web applications, I’d like to do some reconnaisance first, and see what the traffic is, and how many requests/sec are actually happening. Are we talking 10 req/s or 1000 req/s?
So, I wrote some scripts to graph out the requests/sec from an apache logfile, and I want to share this with you.
First of all, I want to merge the logfiles, so I get one big log file of all cluster nodes. For this, I use the logresolvemerge.pl script that comes with AWStats.
perl logresolvemerge.pl example.node1.log example.node2.log > example.log
Then basicly what I do, is I sort the log file (aparently apache log files are not really chronologic, probably because of the difference between the time a request is made, and the time a request is completed), then I run a script to get the number of requests per minute using this requests.rb script. I then create an rrd file, post all the information about the requests per minute into it, and then I export graphs
To sort the logfile, save this script as _sortlog.sh (you might want to use another location than ~/tmp as temporary sorting dir:
if [ ! -f $1 ]; then
echo "Usage: $0 "
exit
fi
echo "Sorting $1"
sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n -T ~/tmp $1 > $2
Here’s a complete script. Keep in mind I use FreeBSD, so on other *nix flavours you might have to change the script a bit (use expr instead of gexpr, etc). Requirements are the _sortlog.sh script and the requests.rb script mentioned above, and on FreeBSD the packages gexpr and rrdtool12 (or maybe rrdtool 1.3 wil word as well). The 4 timestamps mentioned in the script are for the exporting of the graphs. (My blog will replace straigt double quotes with the curly ones as well, you should replace them back):
if [ $# -ne 1 ]; then
echo "Usage: $0 "
exit
fi
TIMESTAMP_WEEK_END=1237258800
TIMESTAMP_DAY_BEGIN=1237071600
TIMESTAMP_DAY_END=1237158000
echo `date` sorting log file "$LOGFILENAME".log to "$LOGFILENAME"_sorted.log..
sh _sortlog.sh "$LOGFILENAME".log "$LOGFILENAME"_sorted.log
echo `date` calculating requests per minute in file "$LOGFILENAME"_requests.txt..
cat "$LOGFILENAME"_sorted.log | ./requests.rb > "$LOGFILENAME"_requests.txt
echo `date` making script "$LOGFILENAME"_createrrd.sh..
FIRSTTIMESTAMP=`gexpr "$FIRSTTIMESTAMP - 60"`
echo --step 60 \\ >> "$LOGFILENAME"_createrrd.sh
echo --start $FIRSTTIMESTAMP \\ >> "$LOGFILENAME"_createrrd.sh
echo DS:requests:ABSOLUTE:60:0:U \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:1:500 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:AVERAGE:0.5:1440:3985 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MIN:0.5:1440:3985 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:1:500 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:MAX:0.5:1440:3985 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:1:600 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:6:700 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:24:775 \\ >> "$LOGFILENAME"_createrrd.sh
echo RRA:LAST:0.5:1440:3985 >> "$LOGFILENAME"_createrrd.sh
rm "$LOGFILENAME".rrd
sh "$LOGFILENAME"_createrrd.sh
echo `date` making script "$LOGFILENAME"_updaterrd.sh..
AWKSTRING="{print \"rrdtool update $LOGFILENAME.rrd \" \$1 \":\" \$2}"
echo `echo ${line} | awk "$AWKSTRING"` >> "$LOGFILENAME"_updaterrd.sh
done
sh "$LOGFILENAME"_updaterrd.sh
echo --imgformat=PNG \\ >> "$LOGFILENAME"_graph_week.sh
echo --start=$TIMESTAMP_WEEK_BEGIN \\ >> "$LOGFILENAME"_graph_week.sh
echo --end=$TIMESTAMP_WEEK_END \\ >> "$LOGFILENAME"_graph_week.sh
echo --title=\"$LOGFILENAME requests\" \\ >> "$LOGFILENAME"_graph_week.sh
echo --rigid \\ >> "$LOGFILENAME"_graph_week.sh
echo --base=1000 \\ >> "$LOGFILENAME"_graph_week.sh
echo --height=120 \\ >> "$LOGFILENAME"_graph_week.sh
echo --width=500 \\ >> "$LOGFILENAME"_graph_week.sh
echo --alt-autoscale-max \\ >> "$LOGFILENAME"_graph_week.sh
echo --lower-limit=0 \\ >> "$LOGFILENAME"_graph_week.sh
echo --vertical-label=\"requests per second\" \\ >> "$LOGFILENAME"_graph_week.sh
echo --slope-mode \\ >> "$LOGFILENAME"_graph_week.sh
echo --font TITLE:9:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo --font AXIS:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo --font LEGEND:8:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo --font UNIT:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:a=\"$LOGFILENAME.rrd\":requests:AVERAGE \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:b=\"$LOGFILENAME.rrd\":requests:LAST \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:c=\"$LOGFILENAME.rrd\":requests:MIN \\ >> "$LOGFILENAME"_graph_week.sh
echo DEF:d=\"$LOGFILENAME.rrd\":requests:MAX \\ >> "$LOGFILENAME"_graph_week.sh
echo AREA:a#6EA1007F:\"HTTP Requests\" \\ >> "$LOGFILENAME"_graph_week.sh
echo GPRINT:b:LAST:\"Current\\:%8.0lf\" \\ >> "$LOGFILENAME"_graph_week.sh
echo GPRINT:a:AVERAGE:\"Average\\:%8.0lf\" \\ >> "$LOGFILENAME"_graph_week.sh
echo GPRINT:d:MAX:\"Maximum\\:%8.0lf\" \\ >> "$LOGFILENAME"_graph_week.sh
echo LINE1:a#6EA100FF:\"\" >> "$LOGFILENAME"_graph_week.sh
sh "$LOGFILENAME"_graph_week.sh
echo --imgformat=PNG \\ >> "$LOGFILENAME"_graph_day.sh
echo --start=$TIMESTAMP_DAY_BEGIN \\ >> "$LOGFILENAME"_graph_day.sh
echo --end=$TIMESTAMP_DAY_END \\ >> "$LOGFILENAME"_graph_day.sh
echo --title=\"$LOGFILENAME requests\" \\ >> "$LOGFILENAME"_graph_day.sh
echo --rigid \\ >> "$LOGFILENAME"_graph_day.sh
echo --base=1000 \\ >> "$LOGFILENAME"_graph_day.sh
echo --height=120 \\ >> "$LOGFILENAME"_graph_day.sh
echo --width=500 \\ >> "$LOGFILENAME"_graph_day.sh
echo --alt-autoscale-max \\ >> "$LOGFILENAME"_graph_day.sh
echo --lower-limit=0 \\ >> "$LOGFILENAME"_graph_day.sh
echo --vertical-label=\"requests per second\" \\ >> "$LOGFILENAME"_graph_day.sh
echo --slope-mode \\ >> "$LOGFILENAME"_graph_day.sh
echo --font TITLE:9:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo --font AXIS:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo --font LEGEND:8:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo --font UNIT:7:/usr/local/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:a=\"$LOGFILENAME.rrd\":requests:AVERAGE \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:b=\"$LOGFILENAME.rrd\":requests:LAST \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:c=\"$LOGFILENAME.rrd\":requests:MIN \\ >> "$LOGFILENAME"_graph_day.sh
echo DEF:d=\"$LOGFILENAME.rrd\":requests:MAX \\ >> "$LOGFILENAME"_graph_day.sh
echo AREA:a#6EA1007F:\"HTTP Requests\" \\ >> "$LOGFILENAME"_graph_day.sh
echo GPRINT:b:LAST:\"Current\\:%8.0lf\" \\ >> "$LOGFILENAME"_graph_day.sh
echo GPRINT:a:AVERAGE:\"Average\\:%8.0lf\" \\ >> "$LOGFILENAME"_graph_day.sh
echo GPRINT:d:MAX:\"Maximum\\:%8.0lf\" \\ >> "$LOGFILENAME"_graph_day.sh
echo LINE1:a#6EA100FF:\"\" >> "$LOGFILENAME"_graph_day.sh
sh "$LOGFILENAME"_graph_day.sh
You can run this script using the command:
$ ./go.sh example.log
And the output would look like:
Wed Mar 18 17:30:30 CET 2009 started script for log example.log..
Wed Mar 18 17:30:30 CET 2009 sorting log file example.log to example_sorted.log..
Sorting example.log
Wed Mar 18 17:54:23 CET 2009 calculating requests per minute in file example_requests.txt..
Thu Mar 19 03:22:42 CET 2009 making script example_createrrd.sh..
Thu Mar 19 03:22:42 CET 2009 deleting example.rrd..
rm: example.rrd: No such file or directory
Thu Mar 19 03:22:42 CET 2009 executing script example_createrrd.sh..
Thu Mar 19 03:22:42 CET 2009 making script example_updaterrd.sh..
rm: example_updaterrd.sh: No such file or directory
Thu Mar 19 03:23:10 CET 2009 executing script example_updaterrd.sh..
Thu Mar 19 03:23:27 CET 2009 making script example_graph_week.sh..
Thu Mar 19 03:23:27 CET 2009 executing script example_graph_week.sh..
595×199
Thu Mar 19 03:23:27 CET 2009 making script example_graph_day.sh..
Thu Mar 19 03:23:27 CET 2009 executing script example_graph_day.sh..
595×199
Thu Mar 19 03:23:27 CET 2009 done
Note the requests.rb file can take long, the above output was for an 18GB file. The graphs will perhaps look like this:




It’s here, and ready to 




