This year’s first OSCON 2008 tutorial is by brian d foy of Stonehenge Consulting Services. Personally I don’t think I’ll gain very much from this tutorial, but it should be full of some kind of useful information anyhow.
brian d foy is the author of Mastering Perl, which is available in dead tree form and online.
This tutorial isn’t for masters of Perl, but for people who just want to control Perl code, and this isn’t necessarily the right (or only) way to do it, but it’s the way that foy has done it. We’ll cover profiling, benchmarking, configuration, logging, and lightweight persistence.
The path to mastery runs from being an apprentice (beginners, supervised), to journeyman (competent in the trade, study under masters) and through to becoming a master (developed own style and created masterpiece). We don’t quite do this in software, but this is the idea behind the “mastering” in “Mastering Perl”.
Kicking off with profiling. Profiling is going to show you the performance of your program, but not just for speed. It depends on what you’re trying to do (i.e. save memory, etc). As part of your process, find out what your requirements are and focus on those.
When profiling, you can use the -d command-line option, which can let you use special profilers (like Devel::DProf) that let you discover what’s going on inside your programs. Unfortunately when you’re running things like this, the CPU time (at least from Devel::SmallProf) is useless these days as CPUs are so incredibly fast. Pay attention to the “wall time”.
Whenever you have a problem better algorithms are usually the best solution. Instead of tweaking individual lines (switching a foreach to a map, for example) re-writing using a new algorithm (iteration instead of recursion, for example) can get you somewhere.
If you memoize using the Memoize module) you can store results for functions that you call often. Calculate the answer the first time in, then store the result and use that for future calls.
Devel::DProf is more complex than Devel::SmallProf that can be used for better profiling.
Basics of profiling: profiling counts something. All the code runs through a central reporting mechanism. While recording, the program is slower. At the end you get a report, and you use that report to make a decision.
Sometimes you want to profile something that won’t get captured by a standard profiler, like recording the number of DBI queries made. It’s fairly easy to just roll your own counter (do something like add a line $Queries{$sql_statement}++;). Luckily there’s already DBI::Profile that you can use to profile DBI code.
If you want to make sure that your tests are actually testing your code, use the Devel::Cover module, which will tell you which lines of code are covered by tests, which branches are tested, etc.
Benchmarking is used to compare like to like — comparing the operation of one program on different hardware, comparing two programs that do the same thing in different ways, etc. Before we benchmark we need to think about what we’re doing, which is why profiling was covered first.
Every time we look at something, we change the situation. The act of measuring with the intent of benchmarking affects the outcome. And the tools have inherent uncertainties. We want both precision (repeatability) and accuracy (getting the right answer). All things being equal, there are lies, damned lies, and benchmarks. Everyone has an agenda, you don’t run testbeds as production, and skepticism wins the day.
Speed isn’t the only metric, and it might not even be the most important one. There are other things like disk use, IO, CPU time, concurrent users, memory use, bandwidth use, network lag, responsiveness, etc. And what about programmer time?
If you want to look at memory use, try the Devel::Peek module. This can give you memory use for e.g. hashes and arrays, so you can find out where all the memory in your application is being used.
Benchmarking can be done with Benchmark.pm. Unfortunately it’s misused by a lot of people. It only measures speed, and its algorithm introduces an error of about 7%. It only measures time on the local CPU as well. And be skeptical about the output — if you find that grep is running four million times a second, something’s probably wrong with your tests.
When reporting benchmarks, report the benchmarking computer, operating system, application details, etc.
Moving on to configuration now. The goal of configuration is to keep people from bothering you. Allowing users to modify the behaviour of your program without bothering you. Don’t let it get out of hand though…
Allowing users to edit code is popular (for example) but wrong. Don’t let them do this, because they’ll edit it with Word and muck things up. Using a separate config.pl is only marginally better, but a syntax error still kills the program.
Environment variables are a good way to allow users to configure a program, but it’s really only fine for command-line users. If you use these, set up defaults first and then override them in your program if the corresponding environment variable is set. (sidenote: “defined-or” operator in Perl 5.10: my $VERBOSE = $ENV{VERBOSE} // 1;)
Perl’s %Config from the Config module stores the options given to Configure when configuring and building Perl. This could be used to check if e.g. your program can use threads.
Command-line switches can be used as well, but the command-line could be alien to your users. Unfortunately there is no standard for command-line switches:
% foo -i -t -r % foo -i -t -d/usr/local % foo -i -t -d=/usr/local % foo -i -t -d /usr/local % foo -itr % foo -debug -verbose=1 % foo --debug=1 -it
Perl comes a way to do all of this: put the -s command-line option on your shebang line (command-line option to enable command-line options… turtles all the way down). This is an easy way to do command-line option handling without needing to bring in a separate module. That’s where Getopt::Long comes in.
You could also use something like ConfigReader::Simple or INI files. INI files have the advantage of being able to set different sections apart from each other. There’s a module to read INI files: Config::IniFiles. Config::Scoped is almost like code. None of these are brian’s endorsements; there are a lot of modules in the Config namespace so find one that’s in the format you have to use (or want to use).
There’s also a module called AppConfig that bundles this all together — command-line switches, config files, and a lot of other stuff.
And on to logging. Try to log without changing the program. Programmers don’t want to have to change the program to get information out of it. There’s also a bunch of different messages that could be interesting to different groups of people: error messages to users, debugging messages to developers, progress information, and any other extra information. Everybody seems to want to invent their own logging wheel, but there are two major modules: Log::Dispatch and Log::Log4perl.Log::Log4perl can be configured using a config file, meaning you don’t need to change your code to change how things are logged.
Getting a bit fancier, you can use the DBI appender to Log::Log4perl to log to a database. Log::Log4perl can also reload a config file on the fly, so you don’t have to stop the program to change the logging.
Lightweight persistence allows data to stick around between program runs. You pick up where you left off last time, or other programs can use the results. “Lightweight” refers to anything too small for DBI. Starting off, Data::Dumper can be used. YAML does much the same, but its format is nicer than Data::Dumper. And there’s also Storable (note: shouldn’t use store, should use nstore). And for more transparent stuff, there’s DBM::Deep.
Main points: Profile your application before you try to improve it. Be very careful and skeptical with benchmarks. Make your program flexible through configuration. Use Log4perl. And use lightweight persistence.
Further reading:

Pingback: Recent URLs tagged Study - Urlrecorder