Perl Worst Practices is the logical antithesis to Best Practice Perl. In this tutorial, Damian Conway will tell us about all the bad things in Perl.
Best practices lead to code that’s maintainable, robust, and efficient, by coding predictably, repeatably, and consistently by adhering to “the rules”.
Worst practice programming leads to code that’s innovative, entertaining, and fun by coding intuitively. We’re going to look at some extremely Worst Practice Code, and draw a series of positive suggestions on how to code and how not to code. We’ll see Damian’s idea of obfuscated Perl, which is a scary prospect.
SelfGOL is his obfuscated code, and it can self-replicate, rewrite other Perl programs so that they self-replicate, detect un-rewritable programs, play Conway’s “Game of Life”, and animate any text as a cycling marquee banner. And the entry is in fewer than 1000 bytes of code. If you want to see it, it’s on the Perl 5 Wiki. Actually now that I look at that and compare it with the SelfGOL in the handout book, they’re different. Oh well.
SelfGOL is full of dark backstreets, everything you wanted to know about Perl and should have been afraid to ask.
In looking at SelfGOL (or other hideous programs) first run it through perltidy. Unfortunately this breaks SelfGOL, but that’s another story.
Principle 1: Sane and consistent layout makes code more maintainable (but it isn’t a magic bullet if the code itself is beyond help).
Principle 2: Using built-in features isn’t necessarily smarter or cleaner (even though fellow developers’ futile struggles to recall those features can be highly amusing).
The next line of SelfGOL ($;=$/;) uses punctuation variables. $/ is the input record separator, and if you use it in place of the newline character you can shorten your code! $; is the hash key separator:
$hash{'x','y'} = $val;
Perl replaces the contents of $; and turns it into one key:
$hash{'x'.$;.'y'} = $val;
This whole line makes a newline “constant” that’s difficult to read.
Principle 3: Obscure and obsolete features are obscure and obsolete for a reason (and retasking them for even more obscure purposes doesn’t help).
The next line (seek+DATA,undef$/,!$s;) looks for the __DATA__ block. The IRS is cleared, which means that the next read from the DATA handle will read everything in. It also returns undef, which becomes 0 in scalar context, so seek is told to seek 0 bytes relative to the value of the -s command-line option. If -s is on the command-line, then seek starts at the beginning of the file, otherwise it starts from the current position. In other words, if -s, then seek to the start of the file, and if not -s, do nothing.
Principle 4: Each statement should do one thing only (since that’s the upper limit most brains can comprehend).
Next line ($s && print) says “was there a -s on the command-line?” If $s is set, then print the value of $_, which prints the entire stuff slurped into $_ (which was done by a skipped line: $_=<DATA>;). Voila, this is the entire bit of the code that prints the entire program.
Principle 5: Relying on default behavious makes code very slightly easier to write and vastly harder to read (because most readers can see better than they can think).
Continuing that line (|| (*{q;::\;
;}=sub{$d=$d-1?$d:$0;s;';\t#$d#;,$_}) && $g && do {) doesn’t make much sense, because most of it is used to predeclare another subroutine. The newline is important, because this actually sets up a subroutine whose name is “semicolon newline”.
Principle 6: Randomly placed subroutine definitions are static (in the radio interference sense).
Continuing ($y = ( $x ||= 20 ) * ( $y || 8 );), this sets the width ($x) and height ($y) of the board from the command-line. If $x isn’t set, then ||= will assign 20. Likewise, if $y is missing, it defaults to 8. But! $y starts with the height of the board, but internally it stores the number of cells in the board. Why would you do this? Because it suits the internal representation of the board. The obvious representation would be an array of arrays, so obviously we can’t use that. Instead we “unroll” the arrays (C programmers would find this natural) and use modulo arithmetic. The board is a 1D string storing a 2D board with a genus-one 3D topology.
Principle 7: Choose data structures that simplify your task (even if the task is to make data structures incomprehensible).
Next line (sub l { sleep &f; }) is a utility “pause” subroutine in a nested block, but it’s not scoped to that block. &f; means call f() with this sub’s @_.
Principle 8: Just because you use some operation frequently doesn’t mean it should be in a utility function.
Next line (sub'f{pop || 1}): 'f is the same as ::f (comes from Perl4). By default, pop pops @_. If no arguments, return 1. This means that &f returns $_[0].
Principle 9: Encapsulating the familiar can decrease maintainability (refactoring isn’t a substitute for sanity).
In the next line p() prints the GoL board. Start with numerous newlines to clear the screen ($= is the page length specifier, which defaults to 60). Then we need to split the board up into chunks that’s $x characters wide ($b =~ /.{$x}/g). .{$x} matches $x characters at a time, /g in list context returns a list of all matches. this means that $b is broken ever $x characters, and putting this in the second (and more) argument to the join $; means that the regular expre
Principle 10: Treat any clever one-line solution as an alarm bell (or as an antipersonnel mine with a six-month delay fuse)
Next line (sub n { substr( $b, &f % $y, 3 ) =~ tr,O,O,; }) sets up a subroutine that counts live cells in the cell’s 3-neighbourhood. &f takes the first argument, modulo-$y wraps the board at the end of the string, we grab three characters, and use tr/O/O/ to count ‘O’s.
Principle 11: Familiarity breeds comprehension (it breeds contempt too)[1]
The next block (which I’m not going to list here) updates the GoL board. We start off with @_[~~@_]=@_;, which duplicates $_[0] into $_[1]. The ~ is the bitwise complement operator, and ~~ is a shorter version of scalar. --($f = &f) caches the first argument minus one. $m = substr( $b, &f, 1 ); grabs the current cell. Note the two &f calls — because of the way &f is implemented, the pop pops off from @_, which is why we needed the @_[~~@_]=@_ line to duplicate the first argument. The next block of code counts the number of neighbours, removing the count of the current cell which is (${m} eq +O=>). +O=> is just the letter O, because the + is a no-op and the => is the fat comma. This count is used as an index to the state table ( $w, $w, $m, O ).
Principle 12: Table-driven solutions are clean, efficient, and extensible (as long as you don’t mind losing a little comprehensibility).
At this point Chris pointed out that all of the explanations of the code are in the handout book, so now I’m just going to list the principles, some choice quotes, some little notes, and some exercises for the reader.
EftR: What does q++ do?
Principle 13: Building a messy data structure and then cleaning it up is often easier than building it cleanly in the first place (and to hell with the purists).
“It’s not a real Perl program unless there’s an eval somewhere in the first ten lines.”
Principle 14: Some code is better compiled at run-time (but the urge to use an eval is natures way of letting you know there’s not yet enough pain or misery in your life).
$i ? $b : $c = $b; can be used to modify which lvalue gets assigned to. The ternary operator has a higher precedence than assignment, so with parentheses in the right place this looks like ($i ? $b : $c ) = $b;.
Principle 15: Parentheses are our friends (because if you can remember all 24 levels of Perl’s precedence, you need to get a life).
Principle 16: Edge cases suck (and edge cases of familiar constructs suck worst of all).
$g =~ s?\d+? ($&+1)%$y ?e; might be Damian’s favourite line in SelfGOL. This modifies the actual code of the assignment in $i = 0. It finds the first digit in that string, then changes it to whatever it is, modulo $y. So when the string gets evaled again, $i will be initialized to different values. Also note the delimiters to s, the question mark. Using ? as a regex delimiter means that that regular expression will only match once between calls to the reset operator. What? Note that this only applies to matches though, not to substitutions! So in this case, ? doesn’t do anything!
If you’re wondering, the eval comes in the next line which isn’t here.
Principle 17: Code should do what it seems to be doing (especially when it seems to be doing subtle).
Principle 18: Conceptual elegance is no guarantee of actual maintainability (nor a good substitute for it).
Principle 19: If you’re going to have default values, define them near the place they may actually be used (or at least in some place where they’ll be easily found).
Principle 20: No matter how good you think your error messages are, they’re still too brief, too obscure, and too hard to decipher (even if you’ve already taken Principle 20 into account).
Principle 21: Avoid using obsolete and arcane magic punctuation variables with unfamiliar default values and unexpected global effects (even if you happen to enjoy a little self-inflicted pain in other recreational situations).
Principle 22: The fundamental complexity of any problem is irreducible (optimisations merely redistribute the pain differently).
Principle 23: Code that breaks when it’s reformatted is already broken (though on a much more profound and interesting level).
Principle 24: If it’s impossible to understand, it’ll be impossible to maintain (on the bright side, of course, such code is highly stable).
A computer science quine is a program that prints its own source code by concatenating (partial) copies of itself. Similar to a linquistic quine, such as “added to itself yields a sentence”.
Principle 25: Phenomimetic retrodeterministic nominativism does not improve code comprehensibility (then again, did the name make it sound like it might?)
“Far too few people are using their source code as a user interface.”
Principle 26: Don’t allow dynamic behaviour to violate static expectations (and the easiest way to do that is reusing over-scoped variables for unrelated purposes).
One of the most beautiful lines in SelfGOL: $"=",";. $" is the array stringification separator, and is normally a single space.
Principle 27: Explicit behaviours are better than implicit behaviours (especially when the specification of the implicit behaviour is syntactically baroque and hard to spot, and the behaviour itself is unknown to the majority of programmers).
Principle 28: Code that pre-caches or precomputes its data is much easier to maintain than code that caches or computes on the fly (when you’re running at multiple gigahertz, acquiring your data a few thousand operations early is still plenty JIT enough).
The line of Perl Damian is most proud of writing: y=[====y=]==||&d. With whitespace and substituting in “standard” delimiters, this is y/[// == y/]// || &d. Count up the number of matches, and if there aren’t an equal number of opening and closing parentheses, then call the d() function. There is no d() function, so Perl throws an exception, and the eval catches the exception, and prints “No”.
Principle 29: Coding is an art, but code shouldn’t be art (evolution made programmers boring, pedestrian, and aesthetically challenged for good reasons).
Further reading:
[tags]oscon, oscon08, perl, damian conway[/tags]
- but what doesn’t [↩]









Recent Comments