Everybody says that Perl is a lot like other languages, especially C, sed, and awk. That may have been true in the early days of Perl, but it’s not really that true now. Perl has a lot of features that weren’t previously available and that you might not know about. You may have been writing C programs in Perl all along, and now it’s time to do better: start writing Perl programs in Perl. Three things talked about in this tutorial that will help you write better Perl are caching and memoization, iterators, and parsing.
Many functions depend only on their arguments, so why bother having those functions redo potentially compute-intensive calculations? Once you’ve calculated the result, you should cache it, or save it in memory, so the next time your function is given the same input it can do a simple memory lookup. An example of this is sorting dates of the form “May 17, 1998″. A naive way to sort these dates would be to convert them into a sortable string or number, then sort that way. You could use YYYYMMDD, so that “May 17, 1998″ would be converted to 19980517, which you can then use as a number to sort against other dates. However, this conversion process isn’t free, as you have to split up the string into month, day, and year, then convert the month into a number, and join it all back together again. Why bother doing that twice? If you’ve already converted “May 17, 1998″ to 199980517, why not store the result so you don’t have to do that conversion again? Simply stick the result into a cache structure (in Perl, that’s probably going to be a hash), and next time through, check to see if a result exists in the cache first, and if it doesn’t, then do the calculation.
Doing this to all of your code is fairly easy, but it’s formulaic and can be done automatically by using the Memoize module. Memoize isn’t for everything, so don’t use it for everything. A good example of where it can bite you is random number generation. You want the function to calculate a new random number each time, you don’t want a cached result sent back to you.
Switching tracks a bit, iterators can be seen as an object interface to a list, where you can move through the list by successively retrieving the next object in the list. This helps save on memory because the iterator doesn’t need to keep the entire list in memory, only the list members that it’s already seen.
Perl programmers are used to using iterators whether they realize it or not. A filehandle is an iterator that allows you to step through a file one line at a time.
Iterators have to retain state between calls, because they have to remember where it is so it can get to where it’s got to get next. One drawback of iterators is that they’re like toothpaste tubes — it’s easy to get the data out of it but hard to put back in. It’s difficult to go back to the beginning of your list and start again. If you’d want to do that, you should use a stream, which is essentially the same as an iterator. The only difference is mainly in the interface, as a stream can be used to look ahead at what’s in the list, and an iterator can be used to retrieve what’s next in the list.
Parsing is the process of taking unstructured input and turning it into a usable data structure. To write a parser it’s usually easiest to write a grammar, that is, some set of instructions that describe how different parts of the input are going to be made up. These grammars are usually described in Backus-Naur form, or BNF. What you can then do is build an iterator that walks its way through the input string to generate tokens that match the grammar, thus parsing the input string into a usable data structure.
It turns out that Perl isn’t much like C or awk or sed, it’s more like lisp. And lisp has been around a lot longer than Perl has, so it would be very useful to talk to lisp users and discover their learnings as they’ve gone along developing in lisp. The only problem is that nobody wants to do that because lisp users are really bitter. Luckily for us, Mark-Jason Dominus has done that (talked to lisp users, not got really bitter), and his Higher-Order Perl is the fortunate result.