Archive for category OSCON 2007

OSCON 2007: Perl 6 Update, by Larry Wall

The most recent in a long string of Perl 6 update presentations, Larry Wall talked about some of the new features of Perl 6. Usually he teams with Damian Conway, but unfortunately he’s not here this year.

Last year there were 207 separate patches to spec, this year there were 412. We’re in the final 20% of the 80/20 rule (80% of the project takes 80% of the time, the other 20% takes 80% of the time).

Class and module names can now have adverbial modifiers, so you can supply versioning information when you’re using or writing classes:

use Codex:author<cpan:LDAVINCI>:ver<2.0.1>;

Inline comments are now in:

$this = $code && #{this is a comment} $code2;

There’s no more defaulting to $_.

You can zip arrays together:

my %hash = @keys Z @values;

File test operators are gone, replaced with:

if $file ~~ :r { say 'Can read' };
my $size = $file ~~ :s;
my $size = $file.:s;

There is an explicit parallelism operator:

%hash = hyper map { $_, value($_) }, @array;

STM is implemented:

contend {
maybe { foo() }
maybe { bar() }
maybe { defer if $oops }
}

You can create a fixed-size array, and Perl will crash exceptionally if you try to access outside of the bounds:

@calendar[12];
my @spacetime[100;100;100;**];

Negative indices no longer count backwards from the end. You have to use the ‘whatever’ operator for that:

$array[*-N];

POD has been cleaned up and extended.

Also see Chris’s coverage.

[tags]oscon, oscon07, perl, perl 6, larry wall, damian conway[/tags]

No Comments

OSCON 2007: DBD::Gofer, A Stateless DBI Proxy, by Tim Bunce

DBD::Gofer (Gofer, hereafter) is a scalable stateless proxy architecture for DBI. It’s transport independent, efficient, well-tested, scalable, cacheable, simple, and reliable. A request contains all of the required information to connect and execute the request, so it’s stateless. A few transports that come with Gofer include a null transport, which is very useful for testing, as you don’t need anything on the backend to communicate with. A stream transport can allow you to ssh to a remote system to self-start a server:

ssh -xq user@host.domain \
perl -MDBI::Gofer::Transport::stream \
-e run_stdio_hex

You can also use an http transport to send requests over HTTP. With this you can use HTTPS for security, use web techniques for scaling and availability, and it supports web caching. A gearman transport distributes requests to a pool of workers through Gearman.

DBD::Gofer is a proxy driver. It accumulates details of DBI method calls, and delays forwarding requests for as long as possible. It aims to be as transparent as possible, but you can tune this to trade transparency for speed. Policies are implemented as classes, and there are three supplied: pedantic, classic, and rush.

Gofer doesn’t support transactions. You can’t modify $dbh attributes after you connect. It can’t use temporary tables, locks, and other per-connection persistent state, except if you go through stored procedures. Code using last_insert_id needs a simple change.

Gofer can automatically retry on failure. This is disabled by default, but is easy to enable. You’re also allowed to define your own behaviour, overriding the default behaviour to retry if $request->is_idempotent is true.

In the future, Gofer will allow for HTTP caching via appropriate headers. The server needs to agree to caching, and if it agrees, then caching will then happen just like for web pages. Gofer’s future includes JSON, which turns DBI into a web service, and you could have clients in a wide range of languages.

[tags]oscon, oscon07, perl, database, dbd::gofer, tim bunce, dbi[/tags]

No Comments

OSCON 2007: Subversion: Powerful New Toys, by Justin Erenkrantz

Subversion is version control software that we’ve recently switched to at the Joint Astronomy Centre. We don’t use any of the really advanced features of Subversion, but after this session we might.

Linus Torvalds called Subversion “the most pointless project ever started.” Awesome.

This session touches on new features since release 1.0. The first such feature is the FSFS repository backend. This was introduced in 1.1 and made the default in 1.2. The 1.0 implementation used a Berkeley DB backend which didn’t scale — the apache.org installation kept getting deadlocks, infinite loops, and corruption, all of which are bad things for version control software.

Optional locking was introduced in 1.2. This allows people to indicate when they’re working on a file that won’t merge well. It’s useful for binary files like Word documents or images. Locks are breakable by force, so you can’t get locked out. They don’t want the locks to get in the way of development. A svn:needs-lock property can be set on a file to essentially make it read-only.

WebDAV auto-versioning was introduced in 1.2 to allow DeltaV clients (Mac OS X Finder, Linux WebDAV FS, etc.) to commit to the repository. A caveat is that while versions are technically preserved, the automatic versions often look weird — files often get deleted and added, it plays havoc with diffs… It’s a “feature” of the WebDAV clients and not of subversion.

Path-based authorization was introduced in 1.3, which allows administrators to partition access to the repository. Authorization tends to be quite slow, but the overhead is in httpd, not Subversion. It’s fairly safe, as for every recursive operation, each path needs to be checked for access.

Operational logging was introduced in 1.3, which allows administrators to track commits, updates, checkouts, etc.

Improved language buindings were added in 1.3, allowing for Python, Ruby, and Perl SWIG bindings. The build requirement of SWIG was removed, and the interfaces were made much more language-friendly.

Added in 1.4 is improved working copy handling. A flat file is now used instead of using XML, as XML lead to fairly large overheads.

Repository replication was added in 1.4, which allows you to replicate the entire repository locally (svnsync init / svnsync sync). An easy way to allow third parties to easily mirror your repository. It’s only a one-way thing though, you can’t commit to the mirror and have it show up on the master repository. SVK relies on this.

A new binary diff algorithm was added to 1.4. It gives you substantial space savings (50% in many cases) and faster operations.

ra_serf is an alternative WebDAV approach that can replace ra_neon (aka ra_dav), and was introduced in 1.4. ra_serf does HTTP pipelining, so it’s faster than ra_neon. It also uses concurrent connections, giving further speed improvements.

In the near future, 1.5 is coming out. True merge-tracking is the last big feature that they’re waiting for before releasing it. Features I’ll talk about from this point on will be in 1.5.

WebDAV transparent mirroring will allow for geographically distributed mirrors. You’ll be able to set up a master server that will distribute updates to slaves via rsync or svnsync, and then clients will be able to read from and commit to these slave servers.

You’ll be able to do a partial checkout of a particular directory. There are a few bugs that need ironing out, and it needs finishing, but it’s in the trunk and will be in 1.5. It will allow you to check out an individual file in a directory, check out only a certain number of directory depths…

Interactive conflict resolution is coming. When you do a merge or update and have a conflict, you can bring up an editor and resolve the conflict immediately before the update completes instead of having to push off the resolution until after the update is completed.

Another quote from Linux: “Merging in Subversion is a complete disaster. They have a plan and their plan sucks too. It is incredible how stupid these people are.” Merge tracking will allow Subversion to, get this, track and record merges. svnmerge.py was introduced in 1.3, but it’s client-side (SVK does something like this as well). In 1.5 this will be pushed into the server.

In the distant future, some features that could come along are speed improvements, offline commits, local branches, and better merging. Subdirectory detachability will remove the .svn directory from each directory in your checkout. New repository formats may be coming. Clients could migrate towards thin Python applications. Atomic renames. Distributed repositories.

Slides for this presentation are available online.

[tags]oscon, oscon07, subversion, svk[/tags]

No Comments

OSCON 2007: Wednesday Morning Keynotes

Wednesday morning sees three keynotes and one interview session. Tim O’Reilly will be talking about Open Source on the O’Reilly Radar, James Reinders and Dirk Hohndel will be present the Threading Building Blocks again, and Simon Peyton-Jones will talk about Transactional Memory for Concurrent Programming. The interview sees Tim O’Reilly sitting down with Mark Shuttleworth.

Nat Torkington welcomed everybody to the conference, talking about the opening-up of society. The principles of Open Source software are creeping into other facets of life: computer hardware, body hacking, open democracy.

The O’Reilly Radar is essentially a brain-dump of what Tim O’Reilly’s thinking about. One of his fundamental beliefs is that the future is all around us, and we just have to see the patterns to determine what’s going to be big. He sees an issue in the ties between hardware and software, using Google as an example. Google’s software could be open and available, but it probably wouldn’t do anybody any good without the hardware to run it on. Open Source did not spring just from a set of licenses, but from a set of practices. We should be paying attention to projects like Wikipedia and OpenID. OpenID is racing to keep identity on the web open. Factors that have helped Open Source succeed are frictionless distribution, collaborative development, freedom to build on, adapt, or extend, and the freedom to fork. There are a number of projects that look interesting, including Hadoop, Foxmarks, StumbleUpon, and Intel’s Threading Building Blocks.

Which leads into James Reinders and Dirk Hohndel talking about TBB and multi-core parallelism. It was well done, with Reinders in a suit giving a presentation on how you can buy TBB, only to be interrupted by Hohndel, who then talked about Intel’s work with open source. It’s got graphics drivers, wireless drivers, power, and kernel, and is moving into mobile and internet linux devices. TBB needs to be open everywhere, across compilers, processors, and operating systems. It provides algorithm templates, a new memory allocator, and other things that I didn’t write down. It can be used with C code, so long as you compile it with a C++ compiler.

Simon Peyton-Jones says that there are two ways to do parallel programming: task parallelism and data parallelism. This keynote’s focus is task parallelism, where the state of the art is 30 years old: locks and condition variables. They’re fundamentally flawed: “it’s like building a skyscraper out of bananas.” So what’s wrong with locks? You can get race conditions, deadlocks, lost wakeups, and diabolical error recovery. And they’re absurdly hard to get right. Atomic blocks, the solution to all this, makes it easy again. How do you do this? Take your code, wrap atomic{ } around it, and then the code inside runs as “all or nothing” in a manner very similar to transactions in database land. So how do you actually make this work? One way could be to execute the code without taking any locks, and as you run through it you write only to a thread-local transaction log, not to memory. When you’re done, the transaction tries to commit to memory. Transactional memory looks incredibly cool, and it’s available in STM Haskell now.

And then I zoned out, so that’s the end of this post.

[tags]oscon, oscon07, tim o’reilly, haskell, intel, transactional memory[/tags]

1 Comment

OSCON 2007: Simple Ways To Be A Better Programmer, by Michael G Schwern

This tutorial isn’t about specific programming tips, it’s more about tools that you can apply to your programming skills.

Computer Science + People = Software Development

Now that you’ve grown up and don’t have structured learning, how do you learn new things? You have to unlearn how you previously learned, because those techniques don’t really apply very well any more. Buy why learn? Learning takes time, time that you could spend doing things that you already know how to do. One reason to learn is that you’re doing it wrong and you need to learn how to do it right. Also, cross-training is sexy; you can steal ideas from different areas and apply them to programming. If you tie yourself to a specific technology, that technology will eventually die and you’ll die with it.

A lot of people don’t get over the discomfort hump when they’re learning new things. Over time your discomfort with something new will diminish. Having a substitute skill (i.e. QWERTY when you’re trying to learn Dvorak) also prevents people from adapting new skills.

And a lot of people are afraid to say “I don’t know” and “tell me about it” — counter-intuitively, asking these questions can actually help your credibility. So ask them. And ask why, as doing so can teach you things when you try to find the answer. Knowing how something failed isn’t necessarily the same as knowing why something failed.

One of the ways to learn is to fail. You can’t learn from your mistakes if you don’t make any mistakes in the first place. And failing is something that you’ll do if you start learning new things, so failure shouldn’t be seen as a failure, it should be seen as a victory.

Hang out with people who like to learn. Don’t hang out with the 9-to-5ers who are just there to earn a paycheque. Join a community, as the community can help you learn with its massed knowledge. Try broadcasting your problem via blogs, IRC, user groups, etc.

Go be stupid. Go somewhere where nobody knows you’re supposed to be smart.

Learn a different language or language type. There’s procedural (Perl, C, assembly), functional (Haskell, Scheme, Lisp), object-oriented (C++, Java), and declarative (Star Trek computer: “Tea, Earl Grey, hot”, SQL).

Watch out for communication breakdowns. If possible, rely on face-to-face communication instead of email, where the communication is more “people talking at each other”. Video is nice as well, then try voice, then try IM. Don’t second-guess what people say. Assume that what they wrote is what they meant.

If you’re having an argument, you become emotionally invested in that argument and you really want to win. Find the courage to say “you’re right” to the person you’re arguing with. If there’s a problem somewhere in your code, don’t focus on the person who caused that problem, focus on the actual problem with the code. Tools are not good or bad, they have advantages and disadvantages. If you find you’re going back and forth with the same person on a mailing list, odds are good you’re talking past each other and there’s a source of miscommunication somewhere. Bitching about a problem isn’t the same as doing something about it.

The problem with programmers is that they care about code. Everyone else cares about the goals.

Avoid communication breakdowns by being available. If your business’s regular hours are 9-to-5, try to make yourself available for a slice of that, even if it’s just via IM.

Get stuff out of your head so that if you get hit by a bus, the project doesn’t die. There are various ways to do this, through code documentation, a wiki, a bug tracking system, use Getting Things Done philosophies.

Don’t do fear-based programming, because you’ll never do anything interesting. If you work in small chunks, you can get things done quicker, the less impact a failure will have on the larger picture. An antidote to fear-based programming is testing. Test first, test during, test automatically, and make sure all your tests pass all the time. Test your bugs; whenever you fix a bug write a test testing that that fix is correct.

Use effective version control. Use a code-test-commit cycle. Commit small chunks. Commit one idea per commit. Branch tasks off to their own branch so you’re not always committing to the trunk (more modern version control software such as SVK make this easy).

Two things to take home: software is about people, and do things in small chunky pieces.

[tags]oscon, oscon07, programming, michael schwern[/tags]

1 Comment

OSCON 2007: Learning Ajax, by Alex Russell

Ajax is all the rage these days. The asynchronous nature of interaction with websites is all over the place, from Google Maps to Facebook to… uh… Google Maps. Faced with a Perl tutorial that I’d already attended, I decided to wade into the Ajax pool.

The problems with developing applications for the web are browsers were essentially dead for half a decade after the release of IE6, it’s hard to “subclass” HTML tags, and a couple other things that I missed. To “fix” HTML, we need to present more and more information to the users, do it more quickly than we do now, present data change better, and allow for better layer markup and behaviour.

Most of “Ajax” isn’t actually about Ajax. It’s more of a system of how users interact with data, and improving that interface. So why did it take so long for Ajax to take off? For the longest time JavaScript was a four-letter word. Browsers sucked. It was hard to make discrete requests, more than likely using <iframe> tags or cookies. From a developers’ point-of-view, all of these things were pains. Luckily in 1999 Microsoft introduced XMLHTTPRequest (XHR hereafter), which was eventually implemented in Mozilla in 2002. It allows GET and POST, and you are allowed to pass things other than XML across it. Unfortunately, the API isn’t yet standardized, so things could potentially change in the future.

During a request event there are four edge triggers that get fired, and these can be queried through the xhrObj.readyState method on an XHR object. The example given checks to see if the value is equal to 4, which is a “magic number” that doesn’t jive well with me. Can’t they come up with some named constants?

Some examples of inspection and debugging tools include Firebug, TamperData, SquareFree Shell, Drosera, Microsoft Script Editor, etc, etc, etc.

Some important XHR properties to know are setRequestHeader() (to set headers to tell the server various things about the client), abort() (to abort, duh), getResponseHeader() (to see info about what the server’s sent back), overrideMimeType() (to override the base description of what’s being sent), and onerror() (for error handling). To put it all together, we still need to get at the returned data, handle errors, put the new data in the UI, and communicate what happened to the user.

So what to send over the wire? We could use XML or various other plain-text options (JSON, HTML snippets…). Pros of XML include native parsing in browsers, XPath (mostly) works, and it’s simple to use if the data is already XML. Cons of XML include its size and the support for XSLT is spotty (but improving). Pros of JSON include it’s not XML, it’s small on the wire, and it’s really fast to parse. Its cons include the lack of any equivalent of XSLT. Pros of JavaScript include the ability to send behaviour, it’s small, and really fast to parse. Its cons include the ability to send behaviour (which is insecure) and the lack of any equivalent of XSLT.

When you’re writing Ajax-y code, you have to make sure that all of your GET requests are idempotent. Pay attention to cache control headers so you treat the server nicely. Make sure that you give the user feedback, and for God’s sake test your stuff on all the major browsers. Don’t use Ajax just because you can. Don’t build UIs that can’t operate without JavaScript, unless this is an explicit design choice.

When developing Ajax web apps, use a library. There are many excellent closed-source libraries (none of which are listed), along with many excellent open-source ones (Dojo, script.aculo.us, and YUI are three examples).

This actually answers my earlier concerns about magic numbers: libraries handle that for you.

So why Ajax? Why now? The ceiling of HTML and CSS was rapidly approaching, and browsers were stagnant. Developers hadn’t fully trawled through the available features, only “discovering” XHR recently. It also avoids plugins, and the UI improvements are useful and important to users. And why shouldn’t you use Ajax? It’s complex and costs more to develop, you have to deal with quirks in browsers and software upgrades. JavaScript isn’t a “normal” language, and the Open Web isn’t really evolving quickly enough.

It’s interesting to note that the early prominent Ajax applications were in “dead” categories: Google Maps is an excellent example of how a UI change brought users in.

There are four key things that Ajax developers should adhere to: discoverability, recoverability, context, and feedback. Discoverability refers to how the user discovers data changes and what data is available. Recoverability refers to the ability to go back, which can mean not breaking the back button. Context allows users to know where they are in the process, and feedback refers to giving the user information about what’s going on. (I think I got all that right…) A few groups have developed some UI design patterns: the Yahoo Pattern Library and ajaxpatterns.org are two.

When you’re optimizing Ajax, there are three things you need to pay attention to: bandwidth, latency, and parallel vs. serial requests. Web Inspector is a really cool way to help debug and optimize. Strategies to optimize, from easy to hard, are using the cache, making the code smaller, request fewer things, request in parallel, delay loading until you need a specific thing, and move the host closer to the client.

When using the cache, expect a 50% cache miss rate. Set Last-Modified headers to very long for static things like images and JavaScript files to take advantage of caching. You can also cache on the server-side with things like Squid, memcached, mod_cache, or database tuning.

To make code smaller, look into gzip encoding with mod_deflate. You can also strip out white space and comments for deployed code, and strip dead code out.

Alex Russell presented this tutorial, which is also available here.

[tags]oscon, oscon07, ajax, alex russell, web programming[/tags]

No Comments

OSCON 2007: Intel open-sources parallel programming library

Tonight Alasdair, Chris and I attended an Intel “media event”. As upstanding members of the media (okay, we all have blogs), we were “invited” (okay, we signed up on a web page) to attend a grand announcement from Intel.

Let me say, drinking a beer on the bus to the announcement was cool. I was living like a rock star, if only for ten minutes.

Let me also say that drinking another beer before the announcement was cool.

Anyways, Intel’s announcement. Intel is open-sourcing Threading Building Blocks, which allows for easier programming and deployment of multi-threaded C++ code. It’s released under the GPL version 2, and will be shipped with a number of operating systems in the near future, including Red Flag Linux, Novell, and Solaris.

Honestly, all this talk about multiple cores and massively parallel systems sounds like a solution in need of a problem. I hope that they’re not talking about the average user, the Joe Bloggs who uses his computer for email and web surfing and playing Solitaire, because they don’t need a four-core processor. Hell, they hardly need a dual-core processor. Pushing beyond four cores? I don’t see why home users need it.

Higher-end users definitely need it. Scientific users, definitely. Industry, most likely. And I don’t know enough about Intel’s finances to know if these users make up a substantial amount of their business…

[tags]intel, parallel programming, threading building blocks, oscon, oscon07[/tags]

1 Comment

OSCON 2007: Advanced Parsing Techniques, by Mark-Jason Dominus

Mark-Jason Dominus kicks off Monday’s afternoon tutorial with Advanced Parsing Techniques (for Perl).

What’s parsing? It’s the process of taking an unstructured input (such as a sequence of characters) and turning it into a data structure. Parse::RecDescent is a closed system parser, but we’re going to look at an open one: HOP::Parser.

An example of something we might want to parse is a mathematical function that a user has input from a webpage (e.g. (x^2 + 3*x) * sin(x*2) +14). An easy solution is to use eval to turn user input into compiled Perl code. The problem with this is it’s easy to make it go wrong. An alternative would be to implement an evaluator for expressions. This would take your string and turn it into a list of tokens, a process called lexing. Perl is good for this because of its regular expression engine.

When you’re parsing, you generally need a grammar, which describes all of the expressions and tokens and how they interrelate. Parse::RecDescent basically uses this method, where each grammar rule becomes a function.

So far we’ve done a lot of parsing (including a lot of examples that aren’t included here), but what about evaluation? Enter a bunch of examples of how to do this that I’m not going to reproduce. Go buy his book. :-)

[tags]oscon, oscon07, perl, parsing, mark-jason dominus, higher-order perl[/tags]

No Comments

OSCON 2007: Sneak Peek at Tim O’Reilly’s questions for Mark Shuttleworth

Tomorrow morning Tim O’Reilly is going to have a sit-down Q&A keynote session with Ubuntu‘s Mark Shuttleworth. I managed to get ahold of some of the questions that are going to be asked:

  • So, what cologne do you use? You smell really good.
  • Have you upgraded to Vista yet, or are you waiting for SP1 too?
  • C# …. great programming language … or greatest?
  • Have you ever thought of giving more of your money away, say to other people on stage with you right now?
  • Open source is great, everything should be open source. Including your budgets. Give me commit access to your financials, would you?
  • Ubuntu is too hard to say. Why don’t you go with something easy, like 7?

Questions courtesy gnat on IRC.

[tags]oscon, oscon07, tim o’reilly, ubuntu, mark shuttleworth[/tags]

No Comments

OSCON 2007: Taming Legacy Perl, by Peter Scott

I arrived late to this tutorial because I set my alarm for 6:30PM instead of 6:30AM. I know now that it’s possible to wake up, shower, get stuff together, and get to the convention center in 25 minutes.

I arrived to Peter Scott talking about Test::More, which isn’t all that surprising given the topic: taming legacy Perl. I use Test::More in my modules (sometimes not very well), but I did learn that Test::More has functions like eq_hash for hash testing. Other useful modules he touched upon include Test::NoWarnings, Test::MockObject, and Test::MockModule.

Getting a bit away from legacy code, he recommends using h2xs when you’re starting to write a module, if only to get your Makefile.PL for free. Or you could use ExtUtils::ModuleMaker, Module::Start, or Module::Starter. Create a t directory, stick your test programs in there, then if you do perl Makefile.PL; make test it’ll run your tests.

Remember that your tests are programs: use good development practices, use strict and use warnings, keep them small, refactor as necessary, etc.

To test web apps, check out the WWW::Mechanize module, which mimics a webserver, and Test::WWW::Mechanize to test for HTML-y things.

Should you rewrite legacy code? The major advantages to doing this include:

  • you own the new code
  • you understand it
  • you remember it
  • testing is much more fun
  • it probably won’t take as long as you think…
  • …or, not rewriting will take longer than you think

The tutorial seems to be moving away from legacy code and towards good coding habits now… Code should be pretty to look at. You should add comments where you had to think a lot. You can use perltidy to fix up even the worst layout.

Analyzing code can be done using Devel::Cover and Devel::Coverage. These modules look at branching, code line touches, etc., to make sure that your tests adequately cover all possible conditions in your code.

Moving back to inherited code, you have to watch for the apparent level of Perl expertise: does it use Perl-ish structures such as regular expressions or hashes? Does it call parallel arrays or hashes instead of lists of lists? Does it call unnecessary external programs instead of using modules? Does it use my, local, or use? To reduce bloat, look for opportunities to use third-party modules, put duplicated code into subroutines, put duplicated subroutines into modules, don’t optimize for speed before optimizing the code for clarity.

Replace magic numbers with symbolic names. HTML strings should be replaced with a templating system (like HTML::Template or Text::Template). Use the /x modifier to document large regular expressions, or build them up using the qr// operator. Move variable declaration to the latest possible point. Use use strict. Use CPAN.

…and document using POD.

[tags]oscon, oscon07, peter scott, perl, legacy code[/tags]

2 Comments