Saturday, 20 June 2009

The new \N regex escape

At the French Perl Workshop 2009, a talk on Perl 6 parsing by Stéphane Payrard reminded me of the existence of \N in Perl 6. It's a regex escape that is the exact opposite of \n : it will match any character that is not a newline, independently from the presence or absence of the single line match modifier /s. So I have now added it in Perl 5. This is my first feature contribution to the regex engine!

Sunday, 14 June 2009

Back from the FPW 2009

The French Perl Workshop 2009 has ended. I took some pictures (not much). I had the occasion to try on a smaller French audience the talk I'll give in Lisbon for YAPC::Europe.

One of the interesting presentations was made by the rtgi guys, who mapped the CPAN authors and modules, as well as the Perl community as seen on the web, on cpan-explorer.org. Enjoy the maps.

Saturday, 13 June 2009

The Future of Perl 5

Bjarne Stroustrup is quoted having said: It is my firm belief that all successful languages are grown and not merely designed from first principles. Being a perlist, (or a perlian? a perlard?), I can only agree. Perl 5 was grown, and continues to grow; although most of its growth is now outsourced to the CPAN (with modules like Devel::Declare, Method::Signatures, Moose, MooseX::*, and so on.)

Thus, I see little added value in adding new, complex syntax to Perl 5. My preference goes to making Perl 5 easier to extend. (Here you can feel the influence of the Perl 6 approach.) The core's job is not to duplicate and override the efforts made on CPAN to extend Perl, but to encourage them. CPAN is, after all, the killer app of Perl.

The big advantages of outsourcing syntax experiments to CPAN is that the community can run many experiments in parallel, and have those experiments reach a larger public (because installing a CPAN module is easier than compiling a new perl); that enables also to prune more quickly the failed approaches. This is a way to optimize and dynamize innovation.

So, a patch to add a form of subroutine parameter declaration, or to add some syntactic sugar for declaring classes, are probably not going to be included in Perl 5 today. Those would extend the syntax, but not help extensibility -- actually they would hinder it, imposing and freezing a single core way to do things. And saying that there is only one way to do it is a bit unperlish. It's basically for the same reason that we don't add an XML parser or a template preprocessor in the core: no single, one-size-fits-all solution has emerged.

Now, if a one single way of declaring subroutine parameters or classes emerges and stabilizes, it makes sense to add it in the core, and even to re-implement it in C, for efficiency reasons. But whatever is added will also impose some backwards compatibility requirements on the future core releases: we must be careful to avoid getting stuck with useless or ugly syntax. -- In turn, that means that new syntax can eventually be added for purely aesthetic reasons (like the stacking of filetest operators in 5.10).

Those general considerations don't mean, however, that all new syntax is to be ruled out from the core: if some new syntax is introduced in a way that improves the internals or can be taken advantage of by CPAN modules, it's worthwhile to include. For example, we can consider adding a way to declare methods differently than ordinary subroutines (possibly via a new keyword method that would supplement sub) : so we can forbid calling them as subroutines, or by magic-goto from a subroutine. We could also add a way for a subroutine to know whether it has been called as a method or as a subroutine. That kind of thing. Improving the possibilities of introspection and self-checking of Perl do improve its extensibility.

What about, then, the future directions of Perl 5? The Big Picture? The priorities? The Plan for 5.12 and beyond?

New syntax is nice and shiny. But for me, there are more important and urgent features that are needed now. New syntax introduces new bugs, and, Perl 5 being what it is, new edge cases; we should aim to reduce those instead.

In my view, (I could even use the word vision, but I won't do it), the future of Perl 5 should be mostly organised around two directions:


  • clean-up and orthogonalisation

  • giving more facilities for extensibility



To elaborate on that a bit:

Clean-up and orthogonalisation : the big TO-DO here for 5.12 is the clean-up of the internal handling of UTF-8 strings and the abstraction leakage that ensues in some corners of it. (This was referred to as the Unicode bug by many people.) Briefly, perl builtins like lc() or uc(), or regex metacharacters like \w, have a behaviour that depends on whether the string they operate on have the internal UTF-8 flag set. This shouldn't be the case -- that flag should be kept strictly internal. Fixing this will make the life easier to many people that handle Unicode strings with Perl, but it will break backwards compatibility. That's why it's not planned for a 5.10.x release.

Another field that could be improved is the behaviour of autovivification, which is currently not extremely consistent. Sometimes also autovivification is annoying -- a common example is a test like exists($h->{a}{b}), that autovivs $h->{a}.

Any one of those two cleanups, once implemented, would be important enough to deserve a 5.12.0 release (at least that's what I'm thinking this week).

Giving more facilities for extensibility : Providing more hooks to module writers. Perl is pretty hookable already; but the creativity of modules writers has no limits. Vincent for example was talking yesterday about making the internal function op_free(), used to free code from the memory, hookable -- which would help for some evil manipulations of eval(), the details of which I don't remember at the moment. More generally, a better API for manipulating the optree internals would be useful. I would like to have more hooks in the tokenizer as well -- Devel::Declare, for example, could benefit from this.

Once those goals are achieved, it will be time to add new syntax, on steadier grounds. The core will never have an XML parser, because the diversity of needs for parsing XML makes the diversity of modules necessary and welcomed; this is not true for object models -- many competing object models are not necessarily a good thing, especially inside a same application. But large-scale experimentation on CPAN enabled the community to make Moose much better than whatever a handful of P5Pers could have designed by themselves.

And now I shall STFUAWSC.