Monday, December 28, 2009

Dist::Zilla -- part 1

Inspired by RJBS's advent article on Dist::Zilla, I'm getting ready to give it a spin.

Install Dist::Zilla

Install a bundle. This works fine, but didn't bring in Dist::Zilla:

% cpan Dist::Zilla::PluginBundle::Git

./Build install -- OK

Attempt to install the base Dist::Zilla, but it failed:

% cpan Dist::Zilla

Running make test
Has already been tested successfully
Running make install
Already tried without success

Cleaning my .cpan/build directory and trying again.

Before cleaning up the files, I'll check the makefile for the preqs, to see if I can narrow down the issue. I then cleared out my build space, manually installed the prerequisites, and then installed Dist::Zilla. This worked.

[andrew@mini]% perl Makefile.PL 0 ~/.cpan/build/Dist-Zilla-1.093400-1o8qqf
Warning: prerequisite Config::INI::MVP::Reader 0.024 not found.
Warning: prerequisite Config::MVP 0.092990 not found.
Warning: prerequisite Hash::Merge::Simple 0 not found.
Warning: prerequisite Moose::Autobox 0.09 not found.
Warning: prerequisite MooseX::Types::Path::Class 0 not found.
Warning: prerequisite PPI 0 not found.
Warning: prerequisite String::Flogger 1 not found.
Warning: prerequisite namespace::autoclean 0 not found.
Writing Makefile for Dist::Zilla

[andrew@mini]% rm -rf ~/.cpan/build/*

[andrew@mini]% cpan Config::INI::MVP::Reader Config::MVP Hash::Merge::Simple Moose::Autobox MooseX::Types::Path::Class PPI String::Flogger namespace::autoclean

...[snip]...[this brought in a lot of deps]
/usr/bin/make install -- OK

[andrew@mini]% cpan Dist::Zilla

Installing /apps/perl5/bin/dzil
Appending installation info to /apps/perl5/lib/perl5/i486-linux-gnu-thread-multi/perllocal.pod
/usr/bin/make install -- OK

And if I'm going to cargo cult from RJBS and use his tool, then I might as well go all the way by installing the RJBS plugin bundle.

cpan Dist::Zilla::PluginBundle::RJBS

Now what? Using Dist::Zilla

% dzil new My-Project
will create new dist My-Project in obj(/home/andrew/src/My-Project)
$VAR1 = {};
% cd My-Project
% ls
% cat dist.ini
name = My-Project
version = 1.000
author = andrew
license = Perl_5
copyright_holder = andrew


% mkdir lib t

Now, create a stub module in lib/My/, something like this (copied straight from the quoted article):

use strict;
package My::Project;
# ABSTRACT: our top-secret project for playing bowling against WOPR

use Games::Bowling::Scorecard;
use Games::War::Nuclear::Thermonuclear::Global;
use Path::Resolver 2.012;

=method play_a_game


This method starts a game. It's a strange game.


sub play_a_game { ... }


The #ABSTRACT comment will be pulled out and used as META data.

now, let's build the module:

% dzil build
beginning to build My-Project
guessing dist's main_module is lib/My/
extracting distribution abstract from lib/My/
couldn't find a place to insert VERSION section to lib/My/
rewriting release test xt/release/pod-coverage.t
rewriting release test xt/release/pod-syntax.t
writing My-Project in My-Project-1.000
writing archive to My-Project-1.000.tar.gz
And now take a look at what it built:

% find My-Project-1.000

This created the META.yml file, built the MANIFEST, created two additional tests: release-pod-syntax and release-pod-coverage, built a README and copied in the correct LICENSE file. And then it tarred it all up for me. Excellent.

There are additional plugins that can be used within the dist.ini file. [@Git] will verify that all the files are checked into git before doing the build. [@RJBS] will use the RJBS bundle, to pull in all the steps he normally uses for a module. Searching for Dist::Zilla::Plugin on cpan produces 6 pages of results.

I'll post an update as I work on using dzil for a real module, and let you know how it goes. So far, I'm pretty excited at keeping my code and boilerplate separated.

Saturday, December 26, 2009

More Perl Advent Calendars

After I finished my post on perl advent calendars I stayed up for two more hours and finished reading all of RJBS's calendar. A midnight to 2am well spent.

Working through the perl advent calendar, I found links to a few more.

Plack Advent Calendar miyagawa walking us through the new hawtness that is psgi & plack.
PSGI, plack
Runs us from day 1 of installing plack through creating apps and multiple apps living on one install and beyond. A nice addition to the plack documentation. We should all learn this one, and start building against the PSGI specification so our webapps can be deployed across a range of server types. has several advent calendars up: hacker, casual, dbix-skinny, data-model.
They are all written in japanese, but the code snippets are in perl.

Day 2 of hacker track has a nice piece on opts which is a DSL (domain specific language) for command line parsing, a nice wrapper around Getopt::Long.
use opts;

opts my $foo => { isa => 'Str', default => 'bar' },
my $bar => { isa => 'Int', required => 1, alias => 'x' };

Merry Christmas indeed!

installing hadoop on ubuntu karmic

Mixing and matching a couple of guides, I've installed a local hadoop instance on my netbook. Here are my notes from the install process.

I'll refer to the guides by number later. Doc 1 is the current #1 hit for 'ubuntu hadoop' on google, so it seemed a good spot to start.



1) created a hadoop user and group, as per document 1. Also ssh-key for hadoop user. (currently no-password, will check that soon).

2) added jaunty-testing repo from cloudera, see doc 2. They don't have a jaunty package yet. Add /etc/apt/souces.list.d/cloudera.list

#deb karmic-testing contrib
#deb-src karmic-testing contrib
#no packages for karmic yet, trying jaunty-testing, jaunty-stable, jaunty-cdh1 or jaunty-cdh2
deb jaunty-testing contrib
deb-src jaunty-testing contrib

3) install hadoop:

[andrew@mini]% sudo aptitude install hadoop                                                                   0 ~/src
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Reading extended state information      
Initializing package states... Done
"hadoop" is a virtual package provided by:
  hadoop-0.20 hadoop-0.18 
You must choose one to install.
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 25 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Reading extended state information      
Initializing package states... Done

3b) sudo aptitude update, sudo aptitude install hadoop-0.20

[andrew@mini]% sudo aptitude install hadoop-0.20                                                              0 ~/src
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Reading extended state information      
Initializing package states... Done
The following NEW packages will be installed:
  hadoop-0.20 hadoop-0.20-native{a} 
0 packages upgraded, 2 newly installed, 0 to remove and 25 not upgraded.
Need to get 20.1MB of archives. After unpacking 41.9MB will be used.
Do you want to continue? [Y/n/?] Y
Writing extended state information... Done
[... snip ...]
Initializing package states... Done
Writing extended state information... Done
4) this has setup our config information in /etc/hadoop-0.20, also symlinked as /etc/hadoop/ is loaded from /etc/hadoop/conf/ (aka /etc/hadoop-0.20/conf.empty/

Modify to point to our jvm. Since I installed sun java 1.6 (aka Java6), I updated it to: export JAVA_HOME=/usr/lib/jvm/java-6-sun

5) update rest of configs.
Snapshotted conf.empty to ~/config/hadoop/conf, and started making edits, as per doc 1. Symlinked into /etc/hadoop/conf

files available at document #3, my github config project, hadoop/conf subidr.

6) switch to hadoop user
sudo -i -u hadoop

7) initiale hdfs (as hadoop user)
mkdir ~hadoop/tmp
chmod a+rwx ~hadoop/tmp
hadoop namenode -format

8) fire it up: (as hadoop user)

hadoop@mini:/usr/lib/hadoop/logs$ /usr/lib/hadoop/bin/
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
hadoop@mini:/usr/lib/hadoop/logs$ /usr/lib/hadoop/bin/
starting namenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-namenode-mini.out
localhost: starting datanode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-datanode-mini.out
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-mini.out
starting jobtracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-jobtracker-mini.out localhost: starting tasktracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-tasktracker-mini.out

8) Check that it is running via jps
hadoop@mini:/usr/lib/hadoop/logs$ jps
12001 NameNode
12166 DataNode
12684 Jps
12568 TaskTracker
12409 JobTracker
12332 SecondaryNameNode

(note to self, why don't we have hadoop completion in zsh? Must rectify)

9) Run example. See doc 1:
hadoop jar hadoop-0.20.0-examples.jar wordcount gutenberg gutenberg-output

hadoop@mini:~/install$ hadoop jar hadoop-0.20.1+152-examples.jar wordcount gutenberg gutenberg-output
09/12/25 23:24:19 INFO input.FileInputFormat: Total input paths to process : 3
09/12/25 23:24:20 INFO mapred.JobClient: Running job: job_200912252310_0001
09/12/25 23:24:21 INFO mapred.JobClient:  map 0% reduce 0%
09/12/25 23:24:33 INFO mapred.JobClient:  map 66% reduce 0%
09/12/25 23:24:39 INFO mapred.JobClient:  map 100% reduce 0%
09/12/25 23:24:42 INFO mapred.JobClient:  map 100% reduce 33%
09/12/25 23:24:48 INFO mapred.JobClient:  map 100% reduce 100%
09/12/25 23:24:50 INFO mapred.JobClient: Job complete: job_200912252310_0001

hadoop@mini:~/install$ hadoop dfs -ls gutenberg-output
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2009-12-25 23:24 /user/hadoop/gutenberg-output/_logs
-rw-r--r--   1 hadoop supergroup      21356 2009-12-25 23:24 /user/hadoop/gutenberg-output/part-r-00000

It Lives!

Wednesday, December 23, 2009

Perl Advent Calendar(s)

It's that time of year again! Ok, I'm 23days late for the start, but I'm still within the valid range. Perl Advent Calendars!

I just stumbled across RJBS's Perl Advent Calendar. The article for Today (2009-12-24) is on Email::Sender::Simple. This is a great module. Seriously, if you are doing any email sending, this is what you should use.

I found and used this module over this past summer, while updating some legacy php code that included an emailed report. I was not looking forward to reimplementing the email sending routines in my perl version, but then I found Email::Sender::Simple and it is (as you'd expect from the name) super simple. Unless you want it to be more complex, and then it'll be more complex.

This whole advent calendar is pretty cool, because it is all stuff he's been working on. A brief moment of webstalking, and now I know who RJBS is. He wrote jgal, and igal clone, way back when. I should have an email around here from when I first switched from igal to jgal. That must have been a long time ago... it was, among other things, PRE-FLICKR! Hmm, scratch that, seems I was confusing jgal with jigl, both of which were igal inspired. I did just find an email from Oct 2003 personally announcing jigl vs 2.0 (jason's image gallery).

Some perl advent calendars:

Saturday, December 19, 2009


In a fit of inspiration, I flipped through my videos on my XBMC this morning and realized that I still have a copy of "MIT: Introduction to Algorithms" recorded in 2001. This morning I've been watching episode 3, "Divide and Conquer", recorded 9/12/2001.

It's interesting to see the people reacting to 9/11, when it was still a "great tradgedy" before it became "terrible attacks" when the call was "Feel this pain, yes, but get out and keep making the world better. That's what we must do after an event like this." And then right back into math and CS. I'm glad they didn't have any vinette on "divide and conquer" as practised by the Romans or British (I see that they did make a comment like that in the 2005 lesson).

This is a nice refresher. It's been a long time since I've formal analysis of algorithm run-times beyond "back of the envelope" estimations. If the cost of splitting a problem in two is small compared to the speed/complexity improvement of doing a half-size problem, then this is a win. See also map-reduce et al.

Perhaps it is also time to go through "Algorithms in Perl" and update it for "modern perl"isms? That'd make an interesting blog thread.
A recording from 2005 is available on and on youtube.

Peteris Krumin watched and blogged about all of these episodes. In fact, I think he's the one who transcoded them from realmedia and posted them on Thanks for releasing these under a CC license, MIT!

Friday, December 11, 2009

Dec Perl Mongers was a blast

We'll have the slides up soon along with some Perl+VIM tips. Really, I mean it this time. Do you have some favorite perl/shell/vim integration tips you'd like to pass along?

In the mean time, you can follow along in the git repository. Including the drama where 10 minutes before presentation time I merged in changes from my cohort that deleted 90% of the files, and then I blindly pushed those up to the public master.

You can look directly at the Vroom slides from . I'll render those to html and push them somewhere for viewing.

Omnicompletion actually worked during my talk/demo. I think that's the first time I've ever actually had it work. Totally exciting. While preparing the slides and doing I found that I was missing a configuration option in my vimrc, so my ftplugin/perl directory was getting skipped.

Tuesday, December 1, 2009

plack, is it really this easy?

I just wrote my first plack based prototype, built around Plack::Request. Plack::Request isn't normally used directly, but I'm using it here as a simple prototype to run a feasibility test for a proposed new project.

I took a coworkers existing MyEngine ModPerl2 prototype and refactored it into and Then I reused in listed below. (In between I added some tests for, yes it is all currently boilerplate scaffolding, that doesn't mean it shouldn't be tested).

All this server has to do is take an incoming request, pass it to a function from the (business)Logic module to get a list of possible responses, then call a helper from the Logic module to pick the winner then finally return the winner to the caller via JSON. Picking the possible responses can be time consuming and will be capped somewhere in the 100-250mS range, none of which is important yet outside of the business logic case. That will get interesting when I look into parallelizing the possible responses code (likely moving from prefork to Coro as the plank server to accomodate this).

Next up I'll benchmark this on a production caliber machine to test for overhead, latency, throughput and max connections. These will provide upper-bound possibilities, since the business logic is mostly empty scaffolding at this point. Testing will be against both Standalone::Prefork and Coro plack server backends.

Running the server via Standalone::Prefork server:
plackup --server Standalone::Prefork --port 8080
Running the server via Coro server:
plackup --server Coro --port 8080