Wednesday, September 22, 2010

September Los Angeles Perl Mongers

Game on!

I'm too sick to be here, but I came into the office tonight to run the meeting. Mad props to Tommy for driving down from Westlake to present tonight. Aran is also sick, so he's bailing on presenting. His presentation "12 cpan modules in 12 penta minutes" may well be cursed.

Big crowd tonight, 20+.

Monday, September 20, 2010

convert to CPAN Testers 2.0 (CPANTS)

I woke up to find a bevy of "Mail Delivery Failure" messages in my inbox. Seems the cpan-test reports I emailed in bounced back because cpan tester 2.0 dropped support of incoming email reports in favor of http. I'm excited to hear about this http switch, as I hated not being able to send test reports from machines that lacked email configurations.
This message was created automatically by the mail system (ecelerity).

A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed:

>>> (after RCPT TO): 550 cpan-testers no longer accepts test submissions via email. Please convert to CPAN Testers 2.0 and the http submission method. Instructions are at:

Let's follow the wiki article: and get upgraded!

Upgrade steps for a current CPAN::Reporter user:

  1. Upgrade CPAN::Reporter, add Test::Reporter::Transport::Metabase module
    # check current version:
    % perl -MCPAN::Reporter -l -e 'print $CPAN::Reporter::VERSION'
    % cpanm CPAN::Reporter
    Successfully installed CPAN::Reporter
    % cpanm Test::Reporter::Transport::Metabase
    Successfully installed Test-Reporter-Transport-Metabase-1.999008
  2. create a profile using 'metabase-profile', put it into location.
    % metabase-profile
    Enter full name: ...
    Enter email address: ...
    Enter password/secret: ...
    Writing profile to 'metabase_id.json'
    % mkdir ~/.cpantesters
    % mv metabase_id.json ~/.cpantesters
    % chmod 400 ~/.cpantesters/metabase_id.json
  3. upgrade my ~/.cpanreporter/config.ini file to add a transport line
    #add transport line to my ~/.cpanreporter/config.ini file
    % echo 'transport = Metabase uri id_file ~/.cpantesters/metabase_id.json' >> ~/.cpanreporter/config.ini
  4. Test
    cpan Hadoop::Streaming
    CPAN::Reporter: Test result is 'pass', All tests successful.
    CPAN::Reporter: preparing a CPAN Testers report for Hadoop-Streaming-0.102520
    CPAN::Reporter: ssending test report with 'pass' via Metabase
  5. Verify Test : Check metabase tail log for my entry.
    % wget --quiet -O- | grep -i grangaard
    [2010-09-20T20:47:06Z] [Andrew Grangaard] [pass] [SPAZM/Hadoop-Streaming-0.102520.tar.gz] [i486-linux-gnu-thread-multi] [perl-v5.10.1] [3acd2e9e-c4f8-11df-b898-64160c3e84b1] [2010-09-20T20:47:06Z]
Dr. Frankenstein, It lives!

Tuesday, September 7, 2010

Hadoop::Streaming PAUSE registration submitted

Submitted a PAUSE (Perl Authors Upload SErver) request to register Hadoop::Streamingin the User Interface tree at CPAN. I wasn't really sure which top-level category to put it in, but settled on UI as it provides a simple adaption of the Streaming interface of Hadoop.

Woo, my first registered module space. Update: oooh, brian d foy!

On Wed, Sep 08, 2010 at 06:50:34AM +0200, Perl Authors Upload Server wrote:
> The next version of the Module List will list the following module:
>   modid:       Hadoop::Streaming
>   DSLIP:       RdpOp
>   description: simple interface to Hadoop Streaming
>   userid:      SPAZM (Andrew Grangaard)
>   chapterid:   8 (User_Interfaces)
>   enteredby:   BDFOY (brian d foy)
>   enteredon:   Wed Sep  8 04:50:33 2010 GMT
> The resulting entry will be:
> Hadoop::
> ::Streaming       RdpOp simple interface to Hadoop Streaming         SPAZM

Monday, September 6, 2010

Hadoop::Streaming 0.102490 pushed to CPAN

I've pushed a new release of Hadoop::Streaming to CPAN. It should be available in a couple of hours, depending on how long it takes your CPAN mirror to do the mirror update dance.

The release includes expanded documentation in the base Hadoop::Streaming placeholder file. Also included is a Hadoop::Streaming::Combiner role, for creating combiners. Combiners are like reducers that run post-map, per-merge. Once can reuse the reducer as combiner, if the reducer produces the same key/value format on output as input.

After writing my new documentation, test and code, I tested it with dzil test. After passing tests, it's a simple 1 step push to CPAN and github via dzil release. AWESOME! Dist::Zilla makes maintaining CPAN modules brilliantly easy.

Happy Labor Day!


CPAN - Comprehensive Perl Archive Network
Hadoop::Streaming perl modules

There's no Step Two!

[andrew@mini]% dzil release                                1 ~/src/hadoop-streaming-frontend
[DZ] beginning to build Hadoop-Streaming
[DZ] guessing dist's main_module is lib/Hadoop/
[DZ] extracting distribution abstract from lib/Hadoop/
[DZ] writing Hadoop-Streaming in Hadoop-Streaming-0.102490
[DZ] writing archive to Hadoop-Streaming-0.102490.tar.gz
[@Basic/TestRelease] Extracting /home/andrew/src/hadoop-streaming-frontend/Hadoo
p-Streaming-0.102490.tar.gz to .build/dVEDcaew44
Checking if your kit is complete...
Looks good
Writing Makefile for Hadoop::Streaming
cp lib/Hadoop/ blib/lib/Hadoop/
cp lib/Hadoop/Streaming/ blib/lib/Hadoop/Streaming/
cp lib/Hadoop/Streaming/Role/ blib/lib/Hadoop/Streaming/Role/Emitter.p
cp lib/Hadoop/Streaming/Reducer/Input/ blib/lib/Hadoop/Streamin
cp lib/Hadoop/Streaming/ blib/lib/Hadoop/Streaming/
cp lib/Hadoop/Streaming/Reducer/Input/ blib/lib/Hadoop/Streaming/Redu
cp lib/Hadoop/Streaming/Role/ blib/lib/Hadoop/Streaming/Role/
cp lib/Hadoop/Streaming/Reducer/ blib/lib/Hadoop/Streaming/Reducer/
cp lib/Hadoop/Streaming/ blib/lib/Hadoop/Streaming/
Manifying blib/man3/Hadoop::Streaming::Combiner.3pm
Manifying blib/man3/Hadoop::Streaming.3pm
Manifying blib/man3/Hadoop::Streaming::Role::Emitter.3pm
Manifying blib/man3/Hadoop::Streaming::Reducer::Input::ValuesIterator.3pm
Manifying blib/man3/Hadoop::Streaming::Reducer::Input::Iterator.3pm
Manifying blib/man3/Hadoop::Streaming::Reducer.3pm
Manifying blib/man3/Hadoop::Streaming::Role::Iterator.3pm
Manifying blib/man3/Hadoop::Streaming::Reducer::Input.3pm
Manifying blib/man3/Hadoop::Streaming::Mapper.3pm
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00-load.t ....... ok
t/01-wordcount.t .. 8/? # perl path -> /usr/bin/perl
t/01-wordcount.t .. ok
t/02-analog.t ..... ok
All tests successful.
Files=3, Tests=19,  3 wallclock secs ( 0.05 usr  0.01 sys +  2.34 cusr  0.20 csys =  2.60 CPU) 
Result: PASS   

[@Basic/TestRelease] all's well; removing .build/dVEDcaew44
*** Preparing to upload Hadoop-Streaming-0.102490.tar.gz to CPAN ***

Do you want to continue the release process? [y/N]: y
[@Git/Check] branch master is in a clean state
[@Basic/UploadToCPAN] registering upload with PAUSE web server
[@Basic/UploadToCPAN] POSTing upload for Hadoop-Streaming-0.102490.tar.gz
[@Basic/UploadToCPAN] PAUSE add message sent ok [200]
[@Git/Commit] Committed Changes
[@Git/Tag] Tagged v0.102490
[@Git/Push] pushing to origin

Thursday, September 2, 2010

github + cpan = gitpan

Gitpan is a clone of all the modules on cpan in git form, nearly twenty-two thousand public repositories. This is not a place for development of modules. Instead it is a place to easily pull the current source for a module to make a patch and send to the maintainer, without having to find where she maintains her golden copy.

I read about gitpan a while ago, but then when I wanted to find it last week, I couldn't find the correct search terms. [github cpan] produces a list that doesn't include gitpan in the first page, as it is crowded out by the many perl modules developed on github for release to cpan and of course things like Net::GitHub and GitHub::Import, and an interesting discussion at perlmonks on (informal) perl naming convention for github projects.

Now that I know the name, it is still hard to find information! From the FAQ section of the readme:

What is gitPAN?
gitPAN is a project to import the entire history of CPAN (known as BackPAN) into a set of git repositories, one per distribution.

Why is gitPAN?
CPAN (and thus BackPAN) is a pile of tarballs organized by author. It is difficult to get the complete history of a distribution, especially one that has changed authors or is released by multiple authors (for example, Moose). Because releases are regularly deleted from CPAN even sites like provide an incomplete history. Having the complete history of each distrubtion in its own repository makes the full distribution history easy to access.

gitPAN also hopes to make patching CPAN modules easier. Ideally you simply clone the gitPAN repository and work. New releases can be pulled and merged from gitPAN.

gitPAN hopes to showcase using a repository as an archive format, rather than a pile of tarballs. A repository is far more useful than a pile of tarballs, and contrary to many people's expectations, the repository is turning out smaller.

Finally, gitPAN is being created in the hope that "if you build it they will come". Getting data out of CPAN in an automated fashion has traditionally been difficult.

Where is gitPAN?
The repositories are on at (watch out, it may overload your browser).

Code, discussion, and issues can be had at


How can I contact gitPAN?

Twitter: #gitpan


google search for [github cpan]
google search for [gitpan]
gitpan at github -- 21,976 public repositories and counting!
Schwern's announcement of gitpan on his use.perl blog.
discussion of gitpan and code:
gitpan issues:
a page with 4 links at