Thursday, February 18, 2010

SCaLE

SCaLE 8x,the Southern California Linux Expo, starts tomorrow, Feb 19, 2010. "Where OpenSource Happens". SCaLE 8 will be held at the Westin LAX.
The premier Open Source conference in the U.S., now in its 8th year, has content for everyone! If you're looking to learn, choose talks from a developer's track, a beginner track, or one of the three general interest tracks.

Looks like a fun event for this weekend.

Vacation

Dear managers,

Please set a good example by taking your vacations. Leave the laptop at home and really unwind. Encourage your employees to take theirs. The short term cost of the week or two lost in a couple of project schedules is going to be made up in improved worker productivity, quality, and engagement on return. If you can't accomodate that amount of slowdown, you probably have other problems too.

I'll let you know next week when I'm back from Hawaii. This is my third visit to the Kaimana Klassik ultimate tourney. I've had a total blast, though I have yet to find time to sneak in a day or two of work like I'd hoped. 10 days is a long,luxurious, wonderful trip -- and drains my vacation account down to roughly 0 hours.

No perl posts for you this week! I did do some fun re-factoring while flying over.

Friday, February 12, 2010

Bil & Ted's Excellent Adventure

Can't afford to attend TED in person? Consider a BIL conference.

BIL 2010 Starts tomorrow, Friday Feb 12th in Long Beach. I'd go if I wasn't on vacation in Hawai'i. Props to Tilly for telling me about this conference at last month's Mindshare la, I hope you're going and taking notes.

BIL 2010 is happening right now in Long Beach, CA. View the livestream or the BIL2010 website.

BIL is an ad-hoc conference for people changing the world in big ways. It's a place for passionate people to come together to energize, brainstorm, and take action. We invite you to bring your world into ours.

John Smart -- Evo Devo Universe
Understanding the universe in an evolutionary and developmental way
Kiem Tjong -- Democratizing the University Innovation Process
Why is it so difficult to commercialize university technologies, and how can we better harvest the work of our brightest minds.
Cameron Sinclair -- Open Source Architecture: From TED Prize to reality
Open Architecture Network, winner of the TED Prize and the world’s first open source site for architectural solutions to humanitarian crises.
Brad Templeton -- The Evils of Cloud Computing
There is a darkness to this cloud computing movement: we are giving other companies all our personal data to store and handle.
David Hale -- Pillbox – Identify Unknown Drugs
This tool allows users to visually search for and identify unknown pills using appearance information and high resolution images.
Alexandros Pagidas -- University of the Future
Current education does not create free, creative and wise individuals, but workers for the requirements of the market. Most universities give you an education that will supply you with a career – not a good life.
Jayson Elliot -- Rethinking the Modern GUI
Why are we stuck with the same user interface computers had 30 years ago? Jayson explores the future of the Graphical User Interface.
John Schloendorn -- A Garage-Level Biomedical Research Effort is Taking on Death
Using out-of-the-box thinking and a shoestring budget to attempt to cure human aging.
Chia Hwu -- Communities in Healthcare
Healthcare is about people. How can communities be used to improve it?
Deep Learning for Artificial Intelligence
How do we build natural intelligence into machines?

Thursday, February 11, 2010

LA Perl Mongers, January Recap

We had a great meeting this month! Thank you guests and presenters. Twenty mongers in attendance at the start with a few more filing in later. I thought I had over ordered by getting 5 large and one XL pizza, but they disappeared within 20 minutes.

Looking forward to seeing you all again on Wed February 24, 2010.

My talk on Hadoop::Streaming had to cover too much breadth of background to spend much time getting into the code. I look forward to doing a part two where I display a non-trivial example. The talk and slides covered an overview of Hadoop, Hadoop streaming, writing streaming jobs manually, writing streaming jobs using Hadoop::Streaming from CPAN, testing jobs locally, and finally running jobs on the cluster.

The slides will be posted soon. There was a hardware failure on the *.pm.org hosting box on Tuesday into Wednesday, so I didn't push an update. You can see them in raw vroom format in the Git Repo.

Aran was kind enough to come down from Thousand Oaks and present a nice talk on "Coding Concisely in Perl." This was a fun, interactive presentation discussing various ways to make your code (or your inherited code) cleaner and more representative of its core ideas. From simple changes like using ternaries for setting variables, using map/grep to replace cut-and-paste code, and finally a look at the ValueClick method of mocking the production DB. Their DB Mock layer uses a shim to create a local sqlite3 db by autoloading the schemas from the production databases. It's quite clever.

The slides on DB testing have company code on them, so we are waiting on his CTO to approve releasing them for publishing. I expecte to have them up on la.pm.org soon.

Monday, January 25, 2010

perlbal

I'm looking for information on using and extending perlbal. Perlbal is an http load balancer / reverse proxy from sixapart/danga.

There is a bit of documentation with the source code:

I've found some good info in the slides from a YAPC 2009 talk : perpal-tutorial. It's at slideshare.net, so it requires flash or a login.

Ok, now that I've read all that, how do I set it up as a simple proxy/balancer between my internal service and n (small) external hosts, with connection pooling, persistent connections from perlbal to the external hosts?

My internal service will need to quickly send a request to each of the remote servers and then aggregate/modify the collective response. My internal service won't wait very long for the responses from the n hosts => slow responses (<~200ms?) will just be dropped from the algorithm.

... investigation continues.

Saturday, January 23, 2010

Reading List Updates.

I can't go home and not go to Powell's. I can't go to Powell's and not buy some books. I don't want to change either of those facts.

On my trip home over Thanksgiving I had time on my way out to look at books at the airport Powell's location. There were 5 employee recommendations in the business section, and I had trouble not buying them all. I came home with 4 books that day. Today I'm moving two of them into the reading list.

See if you can spot the hidden thread that ties these items all together

Present Like a PRO
"The field guide to mastering the art of business, professional, and public speaking."
I flipped through just a couple of pages and decided to stop flipping and just move it directly into the buy pile. As soon as I finish this post I'm going to spend 1 hour curled up on the couch with it. I hope to have a reivew for you soon. It touches on all the parts of a business presentation, from the mechanics of audio visuals (aka slides), to practicing, to connecting with your audience. I've heard it said that, "The fear of public speaking is the number one fear of American adults." If it's not my number one fear, that is only because I have so many big fears. And I'm planning on knocking those fears off, one (or more) at a time.
How to Connect with Anyone
96 ways to All-New Little Tricks for Big Success in Relationships
This was another recommended book that fell quickly into the buy pile. Flipping through the first couple of points, I was able to tell that these were tips I could follow, as they included both the high level concept, eg "extend eye contact" as well as specific steps to get there. A fake it till you make it approach of which I am so fond. I may be a "social engineer" as the marketing ladies so sweetly refer to me, but I'm still an engineer and nerd. Make a game of it. "Look at their eyes, what color are they? Not just green, but green with little flecks of black and is that gold? What is the shape? The ratio of height to width?" Now go practice. Practice on your friends. Practice in the mirror. Get to where it feels comfortable and you won't play these games. At that point you can actually look into their eyes just to look at them and connect and listen. 95 more tips to go.
Moose::Manual and Moose::Manual::Roles
I've let some of my basic Moose understanding sink in enough that it's time to branch into the concept of Moose Roles. These are like helpers or mix-ins in other languages. Roles are not directly instantiated, but pulled into another class. The Role interface specifies a contract on both sides. The Role must provide certain functions and can require that certain functions be present in the other class. Once loaded, the Role functions are directly accessible in the class. There are several Roles in the Hadoop::Streaming class that I wish to extend. With my presentation around the corner (Wednesday), this will pop to the top of the reading list very soon.

Tuesday, January 19, 2010

hadoop: restart the cluster and run a job

Following the steps in the previous hadoop post, I have a working single instance hadoop cluster on my laptop. Here's a refresher on restarting it and using it.

Start up Hadoop

#login as hadoop user:
sudo -i -u hadoop

#start cluster:
/usr/lib/hadoop/bin/start-all.sh

#check cluster is running via jps:
jps
30957 SecondaryNameNode
31046 JobTracker
30792 DataNode
30638 NameNode
31533 Jps
31205 TaskTracker

Upcoming Tech Events (Jan and Feb)

After the Christmas lull everything is going full bore again! Let's find out what fun tech events are coming up. Fun for the whole nerd family!

January

Jan 1-17

Saturday Jan 9: LiLAX Linux Users Group. Dan Kegel, google chrome engineer, talking about WINE (Wine Is Not an Emulator). Really wanted to hit this one, bummed I couldn't fit it in. Would have tried a little harder if I knew Dan was speaking. Recurring on the second sat of the month.
Tue: Hadoop Meetup: quarterly community meeting
Bar Camp San Diego Jan 16-17. Any word on a new date for BarCampLA?

Jan 18-24

Happy MLK Day!
Tuesday: 1/19 : LA Talks Tech 7pm.
Thusday: mindshare.la : Mars, Social cachet and flexible reality.

Jan 25-31

Mon: 1/25: SXSW Interactive L.A. Networking Event 7-9pm. RSVP by Friday.
Wed: 1/27: Perl Mongers at Rubicon Project 7-9pm. Hadoop + Perl
Thu: 1/28: Twistup LA come see the area's top startups.

February

Tue: Feb 9 : Hadoop Meetup @mahalo. Sat: Feb 13: LiLAX: Linux Users Group (estimated. second saturday of month).
Feb 22: LA WebDev meetup @Santa Monica Public Library. First event of the year. I'll be on vacation. Someone tell me how it goes, thanks.

Looking for something less established where you can get in on the ground floor? Have you considered CrashSpace.org, the new public hacking space in Culver City? Sean just got that opened up in late December, and there are <100 members. Get in early. You can probably even still make your own event like "Show and Tell" Friday, "Take-Apart-Tuesday" and "Sunday Craftday."

LA Perl Mongers - January 27, 2010

The LA.pm.org website[0] is (finally) updated with information about the next LA Perl Mongers meeting, to be held Wednesday January 27, 2010.

Our first presentation will be an introduction to using the streaming interface[1] to Hadoop[2] from Perl, using Hadoop::Streaming[2]. After a brief overview of Hadoop the talk will focus on building streaming jobs and getting the necessary infrastructure in place to support them.

Our second presentation slot is open, ready for a volunteer. Otherwise I'll do a second presentation on packaging CPAN distributions using Dist::Zilla[4]. "Dist::Zilla - distribution builder; installer not included!"


[0] http://la.pm.org
[1] http://hadoop.apache.org/common/docs/r0.15.2/streaming.html
[2] http://hadoop.apache.org
[3] http://search.cpan.org/~spazm/Hadoop-Streaming-0.100060/
[4] http://search.cpan.org/dist/Dist-Zilla/

Tuesday, January 12, 2010

Hack day with Kenny: Fey::ORM, testing and screen.

After sleeping through the Linux@LAX users group meeting (sorry guys), I rolled up to Kenny's (Kenny Flegal), where he had invited me for a day of coding and authentic Salvadorian food. Win Win!

I showed him briefly the topic of my upcoming Monger's presentation, but mostly we looked at his current project. He is forking a GPL licensed project, to recreate part of the functionality and extend it in a different direction. Along the way he's rewriting the app layer in perl from command line php scripts.

We discussed the various clauses of the Gnu Affero GPL with regards to the hosting of the project during the initial revs. Can he have a public repository before he has finished changing all references to the old name to a new name and adding "prominent notices stating that you modified it, and giving a relevant date" as per Section 5, paragraph a? We decided that he probably could, but that it'd be easier to start with a private repo and not publish until that part is done. That seems sub-optimal from a "getting the source to the people" mindset, but it is more optimal in the "protect the good name of the original project and publishers." For a fork that won't follow up-stream patches, does one just make a single prominent notice to that affect like, "forked from project XYZ on 2010-01-02?"

Along with switching from php to perl, he's pulling out the hard coded sql from the scripts and moving to an ORM. He's picked Dave Rolsky's impressive Fey ORM. This project has a ridiculously complex set of schemas, with inconsistent table names and not explicit foreign key constraints. As such, it is extra work to get the fey schema situated.

Kenny started to give me a run through of some of the code, but it was awkward with both of us on laptops to see the code conveniently. I made him stop and set up a screen session for sharing, as described in my previous post on screen. This was more difficult than I expected, with the problem eventually being that ubuntu 9.4 and beyond has moved /usr/bin/screen to /usr/bin/screen.real and made screen a shell wrapper. The screen multiuser ACL system requires that the screen binary be setuid (chmod +s). With this setup we needed to make screen.real setuid. That took a while to notice.

Once we had a shared session open, it was much easier for him to give me a guided tour of the codebase and database/sql setup. Once that was clear it was time to get some code started. He showed me some of the Fey::ORM model code and how he was migrating over the individual sql statements to the ORM. He had been plugging away on the model code for a while, starting by creating a comment for every line of sql in the application including the file and line of the caller.

The next step was clear, we needed some tests. We set to work getting an initial test of the model code. First we installed Fey::ORM::Mock as a mock layer. This works at a higher level than a standard DBD::Mock interface to allow better testing of the Fey::ORM features. The test didn't pass at first due to missing data in the mock object, so we grabbed a list of the fields that mapped to DB fields and started adding values to pass constraint failures on the data. Once we had a minimal set of data then we started to see problems with the ORM schema description. The lack of well defined foreign key constraints meant we needed to explicitly define that structure for the ORM. More boilerplate code into the model. We repeated this test-update-repeat cycle a few more times adding more data linkage descriptions.

I took a brief break from our pairing and jumped to a different screen to install some goodies. I grabbed a copy of the configuration files from the December la.pm.org talk and started updating his config. He didn't have a .vimrc, .vim or .perltidyrc on this brand new dev box, so I pulled those in from the repo. I showed him how much time using ":make" in vim could slice off his build/test cycle, and he was super excited. (ok, not till the third or fourth try but he eventually got the hang of it).

To get around some issues in code placement, I modified the .vimrc and .vim/ftplugin/compiler code to add -MFindBin::libs to the calls to perl -c and prove. This allowed the parent libs/ directory to be found for these non-installed modules. This is a bit of a hack and I'll get it removed as we move closer to an initial release and pick a packaging tool, possibly Dist::Zilla.

An open question is the speed of Fey::ORM. It takes a big startup hit while building the models from the schema and interacting with the database. This is supposed to lead to a big speed gain during runtime from aggressive caching of that information. All I know for certain is that the compile-run-test cycle was really slow. This is my first time using Fey so I don't know how this plays out normally. It could just be that the number of crosslinked tables in the db config were causing additional slowdowns.

By this point we had already had two delicious meals of El Salvador cuisine and it was approaching midnight. The first meal was home cooked fried (skinless) chicken for lunch and the second was papoosas at a local, excellent place in Van Nuys. I was all coded out, which made for a perfect transition to the party at Andy Bandit's that night, conveniently just 6 miles from Kennys.

All in all, a fine Saturday.

Thursday, January 7, 2010

Perl Iron Man challenge -- is the cron still running?

Am I linking to the correct image files?
In the Matt's Announcement, he said to link to the images in http://ironman.enlightenedperl.org/munger/mybadge/male/. I see all of those images as having a modified date of 2009-10-03. Perhaps that is just a trick of the linked files, as that is the modified date for the live image files. All of the CSV Files seem to be really old too. Is that not the path that the CSV cron job is using?

Now I'm wondering if I really missed a day or two in there where I pushed it to 10 days when I was slacking (for weeks on end) by doing work-work instead of blogging, or if I really have hit IRON MAN status after 6 months of blogging. I thought today was 6 months, but it's actually 7 months! I still show up as Bronze Man status I hope I didn't push too many 10 day windows in a row and risk not hitting the 4 posts per 32 days rolling window. (I don't think I knew that was a requirement, before reading the rules just now. Well, I must have known that at some point, as I made a post about it many moons ago.


[andrew@mini]% date -d '2009-06-06 + 7 months'
Wed Jan 6 00:00:00 PST 2010

Tuesday, January 5, 2010

CPAN upload, ONE!

My very first CPAN upload is happening *RIGHT NOW*. go go gadget CPAN upload! And now off to sleep, really.


2010-01-05 13:26:15 $$23346 v1048: Info: Need to get uriid[S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz] (paused:333)
2010-01-05 13:26:15 $$23346 v1048: Info: Going to fetch uriid[S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz] (paused:619)
2010-01-05 13:26:15 $$23346 v1048: Info: Requesting a GET on uri [ftp://pause.perl.org/incoming/Hadoop-Streaming-0.100050.tar.gz] (paused:641)
2010-01-05 13:26:16 $$23346 v1048: Debug: ls[-rw-rw-r-- 1 root root 16457 2010-01-05 13:26 /home/ftp/tmp/S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz
]zcat[/bin/zcat]tpath[/home/ftp/tmp/S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz]ret[]stat[2057 470407906 33204 1 0 0 0 16457 1262694376 1262694375 1262694375 4096 40]: No child processes (paused:696)
2010-01-05 13:26:16 $$23346 v1048: Info: Got S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz (size 16457) (paused:492)
2010-01-05 13:26:18 $$23346 v1048: Info: Sent 'has entered' email about uriid[S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz] (paused:555)
2010-01-05 13:27:43 $$23346 v1048: Info: Verified S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz (paused:304)
2010-01-05 13:27:43 $$23346 v1048: Info: Started mldistwatch for lpath[/home/ftp/pub/PAUSE/authors/id/S/SP/SPAZM/Hadoop-Streaming-0.100050.tar.gz] with pid[24954] (paused:309)
2010-01-05 13:27:53 $$23346 v1048: Debug: Reaped child[24954] (paused:64)

I think it should show up soon on my cpan page or perhaps the Hadoop::Streaming search.cpan page?

Hadoop Streaming - running a job

Logs from running a hadoop streaming job in a freshly set up environment
  1. copy input files
  2. run streaming jar (now located in contrib/streaming)
  3. hope.

1) copy input files to dfs:
hadoop dfs -copyFromLocal examples/wordcount wordcount

2) run streaming job:
hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar \
-input wordcount \
-output wordcountout \
-mapper examples/wordcount/map.pl \
-reducer examples/wordcount/reduce.pl \

packageJobJar: [/home/hadoop/tmp/hadoop-hadoop/hadoop-unjar5876487782773207253/] [] /tmp/streamjob4555454909817451366.jar tmpDir=null
10/01/05 03:29:34 INFO mapred.FileInputFormat: Total input paths to process : 1
10/01/05 03:29:35 INFO streaming.StreamJob: getLocalDirs(): [/home/hadoop/tmp/hadoop-hadoop/mapred/local]
10/01/05 03:29:35 INFO streaming.StreamJob: Running job: job_201001050303_0003
10/01/05 03:29:35 INFO streaming.StreamJob: To kill this job, run:
10/01/05 03:29:35 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201001050303_0003
10/01/05 03:29:35 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0003
10/01/05 03:29:36 INFO streaming.StreamJob: map 0% reduce 0%
10/01/05 03:30:19 INFO streaming.StreamJob: map 100% reduce 100%
10/01/05 03:30:19 INFO streaming.StreamJob: To kill this job, run:
10/01/05 03:30:19 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201001050303_0003
10/01/05 03:30:19 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0003
10/01/05 03:30:19 ERROR streaming.StreamJob: Job not Successful!
10/01/05 03:30:19 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Sigh. Failure. Time to debug the job. Definitely needs the -file flag to bundle the executables to the remote machine.
-file map.pl
-file reducer.pl

Step back and run a simpler job.
hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar -input wordcount -output wordcountout3 -mapper /bin/cat -reducer /bin/wc


10/01/05 03:36:17 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0004
10/01/05 03:36:17 ERROR streaming.StreamJob: Job not Successful!
10/01/05 03:36:17 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

This is the example code. Why did it fail again? Is there a problem with my input file?

I'll pick this back up tomorrow (or later). A little birdie just told me it was 3:40am. Well past my bedtime. I'd like to finish this up, but that's what I've been saying since I started at midnight. I have the Hadoop::Streaming::Mapper and ::Reducer modules all packaged up and ready to make an initial push to CPAN, but first I need to get an example running under hadoop. I did finish writing the tests for the non-hadoop case, and those are clean and ready. Feel free to follow along at my github repository.

Update

or maybe I'll just try a few more times...

hadoop jar /usr/lib/hadoop/contrib/streaming/hadoop-0.20.1+152-streaming.jar -input wordcount -output wordcountout7 -mapper map.pl -reducer reduce.pl -file examples/wordcount/map.pl -file examples/wordcount/reduce.pl


packageJobJar: [examples/wordcount/map.pl, examples/wordcount/reduce.pl, /home/hadoop/tmp/hadoop-hadoop/hadoop-unjar390944251948922559/] [] /tmp/streamjob7610913425753318391.jar tmpDir=null
10/01/05 03:59:11 INFO mapred.FileInputFormat: Total input paths to process : 1
10/01/05 03:59:11 INFO streaming.StreamJob: getLocalDirs(): [/home/hadoop/tmp/hadoop-hadoop/mapred/local]
10/01/05 03:59:11 INFO streaming.StreamJob: Running job: job_201001050303_0010
10/01/05 03:59:11 INFO streaming.StreamJob: To kill this job, run:
10/01/05 03:59:11 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201001050303_0010
10/01/05 03:59:11 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201001050303_0010
10/01/05 03:59:12 INFO streaming.StreamJob: map 0% reduce 0%
10/01/05 03:59:23 INFO streaming.StreamJob: map 100% reduce 0%
10/01/05 03:59:35 INFO streaming.StreamJob: map 100% reduce 100%
10/01/05 03:59:38 INFO streaming.StreamJob: Job complete: job_201001050303_0010
10/01/05 03:59:38 INFO streaming.StreamJob: Output: wordcountout7

Success!


hadoop dfs -ls wordcountout7;

Found 2 items
drwxr-xr-x - hadoop supergroup 0 2010-01-05 03:59 /user/hadoop/wordcountout7/_logs
-rw-r--r-- 1 hadoop supergroup 125 2010-01-05 03:59 /user/hadoop/wordcountout7/part-00000


hadoop dfs -cat wordcountout7/part*

apple 2
bar 2
baz 1
c 1
c++ 2
cpan 9
foo 2
haskell 4
lang 1
lisp 1
ocaml 2
orange 2
perl 9
python 1
ruby 4
scheme 1
search 1

Monday, December 28, 2009

Dist::Zilla -- part 1

Inspired by RJBS's advent article on Dist::Zilla, I'm getting ready to give it a spin.

Install Dist::Zilla

Install a bundle. This works fine, but didn't bring in Dist::Zilla:

% cpan Dist::Zilla::PluginBundle::Git

...
JQUELIN/Dist-Zilla-Plugin-Git-1.093410.tar.gz
./Build install -- OK

Attempt to install the base Dist::Zilla, but it failed:


% cpan Dist::Zilla

...
Running make test
Has already been tested successfully
Running make install
Already tried without success

Cleaning my .cpan/build directory and trying again.

Before cleaning up the files, I'll check the makefile for the preqs, to see if I can narrow down the issue. I then cleared out my build space, manually installed the prerequisites, and then installed Dist::Zilla. This worked.

[andrew@mini]% perl Makefile.PL 0 ~/.cpan/build/Dist-Zilla-1.093400-1o8qqf
Warning: prerequisite Config::INI::MVP::Reader 0.024 not found.
Warning: prerequisite Config::MVP 0.092990 not found.
Warning: prerequisite Hash::Merge::Simple 0 not found.
Warning: prerequisite Moose::Autobox 0.09 not found.
Warning: prerequisite MooseX::Types::Path::Class 0 not found.
Warning: prerequisite PPI 0 not found.
Warning: prerequisite String::Flogger 1 not found.
Warning: prerequisite namespace::autoclean 0 not found.
Writing Makefile for Dist::Zilla

[andrew@mini]% rm -rf ~/.cpan/build/*

[andrew@mini]% cpan Config::INI::MVP::Reader Config::MVP Hash::Merge::Simple Moose::Autobox MooseX::Types::Path::Class PPI String::Flogger namespace::autoclean

...[snip]...[this brought in a lot of deps]
/usr/bin/make install -- OK

[andrew@mini]% cpan Dist::Zilla

Installing /apps/perl5/bin/dzil
Appending installation info to /apps/perl5/lib/perl5/i486-linux-gnu-thread-multi/perllocal.pod
RJBS/Dist-Zilla-1.093400.tar.gz
/usr/bin/make install -- OK

And if I'm going to cargo cult from RJBS and use his tool, then I might as well go all the way by installing the RJBS plugin bundle.

cpan Dist::Zilla::PluginBundle::RJBS

Now what? Using Dist::Zilla


% dzil new My-Project
will create new dist My-Project in obj(/home/andrew/src/My-Project)
$VAR1 = {};
% cd My-Project
% ls
dist.ini
% cat dist.ini
name = My-Project
version = 1.000
author = andrew
license = Perl_5
copyright_holder = andrew

[@Classic]

% mkdir lib t

Now, create a stub module in lib/My/Project.pm, something like this (copied straight from the quoted article):

use strict;
package My::Project;
# ABSTRACT: our top-secret project for playing bowling against WOPR

use Games::Bowling::Scorecard;
use Games::War::Nuclear::Thermonuclear::Global;
use Path::Resolver 2.012;

=method play_a_game

$project->play_a_game($num_of_players);

This method starts a game. It's a strange game.

=cut

sub play_a_game { ... }

1;

The #ABSTRACT comment will be pulled out and used as META data.

now, let's build the module:


% dzil build
...
beginning to build My-Project
guessing dist's main_module is lib/My/Project.pm
extracting distribution abstract from lib/My/Project.pm
couldn't find a place to insert VERSION section to lib/My/Project.pm
rewriting release test xt/release/pod-coverage.t
rewriting release test xt/release/pod-syntax.t
writing My-Project in My-Project-1.000
writing archive to My-Project-1.000.tar.gz
And now take a look at what it built:


% find My-Project-1.000
My-Project-1.000
My-Project-1.000/Makefile.PL
My-Project-1.000/t
My-Project-1.000/t/release-pod-syntax.t
My-Project-1.000/t/release-pod-coverage.t
My-Project-1.000/dist.ini
My-Project-1.000/README
My-Project-1.000/LICENSE
My-Project-1.000/META.yml
My-Project-1.000/lib
My-Project-1.000/lib/My
My-Project-1.000/lib/My/Project.pm
My-Project-1.000/MANIFEST

This created the META.yml file, built the MANIFEST, created two additional tests: release-pod-syntax and release-pod-coverage, built a README and copied in the correct LICENSE file. And then it tarred it all up for me. Excellent.

There are additional plugins that can be used within the dist.ini file. [@Git] will verify that all the files are checked into git before doing the build. [@RJBS] will use the RJBS bundle, to pull in all the steps he normally uses for a module. Searching for Dist::Zilla::Plugin on cpan produces 6 pages of results.

I'll post an update as I work on using dzil for a real module, and let you know how it goes. So far, I'm pretty excited at keeping my code and boilerplate separated.

Saturday, December 26, 2009

More Perl Advent Calendars

After I finished my post on perl advent calendars I stayed up for two more hours and finished reading all of RJBS's calendar. A midnight to 2am well spent.

Working through the perl advent calendar, I found links to a few more.

Plack Advent Calendar miyagawa walking us through the new hawtness that is psgi & plack.
PSGI, plack
Runs us from day 1 of installing plack through creating apps and multiple apps living on one install and beyond. A nice addition to the plack documentation. We should all learn this one, and start building against the PSGI specification so our webapps can be deployed across a range of server types.

Perl-users.jp has several advent calendars up: hacker, casual, dbix-skinny, data-model.
They are all written in japanese, but the code snippets are in perl.

Day 2 of hacker track has a nice piece on opts which is a DSL (domain specific language) for command line parsing, a nice wrapper around Getopt::Long.
use opts;

opts my $foo => { isa => 'Str', default => 'bar' },
my $bar => { isa => 'Int', required => 1, alias => 'x' };

Merry Christmas indeed!

installing hadoop on ubuntu karmic

Mixing and matching a couple of guides, I've installed a local hadoop instance on my netbook. Here are my notes from the install process.

I'll refer to the guides by number later. Doc 1 is the current #1 hit for 'ubuntu hadoop' on google, so it seemed a good spot to start.

Documents:

  1. http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
  2. http://archive.cloudera.com/docs/_apt.html
  3. http://github.com/spazm/config/tree/master/hadoop/conf/

1) created a hadoop user and group, as per document 1. Also ssh-key for hadoop user. (currently no-password, will check that soon).

2) added jaunty-testing repo from cloudera, see doc 2. They don't have a jaunty package yet. Add /etc/apt/souces.list.d/cloudera.list


#deb http://archive.cloudera.com/debian karmic-testing contrib
#deb-src http://archive.cloudera.com/debian karmic-testing contrib
#no packages for karmic yet, trying jaunty-testing, jaunty-stable, jaunty-cdh1 or jaunty-cdh2
deb http://archive.cloudera.com/debian jaunty-testing contrib
deb-src http://archive.cloudera.com/debian jaunty-testing contrib

3) install hadoop:

[andrew@mini]% sudo aptitude install hadoop                                                                   0 ~/src
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Reading extended state information      
Initializing package states... Done
"hadoop" is a virtual package provided by:
  hadoop-0.20 hadoop-0.18 
You must choose one to install.
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 25 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Reading extended state information      
Initializing package states... Done

3b) sudo aptitude update, sudo aptitude install hadoop-0.20

[andrew@mini]% sudo aptitude install hadoop-0.20                                                              0 ~/src
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Reading extended state information      
Initializing package states... Done
The following NEW packages will be installed:
  hadoop-0.20 hadoop-0.20-native{a} 
0 packages upgraded, 2 newly installed, 0 to remove and 25 not upgraded.
Need to get 20.1MB of archives. After unpacking 41.9MB will be used.
Do you want to continue? [Y/n/?] Y
Writing extended state information... Done
[... snip ...]
Initializing package states... Done
Writing extended state information... Done
4) this has setup our config information in /etc/hadoop-0.20, also symlinked as /etc/hadoop/
hadoop-env.sh is loaded from /etc/hadoop/conf/hadoop-env.sh (aka /etc/hadoop-0.20/conf.empty/hadoop-envb.sh)

Modify hadoop-env.sh to point to our jvm. Since I installed sun java 1.6 (aka Java6), I updated it to: export JAVA_HOME=/usr/lib/jvm/java-6-sun

5) update rest of configs.
Snapshotted conf.empty to ~/config/hadoop/conf, and started making edits, as per doc 1. Symlinked into /etc/hadoop/conf

files available at document #3, my github config project, hadoop/conf subidr.

6) switch to hadoop user
sudo -i -u hadoop

7) initiale hdfs (as hadoop user)
mkdir ~hadoop/tmp
chmod a+rwx ~hadoop/tmp
hadoop namenode -format

8) fire it up: (as hadoop user)

/usr/lib/hadoop/bin/start-all.sh
hadoop@mini:/usr/lib/hadoop/logs$ /usr/lib/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
hadoop@mini:/usr/lib/hadoop/logs$ /usr/lib/hadoop/bin/start-all.sh
starting namenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-namenode-mini.out
localhost: starting datanode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-datanode-mini.out
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-mini.out
starting jobtracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-jobtracker-mini.out localhost: starting tasktracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-tasktracker-mini.out

8) Check that it is running via jps
hadoop@mini:/usr/lib/hadoop/logs$ jps
12001 NameNode
12166 DataNode
12684 Jps
12568 TaskTracker
12409 JobTracker
12332 SecondaryNameNode

(note to self, why don't we have hadoop completion in zsh? Must rectify)

9) Run example. See doc 1:
hadoop jar hadoop-0.20.0-examples.jar wordcount gutenberg gutenberg-output

hadoop@mini:~/install$ hadoop jar hadoop-0.20.1+152-examples.jar wordcount gutenberg gutenberg-output
09/12/25 23:24:19 INFO input.FileInputFormat: Total input paths to process : 3
09/12/25 23:24:20 INFO mapred.JobClient: Running job: job_200912252310_0001
09/12/25 23:24:21 INFO mapred.JobClient:  map 0% reduce 0%
09/12/25 23:24:33 INFO mapred.JobClient:  map 66% reduce 0%
09/12/25 23:24:39 INFO mapred.JobClient:  map 100% reduce 0%
09/12/25 23:24:42 INFO mapred.JobClient:  map 100% reduce 33%
09/12/25 23:24:48 INFO mapred.JobClient:  map 100% reduce 100%
09/12/25 23:24:50 INFO mapred.JobClient: Job complete: job_200912252310_0001
...

hadoop@mini:~/install$ hadoop dfs -ls gutenberg-output
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2009-12-25 23:24 /user/hadoop/gutenberg-output/_logs
-rw-r--r--   1 hadoop supergroup      21356 2009-12-25 23:24 /user/hadoop/gutenberg-output/part-r-00000

It Lives!

Wednesday, December 23, 2009

Perl Advent Calendar(s)

It's that time of year again! Ok, I'm 23days late for the start, but I'm still within the valid range. Perl Advent Calendars!

I just stumbled across RJBS's Perl Advent Calendar. The article for Today (2009-12-24) is on Email::Sender::Simple. This is a great module. Seriously, if you are doing any email sending, this is what you should use.

I found and used this module over this past summer, while updating some legacy php code that included an emailed report. I was not looking forward to reimplementing the email sending routines in my perl version, but then I found Email::Sender::Simple and it is (as you'd expect from the name) super simple. Unless you want it to be more complex, and then it'll be more complex.

This whole advent calendar is pretty cool, because it is all stuff he's been working on. A brief moment of webstalking, and now I know who RJBS is. He wrote jgal, and igal clone, way back when. I should have an email around here from when I first switched from igal to jgal. That must have been a long time ago... it was, among other things, PRE-FLICKR! Hmm, scratch that, seems I was confusing jgal with jigl, both of which were igal inspired. I did just find an email from Oct 2003 personally announcing jigl vs 2.0 (jason's image gallery).

Some perl advent calendars:

Saturday, December 19, 2009

Algorithms

In a fit of inspiration, I flipped through my videos on my XBMC this morning and realized that I still have a copy of "MIT: Introduction to Algorithms" recorded in 2001. This morning I've been watching episode 3, "Divide and Conquer", recorded 9/12/2001.

It's interesting to see the people reacting to 9/11, when it was still a "great tradgedy" before it became "terrible attacks" when the call was "Feel this pain, yes, but get out and keep making the world better. That's what we must do after an event like this." And then right back into math and CS. I'm glad they didn't have any vinette on "divide and conquer" as practised by the Romans or British (I see that they did make a comment like that in the 2005 lesson).

This is a nice refresher. It's been a long time since I've formal analysis of algorithm run-times beyond "back of the envelope" estimations. If the cost of splitting a problem in two is small compared to the speed/complexity improvement of doing a half-size problem, then this is a win. See also map-reduce et al.

Perhaps it is also time to go through "Algorithms in Perl" and update it for "modern perl"isms? That'd make an interesting blog thread.
A recording from 2005 is available on video.google.com and on youtube.

Peteris Krumin watched and blogged about all of these episodes. In fact, I think he's the one who transcoded them from realmedia and posted them on video.google.com. Thanks for releasing these under a CC license, MIT!
http://www.catonmat.net/blog/summary-of-mit-introduction-to-algorithms/

Friday, December 11, 2009

Dec Perl Mongers was a blast

We'll have the slides up soon along with some Perl+VIM tips. Really, I mean it this time. Do you have some favorite perl/shell/vim integration tips you'd like to pass along?

In the mean time, you can follow along in the git repository. Including the drama where 10 minutes before presentation time I merged in changes from my cohort that deleted 90% of the files, and then I blindly pushed those up to the public master.

You can look directly at the Vroom slides from http://github.com/spazm/config/blob/master/slides/slides.vroom . I'll render those to html and push them somewhere for viewing.

Omnicompletion actually worked during my talk/demo. I think that's the first time I've ever actually had it work. Totally exciting. While preparing the slides and doing I found that I was missing a configuration option in my vimrc, so my ftplugin/perl directory was getting skipped.

Tuesday, December 1, 2009

plack, is it really this easy?

I just wrote my first plack based prototype, built around Plack::Request. Plack::Request isn't normally used directly, but I'm using it here as a simple prototype to run a feasibility test for a proposed new project.

I took a coworkers existing MyEngine ModPerl2 prototype and refactored it into MyEnginePrototype.pm and MyEngineLogic.pm. Then I reused MyEngineLogic.pm in MyEnginePlackPrototype.pm listed below. (In between I added some tests for MyEngineLogic.pm, yes it is all currently boilerplate scaffolding, that doesn't mean it shouldn't be tested).

All this server has to do is take an incoming request, pass it to a function from the (business)Logic module to get a list of possible responses, then call a helper from the Logic module to pick the winner then finally return the winner to the caller via JSON. Picking the possible responses can be time consuming and will be capped somewhere in the 100-250mS range, none of which is important yet outside of the business logic case. That will get interesting when I look into parallelizing the possible responses code (likely moving from prefork to Coro as the plank server to accomodate this).

Next up I'll benchmark this on a production caliber machine to test for overhead, latency, throughput and max connections. These will provide upper-bound possibilities, since the business logic is mostly empty scaffolding at this point. Testing will be against both Standalone::Prefork and Coro plack server backends.

Running the server via Standalone::Prefork server:
plackup --server Standalone::Prefork --port 8080 MyEnginePlackPrototype.pm
Running the server via Coro server:
plackup --server Coro --port 8080 MyEnginePlackPrototype.pm