Saturday, December 31, 2011

28c3: Effective Denial of Service attacks against web application platforms

Synopsis

Hour long presentation from the C3 security conference, on hash collision based attacks against web apps. The idea being that most web stacks automatically grab the query params and stick them into a hash of key-value pairs and by exploiting hash collisions an attacker can waste huge amounts of CPU time by a simple HTTP POST. In languages that don't randomly perturb their hashing functions (mostly DJBX33A or DJBX33X), collisions can be easily found and exploited.

This affects node.js/v8, php, python, ASP, ruby, java and etc. Only perl (5.8.1 ~2003) and cruby (1.9 ~2008) are patched for randomized hashing functions. The v8 devs are unconcerned "because v8 is a client side language," which caused a bit of alarm for the node.js folks.

The presenters started their investigation after reading a mention in the perl security faq (perldoc perlsec), in the section Algorithmic Complexity Attacks, that before 5.8.1 perl had a security flaw where hash collisions could be exploited. 5.8.1 was released in 2003.

           In Perls before 5.8.1 one could rather easily generate data that as
           hash keys would cause Perl to consume large amounts of time because
           internal structure of hashes would badly degenerate.  In Perl 5.8.1
           the hash function is randomly perturbed by a pseudorandom seed
           which makes generating such naughty hash keys harder.  See
           "PERL_HASH_SEED" in perlrun for more information.
Thread from the node.js mailing list: "HOLY CRAP. nearly all nodejs http servers are vulnerable to DoS and apparently, the V8 guys seem to not care much"

Related Node.js followup:

Friday, December 30, 2011

Talent, Bias and Diversity

Inc.com has a nice article on talent: You Can't Predict Talent; Foster It . His tips for fostering talent: open atmosphere, extravagant diversity, time didn't matter, stretch goals were just the start. The article leads off with this quote:
In his recent book Thinking Fast and Slow, behavioral economist Daniel Kahneman tells the story of observing army recruits out on exercises and his belief that he could spot the potential leaders amongst them. Years later, it turned out he'd been almost entirely wrong. His confident judgment had been a morass of bias, heuristics, and narrative fallacies.

This got me thinking about a more-or-less completely non-sequiter, vaguely related to open atmostphere and extravagant diversity.

As humans, we have brains built for pattern matching, so we find patterns. We apply these patterns all day long into systems of heuristics: "have I been in this position before? What did I do then? Did it work?" Actions that match our heuristics feel right because they work as expected and match previous experience.

We're making more of these patterns every day, and we are awesome in our ability to match situations against our patterns.

Problem: we're not good at evaluating whether our past decisions were correct. This introduces systemic bias. Bias expands directly with the homogeneity of your peer group.

Fixes? Think different. Use DATA to evaluate your decisions. Interact outside of your comfortable peer group. Be aware that you're using short cuts, and take the long way every-so-often to see if it really is longer. think.

Thursday, December 1, 2011

Perl Advent Calendars, 2011 edition.

It's Advent Calendar time in the perl ecosystem! Start each day with a delicious treat of knowledge.

I've found a half dozen english language perl advent calendars, starting with the original perl advent calendar. For extra fun I've included another half dozen Japanese language calendars -- I can still read the perl it's just the prose that is lost in translation.

Ricardo (RJBS) has taken over the Perl Advent calendar this year, which is awesome. Sadly, that means he won't be doing his own "month of rjbs" calendar. I've added a link to his 2010 calendar, in case you missed it the first time around. He's starting the month with Day 1: cpanm and local::lib.

For a second year, Miyagawa has skipped updating plack advent calendar. Check out the 2009 edition linked below. He has given us plenty of other presents this year: Carton, etc.

Perl Advent
http://perladvent.org/2011/
(Formerly the perladvent.pm.org calendar)
Perl Dancer -- the dancer mini web framework
http://advent.perldancer.org/2011/
Catalyst Advent Calendar -- The Catalyst Web Framework
http://www.catalystframework.org/calendar/
Perl 6
http://perl6advent.wordpress.com/
For the adventurous: Japanese Perl Advent Calendars, 8 different tracks!
http://perl-users.jp/articles/advent-calendar/2011/
AnySan Track
Casual Track
dbix Track
English Track
Hacker Track
Test Track
Acme Track
Teng Track
Amon2 Track

Ricardo's 2010 advent calendar -- a month of RJBS
http://advent.rjbs.manxome.org/2010/
2009 Plack calendar
http://advent.plackperl.org/

One bonus list, for the sysadmin in your life:

SysAdvent - The Sysadmin Advent Calendar.
http://sysadvent.blogspot.com/
Evil: If I were creating the world I wouldn't mess about with butterflies and daffodils. I would have started with lasers, eight o'clock, Day One!
-- Time Bandits

Saturday, November 12, 2011

Git::CPAN::Patch

Just saw an announcement of Git::CPAN::Patch over at Yanick's blog.

This is an awesome tool to create patches for other people's modules (OPM(tm)). It will create a git repo from current sources so you can get straight to patching. New in v0.7.0, if the module has a public repo, it'll pull directly from that. Rockstar!

Git::CPAN::Patch could already seed a local repository with the latest distribution of a module, or its whole BackPAN history, or its GitPAN mirror. But with version 0.7.0, it can now go straight for the meat and clone the distribution's officil git repository, provided that it's specified in its META.json or META.yml.
-- Read more: http://babyl.dyndns.org/techblog/entry/new-and-improved-git-cpan-patch-0.7.0

Provides two tools git cpan-sources and git cpan-init

HackDay! 11/12/11

David (DDubs!) came over for a HACKDAY today. Fun times!

He's in a maze of twisty little passages, all alike. He's installing SVN on localhost, with the full webDAV thing. Why? So he can practice migrating SVN to git. Yes, crazy land. I'm more looking forward to the actual project, a Bayesian Classifier to practice with Moose.

I've updated my App::PM::Website tool for maintaining the Los Angeles Perl Mongers website. It now uses Lingua::EN::Numbers::Ordinate to render dates like "Wednesday the 7th" instead of "Wednesday the 7." That's one more piece of manual hard coding replaced with software. Woot! Come on out on December 7th for Mike's talk on VOIP!

I really do want to get this code into a state where it's releasable to CPAN and usable by other monger groups. Maybe by making my vapor App::PM::Toolbox into reality? Why am I so tentative on releasing that?

I also updated and released v0.113160 of Hadoop::Streaming perl module, mostly to use Any::Moose. I made the changes a few months ago at a user's request. I sent him a beta version for testing, and it got lost in the weeds. I merged the code over today from feature/mouse and pushed it out. While I was in there, I updated the documentation in the main package to show how to use the '-archive' flag to hadoop.

I love typing 'dzil release' and having my Changes file updated, checked into git, release git tagged, release built and bundle pushed to PAUSE for CPAN.

If I had checked github first, I would have seen this lovely change request waiting where a user had made the Any::Moose conversion for me. My first incoming github change request. ROCKSTAR!

This post brought to you by the number 30 and the letter D (eltron).

2011-11-13 00:52:18 $$5747 v1049: Info: Need to get uriid[S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz] (paused:337)
2011-11-13 00:52:18 $$5747 v1049: Info: Going to fetch uriid[S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz] (paused:625)
2011-11-13 00:52:18 $$5747 v1049: Info: Requesting a GET on uri [ftp://pause.perl.org/incoming/Hadoop-Streaming-0.113160.tar.gz] (paused:647)
2011-11-13 00:52:19 $$5747 v1049: Info: renamed '/home/ftp/tmp/S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz' to '/home/ftp/pub/PAUSE/authors/id/S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz' (paused:760)
2011-11-13 00:52:19 $$5747 v1049: Info: Got S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz (size 28088) (paused:496)
2011-11-13 00:52:20 $$5747 v1049: Info: Sent 'has entered' email about uriid[S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz] (paused:561)
2011-11-13 00:53:46 $$5747 v1049: Info: Verified S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz (paused:308)
2011-11-13 00:53:46 $$5747 v1049: Info: Started mldistwatch for lpath[/home/ftp/pub/PAUSE/authors/id/S/SP/SPAZM/Hadoop-Streaming-0.113160.tar.gz] with pid[11168] (paused:313)

Wednesday, October 26, 2011

Perl for newbies

I've heard the tutorials at Perl-Begin.org are a good way to get started with perl. Current, modern perl. Not some 15-year-old script kiddie intro.

Speaking of, it might be interesting to make a learnperlthehardway book, ala Write your own LxTHW and the companion site LearnCodeTheHardWay.org.

Sunday, October 16, 2011

Homework!

It's been a long time since I've had schoolwork. Some habits die hard -- here I am 1 week in and rushing/procrastinating/just-plain-late with my first assignments!

I'm taking two courses at Stanford this fall. Maybe you know someone else in one of them? There are one hundred thousand people in my AI class. 100,000. Even our largest freshman intro courses at Caltech had fewer than 215 people, so this is three decimal orders of magnitude larger. Wow. I'm not sure how many people are signed up for the Machine Learning class, but it's probably similarly large.

Back to work!

Saturday, October 15, 2011

Author dependencies in Dist::Zilla

Wanna edit/tweak/build/play-with a Dist::Zilla based perl module you've checked out? Seems daunting because "the module installer isn't included" or "what if I don't have the same helper modules that the author uses?" ? Worry Not!

It's easy to get the minimal pieces installed, so let's get to it.

Basic steps for building/using a Dist::Zilla based module from raw source:

  1. Check out module source
  2. install Dist::Zilla:
    cpanm Dist::Zilla
  3. install author deps:
    dzil authordeps | cpanm
  4. install module deps:
    dzil listdeps | cpanm
  5. build module with dzil:
    dzil build

1. Check out module source

For my example, I'm migrating my own App::PM::Website sources from an old laptop to a new one.

Looking for example code to checkout? Try searching for dist.ini on github to find an interesting perl module. ;)

% git clone git@github.com:spazm/app-pm-website 
% cd app-pm-website

Check the code out of the repository and cd into the top level.

2. Install Dist::Zilla

On this relatively clean perl 5.12.3 install, cpanm Dist::Zilla brought in 81 packages.
% cpan Dist::Zilla
[andrew@fred]% cpanm Dist::Zilla           127 (git)-[dev] ~/src/app-pm-website--> Working on Dist::ZillaFetching http://search.cpan.org/CPAN/authors/id/R/RJ/RJBS/Dist-Zilla-4.300002.tar.gz ... OK
[... snip ...]
Building and testing Dist-Zilla-4.300002 ... OKSuccessfully installed Dist-Zilla-4.30000291 distributions installed

3. Install author dependencies

dzil authordeps will show the modules necessary for Dist::Zilla to build the module from raw source into a built module. Pipe this to cpanm to install the modules.
% dzil authordeps
Dist::Zilla::Plugin::MetaResources
Dist::Zilla::Plugin::AutoPrereqs
Dist::Zilla::Plugin::Repository
Dist::Zilla::Plugin::NextRelease
Dist::Zilla::PluginBundle::Basic
Dist::Zilla::Plugin::AutoVersion
Dist::Zilla::PluginBundle::Git
Dist::Zilla::Plugin::PkgVersion
Dist::Zilla::Plugin::MetaJSON
Dist::Zilla::Plugin::PodWeaver

% dzil authordeps | cpanm
Dist::Zilla::Plugin::MetaResources is up to date. (4.300002)
--> Working on Dist::Zilla::Plugin::Repository
[... snip ...]
Successfully installed Dist-Zilla-Plugin-PodWeaver-3.101641
11 distributions installed

4. Install module dependencies

Install the authordeps before module dependencies, in case authordeps are required for dzil to calculate the module dependencies. E.g. I needed PodWeaver installed via authordeps before I could run dzil listdeps to see the module dependencies.
% dzil listdeps
App::Cmd
App::Cmd::Command
App::Cmd::Tester
base
Config::YAML
Data::Dumper
Date::Parse
DateTime
DateTime::Format::Strptime
ExtUtils::MakeMaker
HTTP::DAV
Net::Netrc
POSIX
strict
Template
Test::Class
Test::Class::Load
Test::More
warnings

% dzil listdeps | cpanm
App::Cmd is up to date. (0.312)
base is up to date. (2.15)
--> Working on Config::YAML
[...snip...]
Successfully installed Test-Class-0.36
Test::More is up to date. (0.98)
10 distributions installed

5. Build module

dzil build will build the module into a directory and tar it up ready for cpan.

Similarly, dzil test will build the code and run the tests, you'll use this to verify your changes to the target module.

% dzil build
[DZ] beginning to build App-PM-Website
[DZ] guessing dist's main_module is lib/App/PM/Website.pm
[DZ] extracting distribution abstract from lib/App/PM/Website.pm
[DZ] writing App-PM-Website in App-PM-Website-0.112890
[DZ] building archive with Archive::Tar; install Archive::Tar::Wrapper for improved speed
[DZ] writing archive to App-PM-Website-0.112890.tar.gz
And now I can get back to the task at hand, improving this module. I'll let you know how that goes too.

Friday, September 16, 2011

Fix linux DNS issues with .local addresses on MS domain

B.L.U.F.:

Microsoft uses .local as the recommended root of internal domains, and serves them via unicast dns. Linux uses .local as the root of multicast dns. If you're stuck on a broken MS network like this, reconfigure your linux multicast DNS to use a different domain like .alocal.

To do this, add a "domain-name=.alocal" line to the "[server]" section of "/etc/avahi/avahi-daemon.conf", then restart avahi-daemon: "sudo service avahi-daemon restart".

#/etc/avahi/avahi-daemon.conf
[server]
domain-name=.alocal

You may need to flush the DNS,mDNS and resolver cache, as well as restart your web browsers to clear their internal cache.

Background.

I was seeing the strangest behavior on my work linux box. I could look up local addresses, but not contact them in my browser. Turns out I could look them up but not ping them, either.
% host foo
foo.corp.local is an alias for bar.corp.local
bar.corp.local has address 10.1.2.3

% host foo.corp.local
foo.corp.local is an alias for bar.corp.local
bar.corp.local has address 10.1.2.3

% ping foo -q -c 1
PING bar.corp.local (10.1.2.3) 56(84) bytes of data.

--- bar.corp.local ping statistics ---
1 packets transmitted, 1 recieved, 0% packet loss, time 0ms

% ping foo.corp.local
unknown host foo.corp.local
I spent a while thinking this was a resolver issue in /etc/resolv.conf, since I knew that was getting modified by the VPN. Everything was fine in the resolver. What I'd forgotten about was /etc/nsswitch.conf! The hosts line in /etc/nsswitch.conf put mdns4_minimal before dns AND set a reply of "NOTFOUND" from mdns to propagate back directly without hitting DNS.
# /etc/nsswitch.conf hosts line:
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
We could side-step the problem by removing mdns4_minimal from the hosts search path, but this will lead to potentially long dns timeouts from mistyped .local addresses. (Ok, that's not a very bad side effect, but still let's fix it correctly).

Dig a little deeper into .local and mdns, and you'll find Avahi. Avahi "facilitates service discovery on a local network via the mDNS/DNS-SD protocol suite," what Apple calls Bonjour or Zeroconf. They have a warning page about unicast .local DNS zones that gets to the crux of the problem : linux has mdns (multicast dns) support configured for .local, but Microsoft support suggests using .local with unicast DNS. The two don't get along at all.

mDNS/DNS-SD is inherently incompatible with unicast DNS zones .local. We strongly recommend not to use Avahi or nss-mdns in such a network setup. N.B.: nss-mdns is not typically bundled with Avahi and requires a separate download and install.
-- Avahi and Unicast Dot Local wiki page

Fixes:

  1. move avahi mdns from .local to a different name (e.g. .alocal)
  2. or Remove mdns from /etc/nsswitch.conf or remove mdns module.
For the former, add a domain-name=.alocal line to the [server] section of /etc/avahi/avahi-daemon.conf, then restart avahi-daemon: sudo service avahi-daemon restart.

If that doesn't work (and you restarted your browsers, with their insidious dns cache, right?) you can try removing mdns from the hosts entry in /etc/nsswitch.conf. replace this line:

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
with this line:
hosts: files dns
Links:

Tuesday, August 9, 2011

Demand Media buys RSS Graffiti

Demand Media is announcing today their purchase of RSS Graffiti. This is the last week of our El Segundo office. Next monday I'll be working in Santa Monica again. Super exciting.

"Demand Media Acquires RSS Graffiti RSS Graffiti's Talented Engineering Team and Popular Facebook Application to Help Accelerate Demand Media's Social Publishing Strategy"
-- http://www.marketwatch.com/story/demand-media-acquires-rss-graffiti-2011-08-09
"In addition to releasing its Q2 earnings today, content aggregator Demand Media (NYSE: DMD) has made two acquisitions: it’s bought a long-running partner, the blog ad network IndieClick and RSS Graffiti, a Los Angeles-based developer of social media products, primarily Facebook. "
-- http://paidcontent.org/article/419-demand-media-buys-indieclick-for-14-million-expands-google-ad-deal/

Friday, August 5, 2011

irrational markets

Two quotes to ponder from economist John Maynard Keynes. I searched this morning and found this first one, a variant of which I've been saying for years. I really wanted to think it was my own creation. I'm glad I looked it up, as I found this second quotation updating opinions to follow facts rather than updating "facts" to follow opinions.

Markets can remain irrational a lot longer than you and I can remain solvent.
---- John Maynard Keynes

When the facts change, I change my mind. What do you do, sir?
---- John Maynard Keynes

Thank you wikiquote!

Sunday, July 17, 2011

Wedding Dance

My latest Dancer project is up at SweetieBeast.us, a website for my pending wedding. T-minus-13 days and counting!

The source is up at github. Look in the wedding-dancer directory for the dancer project.

The dancer code is very short, because I'm not really using any dancer bits. I'm using dancer to provide an interface around templating code and page layout wrapping. As such, the pages are all pure templates.

Version .1 was very close to the following code snippet. Auto_page turns my template files into corresponding routes. I then created a template for each of my pages, spent a day futzing with CSS and poof: wedding website!

package sweetiebeast;
use Dancer ':syntax';
our $VERSION = '0.1';
set auto_page => 1;
get '/' => sub { template 'index'; };
1

When I launched, I put the site behind an apache proxypass directive. So I added Dancer::Plugin::Proxy to create links relative to the external proxy. I query Dancer::Plugin::Proxy to explicitly create link targets in the model to simplify my views.

package sweetiebeast;
use Dancer ':syntax';
use Dancer::Plugin::ProxyPath;
our $VERSION = '0.2';

set auto_page => 1;
get '/' => sub { template 'index'; };

before_template sub {
    my $tokens = shift;
    $tokens->{uri_base}           = proxy->uri_for("/");
    $tokens->{uri_for_index}      = proxy->uri_for("/");
    $tokens->{uri_for_faq}        = proxy->uri_for("/faq");
    $tokens->{uri_for_location}   = proxy->uri_for("/location");
    $tokens->{uri_for_travel}     = proxy->uri_for("/travel");
    $tokens->{uri_for_guests}     = proxy->uri_for("/guests");
    $tokens->{uri_for_images}     = proxy->uri_for("/images");
};
true;

Monday, June 27, 2011

Macintosh Fork Failure

Random failures on my macbook air today. Oh I see, fork failures. The default maxproc setting is ridiculously low (512).

Let's double that:

% sudo sysctl -w kern.maxproc=1024
kern.maxproc: 532 -> 1024
% sudo sysctl -w kern.maxprocperuid=1024
kern.maxprocperuid: 512 -> 1024
Now let's check the ulimits, now that only my shells are affected.
% ulimit -a
-t: cpu time (seconds)         unlimited
-f: file size (blocks)         unlimited
-d: data seg size (kbytes)     unlimited
-s: stack size (kbytes)        8192
-c: core file size (blocks)    0
-v: address space (kb)         unlimited
-l: locked-in-memory size (kb) unlimited
-u: processes                  266
-n: file descriptors           2560
% ulimit -a 512
Now, why am I still getting fork errors when opening a new terminal? What process has this 266 locked in?

Friday, June 17, 2011

my first yapc

I'm looking forward to my trip to Asheville, NC for YAPC::na. June 26-30. Yet Another Perl Conference, North America.

I was supposed to go last year while I was at Rubicon Project, but my trip got canceled (some mix of busy, crunch time, and politics).

I haven't been to a focused perl conference since the short lived "O'Reilly University" circa 2000 in nyc. MJD was the bomb! It'll be interesting to compare with our <shameless_plugs> excellently run LA pm</shameless_plug> and all the other LA tech meet-ups.

what do I need to do to prep? Rest my brain and liver, I suppose.

Grub prompt after upgrade to ubuntu 11-04

I got bit by this when I upgraded "mini", my 11" acer to ubuntu 11.04. Rebooting left me at a grub prompt. The exercise has left me with an improved feel for the grub2 interface and boot prompt -- it's actually pretty slick with the TAB completion. Improved in-grub help would have been nifty, instead I found myself doing research on my phone.

Using information from aaron kelleys blog[1] and information available from tab completion I performed the following steps which led to a successful boot. After the root command, filename tab completion looks at that drive.

Boot from grub prompt:

root (hd0.5)
linux /vmlinuz root=/dev/sda ro
initrd /initrd.img
boot

Alternative, equivalent boot from grub prompt:

linux (hd0,5)/vmlinuz root=/dev/sda ro
initrd (hd0,5)/initrd.img
boot
After booting, these steps updated grub on the MBR for my root drive:
sudo grub-install /dev/sda
sudo update-grub

For a taste of the flip side, I restarted into windows for windows update. Looks like it's been 6 months since I booted windows7, so 300Meg to download. Question: Will it be finished or pwned when I get home tonight?

links:

  1. http://aaron-kelley.net/blog/2011/04/grub-prompt-after-upgrade-to-ubuntu-11-04/

Saturday, June 4, 2011

migrate Subversion repository/dump to Git & Github, with tags and branches

A simple and complete method for migrating from subersion to git, bringing over past tags and branches.

I migrated a work repository from Subversion to Git last week. It went surprisingly smoothly. There were a more steps than I expected after reading various, selected, sources.

My repository was in "standard layout" containing only a single tree/project. I was sent a dump of the repository created with svndump. If you have direct access to your svn repository, skip step one. Similarly you can skip the github steps or replace them with your own "primary git server."

Note, if you don't want to maintain the svn tags and branches, then there is a simple single step solution using git-svn. This works if you don't have tags and branches to keep, or you just want a personal git checkout of a remote svn repository.
git svn clone --no-metadata --stdlayout --A users.txt http://svn/repo/here/ new_checkout_name/

Overview:

  1. import svn dump into svn repository
  2. create mapping file of svn committer to email address
  3. create empty git repository
  4. cd to git repository
  5. import from svn repo to git repo
  6. pull branch and tag information from svn to git
  7. create github repository
  8. add remote repository
  9. push all tags and branches to github repository.
Steps:
  1. import svn dump into svn repository from dump file svn_repo.dump
    % svnadmin create svnrepo
    % svnadmin load svnrepo < svn_repo.dump
  2. create mapping file of svn committer to email address
    Create a file called authors.txt and fill it with line separated “svn-name = git-name ” pairs.
    #example:
    % cat authors.txt
    user_a = Alpha <alpha@example.com%gt;
    user_b = B Dawg <bdawg@github.example.com%gt;
  3. create empty git repository
    % mkdir gitrepo
    % git init gitrepo
  4. import from svn repo to git repo
    #   git svn clone file:///pathto/svnrepo /pathto/gitrepo –no-metadata -A authors.txt --stdlayout
    svn repo path must be an absolute path when using file://
    % git svn clone file://$(pwd)/svnrepo gitrepo/ --no-metadata -A authors.txt --stdlayout
  5. cd to the git repository
    % cd gitrepo
  6. pull branch and tag information from svn to git
    tags and branches are both viewed as remote branches, tags are prefixed with "tag/".
    For each branch you want to migrate, make a local branch.
    For each tag, make a local tag.
    % git svn fetch                          # fetch remote branches and tags from svn
    % git branch -r                          # view remote branches
    % git branch local_branch remote_branch  # for each branch to migrate
    % git tag local_tag tags/remote_tag      # for each tag to migrate
    % git branch                             # check work by viewing local branches and ...
    % git tag -l                             # ... local tags 
  7. create github repository
    ( github repo ui, create yourrepo )
  8. add remote repository
    % git remote add origin git@github.com:yourname/yourrepo
  9. push all tags and branches to github repository.
    % git push origin --tags   # push all tags
    % git push origin --all    # push all branches 

Crashed Kindle

Wow, I crashed my kindle3-3G today. I tried to use the "back page" button after jumping forward a chapter and it wedged. The screen froze, and didn't change when I power cycled. A soft-reset (hold on-switch for "15" seconds) caused the screen to clear (all black then all clear). I could only tell if it was on by checking the light in my case.

The reboot key-combo is "shift-alt-r". After a couple of soft-resets I tried a reboot and it worked! woo!

Of course, I had to verify. I navigated to the same page and hit back again. Now it's back to being wedged.

No, it isn't the case causing crashes in my case, it's my own converted-from-chm-emailed-to-my-kindle ebook.

Edit: update, she's working again now that I'm done writing this post. Perhaps it was just really really busy finding that page?

Friday, April 29, 2011

Dude, what function am I in?

The ctags plugin for vim shows the function name containing the current line. This is helpful when you're refactoring down functions that are longer than a screenful.

I've been using this (exuberant) ctags based approach for years. It's quite handy and works across more languages than I use: 41 languages from ant to YACC. I've tested with perl, python, C, java and ruby.

The plugin has options to display the function name in the status bar or in the xterm menu bar. It takes care of generating the ctags files on the fly.

I've blogged before on using vim with tags file, recommending using ctags to create a tags file for your codebase. Using the tags browser to jump to subroutines defined in other files is just ... super fun.

ctags plugin for vim:
http://www.vim.org/scripts/script.php?script_id=610
exuberant ctags:
http://ctags.sf.net/
My blogpost on effective VIM with perl and tags. http://www.lowlevelmanager.com/2010/12/perl-tags-and-vim-effective-code.html

These are the options I use with the plugin:

  let g:ctags_statusline=1
  let g:ctags_title=0
  let g:generate_tags=1 
  let g:ctags_regenerate=1

PS. OSX comes with a different version of ctags, you'll need to install exuberant ctags for the ctag plugin to work, it's available in both fink and MacPorts.

Wednesday, April 27, 2011

More LWP SSL 500 issues! Now with HTTP::DAV crossover

LWP::UserAgent is returning a 500 level error in the case of a self-signed site key. This is similar to my previous post on this topic, ( Fixed 500 can't verify SSL peers ):
For https://... default to verified connections with require IO::Socket::SSL and Mozilla::CA modules to be installed. Old behaviour can be requested by setting the PERL_LWP_SSL_VERIFY_HOSTNAME environment variable to 0. The LWP::UserAgent got new ssl_opts method to control this as well.
I use HTTP::DAV to push changes to the Los Angeles Perl Mongers website. When I tried to push updates for tonight's meeting, HTTP::DAV threw an error " The URL "https://groups.pm.org/groups/losangeles/" is not DAV enabled or not accessible.".

This seemed familiar -- in fact I filed a bug against HTTP::DAV ( Bug 59674 ) to pass through SSL errors from LWP when SSL libraries were not installed. ( Cosimo, I am sorry that I didn't respond in a timely manner when the fixes were proposed. Thanks for fixing it!). The fix for 59674 included having a specific message for various classes of errors out of LWP::UserAgent.

## Error conditions
my %err = (
    'ERR_WRONG_ARGS'    => 'Wrong number of arguments supplied.',
    'ERR_UNAUTHORIZED'  => 'Unauthorized. ',
    'ERR_NULL_RESOURCE' => 'Not connected. Do an open first. ',
    'ERR_RESP_FAIL'     => 'Server response: ',
    'ERR_501'           => 'Server response: ',
    'ERR_405'           => 'Server response: ',
    'ERR_GENERIC'       => '',
);

LWP::UserAgent is returning a 500 level error in the case of a self-signed site key. Not 501 as in the prior case.

Example:

#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use Data::Dumper;
my $url = "https://groups.pm.org/groups/losangeles";
my $ua=LWP::UserAgent->new();
my $resp = $ua->get( $url );
print Dumper $resp;
Output Snippet:
LWP::Protocol::https::Socket: SSL connect attempt failed with unknown errorerror:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed at /Library/Perl/5.10.0/LWP/Protocol/http.pm line 51.

One can get around this by setting an environment variable export PERL_LWP_SSL_VERIFY_HOSTNAME=0, or by using the ssl_opts option to UserAgent. A third, preferred solution would be import the key and mark it as "OK" on the client side.

I pushed my website changes by using the PERL_LWP_SSL_VERIFY_HOSTNAME=0 workaround. Now let's see if I can figure out workarounds 3 and 2.

Friday, April 22, 2011

FIXED: 500 Can't verify SSL peers without knowning which Certificate Authorities to trust

Having problems using LWP::UserAgent with SSL hosts? Try installing Mozilla::CA.

500 Can't verify SSL peers without knowning which Certificate Authorities to trust

Props to Corion for posting this tip to perlmonks.

2011-03-08 Release 6.00

Unbundled all modules not in the LWP:: namespace from the libwww-perl distribution. The new broken out CPAN distribtions are File-Listing, HTML-Form, HTTP-Cookies, HTTP-Daemon, HTTP-Date, HTTP-Message, HTTP-Negotiate, Net-HTTP, and WWW-RobotRules. libwww-perl-6 require these to be installed.

This release also drops the unmaintained lwp-rget script from the distribution.

Perl v5.8.8 or better is now required. For older versions of perl please stay with libwww-perl-5.837.

For https://... default to verified connections with require IO::Socket::SSL and Mozilla::CA modules to be installed. Old behaviour can be requested by setting the PERL_LWP_SSL_VERIFY_HOSTNAME environment variable to 0. The LWP::UserAgent got new ssl_opts method to control this as well.

Support internationalized URLs from command line scripts and in the proxy environment variables.

The lwp-dump script got new --request option.

The lwp-request script got new -E option, contributed by Tony Finch.

Protocol handlers and callbacks can raise HTTP::Response objects as exceptions. This will abort the current request and make LWP return the raised response.

-- LWP 6.00 release notes 2011/03/08(emphasis mine).

Monday, April 18, 2011

CPANTS: down?

I tried to pull up a report on CPANTS (cpants.perl.org) today (The CPAN Testing Service) and was greeted by the sad message quoted below.

Is this because CPANTS.perl.org wasn't getting used in favor of www.cpantesters.org? I was afraid we'd lost the awesome CPAN testing infrastructure. According to the CPAN::Testers documentation, CPANTS was a project to track "Kwalitee" & metadata of modules on the CPAN. So what does that mean?

CPANTS - The CPAN Testing Service
CPANTS is currently down due to terminal bitrot.
If you're interested in having it back, please contact domm AT cpan.org

CPANTS and Site maintained by Thomas Klausner - domm AT cpan.org

Hosting and bandwidth provided by Vienna.pm.
-- cpants.perl.org

Saturday, April 16, 2011

Moving at Scale : Etsy

I know the etsy engineers like to graph, graph, graph, but I hadn't heard too much about the rest of their release cycle. They have taken specific steps to attempt to "Stay a Startup as we grow larger." Watch as they expound on the benefits of "one button deployment", empowering releases, branching-in-code, A-B testing and configurable features.

Warning, the video is a couple of hours long.

Watch live streaming video from etsy at livestream.com
Source

  1. Make deployments easier
  2. No-fault post-mortems
  3. everyone works on head, no branches in source control, lots of if(0){ ... } blocks
  4. configuration file that determines which features are live, to which percentage of the population.
  5. public logging and metrics
  6. ...
  7. Change are quick, easy and encouraged.
  8. maintains the "can do" "default is yes" attitude of the early startup.
  9. mean time to notice problems: <5 minutes, mean time to fix problems: 4.5 minutes.
  10. ...
  11. profit

They have released two projects as openSource, available on github:

logster
https://github.com/etsy/logster
Forked from ganglia-logtailer, they removed the daemon mode to simplify the code.
Logster is a utility for reading log files and generating metrics in Graphite or Ganglia. It is ideal for visualizing trends of events that are occurring in your application/system/error logs. For example, you might use logster to graph the number of occurrences of HTTP response code that appears in your web server logs.
statsd
Github: https://github.com/etsy/statsd
Blogpost: measure anything measure everything/
A network daemon for aggregating statistics (counters and timers), rolling them up, then sending them to graphite.
They make heavy use of Graphite for displaying graphs. http://graphite.wikidot.com/

Saturday, April 2, 2011

Better programming reading list

How to become a better programmer in 90 days -- a lightning talk.
http://www.foraker.com/how-to-become-a-better-programmer-in-90-days/

I stumbled upon this lighting talk by Neal Enssle. In it he recommends three books to improve our craft. I admit to not having read any of them, but I have heard of some of the authors.

The Passionate Programmer: Creating a Remarkable Career in Software Development (Pragmatic Life) Clean Code: A Handbook of Agile Software Craftsmanship Refactoring: Improving the Design of Existing Code

The Passionate Programmer, Chad Fowler
Love what you do, your attitude comes through. Highlight a daily win.
Clean Code, Robert C. Martin
A language for describing our code.
Refactoring, Martin Fowler
60 refactoring patterns.

Thursday, March 31, 2011

MacBook Air 13" -- lockup in iTunes, fixed?

My new MacBook Air 13" froze twice last week. (full lockup, requiring reboot), both times as I was launching a fresh install of iTunes. There is an update out this morning to fix that, w00t. Those lock-ups were my only complaint about my new Air. Let's hope this fixes it.

"This update addresses an issue that makes the system unresponsive when using iTunes. It is recommended for all 13" MacBook Air (Late 2010) users running Mac OS X v10.6.7."
-- Mac OS X 10.6.7 Supplemental Update for 13" MacBook Air

In other news, yes, I bought a MacBook Air 13" last Friday. My previous Mac purchase was an 68000 MacSE running 8.6, purchased 20 years ago. Fully pimped out, this was a big impulse buy on Ebay (after a week of watching ebay and craigslist, it still felt like a impulse to start and finish bidding last friday morning). It did run my 3x as much $$ as the Acer (4x retail). What will I do with two tiny laptops?

It is both bigger and smaller than my lovely Acer 1810T. The 13" Air has a hugely larger footprint to the acer, at 13 vs 11. The whole body is slightly lighter and much thinner. Having gotten used to the size of the 11", this feels really big. How do people carry around 17" monsters? The screen is larger and has more pixels at 1400x900 vs 1366x768, albeit at a lower DPI of 130 vs 135 dots per diagonal inch. It also has more pixels than the 13" macbook pro, why are they still using a lower res screen on those?

The 13" Air has a larger battery spec and seems to have considerably larger real-world battery usage. The battery lasts a full workday, though I get nervous when it says an hour or two left. Wake from sleep is wicked fast, and sleep seems to be a very low power situation. I put my acer to hibernate and it takes about 20 seconds to boot with the SSD -- ~8 of that is just the bios time to get to the bootloader. I haven't been able to compare wake from hibernate on the Air since I can't force hibernation -- it happens automatically if it is in sleep mode "long enough."

I'm still trying to get used to the change in keyboards -- the wrist rest area is ginormous on the Air. The large wrist area makes it sit in my lap differently than the Acer, and typing is more awkward. It does rest more softly in the lap -- similar weight over a larger surface area.

I found myself trawling ebay looking for an 11" as a companion to this one. The 11" air is a very close alternative to the 1810T. If you can live without a second mouse button and page-up/page-down dedicated buttons. The larger trackpad really is nicer than the Acer, though I was quite used to that smaller trackpad. The 11" has roughly the same specs, a slightly faster chip, similar quoted battery life, same footprint, much thinner and a tad lighter. Counting the big SSD I put into the 1810T the prices are comparable.

Note, if you are looking to buy an Air, know that the RAM is hard soldered at the factory, so there is a large difference between the 2G and 4G versions. Do the resellers (bestbuy, frys, buy.com etc) even get 4G models to sell? It is possible to replace the SSD, it is non-standard but they are available. The ram you won't get to change.

Wednesday, March 30, 2011

Startup Toolkit.

I'd like to see a project of a "startup kit," a collection of opensource tools that are known to work well-enough together: revision control + code review + continuous integration + issue tracking + deployment&configuration.

I don't need perfect, but I would like good-enough.

I suppose this is close to what Atlassian provides, but that's a little too much corporate proprietary synergistic kool-aid for me. Maybe I'm judging too harshly after the Rubicon Atlassian experience, which was soured more by the lack of system and sysadmin resources to upgrade/install/approve the tools than by the tools themselves.

Some options I'm considering now:

  • Issue Tracking
    • Trac
    • bugzilla
    • redmine
    • github issue tracker
  • Source control
    • git, shared via github private account
  • Continuous Integration
    • hudson/jenkins
    • buildbot
    • cruise-control / cruise-control-rb
  • Code Review
    • mondrian (hosted by google appengine)
  • Configuration management and bootstrapping onto ec2
    • chef (pretty far down the chef road at this point)
    • puppet
    • cfengine
  • Monigoring (using all of these for different aspects:
    • nagios
    • ylastic
    • aws console
  • statistics and graphs
    • ganglia
  • log aggregation
    • splunk -- hadn't considered this, but was suggested by friends at GumGum, to monitor logs across a cloud of app-servers
This should be enough to get a scalable environment for developing, testing and deploying an application. After that, it's time to pick the additional infrastructure pieces needed by the application (data persistance, caching, logging, ... )

TODO(andrew): add links for the projects.

using git-flow with github

We've decided to use nvie's "Git Branching Model" for a work project using github as a "central" repository (by convention). The gitflow documentation is aimed at people running their own local repositories and using the default git-flow settings for branch names.

I want to make some branch naming changes from the original model. First I was going to swap develop and master with master and production. I've gone a step further and done away with master. I want my branches to be develop, qa, and production.

First, how to initialize this in a new repo:

% mkdir fakerepo
% cd fakerepo
% git init fakerepo
Initialized empty Git repository in /Users/andrew/src/fakerepo/.git/

% git flow init
No branches exist yet. Base branches must be created now.
Branch name for production releases: [master] production
Branch name for "next release" development: [develop] 

How to name your supporting branch prefixes?
Feature branches? [feature/] 
Release branches? [release/] qa
Hotfix branches? [hotfix/] 
Support branches? [support/] 
Version tag prefix? [] 

% git branch
* develop
  production

#git flow saves elements to the local repo configuration, .git/config
% git config -l | grep gitflow
gitflow.branch.master=production
gitflow.branch.develop=develop
gitflow.prefix.feature=feature/
gitflow.prefix.release=qa
gitflow.prefix.hotfix=hotfix/
gitflow.prefix.support=support/
gitflow.prefix.versiontag=
Now, how to push to the matching github repository, spazm/fakerepo? Normally I'd use:
 git remote add origin git@github.com:spazm/fakerepo.git
 git push -u origin master
But now I don't have a local master. Should I be using the publish features? #Add manually, rather than via git flow * publish
%  git remote add origin git@github.com:spazm/fakerepo.git

% git push --set-upstream origin develop
Counting objects: 2, done.
Writing objects: 100% (2/2), 173 bytes, done.
Total 2 (delta 0), reused 0 (delta 0)
To git@github.com:/spazm/fakerepo.git
 * [new branch]      develop -> develop
Branch develop set up to track remote branch develop from origin.

% git push --set-upstream origin production
Total 0 (delta 0), reused 0 (delta 0)
To git@github.com:/spazm/fakerepo.git
 * [new branch]      production -> production
Branch production set up to track remote branch production from origin.

% git branch
* develop
  production

% git branch -r
  origin/develop
  origin/production

Now, what does the next user need to do after he clones the remote repo?

% git clone git@github.com:spazm/fakerepo.git
% cd fakerepo
% git status
# On branch develop
nothing to commit (working directory clean)

% git branch
* develop

% git branch -r 
  origin/HEAD -> origin/develop
  origin/develop
  origin/production

% git config -l | grep gitflow

#blank, the gitflow configuration items aren't present in the checkout.
% git flow init

Which branch should be used for bringing forth production releases?
   - develop
Branch name for production releases: [] production
Local branch 'production' does not exist.

#manually bring over tracking branches
% git branch --track production origin/production
Branch production set up to track remote branch production from origin.

% git flow init
Which branch should be used for bringing forth production releases?
   - develop
   - production
Branch name for production releases: [production] 

Which branch should be used for integration of the "next release"?
   - develop
Branch name for "next release" development: [develop] 

How to name your supporting branch prefixes?
Feature branches? [feature/] 
Release branches? [release/] qa
Hotfix branches? [hotfix/] 
Support branches? [support/] 
Version tag prefix? [] 

% git config -l |grep gitflow
gitflow.branch.master=production
gitflow.branch.develop=develop
gitflow.prefix.feature=feature/
gitflow.prefix.release=qa
gitflow.prefix.hotfix=hotfix/
gitflow.prefix.support=support/
gitflow.prefix.versiontag=
Question: Do I really have to manually set up the tracking branches when I clone, prior to running git-init? That seems messy. I'll keep digging and let you know.

[Undisclosed Startup], my new gig

I started working for [Undisclosed Startup] in December, after leaving leaving The Rubicon Project in September.

These four months have flown by.

I'm having a blast getting back into Startup mode. Rubicon is doing great but has grown to hundreds of engineers. [Undisclosed Startup] has a single digit number of employees and ~4 engineers. At some point we'll de-stealth and show the world ... awesomeness.

I'm definitely getting to "wear a lot of hats," ranging from code reviews to company goal planning. I'm now much more familiar with amazon ec2. I've lead the push to switch to git (so far, so awesome) and an SVN layout cleanup. I'm doing a bunch of sysadmin-ish tasks (oh how I loathe the term dev-ops, yet here we are) -- dear opscode/Chef, your documentation is all over the place, but the product is neat. Leveraging my dearly earned hadoop knowledge from Rubicon. Maintaining and extending the perl web ui. Picking up python. testing testing testing! Maybe I'll even use my low-level-manager hat?!

Apologies in advance for my dearth of future entries: startup-land takes a lot of time and energy. I do hope I at least write about some of the multitude of research topics I'm digging into.

Saturday, March 12, 2011

patching github projects

Wow, adding changes to projects on github is surprisingly easy, once you figure out the steps. I made a patch for vim-space earlier today, after I found an error in a comment that cost me an hour yesterday.

I nearly got the github flow correct on the first pass. Here's what I did:

  1. fork the project (via UI on github)
  2. clone locally
    git clone git@github.com/spazm/vim-space.git
  3. make change to plugin.vim in my working directory, test:
    vim plugin.vim ...
  4. create an issue for the original project (via UI on github)
  5. commit change locally:
    git add plugin.vim
    git commit plugin.vim -m "[issue 7] ..."
  6. push change to my cloned repo on github:
    git push origin master
  7. create the patch file:
    git format-patch HEAD^..HEAD
  8. upload the patch file to the issue. Nope, can't upload files to the issue tracker at github.
  9. create pull request:
    git request-pull HEAD^ origin HEAD
    The following changes since commit 9640d4d1ee980e352abd96e2c0ef13372d1c14cd:
    
      Merge remote branch 'remotes/origin/master' (2010-04-13 16:22:27 +0200)
    
    are available in the git repository at:
      git@github.com:spazm/vim-space.git master
    
    Andrew Grangaard (1):
          [issue 8] correct loaded_space to space_loaded in plugin
    
     plugin/space.vim |    4 ++--
     1 files changed, 2 insertions(+), 2 deletions(-)
  10. Copy and paste this pull request into issue 7.

This was close to the correct flow. While the issue tracker on github doesn't allow uploaded files in the comments, it does track pull requests. So the flow should look like:

Correct Flow

  1. Clone Repo (github UI)
  2. clone locally:
    git clone git@github.com/spazm/space-vim
  3. Make Changes, test
    vim plugin.vim ...
  4. commit locally:
    git add plugin.vim
    git commit -m "loaded_space fix"
  5. push to github:
    git push origin master
  6. issue pull request(github UI)
  7. done. bask in the warm glow of improving an open source project.

The issue pull request stage will create a tracking issue. Slickness. The Clone and Pull Request are handled in the github UI. Check the Pull Requests as Github UI documentation for pretty pictures.

Proper form would be to create a topic branch in the cloned repository, this becomes more useful if you plan to make any further changes or modifications to the project. Make the upstream girl's job as easy as possible to enhance the likelihood of your changes being accepted.

[LA.pm] March Los Angeles Perl Mongers

LA.pm.org will be back at Rent.com this month. Thanks for hosting again!

What:   Los Angeles Perl Mongers Meeting
When:   7-9pm
Date:   Wednesday, March 23, 2011
Where:  Rent.com - 2425 Olympic Blvd Suite 400 E, Santa Monica, CA 90404
Theme:  Perl!
RSVP:   Responses always appreciated.

As always, I'm looking for presenters. What are you doing in the greater perl infrastructure?

Looking forward to seeing my fellow mongers in 11 days.

RSVP at facebook

Vroom - vim overrides and mapping for presentation remote

The second best thing about using Vroom (Formerly Vroom::Vroom ) for presentations is hearing, "I've never seen a presentation in vim before." The best thing is using vim as both the editing and display platform.

I realized yesterday while cramming to finish my "intro to git" presentation for work that my recently added vim plugin "magic space" (space.vim) interferes with the default Vroom vimrc bindings. Space.vim rebinds space in a variety of ways to repeat the previous movement commands (search, movement, etc).

Disabling the plugin took a couple of iterations because the documentation was incorrect -- it references loaded_space while the code was checking space_loaded. Actually, the documentation was correct, but the comments in the plugin code are incorrect. I opened a ticket, cloned the github repo, pushed the changes, and updated the ticket with a pull request via git request-pull. Not bad for 5 minutes.

This encouraged me to expand my Vroom knowledge. First, I found the syntax for adding a vimrc override to my presentation. Today, I created a user specific vroom vimrc override in $HOME/.vroom/vimrc to disable magic space when I'm running Vroom presentations.

Since I was already adding custom configuration, I dug out my nerdnite presentation remote and found that the buttons are mapped to PageUp, PageDown and b. These are what powerpoint use for previous page, next page and blank screen. They aren't particularly useful while presenting in vim, so I remapped them in the vroom override to previous and next file.

You think people are shocked by a presentation in vim? Wait until you pull out your laser pointer remote to advance the slides. priceless.

My Intro to Git vroom presentation
https://github.com/spazm/presentations/tree/master/open42-git-intro
Vroom vimrc override file:
https://github.com/spazm/config/blob/master/vroom/vimrc
vim-space
https://github.com/spiiph/vim-space
my fork with fix for issue 8
Issue 8

"" Custom .vimrc overrides for Vroom
"" These will be automatically added to all 
"" vroom autogenerated .vimrc files

" disable space.vim during Vroom presentations.
" in my version of space.vim the docs refer to
" g:loaded_space, but the code checks g:space_loaded
let g:space_loaded = 1

" My presentation remote has three buttons
" forward: sends PageDown
" back:    sends PageUp
" blank:   sends b
" Bind PageUp to previous doc and PageDown to next doc
map  :N:gg
map  :n:gg
" map b ???

Sunday, February 27, 2011

Thrift!

Sweet, a new module popped onto the CPAN this weekend, Thrift::XS, an XS version of Thrift. This is doubly nice -- one, it's a faster XS version. two, it's available directly on cpan. The current module from the Apache Thrift project requires finding and downloading the package.

Thrift is a streaming serialization format. See also Protocol Buffers and Avro.

DESCRIPTION

Thrift::XS provides faster versions of Thrift::BinaryProtocol and Thrift::MemoryBuffer. On average it is about 4-6 times faster.

Thrift compact protocol support is also available, just replace Thrift::XS::BinaryProtocol with Thrift::XS::CompactProtocol.

To use, simply replace your Thrift initialization code with the appropriate Thrift::XS version.

Scale9x: Take Advantage of Modern Perl

Chromatic's talk on Modern Perl at Scale9x is in about an hour -- 11:30am, Sun Feb 27, 2011. If you can't make it, at least check out the live stream.

I really shouldn't have gone to scale yesterday, since I'm so sick and it wiped me out. Yet here I am contemplating going again today. I do want to get my copy of Modern Perl autographed, afterall.

Perl's recent renaissance has produced amazing tools that you too can use today.

This talk explains the philosophy of language design apparent in Perl 5 along the two fundamental axes of the language: lexical scoping and pervasive value and amount contexts. It also discusses several important pragmas and language extensions to improve Perl 5's defaults, to reduce the chance of errors, to allow better abstractions, and to encourage the writing of great code.

Speaker: chromatic x
-- http://www.socallinuxexpo.org/scale9x/presentations/take-advantage-modern-perl

Friday, February 25, 2011

Mining of Massive Datasets textbook.

I started reading Mining of Massive Datasets on vacation. I didn't get very far into it, as it isn't exactly light beach reading. The first bit is a review covering things I mostly don't know, so that was a fun start. I now have a better feeling for IDF and TF.IDF, for instance.

Infolab seems down at the moment.

Mining of Massive Datasets.
http://infolab.stanford.edu/~ullman/mmds.html

new business opportunities

[This business opportunitiy] is a wide open space with lots of people jumping into the pool without knowing how to swim.
We should be able to make a mint selling life preservers.
--me

Thursday, February 10, 2011

LA Hadoop

Great attendance at the Los Angeles Hadoop Users Group (LA Hug) meetup last night on "Productizing Hadoop." Cloudera provided a great speaker to discuss the do's and don't's of migrating hadoop from play/development to full enterprise mode ( from hunter gatherer to modern city). The Hadoop infrastructure has come a long way since my first LA hadoop meetup 1+ year ago -- better support for multi-tenancy with auth and authz, more tools built on top of hadoop, and less need to roll-your-own scripts for everything.

Props to Shopzilla for hosting.

This was a much shyer crowd than we see at LA Perl Mongers (LA.PM). Only one other person asked a question at the end. At PM, we tend to pepper questions and feedback all along the presentation making everything a group production.

Cpanm 1.1 -- now with mirror support!

There is a new version of cpanm (App-cpanminus) that supports --mirror and --mirror-only to allow offline usage.

Kick ass! Thanks again miyagawa

cpanm 1.1 is shipped, and with `--mirror-only` option, you can use it with your local minicpan mirror, or your own company's CPAN index (aka DarkPAN).

The only reason for a few experienced perl programmers who loves cpanm but can't use cpanm offline or at work was the lack of the proper mirror index querying support.

cpanm always has required an internet connection to resolve module name and dependencies, and always relies on CPAN Meta DB and search.cpan.org to query package index.

It's been a fair requirement for 95% of the usage, but again, for an experienced hacker who spends their most of airplane's time hacking code on their laptops, the offline support to fallback to local minicpan would be really nice. (Even though many airlines nowadays provide in-flight Wi-Fi :))

So I opened a bug to support `--mirror-only` option to bypass these internet queries and parse mirror's own 02packages.txt.gz file for module resolution a while ago, and a couple of people have tried implementing it in their own branches. (Thank you!)

Today I merged one of those implementations, and improved a little bit to make it run even faster and more network efficient. The way to use it is really simple, just run cpanm with options such as:

cpanm --mirror ~/minicpan --mirror-only Plack

and it will use your minicpan local mirror as the only place to resolve module names and download tarballs from. (TIP: you can alias this like `minicpanm` to save typing)

---- http://bulknews.typepad.com/blog/2010/11/cpanm-11-hearts-minicpan-and-darkpan.html

Sunday, January 23, 2011

Recover Iphone contacts from raw backup

Just as I got my new phone ( t-mobile MyTouch4G -- love it!) my 23 month old iphone completely refused to charge from either the wall or computer. So how to get my contacts off?

I have a full mirror of my iphone (3g) filesystem, created using rsync+ssh from within my jailbroken phone. It is way cooler to backup over wifi than through a tethered cable; I had no other choice as the data connector died after 14 months. so I had no other option to make backups. I was able to charge from a wall adapter but not from any USB hosts. This crippled setup worked long enough for me to escape my AT&T contract.

Useful tidbits:

  1. contacts are stored in AddressBook.sqlitedb
  2. file is stored in /private/var/mobile/Library/AddressBook/AddressBook.sqlitedb
  3. There is a second, bare-schema database in /private/var/root/Library/AddressBook/
  4. The database is in sqlite3 format.
  5. Person entries are stored in ABPerson table
  6. Phone number/email/etc entries are stored in ABMultiValue table
We can open this file in sqlite3 and export it into a usable comma-separated-file without any other external tools. The person entries are stored in ABPerson, but the phone number entries are stored in ABMultiValue. We join the two tables together in our CSV output.

The following snippet will copy the database to /tmp,open it in sqlite3 and export to contacts.csv. # copy the db to /tmp, then open it cp /private/var/mobile/Library/AddressBook/AddressBook.sqlitedb /tmp cd /tmp sqlite3 AddressBook.sqlitedb sqlite> .mode csv sqlite> .output contacts.csv sqlite> select ROWID, first, last, identifier, value, record_id from ABPerson p join ABMultiValue mv on (ROWID=record_id) sqlite > .quit

The file locations and names was surprisingly hard to find. On the bright side, I didn't need to decode any plist files.

There are some more interesting fields in ABPerson and ABMultiValue, feel free to update the select to grab more fields.

sqlite> .tables ABGroup ABPersonMultiValueDeletes ABGroupChanges ABPersonSearchKey ABGroupMembers ABPhoneLastFour ABMultiValue ABRecent ABMultiValueEntry ABStore ABMultiValueEntryKey FirstSortSectionCount ABMultiValueLabel LastSortSectionCount ABPerson _SqliteDatabaseProperties ABPersonChanges sqlite> .schema ABPerson CREATE TABLE ABPerson (ROWID INTEGER PRIMARY KEY AUTOINCREMENT, First TEXT, Last TEXT, Middle TEXT, FirstPhonetic TEXT, MiddlePhonetic TEXT, LastPhonetic TEXT, Organization TEXT, Department TEXT, Note TEXT, Kind INTEGER, Birthday TEXT, JobTitle TEXT, Nickname TEXT, Prefix TEXT, Suffix TEXT, FirstSort TEXT, LastSort TEXT, CreationDate INTEGER, ModificationDate INTEGER, CompositeNameFallback TEXT, ExternalIdentifier TEXT, StoreID INTEGER, DisplayName TEXT, ExternalRepresentation BLOB, FirstSortSection TEXT, LastSortSection TEXT, FirstSortLanguageIndex INTEGER DEFAULT 2147483647, LastSortLanguageIndex INTEGER DEFAULT 2147483647); sqlite> .schema ABMultiValue CREATE TABLE ABMultiValue (UID INTEGER PRIMARY KEY, record_id INTEGER, property INTEGER, identifier INTEGER, label INTEGER, value TEXT);
"DBIX::Class::Deployment handler is awesome" is an article on using DBIx::Class::DeploymentHandler (along with SQL::Abstract ) to automatically produce database version upgrade and downgrade scripts from DBIX::Class schema documents and schema layout diagrams.

awesome. This is why I follow the Perl Iron Man blogging feed. Great stuff in there!

Day one with R, head first data analysis

Awesome. I installed R (r-project) about 10 minutes ago, and I just created my first scatterplot! This is a long ways from my days with p-fit and n-fit.

I'm reading Head First Data Analysis, published by the fine folks at O'Reilly. I'm enjoying reading this Head First book. Going in, I always think the asides, cartoons and irreverent colloquial manner will be off-putting, but it really does flow nicely. I look forward to comparing it to my other new O'Reilly book, Data Analysis with Open Source Tools (released in Nov 2010).

On page 291, we see this "Ready Bake Code," to pull a csv from their website, load it into R and print a scatter plot of a subset of the data.

employees <- read.csv( "http://www.headfirstlabs.com/books/hfda/hfda_ch10_employees.csv", header=TRUE)
head( employees, n=30 )
plot ( employees$requested[employees$negotiated==TRUE], employees$received[employees$negotiated==TRUE] )

Boom, I have a scatter plot of the subset of employees where the NEGOTIATED field is TRUE, comparing the requested to the received.

I did a full install onto my ubuntu laptop by adding the official r-project aptitude repository, which gave me a slightly newer version than what was available in the default Ubuntu 10.10 (Maverick) repositories. Cran asks you to manually pick a cran mirror, I chose my local UCLA mirror.
# Create /etc/apt/sources.list.d/r.list
deb http://cran.stat.ucla.edu/bin/linux/ubuntu maverick/
# add key (optional,but preferred)
gpg --keyserver subkeys.pgp.net --recv-key E2A11821
gpg -a --export E2A11821 | sudo apt-key add -
# update aptitude
sudo aptitude update
# install r
aptitude install r-base
# launch R (not 'r' -- that's a shell built-in)
R

Tuesday, January 11, 2011

LA Tech Events -- getting busy again.

After the hibernation month of December, it seems like tech events are popping out of the woodwork here in January!

Tonight (2011-01-11) is CloudCamp LA, an un-conference on all things "Cloud." It is hosted at MorphLabs in El Segundo. More than 200 people are pre-registered! All the in-person tickets are gone, but there are still 30 registrations to watch a streamed video from home. (as of 11:11am on 1/11/11)

Tonight also features a round of Lightning talks at ScaleLA: Los Angeles High Scalability Meetup (formerly Hadoop Meetup) Meetup, hosted by the wonderful folks at Mahalo in Santa Monica.

Tomorrow is the Thousand Oaks Perl Mongers, TO is a bit of a drive from down here, but I try to make it every couple of months to visit my ValueClick peeps. Not gonna happen this month, though.

Friday is TED x Caltech : Feynman's Vision (50 years later). Tilly and I will be volunteering. I'm still unsure if I can make volunteer dinner on Thursday night, for volunteers to mingle with presenters.

Next Tuesday, Steven Hawking returns for a presentation at Caltech. One wonders how much longer he'll be out in public. Caltech Alumni can register for a lottery drawing for tickets (deadline noon on 1/13), everyone else can show up and wait in line.

Wednesday the 19th brings back Los Angeles Perl Mongers, it feel like forever since our November meeting. January finds us visiting our friends at Rent.com -- thanks for hosting! My presentation is still TBD, but I hope to have the directions and presentations squared away this week. Thanks for coming!

Thursday the 20th is another wonderful Mindshare, back in the comfortable digs of The Independent theater downtown. Now with complimentary pre-event Happy Hour! Their schedule is also TBD, good to know I'm not alone on that front.

Just over the horizon to February brings SCALE -- the Southern California Linux Expo., Feb 25-27, 2011. Make your plans now!

L.A. Nerdnite also took off the month of December, as our beloved venue, the Air Conditioned Supper Club, closed or took on new management. Look for an announcement soon of a new hip venue. Who is up for a hollywood BYOB picnic experience?

SCALE presentation proposals : denied.

Sigh, Neither of my modern perl SCALE proposals were accepted -- dev track proposals for hands on demonstrations of using Hadoop Streaming with Big Data and quickly building web applications with Dancer. I hope we get an perl mongers booth/table.

I'm glad to hear there were so many presentation proposals. Sounds like we'll have some great talks!

Dear Speaker,

The SCALE committee has reviewed your proposal(s). Unfortunately, your proposal, while excellent, was not accepted. SCALE again had many high quality submissions, so we could only accept a small fraction of those submitted (47 out of 160 submissions).

We thank you for your interest in SCALE and we appreciate your submittal! We hope you'll participate in future SCALE events. The latest updates for the conference are available at http://www.socallinuxexpo.org

Monday, January 3, 2011

New Year, New Releases

I opened my cpan mail today and received a lovely email from a user of one of my CPAN modules, Hadoop::Streaming. Reading a nice comment was a wonderful way to start the first Monday of this New Year. Included with the praise was a bug report -- double plus good!

You have absolutely no idea (or perhaps you do) how happy I was to see that there is a hadoop streaming module for perl. So I thank you for making this available! I wonder if you are still working on it or have plans to continue working on it? Are there many users to your knowledge? Finally, I've tried to run the example code myself under perl 5.12.2 and receive bareword errors when running the mapper locally.

---- Frank S Fejes III

Looking at the package and the error output in his email I realized that my hastily pushed out synopsis example had not been code reviewed -- it wouldn't compile as I wasn't quoting the arguments to the Moose keyword 'with.'

It was a small matter to fix and a breeze to push to cpan via the magic of Dist::Zilla. Thanks Ricardo!

  1. clone code from github repo : git clone git@github.com:spazm/hadoop-streaming-frontend.git
  2. edit lib/Hadoop/Streaming.pm to fix the Synopsis pod
  3. add comment to Changes file
  4. commit the change locally and back to github: git commit && git push origin master
  5. magic Dist::Zilla command, dzil release, which took care of:
    1. checking for file modified files not checked in to git
    2. running pod weaver,
    3. running tests,
    4. updating the release version in Changes file,
    5. git commit Changes,
    6. git push origin master,
    7. git branch,
    8. tar+gz release,
    9. push release to CPAN

While checking my CPAN mail, I also found a CPANTS fail notice for Net::HTTP::Factual which is built on Net::HTTP::Spore. Spore v0.3 came out and changed the spec format again from v0.2 which was in turn different from v0.1.

I tweaked my factual .spec to work with Sport v0.2 or v0.3 and pushed it up to cpan. Same magic Dist::Zilla command.

Freshly available on CPAN: