Tuesday, October 27, 2009

LA Perl Mongers: Pushed back a week

Our next L.A. Perl Mongers meeting has been pushed back a week.
Instead of 10/28 we'll now meet on Wednesday, November 4th.

Topics:

  1. Tommy Stanton: Testing for the Win! Test::More, Test::Most and Test::Unit.
  2. Andrew Grangaard: either "Care and Feeding of third party modules, revisited -- a local::lib example" or "Hadoop with Perl"

la.pm.org

Wednesday, October 21, 2009

Rakudo October Release -- aka Thousand Oaks

Congrats Aran, Shawn and Todd!

Exciting news. Jon has decided to name the October Rakudo release after the TO group, due to the excitement and buzz from our perl6 hackathon.

Rakudo Perl follows a monthly release cycle, with each release code named after a Perl Mongers group. The October 2009 is code named "Thousand Oaks" for their amazing Perl 6 hackathon, their report at http://www.lowlevelmanager.com/2009/09/perl-6-hackathon.html, and just because I like the name :-)
-- Rakudo.org

Thanks for inviting me to the perl6 hackathon, that was loads of fun. Crazy to remember a time when I could spend 8 hours on a Saturday doing fun hacking and not working... Looking forward to doing it again, maybe hosting down here in Santa Monica (but it's hard to hack when the beach is right there, just calling). Maybe this time I'll even write some perl6. (though pir was fun too).


Subject:Rakudo October Release
From:Jonathan Scott Duff <perlpilot@<elided>.com >
Date:Tue, 20 Oct 2009 21:42:54 -0500
To: Andrew
Hello there, I'm handling the Rakudo release for October in a couple of days and I wanted to let /someone/ from TO.pm know that I've chosen Thousand Oaks as the code name for this release for two reasons:

1) You guys held a Perl 6 hackathon TO++
2) I just like the name "Thousand Oaks" :-)

Your blog was linked in the release document, so you're it for me contacting TO.pm.

I realize I probably could have just emailed TO's mailing list, but for some reason that didn't sit well with me. It felt as if it would be too abrupt. In any case, feel free to share the news with the rest of TO.pm (or just wait for the release annoucement if you want :-)

Anyway, cheers,

-Scott
--
Jonathan Scott Duff
perlpilot@<elided>.com

Monday, October 19, 2009

Cowboy culture.

Cowboys.

What is a cowboy, in this context? He's writing scripts, not programs not software. He's someone who runs off on his own, thinking he understands what his is working when he is really cargo culting and making guesses and assumptions. Loves making changes live in production and uses phrases like "I think it should work, right after this patch." Patching things up in a belt-and-suspenders way that obscures the original logic or business case. They have yet to see the light of testing (at all, let alone automated unit testing with high coverage), check things into source control after-the-fact which may or may not match what is in production (which well may vary from machine to machine). And "hey, this 6000 line function should basically work, most of the time, but when it doesn't I have errors sent to /dev/null in the crontab."

Let's say you're a cowboy or you work with one. How do you survive? Adapt? Reform?

Blast off and nuke it from orbit, its the only way to be sure.
--Aliens
If only we could just nuke it. But that's not going to happen until you have a replacement. The fact that the replacement is cleaner and more understandable won't be enough unless it also faster and provable more correct and quite likely it'll need to be bug-for-bug compatible with the old code.

What's the problem with just rewriting it all from scratch? First, code archeology time. When dealing with legacy code, that is spaghetti and undocumented and untested, you will spend most of your time figuring out what it does and why. Is it important that this test short circuits? Why is most of the logic up here, but some is down there? Does the sort order of this array of keynames matter? Are there implied dependencies?

Now, the only reason you're in there is that something is broken, and now someone has allocated some time and resources to fix it, firefighting style. Otherwise, no one wants to go near that code. Especially given that the code owner seems to be putting in herculean efforts to keep it running (staying up late most nights handling error escalations that ring his pager at 4am). Of course, it also probably terribly important code to the business (it handles logs or stats or something similar in the direct money path) so it HAS TO BE DONE! OMG OMG THE SKY IS FALLING FIX IT RIGHT NOW!

Resist the urge to just jump in and make that one change. Yes, someone will be yelling "why isn't this ready" and you'll have to be firm on your reply that you can't know that simple change won't affect the whole system negatively in a chaotic system. Blind refactoring my just further hide the business logic and cause you to rewrite and obscure current bugs. You have to assume there are other problems than the one you are fixing, just no one has noticed the others (or noticed and failed to report). You don't want to be the one called at 4am because your simple fix took payroll offline when the 3am job kicked in and was expecting some old, broken format.

I recommend a two fold approach. First you start with a light bottom-up refactor. Trim the lexical variables down to their minimum needed scope, and change the seven $t variables into useful names that match their scope (big scope == big more descriptive name). Pull blocks of that behemoth function into smaller blocks. Find a test case that exercises the important features, and make sure the original code runs on it reproducibly, giving the same output each time. Really start with this test case. I've been burned so many times chasing my tail because my test output doesn't match the original, due to some random or non-causal output from the original code. Then find any external dependencies (the database, time-of-day, phase-of-moon) and start thinking about how you'll test them.

Once you have that done, start working top-down. You can't do this step until you've been a bit steeped in the code as you won't know the right questions to ask the business sponsors. You have to know how it does things without letting it set your mind-view of how things will be done -- a fine line to tread. Now we look at the problem and the problem domain and wonder if this approach is still valid, given the way the data and data model have changed. Can you use bring some patterns into the code, separate out pre-processing and report definition and number crunching? What can you learn from the evolution of the old code, over multiple passes of tweaks and updates about what we've learned from the business? There is value in that knowledge, if you can separate the wheat from the chaff, the important changes from the incidental, accidental and cargo-cult-copy-and-pasted changes. Build your modules from the ground up to be reusable and modular and yet designed for the business case that the current script handles. Test as you go, you'll be so much happier.

Now, you have your middle rewrite written. It has some of your top down and all of your bottom-up changes. Test it against some minimal output, comparing with the old script. You're going to be running this test a lot, please write a script to automate it. You'll thank yourself later which is nicer than cursing yourself out later because that 1/2 hour test has gone wonky because you broke your shell command history. Now you'll be adding bug-for-bug compatibility, to make sure your code produces output to match current production. Add those bugs, really. And then document them in your bug tracking and make sure they really are bugs and not "oh my goodness, of course I need the fact that the reports come out sorted by the third character of the report name" expected functionality by someone.

When they match, make the switch. Now there should be no visible difference from the outside. But now you can go to town on your in-between scaffolding code. That's why you put in all those unit tests. Now you can hack up chunks of internals and know you aren't affecting the eventual output. Soon you'll have a business critical chunk of software that won't call you for help at 4am, a program you're proud to have rescued from it's prior life as a "script written by a cowboy."

Update: Todd sent me links to two of his cartoons from asciiville, from when he was dealing with a "slew of these cowpies".

BTW: I loved the Cowboy Culture post. I had to fix a slew of those cowpies about a year ago, and I drew a couple of toons as a release. I thought you might enjoy these as we are on the same wavelength with respect to cowboy coding.. :)

bedtime stories

the new recruit

Monday, October 12, 2009

Congrats on the new book, Wei-Hwa!

Wei-Hwa Huang has a new book out, Mutant Sudoku. As one of the world's top puzzlers, I'm sure he'll bring an interesting take on the game. Congratulations on your new book!

I remember when he first introduced me to Sudoku via a terrible pun involving a certain Count from Star Wars, about 2 years before the US sudoku craze began. I really didn't appreciate the quality of the joke until much later, and now I can't seem to find that quote in the gale logs. I did find a mention of the sudoku t-shirt he made in silk screening at Caltech, circa 1996. So clearly he's been aware of these puzzles for a long time.

My sophomore year I took the Putnam Exam, and was happy to emerge from the test alive, I think I even got a few points of partial credit. Wei-Hwa, as a frosh, did amazingly well. Now that I look it up, I see he was a Putnam Fellow in 1993. Top 5 in the country. Wow.

Seriously, how many people do I know with their own wikipedia entry?

Sunday, October 11, 2009

LA perl mongers October update and September recap

September's perl mongers meeting was awesome. We had two presentations (both from me!). The meeting for October will be Wednesday October 28th. The first presentation was an example of getting work done using perl. Specifically using JIRA::Client (a thin wrapper around SOAP::Lite) to access a JIRA bug tracking installation to pull bug counts. Slides included fully working example code. The author of JIRA::Client commented on the blog post. That is so exciting! That's what community and social coding feels like.
Nice example. It inspired me to make it easier to get from filter names to filter ids. I just released JIRA::Client version 0.16 which implicitly casts filter names into filter ids in the calls to getIssueCountForFilter and getIssuesFromFilterWithLimit.
-- Gnustavo
The second presentation was a discussion of "care and feeding of third party perl modules." We started with my blog post and went around discussing what approaches people had tried, which ones they liked which ones they found lacking. Tommy was kind enough to run the video camera for part of the discussion, so once I transfer the tape, we'll have something to upload.

Some of the main points to come out of the discussion were: the importance of staying up-to-date, of having unit-tests of the features you expect and use from external modules, green field testing (make sure you can build it all from scratch in your test environ). The need for a company to institute some sort of revisioning on top of CPAN came up a few times.

Some novel ideas included: source control hooks that check for external modules used in a given commit and updating all those modules to current release, requiring the programmer to verify that his/her checking works with current modules; a single repository for third party modules and other code, that can be easily pulled into any internal project or repository (local::lib helps here); considering what is pushing you to be "up-to-date" as maybe you don't actually need it (sacrilege!)

Some related tasks that need to happen soon: upload the video and notes from the second presentation before the October meeting. Update the website with the October date Edit:done. Put up the slides for the JIRA::Client talk. Find a speaker for October ( Tommy signed up, but then had to defer to November). Find a November date to work around Thanksgiving. Decide if we need to cut back to a single speaker (or two speakers every two months).

Tuesday, October 6, 2009

risks and mistakes

People who don't take risks generally make about two big mistakes a year. People who take risks, generally make about two big mistakes a year.
-- Peter Drucker

This was our quote of the day yesterday at work. Serendipity that Mallory would pick one of my favorite quotes on the one year anniversary of my coming to work for the Rubicon Project.

I think I've made my share of mistakes over the past year -- mostly from not taking big enough chances not adapting and changing quickly enough. Something to think on during the coming year.

Sunday, October 4, 2009

vimdiff ... where have you been all my life?

I'm finally giving vimdiff a try. I normally get by with diff, diff -u (unified) and diff -u -w (unified, ignore whitespace) and their ssh equivalents svn diff (unified) and svn diff -x -w (to ignore whitespace).

I know vimdiff exists, and have used it trivially once or twice. But never felt a need to dive in.

Right now, I'm looking at a file of a coworker's modified code, trying to figure out which version it originally corresponded to. I've looked at the diffs, and I think I know which version it is, but I was having trouble comparing the lines to see what some of the diffs mean.

I popped it up in vimdiff vimdiff file1 file2 and now I have a lovely side-by-side view of the two files, with coupled scrolling. Chunks of unmodified code are folded and out of the way. The vim folding commands work normally: zo will open a given block if I need to see that code and zc will close that block back up. zA to open all and zC to close all.

The normal vim window/frame commands can be used to switch between the two frames. Since scrolling in either file scrolls both files, there isn't a big need to switch between the frames, except when examining long lines. By default, line-wrap is off for the diff, so long lines appear truncated until the viewport is scrolled. control-w w will switch between the two frames. Jump to the end of the long line with $. Again, both frames will scroll together left/right just as with up/down. Alternatively, :set wrap in command mode will enable word wrapping, this needs to be done in each frame independently. If literal tabs are making your lines too long, try displaying tabs as smaller entities: four character :set ts=4 or two character :set ts=2. Again, this must be applied to each buffer independently.

I really like the intra-line difference highlighter. The whole line is highlighted, but the changed portions are highlighted in a different color. Purple and Red respectively in my setup. That helps me pinpoint the exact character changes, so I can focus on see the "why" of the change instead of digging for the "what".

vimdiff and svn is not an either/or proposition. svn allows an external diff command, via the --diff-cmd switch. Unfortunately, vimdiff doesn't work out-of-the-box with this flag, as svn adds extra information when passing to the diff program. A have a very short wrapper called vimdiff-svn-wrapper that I use to drop the extra arguments. I have this in my path and use svn diff --diff-cmd vimdiff-svn-wrapper filename to run svn diff on filename, displaying the output in vimdiff.

On the other end of the spectrum is svnvimdiff from vim.org. This runs vimdiff on the output of svn diff. It's messy the way it uses temp files and I just tried the version I downloaded last year it didn't work for me. I've just written a new version. Had I checked the link, I'd see the original is on version 1.5 and I have version 1.2. My version uses the vimdiff-svn-wrapper with svn diff --diff-cmd. I have directly copied his clever method of getting the modified svn files by parsing the output of svn stat.

Time to get back to figuring out the changes in his code...

Dear Moose

Dear Moose,
CC: TDD

Just a quick note to say, "Thanks for being awesome!"

Hanging out with you both this weekend was awesome. I love the little test-driven proof-of-concept program I wrote with you guys. It was a blast!

I wish I could share the code with my other friends, but you know it is Antony's project and he's a bit paranoid that his idea will escape into the wild. Normally, I think that's a silly attitude, but in this case I understand. His project is definitely not something I'd thought of or considered previously and after a first stab of research it still seems novel. More importantly, after he described it I couldn't stop myself from blurting out: "I WANT ONE!" Which is a good initial reaction for a consumer product.

I had ideas for how to build a simulator for the concept bouncing around my head all week. I finally got both time and energy together on Saturday night. Moose, making objects with you let me focus on the features rather than the boiler plate. 10 has lines later and I had a loadable module. These should be Ints, these should be Bools, this one can't be changed, I told you and *poof* my module had input verification. Very slick. Then I got started writing tests for new features, so I could write the features.

Write test, fail test, write feature, pass test.
--TDD
TDD, it seemed so strange when you first said it, but I'm starting to get it now. Its getting to be a real rush to test the code and see it pass. I'm glad our buddy VIM was there to help, :make for the win.

The design is simple enough that refactoring becomes more obvious. With the minimal boilerplate overhead, it was easy to pull my messages out into first class objects. Maybe that will eventually become a role?

Here's a look-alike of the test I wrote to verify that I could build the objects and get a message object to pass between them.

Looking forward to hanging out with you again soon. Take care!

peace,
Andrew

Thursday, October 1, 2009

local::lib

I used local::lib this week. Wow, it really is a nice way to install cpan modules into my source control tree, to build an app-specific perl5 lib.