Monday, October 19, 2009

Cowboy culture.

Cowboys.

What is a cowboy, in this context? He's writing scripts, not programs not software. He's someone who runs off on his own, thinking he understands what his is working when he is really cargo culting and making guesses and assumptions. Loves making changes live in production and uses phrases like "I think it should work, right after this patch." Patching things up in a belt-and-suspenders way that obscures the original logic or business case. They have yet to see the light of testing (at all, let alone automated unit testing with high coverage), check things into source control after-the-fact which may or may not match what is in production (which well may vary from machine to machine). And "hey, this 6000 line function should basically work, most of the time, but when it doesn't I have errors sent to /dev/null in the crontab."

Let's say you're a cowboy or you work with one. How do you survive? Adapt? Reform?

Blast off and nuke it from orbit, its the only way to be sure.
--Aliens
If only we could just nuke it. But that's not going to happen until you have a replacement. The fact that the replacement is cleaner and more understandable won't be enough unless it also faster and provable more correct and quite likely it'll need to be bug-for-bug compatible with the old code.

What's the problem with just rewriting it all from scratch? First, code archeology time. When dealing with legacy code, that is spaghetti and undocumented and untested, you will spend most of your time figuring out what it does and why. Is it important that this test short circuits? Why is most of the logic up here, but some is down there? Does the sort order of this array of keynames matter? Are there implied dependencies?

Now, the only reason you're in there is that something is broken, and now someone has allocated some time and resources to fix it, firefighting style. Otherwise, no one wants to go near that code. Especially given that the code owner seems to be putting in herculean efforts to keep it running (staying up late most nights handling error escalations that ring his pager at 4am). Of course, it also probably terribly important code to the business (it handles logs or stats or something similar in the direct money path) so it HAS TO BE DONE! OMG OMG THE SKY IS FALLING FIX IT RIGHT NOW!

Resist the urge to just jump in and make that one change. Yes, someone will be yelling "why isn't this ready" and you'll have to be firm on your reply that you can't know that simple change won't affect the whole system negatively in a chaotic system. Blind refactoring my just further hide the business logic and cause you to rewrite and obscure current bugs. You have to assume there are other problems than the one you are fixing, just no one has noticed the others (or noticed and failed to report). You don't want to be the one called at 4am because your simple fix took payroll offline when the 3am job kicked in and was expecting some old, broken format.

I recommend a two fold approach. First you start with a light bottom-up refactor. Trim the lexical variables down to their minimum needed scope, and change the seven $t variables into useful names that match their scope (big scope == big more descriptive name). Pull blocks of that behemoth function into smaller blocks. Find a test case that exercises the important features, and make sure the original code runs on it reproducibly, giving the same output each time. Really start with this test case. I've been burned so many times chasing my tail because my test output doesn't match the original, due to some random or non-causal output from the original code. Then find any external dependencies (the database, time-of-day, phase-of-moon) and start thinking about how you'll test them.

Once you have that done, start working top-down. You can't do this step until you've been a bit steeped in the code as you won't know the right questions to ask the business sponsors. You have to know how it does things without letting it set your mind-view of how things will be done -- a fine line to tread. Now we look at the problem and the problem domain and wonder if this approach is still valid, given the way the data and data model have changed. Can you use bring some patterns into the code, separate out pre-processing and report definition and number crunching? What can you learn from the evolution of the old code, over multiple passes of tweaks and updates about what we've learned from the business? There is value in that knowledge, if you can separate the wheat from the chaff, the important changes from the incidental, accidental and cargo-cult-copy-and-pasted changes. Build your modules from the ground up to be reusable and modular and yet designed for the business case that the current script handles. Test as you go, you'll be so much happier.

Now, you have your middle rewrite written. It has some of your top down and all of your bottom-up changes. Test it against some minimal output, comparing with the old script. You're going to be running this test a lot, please write a script to automate it. You'll thank yourself later which is nicer than cursing yourself out later because that 1/2 hour test has gone wonky because you broke your shell command history. Now you'll be adding bug-for-bug compatibility, to make sure your code produces output to match current production. Add those bugs, really. And then document them in your bug tracking and make sure they really are bugs and not "oh my goodness, of course I need the fact that the reports come out sorted by the third character of the report name" expected functionality by someone.

When they match, make the switch. Now there should be no visible difference from the outside. But now you can go to town on your in-between scaffolding code. That's why you put in all those unit tests. Now you can hack up chunks of internals and know you aren't affecting the eventual output. Soon you'll have a business critical chunk of software that won't call you for help at 4am, a program you're proud to have rescued from it's prior life as a "script written by a cowboy."

Update: Todd sent me links to two of his cartoons from asciiville, from when he was dealing with a "slew of these cowpies".

BTW: I loved the Cowboy Culture post. I had to fix a slew of those cowpies about a year ago, and I drew a couple of toons as a release. I thought you might enjoy these as we are on the same wavelength with respect to cowboy coding.. :)

bedtime stories

the new recruit

No comments: