Grumpy Old Business IT: March 2012

Wednesday, March 28, 2012

Experience is Wasted on the Young

Ageism is legendary in the computer technology business: we all decry the fact that managers are pressured to hire "kids right out of college who grew up with this technology." But the pressure continues and the outcome remains the same: otherwise inexplicably bad software.

For the sake of brevity, let us choose an example of a hot new development technology: Ruby on Rails (http://rubyonrails.org/). This assumes, of course, that some thing can still be considered "hot" if I have heard of it.

If you hire a group of neophytes who "grew up with" Ruby on Rails, what do you get? You get a group of programmers who can hit the ground running in your chosen development technology, which is good. What do you lack? You lack domain knowledge in your target domain, you lack real-world project experience, you lack experience in all that a working, commercial system must do other than meet a technical spec; in fact, you lack just about all relevant experience outside the narrow domain of Ruby on Rails development.

The only scenario I can imagine being well-served by a neophyte team of Ruby on Rails programmers is a contest of some sort where the goal is to create a Ruby on Rails app in a short time that will impress a panel of college students.

Organizations keep going down this path and keep being disappointed that the software they get isn't very good. In fact, it is often worse than the software it replaces. And there isn't usually a progression within a domain, of software getting better and better, and feature sets evolving to keep the best features and lose the apparently-cool-but-actually useless ones.

This is because evolution is a cumulative process: if you keep starting from scratch, you keep making the same mistakes. This is the unavoidable truth even if you are using ever-more-hip environments to make those mistakes: perhaps you will be able to make those mistakes faster and cheaper and on more platforms than you could before, but you will still make those mistakes.

I know that there are software Mozarts out there, able to write great code from an early age, but "great code" does not mean "great software." "Great code" usually means "code that impresses other programmers." All too often users of the software built with that code are left wondering how the author of an irritatingly hard to use program is considered so highly by his technical peers.

As a practical example from my own experience, I offer this common task: loading data into a database. I am an expert in "moving data around" so I am often called in to assist with this particular job. I often find a young, inexperienced team of programmers who are expert in a particular technology, but not computer system experts per se. Increasingly, the home team are all experts in some interactive-oriented technology, such as Ruby on Rails or Visual BASIC, and not at all comfortable with batch-oriented technology such as SQL transaction sets.

The common mistake for the young and inexperienced is to try to force a batch operation into an interactive paradigm, to write code to take the input data and force it through the system as if a superhuman typist had entered it at the keyboard. This mistake has two possible failure modes; sometimes it fails in both ways at the same time:

The loading does not work reliably
The loading takes an unusably long time

When the loading is unreliable, the issues are almost always exception-handling and/or data processing.

Experience tells me that usually the right way to handle a batch update is to apply the transaction one-by-one, putting aside the failures for later human review, correction and reapplication. Only rarely does the user want an entire data set rejected because of a typo in one of the fields of one of the records.

Users generally are far more comfortable with a UI to tell them how many record were in the batch, how many went applied to the database and how many were not applied. They really like being able to inspect and correct the bad records and re-process them until the batch is closed. Does this fit the textbook SQL COMMIT & ROLLBACK paradigm? No, it does not. Sadly for our team of neophytes who will have to learn the hard way why the users are so unhappy with their work. Maybe as our team gets more experience, they will get better, if they can find a job after they hit 28.

When the loading is slow, the issue is almost always trying to put a large number of transactions through a pipe designed to service human input.

The overhead in servicing human input is vast, but the human data rate is so low that this overhead is quite acceptable. Turn that equation on its head--fast data rate, no user to react to the various edit checks--and you have a very poor computing paradigm.

Instead of trying to put data through the human channel, experience tells me that it is often quick and easy and reliable to process the incoming data directly into SQL INSERT, DELETE or UPDATE statements and let the database manager do its thing. If you need to make multiple passes over the data to establish relationships, then do that. If you need to sort through transactions to weed out duplicates or just take the most recent change, do that too.

When you are done fine-tuning your batch process, you can schedule the job to run in the background and voila! yesterday's data is loaded before the start of work today. The computer doesn't mind the extra work, so make the process redundant and have it try over and over again so that delays are not a problem and so that accidental resubmission of data is rejected.

Experience tells me that these operational aspects are what makes a data loading process robust and useful. I don't see how being expert with a given development environment, but ignorant of real-world issues, would make me better at my job.

Now if I could only convince the ever-growing cadre of number-crunching, budget-watching managers who came up through the management track that this delightful fallacy is rarely a good strategy. Maybe I need to find a Ruby on Rails book and add some more buzzwords to my CV.

Thursday, March 22, 2012

Little Things Mean Alot, Eventually

Being a science-oriented male, I have a perhaps under nuanced view of the world. At both work and home, I tend to prioritize my tasks, which is usually good.

But when there is not enough time to do everything, this method has a huge drawback: because there is always a high priority task to do, low priority tasks never get done. This causes problems both at work and home: eventually the owners and stakeholders of even low priority tasks tire of waiting for attention.

As is so often the case, I drew on Linux kernel development for inspiration in my life. [This is a huge exaggeration for comic effect. Really. I hope.]

In the early days, the Linux kernel's scheduling was relatively simple: the more important tasks, such as disk I/O, had high priority and relatively un-time-critical tasks such as servicing the keyboard had relatively low priority. This meant that early Linux systems were amazingly good at getting performance out of mediocre hardware BUT performance degradation was not graceful. In other words, when things went bad they went very bad very quickly.

I remember hammering on the keyboard in frustration, having provoked a database server into taking on more work than it could handle, my typing apparently having no effect, until the system decided to see what was happening in keyboard land, whereupon all of my frustrated input, all of it, was faithfully processed to amusingly ill effect. Well, amusing with the distance of 19 years. At the time, I was beside myself.

Soon thereafter, the raise-the-priority-with-age algorithms were fine-tuned, giving me a much better experience even when I do something really stupid. Or when other people do something stupid. When stupidity raises its ugly head.

Human beings seem to have an innate sense that age should raise the priority of a task. Intellectually, we all want to do the most important tasks (or easiest, or most praise-garnering tasks) but viscerally we also want the small things which have been bugging us forever to get done.

As IT people, we often forget that. I know that I am inclined to avoid what I consider "trivia" because I feel that I am so important that such things are a waste of my time. Alas, I cannot seem to get much buy-in to this view of the importance of my time, so I had to institute "laundry day" in our consultancy: a day every couple of weeks dedicated to doing the laundry: fixing or adding all the small, silly, useless things that users value and that they will eventually rage about unless we get to them.

We see many IT organizations which do not make time for trivia, to very ill effect. We sympathize with IT folks who are not given this opportunity: if your boss does not let you do the laundry, it isn't your fault that you stink (figuratively speaking: the tendency of programmers toward low standards of hygene is another matter entirely).

Which reminds me: currently, I am WAY overdue to replace one of the outside lights, which lights are important to my wife but not to me. I will try to get to that today even if I feel that I have more important things to do. Sigh.

Wednesday, March 14, 2012

Uneven Distribution of IT Talent or Expectation?

Once upon a time, I wrote a book about IT management. This was something of a vanity project: I got some stuff off my chest and then I felt better. (This was before blogs, which are much better suited for venting.)

Since I have a side business publishing books, this vanity project at least had some practical value: we debugged the process with my vanity project before moving on to other projects.

However it came to be, I was struck by the reaction to my book: the few people who read it fall neatly into one of two camps:

This book is useless: it describes solutions to management problems that no one has because no IT environment is this bad.
This book is a much-needed guide to the horror that is current American corporate IT.

This bifurcated reaction raises a question, regardless of my book per se: is IT talent massively unevenly distributed, or does expectation or understanding of IT performance vary massively from organization to organization?

I have not been able to make much headway answering this question: many of the people I have surveyed are MBAs who seem both confident in their ability to assess IT and greatly lacking in actual ability to assess IT at a fundamental level.

On the other hand, since IT management is increasingly about pleasing MBAs, perhaps I just out of date about what constitutes IT competence.

In my consulting practice, we see a wide range of IT competence with a strong bias toward low competence. So I am inclined to believe that talent is vastly unevenly distributed. But why would that be? Is financial and accounting talent similarly unevenly distributed?

Many colleagues have pointed out to me that this bias in my observation is to be expected: we are consulting firm, after all, and only incompetent organizations need our services.

I am not a big fan of this idea: I maintain that rational, competent IT organizations can choose to outsource specific jobs for any or all of the following reasons:

There is not enough time to acquire the expertise in-house.
This is a pilot project with an uncertain future.
You are not a development shop and cannot find an off-the-shelf solution.

In the past, we have done project with companies internationally recognized for their excellence, so I dispute the notion that only weak organizations consume IT consulting, even though there is some of that.

My quest continues: someday I may know if the talent or the expectation is what varies so greatly from organization to organization.

Wednesday, March 7, 2012

Almost Never Say Die

In designing, creating, deploying and supporting information systems, I follow the military model: I call lower-level decisions and tasks "tactical" and higher-level ones "strategic."

Like many (most? all?) successful people, I hate to fail and I don't do it very often, at least at the strategic level. In fact, I can't remember ever truly failing at the strategic level. Failure at the strategic level is usually catastrophic: entire projects which are never used, technology that just doesn't work, solutions that either do not solve the problem or solve the wrong problem. Avoiding strategic failure is usually matter of either knowing your problem space or recognizing early that your solution is not right.

In a sense, I fail at the tactical level all the time in the sense that I try out approaches and reject them if they do not work sufficiently well. I very rarely, if ever, simply throw up my hands in despair and tell the users to live with it.

(I do sometimes tell the users that their boss won't pay for the fix, but that is a different story, even if that story looks the same to the poor users.)

The key to useful failure, as opposed to disastrous failure, is a sense of what should be. How do you know that this job is taking too long and that a different approach is warranted? Because you have a sense of how long the job should take.

How do you know when a restructuring, an expensive, time-consuming, bug-introducing restructuring is required in order to support an apparently trivial upgrade? Because you have a sense of the present time and energy the lack of restructuring is causing and the future pain that the restructuring will avoid.

For example, I threw in the towel yesterday: I was supposed to complete a relatively minor upgrade to one of our apps. The implementation was complicated by the fact that the client had insisted that databases were too big a hammer to use to crack the walnut of configuration, so this app uses text files to store records and support a simple Web-base editor UI. I realized that having extended and re-jiggered this code for years, this final upgrade was a bridge too far: a job that should have taken an hour or two was going to take much longer. In fact, after an hour, I had just about figured out my approach.

I was also behind on delivery because I kept avoiding this project because I knew, deep down, that it would take too long and have too many bugs in the initial implementation and that it would, by my standards, be a failure. And I hate to fail.

Yesterday, after my planning and before my coding, I had to admit to myself that the right thing to do is to rewrite the entire configuration handling, basing it on...a database, as one would expect. This will make the editor easy--and easily stolen, eh re-used, from other parts of the same project. This will make the current and future changes easier, faster and safer. This will make the end user experience better than it is now. This will make the apps which use the configuration information simpler and more dependable. It is the right thing to do.

I will still fail to deliver this upgrade in what I consider to be a timely manner. But I will succeed in the upgrade and I will make the app better, faster and more robust.

One of the joys of getting older is the ability to recognize tactical failure, or impending tactical failure because once you recognize it, you can avoid it or side-step it.

Here I swerve into the uncomfortable area of ageism: the flip side of our industry's love affair with youth and energy over age and experience is a frequent lack of strategic sense. I cannot count the number of times I have had to tell a young programmer, who has proudly shown me his implementation of something, "it took you 12 hours to do a 4 hour job: you should have come to me for help instead of plowing ahead." I don't care that he (always he) stayed at his desk for 12 hours. I don't care that he did it "himself." I don't care that he feels that his implementation is special; alas, I only care that time and energy was wasted.

I am middle-aged: I no longer am willing to code for 14 hours at my desk, bullishly pounding away until the tactical job at hand is done. I haven't tried in years; perhaps I simply cannot do it any more. I hope never find out, because I am now sure that only strategic failure requires such tactical sacrifice.

On the other hand, I also no longer end up having people, when I am done with my death march, asking me "why did you spend all that time on that job? Couldn't you sense that something was wrong?"

I can sense that something is wrong. I can even admit it. And that makes all the difference.

Pages