Grumpy Old Business IT

Wednesday, March 7, 2012

Almost Never Say Die

In designing, creating, deploying and supporting information systems, I follow the military model: I call lower-level decisions and tasks "tactical" and higher-level ones "strategic."

Like many (most? all?) successful people, I hate to fail and I don't do it very often, at least at the strategic level. In fact, I can't remember ever truly failing at the strategic level. Failure at the strategic level is usually catastrophic: entire projects which are never used, technology that just doesn't work, solutions that either do not solve the problem or solve the wrong problem. Avoiding strategic failure is usually matter of either knowing your problem space or recognizing early that your solution is not right.

In a sense, I fail at the tactical level all the time in the sense that I try out approaches and reject them if they do not work sufficiently well. I very rarely, if ever, simply throw up my hands in despair and tell the users to live with it.

(I do sometimes tell the users that their boss won't pay for the fix, but that is a different story, even if that story looks the same to the poor users.)

The key to useful failure, as opposed to disastrous failure, is a sense of what should be. How do you know that this job is taking too long and that a different approach is warranted? Because you have a sense of how long the job should take.

How do you know when a restructuring, an expensive, time-consuming, bug-introducing restructuring is required in order to support an apparently trivial upgrade? Because you have a sense of the present time and energy the lack of restructuring is causing and the future pain that the restructuring will avoid.

For example, I threw in the towel yesterday: I was supposed to complete a relatively minor upgrade to one of our apps. The implementation was complicated by the fact that the client had insisted that databases were too big a hammer to use to crack the walnut of configuration, so this app uses text files to store records and support a simple Web-base editor UI. I realized that having extended and re-jiggered this code for years, this final upgrade was a bridge too far: a job that should have taken an hour or two was going to take much longer. In fact, after an hour, I had just about figured out my approach.

I was also behind on delivery because I kept avoiding this project because I knew, deep down, that it would take too long and have too many bugs in the initial implementation and that it would, by my standards, be a failure. And I hate to fail.

Yesterday, after my planning and before my coding, I had to admit to myself that the right thing to do is to rewrite the entire configuration handling, basing it on...a database, as one would expect. This will make the editor easy--and easily stolen, eh re-used, from other parts of the same project. This will make the current and future changes easier, faster and safer. This will make the end user experience better than it is now. This will make the apps which use the configuration information simpler and more dependable. It is the right thing to do.

I will still fail to deliver this upgrade in what I consider to be a timely manner. But I will succeed in the upgrade and I will make the app better, faster and more robust.

One of the joys of getting older is the ability to recognize tactical failure, or impending tactical failure because once you recognize it, you can avoid it or side-step it.

Here I swerve into the uncomfortable area of ageism: the flip side of our industry's love affair with youth and energy over age and experience is a frequent lack of strategic sense. I cannot count the number of times I have had to tell a young programmer, who has proudly shown me his implementation of something, "it took you 12 hours to do a 4 hour job: you should have come to me for help instead of plowing ahead." I don't care that he (always he) stayed at his desk for 12 hours. I don't care that he did it "himself." I don't care that he feels that his implementation is special; alas, I only care that time and energy was wasted.

I am middle-aged: I no longer am willing to code for 14 hours at my desk, bullishly pounding away until the tactical job at hand is done. I haven't tried in years; perhaps I simply cannot do it any more. I hope never find out, because I am now sure that only strategic failure requires such tactical sacrifice.

On the other hand, I also no longer end up having people, when I am done with my death march, asking me "why did you spend all that time on that job? Couldn't you sense that something was wrong?"

I can sense that something is wrong. I can even admit it. And that makes all the difference.

Wednesday, February 29, 2012

Prose Before Code

I am going to criticize my fellow software developers in particular and engineers in general: I claim that as a group we write boring, confusing, rambling prose.

I realize that I am inviting criticism of my writing by doing this and I welcome the scrutiny: my claim is that writing adequately is within almost anyone's grasp and that I follow my own advice. Alas, if you find this post boring, confusing or rambling, then I have a problem.

I will confess at the outset that while I make my living writing software, I like to write expository prose. I always have. I like writing speeches, I like writing toasts, I like writing presentations, I like writing white papers. You name an expository prose form and I have probably written at least one of it.

Of course, enjoying pursuing an activity is not a reliable indicator of how competently you pursue that activity. Sad, but true: you can love doing something and still be terrible at it; witness my attempts to play Mario Kart on the Wii with my daughter. How can I be that bad? It is an enduring mystery. As my daughter often asks, "why can I beat you when I can't even drive yet?"

Sadly, unlike a video game, writing often does not provide adequate feedback, mostly because we do such a bad job of soliciting it. (Hint: asking a colleague to review an awesome thing you just wrote awesomely does not provoke honest feedback.)

I am even interested in the teaching writing; I don't it myself but I know many who do and I have had many enjoyable conversations with professionals about the trials and tribulations of teaching writing at various levels.

While I am in a confessional mood, I will admit that I was both an English major and a hardcore computer programming student in college. I am, unsurprisingly, also a big fan of reading. I interested in critical reading, recreational reading, skimming technical documents, serious reading of literature, referring to references and even reading advertising copy.

What I do not generally enjoy reading is expository prose written by engineers of almost any stripe, computer programmers included. My prejudice is shared by all the teachers of writing I have ever queried, from the ardent high school teacher to the burned out English-as-a-second-language coach who had just retired from helping a major American car manufacturer get more-or-less comprehensible prose from foreign automotive engineers.

I am so baffled by the extreme badness of so much engineer process for the following reasons:

Being a terrible writer takes ambition. You have to go for it, you have to attempt the bold and broad, the complex and the complicated. If you stick to short sentences and use standard vocabulary, you may well be boring, but you won't be cringe-inducingly awful.
Being a terrible writer requires that you flout the rules, that you go your own way, that you find your own special drummer and follow him or her without regard to where he or she takes you.
Being a terrible writer usually requires that you ignore your reader and do whatever you want to do, that you make assumptions about what the reader knows, or finds amusing, or finds clever.

But engineering and its training are all about the rules. No one expects you to rediscover the laws of Physics or the principles of circuit design or best practices in software development. Instead, you are expected to know these already and adhere to them. So why do so many engineers feel that picking up a pen, or sitting at a keyboard, is license to just go for it? After all, the compiler or interpreter or circuit board or laws of motion have no sense of humor whatsoever. We should be used to leaving the flourishes for our hobbies and our social interactions. (I know, I know, we are not a group with whom most people rush to party.)

When I corner a hapless professional writing teacher at a dinner part and run through my standard rant, the most common explanation offered is the student empowerment movement in American education. Be free of the rules! Express yourself! Be original! Let your natural talent soar! Find your voice!

What twaddle. Most of us have precious little natural talent; fewer have a pleasant writing voice that appeals to most people.

Furthermore, the forms and conventions are very effective. They give your reader a sense of orientation, of familiarity and of confidence that they know what is coming. Do I want to be startled by the originality of a white paper on how to make database calls from a given programming language to a given database management system? No, I do not: I want to acquire the desired information quickly and effectively. I have searched for this kind of information many times before and will do so many times again before I retire: excitement, suspense or high style are neither desired nor required. Just the facts, ma'am.

For example, the Unix "man page" format frees the writer from having to figure out how best to document a given function and instead lets the writer concentrate on writing the actual documentation. On the other end of the equation, the format means that the reader knows what is coming and how best to read the document.

I have been guilty of this desire to run amok myself: the first time I wrote a business plan, I was appalled at how outdated and stupid the classic form seemed to be. I knew that I could do better. I yearned to do better. I grit my teeth and half-heartedly adhered to the form and the plan was not a big hit with the investors and possible recruits at whom it was aimed.

I asked a friend who is in the business of reading business plans for some advice. Instead of undertaking a critique of my plan in particular, he gave me a sense of the audience in general. He said that he had a stack of a couple of hundred business plans on his desk, which he was supposed to quickly whittle down to a dozen or so. That dozen or so were to be passed along to the next, more senior, reviewer, and so on. In order to review hundreds of plans, the plans had to adhere rather strictly to the format: non-standard plans were usually tossed aside immediately; very very very occasionally, the non-standard plans were so awesome that he read them later, retrieving them from the discard pile. But almost always, the non-standard plans were removed as part of the first pass without really being considered.

This underscores a painful lesson I learned early on in my writing career: very few of us have readers who are obligated or deeply motivated to read whatever we write. Most of us have readers who will plow ahead only so long as they are getting more out of the writing than they are putting into it. One of my secondary school writing teachers used to put a red line in the left margin. I asked him why he did that and I will never forget his answer "to mark the point at which I stopped reading."

Sadly, many engineers fail even when the deck is stacked in our favor: we often write documents other people feel obliged to read: manuals, implementation notes, etc. And still we abuse our readers to the point where they give up part through, leaving them to flail with whatever piece of technology the documentation was supposed to illuminate.

There are many good books on writing. There are many tips and tricks. I will not attempt a mini-recap here. Instead, I will beg my fellow engineers to seek honest feedback from readers, to consider readers and to find a set of rules to which they can adhere. Just because we can get away with self-indulgent and awful prose does not mean that we should.

Wednesday, February 22, 2012

The "I Don't Know" Factor

In our consulting practice, we often commiserate with each other about our clients' poor internal communication. We spend much of our time navigating our clients' internal expertise trees, running down answers to questions that arise as we try to provide service.

We usually shake our heads in wonder at how often we have to plant ourselves, physically, in people's offices before they will answer our questions. We often note, with some bitterness, that the answer is some form of "I have no idea" and then we have to move on to the next node in the expertise tree.

Last week I was struck by an admittedly obvious-in-hindsight thought: what if these two phenomena, the long chase and the unsatisfying answer, are actually two facets of the same underlying issue? What if many of the various issues I have encountered--and about which I have written--are all related? What if, for various reasons I will recapitulate below, the answer is often "I don't know" and this ignorance is why no one ever wants to get back to us?

It seems to be taboo in our business to say "I don't know" which is a shame: if you don't know and don't feel that you can cop to that, then you don't have any good options. In fact, the only option most people have is to then become a lying weasel, frequently resorting to rudeness as a diversionary tactic.

In my experience, true experts say "I don't know" often and quickly. They usually add "but I can find out" or "here is how you can find out". If you are confident in your competence, then not knowing something is rarely shameful. In fact, as a domain expert who is rarely asked a question to which I do not immediately know the answer, I can attest that being stumped by a question is a mildly exciting change of pace and a chance to learn something.

I should point out that my rarely being stumped has less to do with my innate awesomeness and more to do with the fact that I have almost 30 years of experience in basically the same small field. At this point, I damn well ought to know 99% of what I encounter all day. And I do.

I should also point out that I claim that my deep expertise in my own small area leads people to ask me questions wildly out of my area, as though experthood were separate from subject matter, like height or eye color. When this happens, I am no more likely to have a useful answer than anyone else, but depending on how important the client and how deep the boredom, sometimes I go looking for the answer. I like learning things. Sadly, this trait is not universal.

In my experience, posers struggle valiantly to avoid saying "I don't know" and either evade or dissemble. Does this fool anyone? No, it does not. But at least people eventually stop asking you questions, which is a victory of sorts, I suppose.

But why is it not acceptable to say "I don't know" in the workplace, at least the IT workplace? Is this related to the cultural shift away from knowledge and toward opinion? Perhaps in the future, we will all have our own truth and wonder why no technology works properly.

While I am at it, why isn't it ok to say "I have no opinion" either? Must everyone care about everything? I have an iPhone, I really like it, but I have no experience with Android, Droid, Blackberry or Windows Phone and so have no opinion about them. I just don't. Why does that annoy some people?

Lest I appear to be merely an arrogant would-be know-it-all, I offer the following list of reasons an IT person might legitimately simply not know something:

Sometimes there is too much to know
Sometimes our careers flame out, making us IT zombies
Sometimes we end up as baby sitters for other people's technology

But sometimes we are just insecure jerks who don't like dealing with people when we can deal with nice, quiet, unemotional and unjudgemental technology instead. But you didn't hear it from me.

Wednesday, February 15, 2012

Modern Modularity & Iterative Development

I recently wrote an app in about two hours: a highly tailored, very powerful tool for a clerk. And I did it as a Web CGI app using straight Perl without the benefit of a rapid application development environment or even a framework like Mason.

Hurray for me. Except, of course, I didn't really do that. It is true that I wrote the app in about 45 minutes and then spent a little over an hour in a tight user feedback / iterative development cycle, resulting in the finished product. But this is the movie version, the marketing hype, the storyline that ignores all the boring ground work that went before.

It would be more accurate to say that I spent about dozen hours over several weeks on various stages of this project: still fast, still cheap, just not spectacular (or utterly unbelievable).

From my perspective, the process went like this:

The client defined a problem: clean up a server's disk area by removing redundant files stored by a system of ours.
I defined the reconciliation algorithm, using md5sum to confirm that files exist in both places. Along the way we encountered some horrific special cases, but that is par for the course in the real world.
I wrote a audit program to put a random sample of these files in a web page for review; our clerk reviewed several batches of these files spread out over the entire repository. We found that the issues were highly concentrated in time windows, so we tailored our approaches to fit the eras that we found.
Based on her feedback, I reused much of the audit program to create a web-based UI to support her review and I wrote a couple of small helper utilities to automate the review process that the clerk had used. She confirmed her earlier era-based findings.
I added a database table to hold the information that the clerk was tracking on her paper pad and which she had figured out was what she needed to know. This allowed us to break down the repository into manageable chunks and to start tracking the clerk's progress more accurately.
I wrote a daemon to do prep work in the background and another one to automate as much of her review as I could. This cut the number of items requiring human review by 90% and allowed us to start automatically correcting that 90% while she worked on the ugly 10%.
Once all that was in place, I wrote the final app, a UI to review the contents of the database as set up the by daemons and to add editing functionality to support saving the results of the review. Then I added a few links to those helper apps to support the reviewing and I was done.

This is how I write good user-intensive software: a design phase which moves quickly to practical applications; we figure out the ideal process and the software trails quickly behind until we have a system that we all trust and that does the job.

Note that my utilities and previous code are highly reusable in both senses:

I can use the code with little or no modification in different apps, which means that I only have to develop and validate once but use many times.
The stand-alone web helper apps are easy to string together to enrich the environment provided by the main app. This is an homage to the Unix "string of pearl" philosophy which I find works so well at the O/S level.

In theory I should get both of these benefits from the Object Oriented programming paradigm, but in practice I don't seem to.

In this day and age, the web technology allows us to have separate-but-easily-integrated pieces and modern programming languages allow us to reuse chunks of code easily so there is no excuse for not producing polished and power apps rapidly and cheaply.

Wednesday, February 8, 2012

When You Can't Read The Code

As the large enterprise solution becomes more and more common, I find that the keepers of such systems seem less and less interested in putting in the time to become domain experts.

As IT systems consultants, we are often interacting with these enterprise solutions. Some percentage of the time, we encounter behaviors in these systems which we find unexpected or undesirable. These behaviors may or may not be bugs, but we feel that we should understand them before we ignore them or program around them.

Back in the day, when more apps were homegrown and more systems were provided with source code licenses, the people we asked about unexpected or undesired behavior were the client's own technical people. If they did not know the answer, then they could find out by reading the code.

Increasingly, we find that our client contacts are either non-technical people or out-of-date technical people. Only a small minority of contacts are technically competent, helpful providers of deep background and technical information. We often have to bypass the client and go straight to their large system vendors because there is no local technical expertise.

To our dismay, it is our technical bretheren, our peeps, who are the hardest to deal with: now that they cannot read the code, they tend to either shrug off our questions or give us shallow guesses off the top of their heads. After all, if you can't be a hard master then why not just give up?

To our delighted surprise, the non-technical people are generally easier to deal with because they know that they don't know and they look for the answer. They will actually contact the vendor for information, or find documentation, or give us the benefit of the user's experience.

While we don't like treating systems as black boxes or as puzzles into which we pour various inputs and monitor the outputs, at least these methods actually turn up results. Getting half-baked guesses from formerly expert techies just wastes our time as we discover just how half-baked the guesses are.

Just recently I spent an afternoon running a series of experiments as we tried various inputs into a large system to see if we could figure out how to accomplish a particular upgrade for the user. We started with a half-baked guess from a supposed expert techie, proved that this techie was wrong about how the large system worked and got down to the business of figuring out to do what must be done.

It seems to me that technical people have to be able to operate in the absence of hard data. My checklist for answering technical questions goes like this:

Query experts if you can
Read the code if possible
Read the technical documentation if there is any
Query super-users if you can
Read the user documentation if you can
Query experienced normal users if you can
Write code to grok databases, logs, etc
Try to reason from first principles
Guess because there is absolutely no other option

And remember that when talking about technical matters, which are not generally matters of taste or subjective opinion, trust but verify.

Wednesday, February 1, 2012

VMs and O/Ses and Bears, Oh My!

IT Nirvana?

Computer systems offer more bang for the buck than ever before. There are more and better options than ever before. Storage is cheap and plentiful and processors are mighty and don't catch fire when you use them.

In theory, the size of data set or processing problem that is now "easy" should be so big that just about anything I do should be easy. In fact, since there this so much computing resource available to help me do it, just about anything I want to do should be simple as well. But I am finding that this implicit promise of simplicity is not being fulfilled, at least not at the system level.

I Am Not Immune

Consider my humble desktop at work. It is an Ubuntu box which is backed up automatically and offsite by the mighty boxbackup utility. Thanks to Virtualbox, I use virtual machines, of various types: a windows7 VM for development and to host my iPhones; various Linux VMs for various special purposes.

Recently we lost power while I was out of the office. My trusty desktop shut down gracefully because I run apcupsd. Hurray! apcupsd even told me when the power went off, although in true open source geek fashion, it reported the time in GMT.

However, the VMs did not shut down gracefully when their host did. Apparently, this is something that I have to set up myself using launchd. One of the VMs was trashed when the host shutdown; the other two were fine.

So on to restoring the trashed VM: the good news is that I found that my boxbackup repository was up to date; the bad news is that it contained a copy of the crashed VM in it. I am a belt-and-suspenders kind of guy, so I have a manual local back up to check: yes, I have an image in the on-site backup which is a few months old, but that is ok: these VMs do not change much over time. So, problem solved in the short-term but not the long-term.

In the long-term, I need to get some professional sys admin time or I need to change into my super tech costume and chase down these issues myself. I strongly favor the first option: the more I know, the more I value the knowledge of others. There is also the factor of money, though: the longer the Great Recession goes on, the less inclined I am to shell out real money unless I have to.

The 21st Century Data Center So Far: Boo

I bring up the plight of my desktop to emphasize that the rant that follows is not simply screed against any particular sys admin but an observation about the environment in which most sys admins have to do their jobs.

As we struggle to deliver software and service on our client's hardware we constantly run into misconfiguration of hardware, virtual hardware, operating system software and services such as web servers and database servers.

We also run into poorly implemented policy and self-contradictory policy which doesn't help and somehow offends me more: can't we at least agree on a usable definition of what we are trying to do?

As VMs become more and more common, and the ability to deploy them correctly more and more rare, we are being forced to return to the "my software, my hardware, my responsibility" model of the past. Especially when we find that off-brand Unix distributions such as AIX seem to lag so woefully behind current.

In the 1980s and 1990s, we used to drop "departmental servers" into our client's work areas because the company mainframe was too expensive, too dedicated and too central to use. In what is now known as the Apple model, life was good: the client had one contact point and we had a known, stable environment.

In the 2000s, we tried to get with the program and use existing infrastructure such as database servers, DNS and DHCP . This was a nightmare: for one thing, every new IT administration seemed to want to do things differently: Microsoft! Open Source! Back to Microsoft, but maybe running some Open Source software on the windows server! Ok, how about thin clients which were sort of Windows? Oh, were you still using the old DNS? Sorry about that--wait, let's use Active Directory for authentication! Is it set up correctly? Who knows!

A Computer System of One's Own

Now we are worn out debugging other people's hardware configurations and system software deployments. We are looking to provide software-as-a-service on our hardware. We are currently mulling over the following options:

We charge a monthly usage fee for access to a working system that is on our premises, under our control and accessed over the wild and wooly Internet via a VPN
We charge a monthly usage fee for access to a working system on a host or hosts dropped by us into the data center: we set it up, keep it up and back it up: you provide power, A/C and a lack of ambient water. Seriously. From experience, we can say that the flooded machine room is a no-go.

We are cautiously excited about The Cloud; we might very well end up using Amazon's offerings in this area, once we are sure that privacy-conscious, mostly-health care clientele can dig it.

Internal IT have been very resistant so far, but we hope that accounting issues and procedural clarity with triumph. We tried to play nicely with the other children, but they kept peeing into the sandbox.

Wednesday, January 25, 2012

Report to Database

A common gig for my consulting company these days is plugging gaps in other vendor's infrastructures.

Once upon a time, this gig was pretty pleasant: every IT-consuming manager seemed to understand that what they bought would be, at best, 80% of what was needed, so they kept time in the schedule and money in the budget to get as much of the last 20% as they could. We were brought in early and worked with the vendor to produce a happy client.

More recently, we find that larger vendors are squeezing us out, but not by doing a great job: instead, the larger vendors are often over-promising, under-delivering, dangling the promise of cheap, expert consulting and customization that never seems to materialize. But when the initial installation is done, the deadline is past and the budget is exceeded.

As a result of this successful strategy, we are brought in when it is too late, when users are fried and management is angry, when time is short and money hard to come by.

The typical story goes like this:

Client considers an enterprise solution (ES) so expensive, it must do all that one could ever want. This thinking seems incredibly naive to me, but there you are.
During the sales phase, the answer to all questions is "yes, of course it does" or "of course we will" or "we are installed in X other similar companies, trust us." I would expect this to raise red flags, but it does not seem to.
During the implementation phase, a different team from the same vendor finds many previous claims to be absurd--"I can't believe that any of our people ever told you that" is what I hear.
When the smoke clears on the installation, the client finds a given business process is not supported by the mighty ES, even by cobbling together functionality from various modules. The vendor's consulting teams come and tell the client to either stop wanting the given functionality or to wait for some bright day when that functionality is released.
For whichever of these failures simply cannot be tolerated, we are called in to plug the gap by providing the required functionality "outboard" of the ES

Given how we come into the environment, it is no surprise that our relationship with the ES vendor is not great. Often the patch requires interaction with the ES database and support from the vendor is either terrible or non-existent. The idea that we all have the same goal (making the client happy) and the same boss (the client) seems to be quite dead.

In this case, I often only have a database black box and a report with at least an identifier and a value for the column I seek. Even when we get schema documentation, it is usually, ahem, bare bones and, shall we say, "out of date."

So I sigh deeply, put on my favorite database hacking music and do the following:

I get permission to use the database (usually a secondary copy)
I get credentials with which to access the database
I am a terrifically ethical person, so I would never decompile someone else's software to get the required information. That would be wrong. Even though the client paid for the database and the software.
I use whatever database description hooks I have, eg MySQL's "show database" and "show tables" and "show columns" commands, or the local equivalent. I have a suite of database debugging tools leftover from a previous incarnation as a database vendor, so I can crack databases even if those databases are not relational databases
I write programs to take the outline of the schema and for each table:

get the list of columns and data types
select all columns for all rows

check each column against the known values
store every hit: table:column:value

manually review the output, form guesses about the schema
write programs based on guesses, eq frequency distributions
refine guesses about the schema and data based on evidence

If all goes well, after a rather long time, I will have some significant clues to the schema and how the column I care about is named, in which table it lives and how it is used. I can then do what I have to do in the way of pre-processing or post-processing.

I often find out enough to continue to provide patches for various deficiencies until someone figures out that the mighty ES, which should have ended all need for IT work, is being augmented in this way and forbids any more work.

Thank God a new mighty ES seems to come along every couple of years, so I guess this cycle will keep me off the street for a while yet.

Pages