Pages

Wednesday, April 11, 2012

Reasoning From Worst Principles

I have nothing against reasoning from first principles; in fact, I really enjoy it. Reasoning from first principles is an excellent alternative when experience and expertise fail us. I particularly like reasoning from first principles when helping people debug large, complex systems: I find that going back to basics very helpful in avoiding the prejudices and preconceptions that are usually at the heart of really thorny debugging sagas.

However, there is a situation in which I find reasoning from first principles to be very frustrating: when management values their reasoning from first principles over all other input, including experience and expertise.

I understand that modern IT decisions are complex and cover such a vast array of possibilities that making a truly informed decision is often impossible. But we should not let the fact that we can't be perfect provide us with an excuse to be terrible.

I find that in the face of uncertainty only flexibility offers a high probability of success.

To a recent example from my own consultancy, we are in the midst of adding disk space to our core server. This is a very common situation, although there are a few complicating factors:
  • Our server is a High Availability Linux cluster with a shared RAID
  • Our server is a bit long in the tooth
  • We really prize our server uptime
 In consultation with our sys admin experts, I was presented with the following options:
  1. Get a new server, which would be more current and have more capacity
  2. In-place hard disk upgrades
  3. External disk upgrades
I evaluated the options thusly:
  1.  New server
    1. Pros: effective, easy fall-back, modernized environment
    2. Cons: expensive, housing old & new, possible environment issues, not indicated except for disk space
  2. In-place upgrade
    1. Pros: leaves most of server intact
    2. Cons: always a risk to crack open a case, requires lots of human attention, process is complex
  3.  External disk upgrades
    1. Pros: cheap in hardware cost and human time, minimum disruption to servers
    2. Cons: probably mediocre performance, another point of failure, chance to trip over cable, etc
I chose option 3 as being the cheapest, fastest, safest option.

Given what happened next, I was probably wrong:
  • the first external SATA enclosures did not play well with the kernel
  • the second external SATA + disk combination takes too long to wake up
  • in rearranging the enclosures showed up an issue with the outdated internal disk on the primary node
So now it looks like we will have to do option 3 for now and look into option 2 as well. Oh, well: that's life when you don't know everything. At least the server has stayed up and useful, we haven't more than we can afford and I see a path to my goal. I had to change course with the first set of enclosures and now I will have to change course again. Responding to real-world consequences of actions is how adults deal with reality.

Sadly, in our information systems consulting, we are running into magical thinking with ever-greater frequency. Specifically, we are running into the following template:

We had a problem, P, which we needed to solve. To solve P, we had to decide on a solution. That decision is decision D. In order to make decision D, we over-simplified P and declared that system S would completely solve P. We paid so much for S that we reason, from first principles, that problem P is solved. Since this is the real world and therefore complicated, there are aspects of system S which are imperfect, but we declare that since system S cost so much, it must do so much and it must do enough.

This is not simply a case of my being nit-picky and snarky about the usual inability to think clearly. In this case, the managers have defined the problem as not existing. Fixing a problem that does not exist is either impossible or unnecessary  or both.

When this magical thinking is in place, the tension that our customers live under is huge and is caused by the fact that they live with problems but are declared to be without problems.

There is usually the added tension of the implication that not being able to use the expensive solution to solve their problems makes them incompetent; there is often the complication that someone in their chain of command had to consciously over-promise in order to get the money for the expensive solution, so trying to get help is viewed as disloyal.

As an outsider trying to solve the problem, it is very uncomfortable to be told that whatever else we do, we may not name the actual problem since the actual problem is officially nonexistent.

This gives us, the outside consultants, two options: either look for related solutions which might help the actual problem, or be heretical and name the actual problem. Sadly for us, many professionals view consultants as a kind of Foreign Legion: expendable troops. If we poison our relationship with their bosses by saying what they cannot, so what?

We understand that only we care about how uncomfortable or unpleasant our working conditions are and there is good money to be made as IT Foreign Legionnaires, so we cannot really complain about that. But what I can do is point out that once you enter the realm of magical thinking, of reasoning from worst principles, you make success almost impossible and you make your own working environment tense and unproductive. Who needs that?

No comments:

Post a Comment