05 April 2010

Documenting Source Code - How It Was Driven Home to Me

This happened almost two decades ago. It was my first startup. We had built a radar Moving Target Detector using four DSP processors, plus another DSP as a controller. I had taken it from proof of concept, to field trial, to ruggedized prototype. Over that time the team size had grown from one (me) to three.

The ruggedized prototype was now to be put through acceptance checks. We gave a final check and then packed it for transporting to the customer's site. When we assembled it, and put it into operation, the ouput would not settle down as it was meant to. That got me into a flap. I was sure that during transportation some damage had been done to the hardware. I had visions of doing a messy hardware debug on the customer's premises. Luckily, I decided to sleep on the problem.

The next day I asked one of my engineers if anything had been done to the code that was in the EEPROMs. He said that there was a long sequence of code which did nothing but write zeroes into the RAM. He had taken that out. The data memory was no longer initialized.

So was my engineer at fault? No, I was. This was code that I had written almost at the start of the project. The purpose of the code was clear to me. But I was no longer "in contact" with the code. And I had failed to document it.

I have since then, been rather fanatical about documenting the intent of any block of code. Documenting the WHAT [is intended to be acheived], and the WHY [it is important that it be acheived]. The HOW is not that important; the code should say that - unless of course the code is convoluted. But then one should not be writing convoluted code in the first place:-)

This incident led me to formulate my extension to Murphy's Law:
If anything can go wrong, it will - and it will happen in the presence of the customer!

03 April 2010

Toyota Recall: Hubris was the root cause

Quote from Under the Hood of Toyota's Recall: 'A Tremendous Expansion of Complexity'. Watch the video

Prof. Takahiro Fujimoto says:
(Quote)
I would probably say middle managers, particularly at headquarters, started to deviate from the Toyota Way by being arrogant, being overconfident, and also they started not to listen to the problems that customers raised. Toyota is a problem-finding, problem-solving company. This culture is still there in the factories and in product development centers. But in some parts of the headquarters, someone started to say, "Hey, this is our problem. I am responsible for finding my problems and solving my problems. It's not [for] you [outside Toyota] to find our problems."

Sometimes I'm critical of Toyota. But they get angry. They always say, "We want to find problems. So please, give us any clues on the problems you see." But if I actually say, "This is a problem for you," they say, "This is none of your business. We have to find the problem. Not you." This attitude was growing for some time, I think, in some parts of headquarters. That was very dangerous. It is a good time to correct this kind of attitude and go back to the basics of the Toyota system.
(Unquote)

02 April 2010

Phenomena, Hypothesis, & Defect

A few days ago my washing machine developed a problem. The phenomena was: The inflow of water would not stop when the power was off. Resulting in flooding.

A service engineer came and inspected the machine. He observed the phenomena. But then he went a step further and created a hypothesis: The water did not stop as the input valve was stuck at open because of sludge(I live in Noida). Then he formulated the corrective action: Chemical wash of the machine to clean out sludge. And that was all that went in his inspection report - no mention of the phenomena.

The workshop people came and collected the machine. The next day they returned a bright-as-new washing machine. But the reported phenomena was still observed! Obviously a chemical wash was not the correct corrective action.

Lesson: Always report the observed phenomena that is to be investigated. Hypotheses should be marked for what they are - reasonable guesses. Most importantly - do not specify a corrective measure without first establishing the defect.