Quote Originally Posted by metric_taper View Post
Not as much as you would think, the guy was and is the best system engineer I ever worked with. He was an EE by schooling. But he always did a full ground up design before ever starting to make the machine. Really good at thinking through all the pitfalls. You should have seen his revision control for software builds, as well the whole software architecture.

He set up a lab experiment from my recall of him telling how he arrived at that. But that is not what drove 30Hz, the innerloop stability control laws of the aircraft was the final design requirement. But he measured how fast you could put and remove your finger from a switch, and the length of time it was closed at the shortest human pressing was within the 30Hz. Human response is real slow.

Hardware interrupts can be a real hard thing to debug. In this case it's a built in timer of the Arduino that generates the interrupt. You just have to be sure to write the service routine that does not hang.

I recall a bug that did get fielded on one of the autopilot models. There was a hardware failure that caused a common open collector interrupt pin to be held low (single interrupt pin with multiple devices). The processor kept servicing the interrupt. There was a comment in the code that said "if it gets here, there's a failure". Well no code got written to do something. It returned from the interrupt, only to start the service routine. The only issue the pilot complained about, was a slow roll over of the aircraft, as the code that writes the sample and hold circuits was not being updated. So leakage current of the storage capacitor did it's thing. They figured out quickly they had a failure. The FAA regulations call this a slow over failure. When they happen for no reason, and never get figured out, accidents can happen. I think the 737 had this issue with a Honeywell system. Normally the pilot can see something is up as the attitude display shows the wings not level. But I understand they had a few that went overspeed before the pilot corrected the problem and lost aluminum from the wings. Night time flying, over cloud cover, or areas like northern Canada with few street lights, and a slow over that does not get noticed, as the pilot is sleeping.
Your EE sounds like "an engineer's engineer." I can recall two or three in my entire career. The really good ones will share their thought processes.

My wife, a Computer Scientist, worked with a guy that would sit at his desk for hours on end and look at the ceiling. Eventually he would write some massive amount of code and it would work perfectly the first time. He drove his boss crazy.

As your story illustrates so well, some software bugs only become visible when there is also a hardware failure. It takes a profound understanding of the system to head off these kinds of problems.

As to the "if it gets here, there's a failure", I'm in the habit of typing "qaz" any place in my code that is left dangling. Periodically, I do a search for that unique string and clean them up. It has saved me many times.

Thanks for the great war stories.

Rick