BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Learning from Bugs and Testers: Testing Boeing 777 Full Flight Simulators

Learning from Bugs and Testers: Testing Boeing 777 Full Flight Simulators

Key Takeaways

  • A bug is an opportunity to learn how it made its way into the product.
  • Dashboards could be misleading: Choose your KPI wisely and measure the right thing.
  • Sometimes you don’t need to code or use the product to find a bug; a poorly written requirement might generate several bugs.
  • Your software might be used in a very unexpected way. Do you have the same definition of corner cases as your customers?
  • Never miss an opportunity to learn: New tools, new languages, new techniques, read bugs from other teams, read books. Be curious.

The aviation industry is taking appropriate measures to ensure that every airplane is operated safely. That industry sometimes learned it the hard way, and over time they have developed the habit of scrutinizing every reported event in order to prevent another occurrence. This is achieved by understanding the root causes and suggesting changes to design, process, or better training—namely working on the three Ps: People, Process, Product. In this article, I go over a couple of noticeable accidents and show you techniques that could be applied to software development.

How aviation incidents and accidents are analyzed?

Let’s start with TK 6491.

A Boeing 747 cargo was landing in fog condition. Unfortunately, they didn’t properly check their flight path during the approach, and instead of landing at the beginning of the runway, they crash-landed in a small village at the end of the runway killing people in their houses and the four pilots. 

When we want to debug code, we usually look at the logs. For airplanes, we have the same concept: Flight parameters, sounds from the cockpit, and other data sources are recorded in what is publicly known as “black boxes” (flashy orange in fact). Below are the two “black boxes” recovered from the crash. In software, that would be equivalent to a system that collects data during operation, including logs, screenshots, and notes taken by an operator. 

Are tools like Elasticsearch, Kibana, etc. helping you quickly and efficiently find in your logs or data lake what you need? Are you logging way too much information? This is the reason why the black boxes themselves evolved: They needed to log the right set of information to conduct a proper investigation, survive harsh conditions, and be recoverable even in deep seas. They came with a compact, resilient, standard and efficient new generation of black boxes.

Flight data and cockpit voice recorders recovered from crash sites.

After recovering the black boxes and their content, a group of experts will analyze the flight data and cockpit sounds and, like detectives, they will try to write the scenario based on observations. For that purpose, they either use flight simulators (desktop version) or full cockpit versions (used for training pilots) and various software tools to process data.

After data processing, they should be able to reconstruct a timeline of the event, most of the time in a form of plots with an overlay of the cockpit sounds recovered. The plot below shows TK 6491’s flight profile with other interesting parameters like main flight control positions and engine settings. We clearly see the autopilot (blue line) deviating from the expected flight path (green line).

In short, the process is equivalent to what is done on a crime scene: Don’t touch anything (if possible), gather evidence, take pictures, make movies, get data from all possible sources, find witnesses.

Collecting and confronting data is an important part of the process. If a sensor is defective, you might not be able to exploit a specific parameter. You then will have to derive the missing data from another source. The hardest investigation on a Boeing 777 I recall was in London, UK; both engines stopped working very close to the runway during landing. The airplane crash-landed. The problem was found after numerous simulations and efforts. The culprit was ice forming in the fuel filters, creating reduced fuel flow to the engines when they needed high fuel flow. The ice melted after the crash leaving almost no evidence in the fuel system. 

When an investigation comes to an end, recommendations and sometimes changes in the regulation are introduced. One of the most recent rules introduced was due to the Germanwings crash: The flight deck must be occupied by two persons at all times. If one pilot gets out of the cockpit then another crew member has to come in. (Ref: 4U 9525).

A final investigation report example from Transport Canada can be found in the link below: An Air France A340 during landing overshoot the end of the runway, went into a ditch, and caught fire. Everybody survived. Transport Canada report, AF 358.

The recommendation from Transport Canada was: 

The Department of Transport requires that passenger safety briefings include clear direction to leave all carry-on baggage behind during an evacuation.

Unfortunately, the same problem occurred killing 41 passengers because people took their carry on luggage with them during the evacuation. (Ref SU 1492

I have published an article (in French) about safety onboard. If you aren’t fluent in French at least the videos should give you an idea of why we are asking passengers to remain seated and strapped until we shut down the engines, keep your seatbelt on at all times, and so on. (Ref Folding your tray, window shades up: Why so important?)

What we can learn is that applying a fix to a Process might not be sufficient. You might also need to work on “People” and/or the “Product.” (As a reader, you could contribute and enhance the safety record by spreading the word about emergency evacuation briefings during take-offs and landings.)

The half-billion and $375 million bugs

Every failure is an opportunity to learn something. Finding and fixing a bug is great, but understanding why the bug is there is where you learn even more. It might be a requirement that wasn’t properly formulated, implemented, tested, or anything else. 

As a software tester, imagine that a bug you have reported was caused by an untrapped exception in a division by zero. If you saw that bug several times or if your software is mission-critical then it is worth asking every team to check if they have encapsulated all code where they perform a division with a “try catch” statement. The Ariane 5 Rocket blew up because of untrapped exceptions: There go half-a-billion dollars and a dent in reputation. 

The US Air Force almost lost an F-22 squadron (state-of-the-art fighter jets valued $350 million a pop) when they flew over the International Date Line (IDL) leaving the pilots with only flight controls (dedicated and separated computers). When you cross the IDL the “time” skips by one day depending on your direction of travel (F22 Raptor International date line). 

A space probe (Mars Polar Lander) crashed on Mars because the software testers forgot to test what would occur when sensors are sending erroneous data to the mission computer. 

Another martian space probe ($125 million) was lost because one team used a metric system while the other didn’t. 

A transport troop Airbus A400M crashed near Seville, Spain because shortly after takeoff the software shut down 3 out of the 4 engines.

A Boeing 777 crash-landed short of the runway in Los Angeles (Asiana 214). That one is close to my heart because I noticed a couple of years before the accident an “odd” feature in the 777. It was odd to me because based on the level of automation found in that cockpit, the use of that feature required a serious understanding of cockpit automation and aircraft operation. I probed, at that time, in the 777 community who considered my findings (a mode where speed protection is removed) not relevant. It wasn’t a bug per se because the feature was implemented as designed and to change that I would have had to fight with Boeing and maybe ask for assistance from the aviation authorities like FAA or EASA. At least more training is now given around that feature to alert the crew. The bottom line here is when you are a tester you need to explore the product and push it beyond the limits to discover where the limits really are.

Testing full flight simulators (FFS) software

I would like to drive you into my world: Flight simulation. Long story short, when I was an airline pilot I had to be trained on flight simulators on a regular basis. I ended up working 10 years for the company making those flight simulators.

Testing flight simulators is a classic process where you perform static code analysis, unit testing, integration testing, and finally, you go in the cockpit and spend a whole month applying testing procedures derived from the real airplane for every system.

For debugging, we can look at the logs from the simulation and also at captured flight parameters as we would do with black boxes used during a crash investigation. Aircraft equipment like ARINC bus analyzers can be used to tap data on the avionic buses but mostly logs and debuggers are your best friends. On the Boeing 777, there is a dedicated computer (called the MAT) installed into the airplane, used by airline maintenance personnel. That computer can also be connected to the simulator. My point here is that the simulation is so precise that the real test equipment can’t tell the difference between the simulator and the real airplane. As an example, this is a video showing an air conditioning test using the MAT. We can perform that test in the simulator because some of the code executed in the simulator is the same code that runs in the airplane.

The challenge is the complexity and knowledge required. In a flight simulator, almost every system is simulated: Fuel, anti-icing, electrical, engines, radar, radios, brakes, hydraulic, pneumatic, etc. But we also have to simulate “the world” with all the physics coming with it: Radio waves propagation, exact positioning of the GPS satellites in the sky, markings on the runway, hydroplaning, fuel expansion due to temperature, ice accretion, landing gear sound, battery discharge, turbine blade expansion, pressure changes in the fuel system due to altitude and many (many) more. The tester needs to know about real-time systems, software, mechanics, aerodynamic, turbine engines, hydraulic, physics, avionics, networks, electronics, etc., and also fly the machine. In my case, my experience as an airline pilot was a real bonus.

The other challenge is the number of test requirements. The aviation authority requests to apply their test requirements (647 pages of test requirements) on top of the “internal” tests that we are doing, a mandatory process to bring the simulator to certification. It is only when certified that an airline is entitled to use that specific simulator. So how do we do that?

For flight performance, we inject flight parameters into the simulation and compare the dynamic outputs from the simulator with flight parameters recorded from the real airplane. 

For aircraft systems (fuel, landing gear, and so on) we apply a mix of procedures issued from maintenance manuals, system checkout documents, pilot experience, and some tests derived from the aviation authority requirements.

For the visual system, we are using pictures from the actual airport, tools for measuring the geometry, and official drawings from the airport. The simulator has to behave exactly like a real airplane and the crew shouldn’t see any difference between what they see in the simulator and what they would see at the airport while sitting in their cockpit (with few exceptions). 

To tie-up with the TK 6491 accident scene at the very beginning, the simulator has a sufficient level of fidelity, so we can reproduce into the simulator the fatal moments of that flight. In that specific case, what helps us is the implementation in the simulator of antennas and how radio waves are propagated. 

To put you back in context, one of the causes of the TK 6491 accident was the autopilot capturing the wrong beam from the antenna that guides the plane onto the ground—the glide slope antenna. This should have been noticed by the pilots but it wasn’t. That antenna is installed near the end of the runway, acting as a lighthouse in the vertical plane. “Follow the light beam and it will guide you toward the end of the runway.” But glide slope antennas aren’t like flashlights.

In the picture below, you can see the beams from the glide slope antennas. Note the real signal that pilots should watch for (True glide slope) and also the other radiated signals. It is physics we can’t do much about it.

 

 

 

Ref antennas polar plots: How antennas radiate signals

The takeaway is that a simulator can be used for investigation because it mimics in many aspects the real airplane and this demonstrates how much more complicated it could be to test such a flight simulator.

The role testers can play in business improvement

We are now moving on to the core of the subject—learning from failures. As a tester, I can tell that all the developers I have met are very keen on fixing the bugs that I have reported. The bug is analyzed, code changed, put into a release, and if everybody is happy we can close the issue in Jira with the reason “Coding error” or “Requirement unclear.”

But we rarely force a team to understand how that bug made its way into the product and this is where observations done by testers could be very useful. Testers are using the application or the product often, they see all kinds of problems, (deployment tools, broken pipelines, requirements, bugs, servers down, etc.), and they usually have a very good feeling about what the product should and shouldn’t do. They are located at an ideal observation spot where they can see and use what has been manufactured.

I have a concrete example from the 777. That airplane is no different from a car. When you order it, you can select a lot of options (in the thousands magnitude). Some of those options are hardware, others are software, meaning that it is either a bit that needs to be toggled somewhere to control an option or you might need to install extra hardware like a switch in the cockpit or blue carpet in the galley. Ultimately, most of them will have an impact on the simulation: Carbon brakes versus steel brakes, another galley, etc. When I was executing tests inside the simulator, I often bumped into issues that were related to misconfiguration: A missing switch here, a missing indication on the pilot screen there, and so on. As a tester, the process is simple—report the issue, wait for a fix, and retest—but I went the extra mile. After investigating why those issues were there, I discovered that we didn’t have a synthetic document with all the options. Usually, a customer asks for a simulator “like that plane,” serial number xyz. To fix the root cause of the same kind of problem (misconfiguration), I would now ask for each customer’s copy of their 1400-page “Quick user Guide” from their specific airplane, skim the document, and circle in red what options they had. A software specialist would then process my markings and derive the appropriate options settings that I would later check in the simulator to confirm proper configuration. Here we go, I have created a process that starts right after the contract award and well before any lines of code are put together.

Now, let’s do the math. To process one bug, you need to document it, enter it, debrief it, groom it, plan it, execute, retest, document in release notes for production, etc. To be fair, make it 10 hours of work per bug. If you have 50 issues, you have already lost 500 hours.

If your testers are “just testing, writing bugs, and closing them,” then something might be wrong. Some testers love to do that while some others are ready and willing to suggest improvements not only on the product but sometimes on the process or about people. Testing is treated as an expenditure and any improvement is felt as more expenditure, so take the opportunity to learn from your bugs. 

Here is an example where you might want to listen to a tester. He complains about response time from your online store. Before rejecting his complaint, it might be worth knowing how many customers are leaving the website with a cart full of items because the checkout process is just too long or requires too many clicks.

The takeaway—A tester might be able to detect patterns based on his experience with the product and might be able to suggest improvements: “Coding error” too often = Select a training on how to code for that team. Give testers some time and appropriate tools for data mining in bugs and they will come back with something useful. But be supportive because some items might need to be fixed in areas way out of reach for a tester,  at least in a large company.

Dashboards and test results

I don’t appreciate dashboards in general. If a manager wants to have a status about the product, either he should talk to me or read my report. 

A concrete case: I was to test takeoffs and landings under various wind conditions and runway contaminants (water, snow, ice, slush). Unfortunately, the engines weren’t working (impossible to start due to the wrong software/hardware configuration in the cockpit panel). Instead of wasting my night shift, I decided to perform all the tests in the air with no engines running, (that’s a part of our training as pilots to fly a 260-ton bird with both engines dead—ask Captain Sullenberger in the Hudson River!), and some tests on the ground with external power connected to the airplane. 

In the morning, when I was asked “how was my testing,” I reported that it was good— not what I intended to do, but at least I had tested something else. Under those circumstances, I would have reported 135 checks failed out of 220 because the engines weren’t running and the dashboard would have been showing more red than green. What would your reaction be when looking at those numbers? Takeoffs and landing are really bad in that simulator!

Another point with dashboards: Did someone ever ask you how much effort (pain most of the time) it took you to bring all those tests to “green”? How was it compared to the previous run/product? “Jenkins was down, I didn’t have the proper version to pull out from Git, the database content with synthetic data wasn’t ready, I had to manually create them, the disks were almost full because logs weren’t purged” and the list can go on. 

I usually present the following graph at conferences. It looks nice in terms of progress done on testing after each sprint. That graph is from a fictitious project where about 10 devs are working on a typical web store kind of application. Management asked for that graph because each test is a feature that has been developed to make it simple. 

Yes, at first glance it looks nice and encouraging. But it doesn’t mean much because other data are missing. If I now show the next graph, then we can draw out something more interesting, such as the testing progress is slower than development. Of course, it is a simplistic example, but my point is that a graph can be misleading.

The takeaways are:

  • I don’t completely reject dashboards but before promoting a dashboard, I would like to know what people interested in that dashboard are trying to measure, simply because as a test expert I might have a better, more accurate, more suitable indicator to offer.
  • Flight computers are using several redundant sources of data and fuse them to evaluate the validity of what they measure. We should do the same and ensure that we can corroborate data to confirm that what we see actually makes sense.

Lessons learned from testing the Boeing 777 flight simulator software

At least I have learned the airplane inside out! But more importantly, I learned that:

  • Communication skills and faceto-face contact are important. I had a large network in the company: Wiremen, electricians, people from avionics, visual, sound, motion, flight controls, engines, etc., and instead of just writing a Jira ticket I was visiting them, speaking face to face about an issue. I helped the others debugging code. (I wasn’t anymore the “test pilot who writes tickets.”) And in return, they were supportive, ready to come into the simulator, find me an aircraft part somewhere. I shook about 450 hands when leaving! 
  • Scrumfall can work. And it worked in our case, but that is another long story to tell.
  • In North America, it is OK to say “NO” as a tester (especially when you put your signature on official documents from the aviation authorities). Saying “no” can trigger a discussion and a negotiation. “No, I can’t perform all those tests in 3 days. Either you take a risk here and there, or you give me resources, or you review your plan. I suggest shallow testing on that module because, in seven months, I never saw a bug in that module. Is that risk acceptable?”
  • A good manager doesn’t care too much about dashboards—he cares about removing roadblocks so his team can get the job done. 
  • Involving testing as early as possible in the requirement/contract phase pays off in the long run; a software architecture might look superb but could be impractical to test. A requirement as seen by the tester might be too complex or debatable, while seeming clear to the author. Most of the time, the tester has to deal with the customer and the product, not the author of the requirements.
  • Test plans. I stopped investing too much in them. A project rarely goes as planned—the same with the test plan. Instead, I evaluate the situation and find the most appropriate tool/method for what is on my plate. And the situation will change anyhow in two months. 
  • Set up the right tools for data mining. Ensure that instead of “software error” or “coding error” you can enter things like “Pointer went out of array bounds in a loop” or “Reached timeout while listening on socket.” 

Conclusion

Investigating a problem up to the root causes is beneficial because it is a way to prevent the occurrence of the same kind of error. Of course, you can’t afford to do that for each bug encountered and this is why each bug needs to be properly documented to facilitate data mining. The aim of data mining is to create bug classes and perform root cause analysis on the classes and on a sample of bugs from each class to see if it leads to the same root causes. For root cause analysis, a simplified approach is the five whys or the fishbone model. Another success during my time on the 777 was the adoption of a mix of waterfall and scrum but, as I wrote before, that subject will deserve a talk in one of the testing conferences.

About the Author

Alexandre Bauduin is a 53-year-old world traveler. He worked in consulting firms gaining experience in several fields (medical, manufacturing, aerospace, pay TV, data warehouse—to name a few) in different countries (Switzerland, France, Spain, Canada, etc.) His career started in the space industry where he discovered his passion for aerospace, working on both military and civilian projects. He was sometimes steered away from aerospace but his passion pushed him to become an airline pilot, as a way to really understand how those instruments he programmed and integrated were operating in a cockpit. One of his last challenges was to organize flight simulator testing into a lean manufacturing environment. He works with milling machines, draftsmanship, accounting and finance, software development, electronic design, and industrial robots, and it is always fun for him to use an oscilloscope, an ARINC bus analyzer, and step into assembly language or stall a Boeing 777!

Rate this Article

Adoption
Style

BT