During WWII, the US Navy was trying to figure out how to armor their planes. You can’t armor the whole plane as this will make it heavy, but you can’t have too little armor either, as this will make the plane vulnerable.
The optimal point is somewhere in the middle where this tradeoff of armor vs speed is balanced. The mathematicians working with the US Navy gathered the data on where planes had been shot up and compiled the chart below.
The wings and central body seemed to be receiving all the enemy fire, so the initial idea was to reinforce the armor of those parts. However, Abraham Wald, a Hungarian Jewish mathematician who fled Austria to avoid persecution, pointed out that the armor should be placed in all the other parts of the plane: the engines, the nose, and anything that doesn’t have a red dot on the image above.
Wald realized that the data points they had at their disposal had been gathered from all the planes that made it back to the base. The fact that all those planes were shot in certain parts meant that those parts were the least likely to be fatal for the pilot and the plane.
The planes that were shot at the engine and the nose almost never made it back in one piece. So, in fact, those were the most vulnerable parts and that’s why they didn’t have any data from those planes.
Ward recognized the survivorship bias in their data: we tend to pay attention only to the successes and not the failures in many cases. The fact that you only hear stories about entrepreneurs who made it and never about the ones who failed (who might as well be the majority of the cases), or the athletes who thanks to the hard work became start, is because we tend to forget about all the data points that don’t make for a good story.
So, next time you’re asked to analyze a sample, think about Mr. Ward and how he saved many pilot lives by focusing on the whole data set.