GreyMatter

A Test of Significance

Andrew Downard’s writeup on iSixSigma.com – “To p or not to p” – offered some interesting insights into the relevance of statistical decision-making in business problems, in the context of Six Sigma.

I got thinking about this topic after reading an article in the Wall Street Journal about “sloppy analysis” in scientific studies… This situation sounded very familiar to me. In fact, substitute “Six Sigma” for “science” in the first sentence and I think the passage becomes even more true. We, as a Six Sigma community, rely far too much on formal tests of statistical significance to tell us what to do.

Statistical significance is nothing more and nothing less than a comparison of one thing to another. A comparison of a supposed “signal” to observed “noise” is the classic example. What gets forgotten is that when we experiment or otherwise collect data, we have complete and total control over what goes in both buckets. We decide what gets counted as signal, and what gets counted as noise. So the results depend entirely on how we sample. And there is no statistical test that can assess significance with this in mind, because it’s not a statistical question. It’s a practical one. Want better p-values? Sample differently. Want to make your F-test look good? Or bad? Change how you collect the data. Want that t-test to have a different result? Run the study again. Go ahead. Try. It’s really easy.

Too often, practitioners of Six Sigma rely on statistical tools and analyses as the foundation of their toolkit.  While I am a big believer in data-based decision making, I often find myself facing internal customers – folks who run the day to day business – presenting slide after slide on heavy duty statistical analyses ranging from the simple Chi Square test for statistical significance to full fledged DOEs and ANOVAs.  Typically, projects with more such slides are regarded as having followed the rigour so necessary to Six Sigma, by both practitioners and business users alike.

I would go so far as to venture that the world of Six Sigma practitioners may very well be divided into those who believe that “statistical analyses” is the essence of all good work in the field, and those who value sound, data-based, business understanding and analyses of the data at hand, without placing too much emphasis on knowledge of statistics. 

In this context, the writeup by Downard, and Dr. Ioannidis’ work (PDF) on which his writeup is based, provides a much-needed, albeit unpopular, perspective…

So what good are these tests of statistical significance? Well, for enumerative work on historical datasets they can be useful. But in the world of Six Sigma where we are charged with predicting the future behavior of a process, let me be clear: they aren’t much good at all. You should be making your own decisions on what is and isn’t significant in your data. This will be based on tolerance for risk and how well you have sampled the process, among other things. You need to fully understand the level of knowledge you have based on your sampling strategy, assess your confidence in your conclusions accordingly, and make the best decision you can about how to proceed based on the particular situation you are in.

As Downard writes in his concluding remarks, “Beyond some basic number-crunching, these are practical questions and concerns, not statistical ones.”