Path: EDN Asia >> Design Centre >> Test & Measurement >> Scientific results evaluation: Rules of Thumb
Test & Measurement Share print

Scientific results evaluation: Rules of Thumb

10 Mar 2016  | Ransom Stephens

Share this page with your friends

In particle physics—where statistics are much easier to come by and systematic uncertainties are usually much lower than in the behavioural sciences—3-σ, or 0.3-ish% of a fluctuation, is considered "evidence" and 5σ or 6 σ, depending on how pivotal the result, is necessary for an experiment to use the word "discover." Even then, an independent experiment must confirm the result before it's taken seriously.

Effect of the people you hang out with on your opinions

Let's do a trickier example; a real one. Work cited in Susan Cain's book Quiet—a book I'm quite fond of because it argues that introverts (people who don't like lots of visitors) are superior to extroverts (like the queen)—presented evidence that our opinions are likely to conform to those of the people we hang out with.

The study involved 32 people who were given simple quizzes. When acting alone, they got the right answers 86% of the time. When they worked in groups that were seeded with moles who insisted the wrong answer was right, their success dropped to 59%. The study is pretty interesting, but all we care about here is the degree of significance of the results, that is, where to score the conclusions on the scale from inconclusive to conclusive, evidence to discovery.

With a group of 32 people, we start with a statistical uncertainty of 32 which is about 5.6, or 18%. If we repeat the experiment many times, we'd expect to have 5-6 people with much different demeanor in 30-ish% (1-σ) of experiments; 11 people of widely varying demeanor in 5-ish% (2-σ); 17 in 0.5-ish% (3-σ), and so on. For this experiment, the relevant component of demeanor is impressionability, both less and more impressionable.

Should I talk about this experiment at cocktail parties? Well, at the 4-σ level of significance, 27 of the 32 people would probably yield widely different results from the original 32, which is pretty bad. Now, if I look closely at the account in Quiet or in The New York Times, then I'd see that the research builds off of previous research and that the researchers are both careful and very clever. If I wade through the actual publication, Neurobiological Correlates of Social Conformity and Independence During Mental Rotation the conclusions are more convincing, but still, a sample of 32 people is not enough to assure that, if the experiment were repeated many times, the average results would be consistent with their result with those 32 people.

On the other hand, because their conclusion is consistent with my own flavour of "common sense," I'm likely to believe it. This is where we get into trouble. In empirical science, "common sense" is prejudice: believing a result because the underlying mechanism makes sense rather than because the measurement is significant is not science.

At this point it's useful to consider the systematic uncertainties: could the moles in the experiment—the people who insisted on the wrong answers—have telegraphed something to the subjects? Of course they could. To measure that, the experiment needs to be repeated with moles who insist on the wrong answers with different levels of vehemence and/or repeat the experiment in different coloured rooms and/or with different music playing and/or with MSNBC/Fox News in the background and/or on Monday, pre-coffee and/or anything to determine variations due to unknown, but inevitable, systematic biases.
Not-quite-full disclosure of the Rules of Thumb
These rules of thumb assume that the measurements follow what's called a Gaussian distribution; that the random processes assemble themselves into the so-called bell curve or normal distribution shown in Figure 2. This assumption is usually pretty good, but rarely exactly true. Good experimentalists make extensive measurements of backgrounds to determine the actual distribution and then apply rigorous statistical analysis to their results to determine their uncertainties. Measuring uncertainties usually takes much more effort than measuring the signal itself, which makes it all the more disheartening when science journalists neglect to report experimental uncertainties.

Next time, we'll play Find The Signal!

About the author
Ransom Stephens is a technologist, science writer, novelist, and Raiders fan.

 First Page Previous Page 1 • 2 • 3

Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.

Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming

News | Products | Design Features | Regional Roundup | Tech Impact