This is an interesting tidbit from David Lykken’s “Professional Autobiography” that was once available on the webpage of the U of MN’s Psych department website:
When I was a graduate student circa 1950, I had a job for several months in the Student Counseling Bureau analyzing the returns from a “After High School What?” survey that one of the counseling faculty had administered to 57,000 seniors in Minnesota high schools. In the basement of Eddy Hall, I would run boxes of IBM cards, each bearing the responses of one student, through the IBM sorting machine. A few years later, when I was on the faculty myself, Paul Meehl and I used those data for our unpublished “crud factor” study in which we showed that, in psychology, everything is related to everything else, at least a little bit. We cross-tabulated all possible pairs of 15 categorical variables on the questionnaire and computed Chi-square values. All 105 Chisquares were statistically significant and 96% of them at p less than 10-6. Thus, we found that a majority (52%) of Episcopalians “like school” while only a minority (47%) of Lutherans do. Fewer ALC Lutherans than Missouri Synod Lutherans play a musical instrument.
What this silly-sounding study implies is that Group A is bound to differ from Group B on Variable X so that, if your theory predicts that A > B, you have about a 50:50 chance of confirming that prediction empiricallyat least if you have a large enough sampleeven if your theory is dead wrong.
Meehl used these data as illustrations in a 1967 paper in Philosophy of Science. He pointed out that the physical sciences, whose theories are strong enough to permit point predictions (Group A will average 125% of Group B’s score, rather than merely A > B), use significance tests in a way that is obverse to the way they are used in the soft sciences. Psychologists say, e.g., that X and Y will be correlated positively and, if that much proves true, then we try to “reject the null hypothesis” by showing that the correlation is so far above the zero or null point, that there is less than one chance in 20 (or more) that the true value of the correlation (which our obtained value estimates) could be as low as zero.
One unhappy consequence of this way of proceeding is that our conclusions become more suspect as our experiment gets better! If we use good, reliable measures of X and Y, then we are more likely to detect the (almost inevitable) correlation between them, and the larger our sample, the more likely it is that this detected correlation will be statistically significant, i.e., have a small enough sampling error and be far enough from zero to believe it really is not zero. A cheap, crappy experiment with poor measures and a small sample that can report a statistically significant result is therefore regarded as more persuasive than a good, big study!
I’ve added some bolding for emphasis. This form of data-mining isn’t really discussed anywhere, from what I can tell, in the popular press or even in academic settings.
Professor Lykken’s autobiography is worth a read.
Filed under: Numbers and Studies, Psychology, Statistics | Comments Off on Smaller Sample Size, Please