Recent argy-bargy about a failed replication has exposed a disturbing belief in some corners of psychological research: that one
experiment can be said to “conceptually replicate” another, even if it uses a
completely different methodology.
John
Bargh, a professor of psychology at Yale, made waves recently with a stinging attack on virtually everyone associated with a failed attempt to replicate one
of his previous findings. The specifics of this particular tango de la muerte can be found elsewhere, and I won’t repeat them
here, except to say that I thought Bargh’s misrepresentation of the journal PLoS One was outrageous, offensive, and an extraordinary own goal.
That
aside, Bargh-gate has drawn out a more important issue on the idea of “conceptual replication”. Ed Yong's article, and the comments beneath, exposed an unusual disagreement, with
some (including Bargh himself) claiming that Bargh et al.'s original findings had been replicated at length, while
others claimed that they had never been replicated.
How
is this possible? Clearly something is awry.
All
scientists, and many non-scientists, will be familiar with the basic idea of replication:
that the best way to tell whether a scientific discovery is real is to repeat the experiment that
originally found it. Replication is one of the bedrocks of science. It helps scientists
achieve consensus and it acts like an immune system, eliminating findings that
are irreproducible due to methodological error, statistical error or fraud.
It
also goes without saying that the most important aspect of replication is to repeat
the original experiment as closely as possible. This is why scientific journal articles contain a Method section, so that other scientists can precisely reproduce your experimental conditions.
Enter
the notion of “conceptual replication”. If you are a scientist and you’ve never
heard of this term, you are not alone. The other day I did a straw poll of my colleagues – who are mostly experimental psychologists and neuroscientists – and got blank looks in response.
The
basic idea is this: that if an experiment shows evidence for a particular
phenomenon, you can “conceptually” replicate it by doing a completely different
experiment that someone – the experimenter, presumably – believes measures a broadly
similar phenomenon. Add a pinch of assumption and a healthy dose of subjectivity,
and viola, you’ve just replicated the
original ‘concept’.
I
must admit that when I first heard the term “conceptual replication”, I felt
like shining a spotlight on the clouds and calling for Karl Pilkington. Psychology
is already well known for devaluing replication and we do ourselves no favours
by attempting to twist the notion of replication into something it isn’t, and
shouldn’t be.
Here are four reasons why.
Here are four reasons why.
1. Conceptual replication is assumption-bound and subjective
From a logical point of view, a conceptual replication can only hold if the different methods used in two
different studies are measuring the same phenomenon. For this to be the case, definitive evidence must exist that they are. But how often does such evidence exist?
Even if we meet this standard (and the bar seems high), how similar must the methods be for a study to qualify as being conceptually replicated? Who decides and by what objective criteria?
2. Conceptual replications can be “unreplicated”
A reliance on conceptual
replications can be easily shown to produce absurd conclusions.
Consider
the following scenario. We have three researchers, Smith, Jones, and
Brown, who publish three scientific papers in a sequence.
Smith
gets the ball rolling by showing evidence for a particular phenomenon.
Jones
then comes along and uses a different method to show evidence for a phenomenon
that looks a bit like the one that
Smith discovered. The wider research community decide that the similarity
crosses some subjective threshold (oof!) and so conclude that Jones
conceptually replicates Smith.
Enter
Brown. Brown isn’t convinced that Smith and Jones are measuring the same
phenomenon and hypothesises that they could actually be describing different phenomena. Brown
does an experiment and obtains evidence suggesting that this is indeed the case.
We
now enter the ridiculous, and frankly embarrassing, situation where a finding
that was previously replicated can become unreplicated. Why? Because we assumed without
evidence that Smith and Jones were measuring the same phenomenon when they were
not. It’s odd to think that a community of scientists would actively engage in
this kind of muddled thinking.
3. Conceptual replications exacerbate confirmation bias
Conceptual
replications are vulnerable to a troubling confirmation bias and a logical
double-standard.
Suppose
two studies draw similar conclusions using very different methods. The second
study could then be argued to "conceptually replicate" the first.
But suppose the second study drew a very different conclusion. Would it be seen to
conceptually falsify the first study? Not in a million years. Researchers would immediately point to the multitude of differences in
methodology as the reason for the different results. And while we are all busily congratulating ourselves for being so clever, Karl Popper is doing somersaults in his grave.
4. Conceptual replication substitutes and devalues direct replication
I
find it depressing and mystifying that direct replication of specific experiments in
psychology and neuroscience is so vital yet so grossly undervalued. Like many
cognitive neuroscientists, I have received numerous rejection decisions over
the years from journals, explaining in reasonable-sounding boilerplate that their decision "on this occasion" was due
to the lack of a sufficiently novel contribution.
Replication has no place
because it is considered boring. Even incremental research is difficult to publish.
Instead, reproducibility has been trumped by novelty and the quest for breakthroughs. Certainty has given way to the X factor. At dark
moments, I wonder if we should just hand over the business of science to Simon
Cowell and be done with it.
The upshot
First,
we must jettison the flawed notion of conceptual replication. It is vital to seek converging evidence for particular phenomena using
different methodologies. But this isn’t
replication, and it should never be regarded as a substitute for
replication. Only experiments can be replicated, not concepts.
Second,
journals should specifically ask reviewers to assess the likely reproducibility
of findings, not just their significance, novelty and methodological rigor. As
reviewers of papers, we should be vocal in praising, rather than criticising, a
manuscript if it directly replicates a previous result. Journal editors should
show some spine and actively value replication when reaching decisions about
manuscripts. It is not acceptable for the psychological community to shrug it's shoulders and complain that "that's the way it is" when the policies of journal editing and manuscript reviewing are entirely in our own hands.
Psychology can ill afford the kind of muddled thinking that gives rise to the notion of conceptual replication.
The field has taken big hits lately, with prominent fraud cases such as
Diederik Stapel producing very bad publicity. The irony of the
Stapel case is that if we truly valued actual replication, rather than Krusty-brand replication, his fraud could have been
exposed years sooner and before he had made such a damaging impact. This makes me wonder how
many researchers have, with the very best of intentions, fallen prey to the
above problems and ‘conceptually’ replicated Stapel's fraudulent discoveries?
I can say that I also had never heard of the idea of "conceptual replication" and agree that the idea is fatally flawed for the reasons that you clearly spell out.
ReplyDeleteScience clearly has good reason to give incentives for novel, ground-breaking findings that push the boundaries of science. Thus, it makes sense to reward scientists who produce such findings. However, a flood of novel findings needs a further process to select which ones actually hold up to long-term scrutiny and comparison with other evidence. This is the role of replication (and replicate and extend). But this can only happen when researchers are given incentives which are aligned with that goal. The balance of incentives has tipped to far in the direction of novelty. There is room for both incremental, replication work as well as broadly creative and novel work within science. We need both in a balance. But in order to get both, we need to fund and reward both!
Thanks for commenting, Joe.
DeleteI agree that we need incentives to pursue direct replications because at the moment all we have are powerful disincentives.
So how can we encourage replication...What about a replication index for specific journals (or even specific researchers) that reports the extent to which specific experiments and key findings are replicated? Or, because scientists flock to any publication with the word "Nature" in the title like moths to a flame, how about a new journal Nature Replications? Or a funding agency specifically set up to fund direct replications of existing work?
I hadn't hear of conceptual replication until this recent kerfuffle either. I had always referred to this process as called "converging evidence" which I think is a better term for it. The evidence relates, it converges on a concept, but it doesn't try to leverage the word "replication" for increased credibility.
ReplyDeleteThanks for this post, I am going to add it to my link roundup and storify of the whole Bargh/Doyen business. In retrospect, I found myself defending conceptual replication by simply reframing it in my head as the value of converging evidence in science (whatever name we use for it). When Ed Yong had his doubts about weak studies propping each other up, he could have just as easily been worrying about converging evidence (using different measures, or different manipulations). But I agree that calling it "replication" is not kosher, and devalues direct replication (which already has a hard time anyways with current publication biases).
Thanks Cedar, I enjoyed your very thorough round-up piece - and thanks for including a link back to my blog.
ReplyDeleteBargh's comments are sometimes extremely arrogant and nasty personal attacks on Doyen et al. - and I think this does him a huge disservice by focusing us folks on how personnally threatened he is, instead of on the content of his remarks. But set aside that, I'm not at all convinced that (1) he's wrong about what he criticizes in Doyen et al.'s paper and (2) he misrepresents the journal PLos one. Which means I think you're wrong (:-) ) when you're finding him "outrageous, offensive, and an extraordinary own goal" about PLoS one.
ReplyDeleteHow do you think we should protect our science from journals that publish stuff that's not reviewed by professionals in our fields? I know there are endless power and political issues with publishing work that's against the tide, and the Doyen work possibly ended up here because it didn't have enough influence to pass through the gates of the political interests of powerful people in the field of semantic priming. But still, I don't trust PLoS one. I don't trust it because I expect more than a "test of scientific soundness" from a journal that expects to be taken seriously. I expect it to have reviewers that know what I mean when I say "social cognition", for example, not reviewers who look at my 2 x 2 design and say ok, she's done it right. As a scientist, I need reviews from my peers, preferably people who are not offended when I show their work was crap, to sieve my work and set aside my own biases and the junk I can infuse my work with. So only from this point of view (there may be tons of others), I need reliable peer-review. Then when someone points at a lack of peer-review, why does it make more sound than it should??
Convergent evidence and conceptual replication are the same thing. Sure, replication is synonymous with duplication, cloning, copying, mirroring, etc., but the term conceptual replication doesn't hide that it's not an exact replication. If you haven't heard the term before, you might be puzzled about its definition, but you know it's something along the lines of, "replication, but not quite."
ReplyDeleteYou're right to criticize the paucity of replicatons of specific effects (e.g., exact replications), but conceptual replications (i.e., convergent evidence) are necessary for scientific progress.
Convergent evidence and conceptual replication are the same thing. Sure, replication is synonymous with duplication, cloning, copying, mirroring, etc., but the term conceptual replication doesn't hide that it's not an exact replication. If you haven't heard the term before, you might be puzzled about its definition, but you know it's something along the lines of, "replication, but not quite."
ReplyDeleteYou're right to criticize the paucity of replicatons of specific effects (e.g., exact replications), but conceptual replications (i.e., convergent evidence) are necessary for scientific progress.