It’s been quite a
month in science.
On the bright side, we
probably discovered the Higgs boson (or at least something that smells
pretty Higgsy), and in the last few days the UK
Government and EU
Commission have made a strong commitment to supporting open-access
publishing. In two years, so they say, all published science in Britain will be
freely available to the public rather than being trapped behind corporate
paywalls. This is a tremendous move and I applaud David Willetts for his
political courage and long-term vision.
On the not-so-bright
side, we’ve seen a flurry of academic fraud cases. Barely a day seems to pass without yet another researcher caught
spinning yarns that, on reflection, did sound pretty far-fetched in the first
place. What’s that? Riding up rather than down an escalator makes you
more charitable? Dirty bus stops make you more racist? Academic fraudsters are
more likely to have ground-floor offices? Ok, I made that last one up (or
rather, Neuroskeptic
did) but if such findings sound like bullshit to you, well funnily enough
they actually are. Who says science isn’t self-correcting?
We owe a great debt to
Uri Simonsohn, the one-man internal affairs bureau, for judiciously uncovering at
least three cases of fraudulent practice in psychological research. So far
his investigations have led to two resignations and counting. Bravo. This is a
thankless task that will win him few friends, and for that alone I admire him.
And as if to remind us
that fraud is by no means unique to psychology, enter the towering Godzilla of
mega-fraud – Japanese anaesthesiologist, Yoshitaka Fujii, who
has achieved notoriety by becoming the most
fraudulently productive scientist ever known.
(As an aside, has
anyone ever noticed how the big frauds in science always seem to be perpetrated by
men? Are women more honest or do they just make savvier fraudsters?)
Along with all the
talk of fraud in psychology, we have had to tolerate the usual line-up of ‘psychology
isn’t science’ rants from those who ought to learn something before setting hoof to keyboard.
Fortunately we have Dave Nussbaum to sort these guys out, which he does
with a steady hand
and a sharp blade. Thank you, Dave!
With psychological
science facing challenges and shake-ups on so many different fronts, the time seems ripe for
some self-reflection. I used to believe we had a firm grasp on methodology and best practice. Lately I’ve
come to think otherwise.
So here’s a dirty
dozen of suggested fixes for psychology and cognitive neuroscience research
that I’ve been mulling over for some time. I want to stress that I deserve no
credit for these ideas, which have all been proposed by others.
1. Mandatory inclusion of raw data with manuscript submissions
No ifs. No buts. No hiding
behind the lack of ethics approval, which can be readily obtained, or the
vagaries of the Data Protection Act. Everyone knows data can be anonymised.
2. Random data inspections
We should conduct
fraud checks on a random fraction of submitted data, perhaps using the
methodology developed by Uri Simonsohn (once it is peer reviewed and judged
statistically sound – as I write this, the technique hasn’t yet been
published). Any objective test for fraud must have a very low false
discovery rate because the very worst thing would be for an
innocent scientist to be wrongly indicted. Fraudsters tend to repeat their
behaviour, so the likelihood of false positives in multiple independent data
sets from the same researcher should (hopefully) be infinitesimally small.
3. Registration of research methodology prior to publication
Some time ago, Neuroskeptic
proposed that all publishable research should be pre-registered prior to being
conducted. That way, we would at least know from the absence of published
studies how big the file-drawer is. My first thoughts on reading this were: why
wouldn’t researchers just game the system, “pre” registering their research
after the experiments are conducted? And what about off-the-cuff experiments conjured up over a beer in the pub?
As Neuroskeptic points
out, the first problem could be solved by introducing a minimum 6-month delay
between pre-registration and data submission. Also, all prospective co-authors of a
pre-registration submission would need to co-sign a letter stating that the research
has not yet been conducted.
The second problem is
more complicated, but also tractable. My favourite solution is one posed by Jon Brock. Empirical publications could be divided
into two categories, Experiments and Observations. Experiments
would be the gold standard of hypothesis-driven research. They would be pre-registered
with methods (including sample size) and proposed analyses pre-reviewed and unchangeable
without further re-review. Observations would be publishable but have a lower weight. They could be submitted without
pre-registration, and to protect against false positives, each experiment from
which a conclusion is drawn would be required to include a direct internal
replication.
4. Greater emphasis on replication
It’s a tired cliché,
but if we built aircraft the way we do psychological research, every new plane
would start life exciting and interesting before ending in an equally exciting fireball. Replication
in psychology is dismally undervalued, and I can’t really figure out why this is when
everyone, even journal editors, admit how crucial it is. It’s as though
we’re trapped in some kind of groupthink and can’t get out. One solution,
proposed by Nosek, Spies and Motyl,
is the development of a metric called the Replication Value (RV). The RV would
tell us which effects are most worth replicating. To quote directly from their
paper, which I highly recommend:
Metrics to
identify what is worth replicating. Even if valuation of replication increased, it is not feasible – or
advisable – to replicate everything. The resources required would undermine
innovation. A solution to this is to develop metrics for identifying Replication
Value (RV)– what effects are more worthwhile to replicate than others? The
Open Science Collaboration (2012b) is developing an RV metric based on the
citation impact of a finding and the precision of the existing evidence of the
effect. It is more important to replicate findings with a high RV because they
are becoming highly influential and yet their truth value is still not
precisely determined. Other metrics might be developed as well. Such metrics
could provide guidance to researchers for research priorities, to reviewers for
gauging the “importance” of the replication attempt, and to editors who could,
for example, establish an RV threshold that their journal would consider as
sufficiently important to publish in its pages.
I think this is a
great idea. As part of the manuscript reviewing process, reviewers could assign an RV to
specific experiments. Then, on a rolling basis, the accepted studies that are
assigned the highest weightings would be collated and announced. Journals could
have special issues focusing on replication of leading findings, with specific labs
invited to perform direct replications and the results published regardless of
the outcome. This method could also bring in adversarial collaborations, in which
labs with opposing agendas work together in an attempt to reproduce each other’s results.
5. Standardise acceptable analysis practices
Neuroimaging analyses
have too
many moving parts, and it is easy to delude ourselves that the approach which ends up ‘working’ (after countless reanalyses) is the one we originally intended. Psychological analyses have fewer degrees of freedom but this is still a
major problem. We need to formulate a consensus view on gold standard
practices for excluding outliers, testing and reporting covariates, and
inferential approaches in different situations. Where multiple legitimate options exist, supplementary
information should include analyses of them all, and raw data should be
available to readers (see point 1).
6. Institute standard practices for data peeking
Data peeking isn't necessarily bad, but if we do it then we need to correct for it. Uncorrected peeking runs riot in
psychology and neuroimaging because the pressure to publish and the dependence
of publication on significant results has made chasing p-values the norm. We can
see it in other areas of science too. Take the Higgs. Following initial
hints at 3-sigma last year, the physicists kept adding data until they reached
5-sigma. The fact that their alpha is so stringent in the first place provides
reassurance that they have genuinely discovered something. But if they peeked
and chased then it simply isn’t the 5-sigma discovery that was advertised. (As
a side note: how about we ditch Fisher-based stats altogether and go Bayesian? That way we can actually test that pesky null hypothesis)
7. Officially recognise quality of publications over quantity
Everyone agrees
that quality
of publications is paramount, but we still chase quantity and value
‘prolific’ researchers. So how about setting a cap on the number of publications each
researcher or lab can publish per year? That way we would truly have an incentive to make sure of results before publishing them. It would also encourage us
to publish single papers with multiple experiments and more definitive
conclusions.
8. Ditch impact factor and let us never speak of it again
As scientists who
purportedly know something about
numbers, we should be collectively ashamed of ourselves for being conned by journal impact factors
(IF). Nowhere is the ludicrous doublethink of the IF culture more apparent
than in the current REF, where the advice from
universities amounts to “IF of journals is not taken into account in assessing
quality of your REF submissions” while simultaneously advising us to “ensure that
your four submissions are from the highest impact journals”. Complete with
helpful departmental emails reminding us which journals are going up in IF (which is all of them as far as I can tell), the situation really is quite stupid
and embarrassing. Here’s a fact shown by Bjorn Brembs: IF
correlates better with retraction rate than citation rate. We should replace
IF with article-specific merits such as post-publication ratings, article citation
count, or – shock horror – considered assessment of the article after reading the
damn thing.
9. Open access publication
Much has been said and
written in the last few days about open access, with the Government making
important steps toward an open scientific future in the UK (I recommend following the blogs of Stephen Curry and Mike Taylor for the latest developments and
analysis). For my part, I think the
sooner we eliminate corporate publishers the better. I simply don’t see what
value they add when all of the reviewing and editing is done by us at zero cost.
10. Stop conflating research inputs with research outputs
Getting a research grant is great, but we need to stop
counting grants as outputs. They are inputs. We need to start assessing the quality
of science by balancing outputs against inputs, not by adding them together.
11. Rethink authorship
Academic authorship is
antiquated and not designed for collaborative teams. By rank-ordering authors
from first to last, we make it impossible for multiple co-authors to make a genuinely equal
contribution (Ah, I hear you cry, what about that little asterisk that flags
equal contributions? Well, sorry, but…um…nobody really takes much notice of those).
I think a better
approach would be to list authors alphabetically on all papers and simply
assign % contributions to different areas, such as experimental design,
analysis, data collection, interpretation of results, and manuscript
preparation. Some journals already do this in some form, but I would like to
see this completely replace the current form of authorship.
12. Revise the peer review system
Independent peer
review may the best mechanism we currently have for triaging science, but it still
sucks. For one thing, it’s usually not independent. I often get asked to review
papers by scientists I know or have even worked with. I’ve even been asked to
review my own papers on occasion, and was once asked to review my own grant
application! (You’ll be glad to know I declined all such instances of
self-review). The review process is random and noisy, and based on such a pitifully
small sample of comments that the notion of it providing meaningful information is, statistically speaking, quite ridiculous.
I personally favour
the idea of cutting down on the number of detailed reviewers per manuscript and
instead calling on a larger number of ‘speed reviewers’, who would simply rate
the paper according to various criteria, without having to write any comments.
As a reviewer, I often find that I can form an opinion of an article relatively
quickly – it is writing the review that takes the most time.
Last week, Paul Knoepfler wrote a provocative
blog post proposing an innovation in peer review in which authors review
the reviewers. Could this help improve quality of reviews? Unfortunately, I don’t think Paul’s system would work (see my
comment on his post here),
but perhaps some kind of independent
meta-review of reviewers could also be a good idea in a limited number of cases.
__
What do you think? Got
better ideas? Please leave any comments below.
** Update 18/7/12, 14:30: On the issue of the gender imbalance in academic fraud, Mark Baxter has kindly reminded me of this case involving Karen M. Ruggiero.
** Update 18/7/12, 14:30: On the issue of the gender imbalance in academic fraud, Mark Baxter has kindly reminded me of this case involving Karen M. Ruggiero.