NeuroChambers: Changing the culture of scientific publishing from within

Monday, 8 October 2012

Changing the culture of scientific publishing from within

****************************
Update, 20 October
I have just learned of a related idea being proposed at Psychological Science.
****************************

****************************
Update, 12 November
At 11am we got the all-clear from the publisher. This is going to be reality.
Stay tuned for further developments!
****************************

This is a long post and won’t appeal to everyone. But it could well be the most important thing I have committed to this blog since I started it seven months ago.

What follows is an open letter to my colleagues on the editorial board at the journal Cortex, where I've just been made an associate editor. My new position offers the exciting opportunity to push much needed reforms in scientific publishing, and moreover, to do so from within the machinery of a peer-reviewed journal.

Here I'm pitching one reform in particular: a new kind of empirical article called a Registration Report, which would involve peer review of research methodology prior to data collection. Ever since reading this important post by Neuroskeptic, I’ve been convinced that study registration is the cure to much of what ails us.

Before launching into the text of the letter (or 'working document' as I prefer to think of it), I'd like to offer my thanks to the following people for inspiring – and, indeed, outright generating – the ideas in this proposal: Neuroskeptic, Marcus Munafò, Pete Etchells, Mark Stokes, Frederick Verbruggen, Petroc Sumner, Alex Holcombe, Ed Yong, Dorothy Bishop, Chris Said, Jon Brock, Ananyo Bhattacharya, Alok Jha, Uri Simonsohn, EJ Wagenmakers, and Brian Nosek. And no doubt many others who I have temporarily forgotten (my apologies, I will update the post accordingly as more names come to mind!)

You might ask why I'm blogging this. Well, I think it's important for two reasons. First, it's good to be transparent about the problems facing scientific publishing and the possible solutions we have to choose from. And second, I want this discussion to be open not only to the editorial board of Cortex but to scientists (both senior and junior), science writers and journalists, science policy makers, and science enthusiasts generally - and, in particular, to the scientists who would consider sending their submissions to Cortex. So whoever you are and whatever you do, if you care about scientific publishing then please do leave a comment.

Tell me if you think it's a good idea. Tell me if you think it's a stupid or naïve idea. Tell me where I've missed something crucial or where you see a particularly strong selling point. And above all, tell me this: would you consider submitting your own manuscripts using this new article format? The more interest this proposal receives, and publicly, the better chance I have of convincing my colleagues and the journal publisher to pursue it – or something like it.

Enough preamble. Here’s my open letter.

=====================================================================

Registered Reports: A proposal for a new article format at Cortex

=====================================================================

It is a great privilege to become an associate editor at Cortex.

Cortex was one of the first journals I published in, and I have reviewed at the journal for many years now. I’m particularly humbled to join such a distinguished editorial board.

As delighted as I am to join Cortex, I think we need to be doing more than editing submissions according to standard practices. In most journals, the traditional approach for handling empirical articles is archaic and demonstrably flawed. I believe we should be using our editorial positions to institute reforms that are long overdue.

1. General proposal and rationale

I would therefore like to propose a new form of empirical article at Cortex, called Registered Reports. I hope to start a discussion among the editorial board and wider scientific community about the merits and drawbacks of such a proposal. In addition to emailing this document to the editorial board, I have also published it on my blog for open discussion, so please feel free to reply either confidentially (via email) or publicly (on the blog). This proposal is very much a working document so any edits or comments on the document itself are most welcome.

I need to make one point clear at the outset. At this stage I am not proposing that we drop any of the existing article formats at Cortex. Rather, I am suggesting an additional option for authors.

The cornerstone of Registered Reports is that a large part of the manuscript would be reviewed prior to the experiments being conducted. Initial manuscripts would be submitted before a study has been undertaken and would include a description of the key background literature, hypotheses, experimental procedures, analysis pipeline, a statistical power analysis, and pilot data (where applicable). Following peer review, the article would then be either rejected or accepted in principle for publication.

Once in principle acceptance (IPA) has been obtained, the authors would then proceed to conduct the study, adhering exactly to their peer-reviewed procedures. When the study is complete the authors would submit their finalised manuscript for re-review and would upload their raw data and laboratory log via Figshare for full public access. Pending quality checks and a sensible interpretation of the findings, the manuscript would be published – and, crucially, independently of what the results actually look like.

This form of article has a number of advantages over the traditional publishing model. First and foremost, it is immune to publication bias because the decision to accept or reject manuscripts will be based on the significance of the research question and methodological validity, never on whether results are statistically significant.

Second, by requiring prospective authors to adhere to a preapproved methodology and analysis pipeline, it will eliminate a host of suspect but common practices that increase false discoveries, including p value fishing (i.e. adding subjects to an experiment until statistical significance is obtained – a practice admitted to by 71% of recently surveyed psychologists; [9]) and selective reporting of experiments to reveal manipulations that “work”. Currently, many authors partake in these practices because doing so helps convince editors and reviewers that their research is worthy of publication. By providing IPA prior to data collection, the incentive to engage in these practices will be largely eliminated.

Third, by requiring an a priori power analysis, including a stringent minimum power level (see below), false negatives will be greatly reduced compared with standard empirical reports. This will increase the veracity of non-significant effects.

Taken together, these practices will ensure that articles published as Registered Reports have a substantially higher truth value than regular studies. Such articles can therefore be expected to be more replicable and have a greater impact on the field.

Why should we want to make this change? The life sciences, in general, suffer from a number of serious problems including publication bias [1, 2], low statistical power [3, 4], undisclosed post-hoc analytic flexibility [5, 6, 7], and a lack of data transparency [8]. By valuing findings that are novel and eye-catching over those that are likely to be true, we have incentivised a range of questionable practices at individual and group levels. What’s more, a worryingly high percentage of psychologists admit to engaging in dubious practices such as selectively reporting experiments that produced desirable outcomes (67%) and p value fishing (71%) [9].

So why should we change now? After all, these problems are far from new [10, 11]. My instinctive response to this question is, why haven't we changed already? In addition, there are several reasons why advances in scientific publishing are especially timely. The culture of science is evolving quickly under heightened funding pressure, with an increasing emphasis on transparency and reproducibility [12], open access publication [13], and the rising popularity of the PLoS model and other alternative publication avenues. Furthermore, retractions are at a record high [14], and recent high-profile fraud cases (e.g. Stapel, Smeesters, Sanna, Hauser) are casting a long shadow over our discipline as a whole.

The ideas outlined here are not new and I certainly can’t claim credit for them. I formulated this proposal after a year of discussion with scientists in multiple disciplines (including journal editors), science policy makers, science journalists and writers, and the Science Media Centre, as well as key blog articles (e.g. here, here and here).

I hope I can convince you that Registered Reports would provide an important innovation in scientific publishing and would position Cortex as a leader in the field. If you agree, in principle, then our next step will be to decide on the details. Then, finally, we would need to convince Elsevier to take this journey with us.

If we succeed then it will bring the scientific community one step closer to a system in which the incentive to discover something true, however small, outweighs the incentive to produce ‘good results’. Call me a shameless idealist, but I find that possibility hugely exciting.

2. The proposed mechanism

Registered Reports would work as follows.

(a) Stage 1: Registration review

Authors submit their initial manuscript prior to commencing their experiment(s). The initial submission would include the following sections:

· Background and Hypotheses

o A review of the relevant literature that motivates the research question, and a full description of the aims and experimental hypotheses.

· Methods

o Full description of proposed sample characteristics, including criteria for subject inclusion and exclusion, and detailed description of procedures for defining outliers. Procedures for objectively defining exclusion criteria due to technical errors (e.g. defining what counts as ‘excessive’ head movement during fMRI) or for any other reasons (where applicable) must be documented, including details of how and under what conditions subjects would be replaced.

o A description of experimental procedures in sufficient detail to allow another researcher to repeat the methodology exactly, without requiring any further information.

o Proposed analysis pipeline, including all preprocessing steps, and a precise description of every analysis that will be undertaken and appropriate correction for multiple comparisons. Any covariates or regressors must be stated. Consistent with the guidelines of Simmons et al. (2011; see 5), proposed analyses involving covariates must be reported with and without the covariate(s) included. Neuroimaging studies must document in advance, and in precise detail, the complete pipeline from raw data onwards.

o Where analysis decisions or follow-up experiments are contingent on the outcome of prior analyses, these contingencies must be detailed and adhered to.

o A statistical power analysis. Estimated effect sizes should be justified with reference to the existing literature. To account for existing publication bias, which leads to overestimation of true effect sizes [15, 16], power analysis must be based on the lowest available estimate of the effect size. Moreover, the a priori power (1 - B) must be 0.9 or higher. Setting a high power criterion for discovery of minimal effect sizes is paramount given that this model will lead to the publication non-significant effects.

o In the case of very uncertain effect sizes, a variable sample size and interim data analysis would be permissible but with inspection points stated in advance, appropriate Type I error correction for ‘peeking’ employed [17], and a final stopping rule for data collection outlined.

o Full description of any outcome-neutral criteria that are required for successful testing of the study hypotheses. Such ‘reality checks’ might include the absence of floor or ceiling effects, or other appropriate baseline measures. Editors must ensure that such criteria are not used by reviewers to enforce dogma about accepted ‘truths’. That is, we must allow for the possibility that failure to show evidence for a critical ‘reality check’ can raise doubt about the truth of that accepted reality in the first place.

o Timeline for completion of the study and proposed resubmission date if registration review is successful. Extensions to this deadline can be arranged with the action editor.

· Pilot Data

o Optional. Can be included to establish reality checks, feasibility, or proof of principle. Any pilot data would be published with the final version of the manuscript and will be clearly distinguished from data obtained for the main experiment(s).

In considering papers in the registration stage, reviewers will be asked to assess:

The significance of the research question(s)
The logic, rationale, and plausibility of the proposed hypotheses
The soundness and feasibility of the methodology and analysis pipeline
Whether the level of methodological detail provided would be sufficient to duplicate exactly the proposed experimental procedures and analytic approach

Attempted replications of high profile studies would be welcomed. For replication attempts to be accepted, they must be regarded by the reviewers as significant and important regardless of outcome (i.e. having a high replication value [18] as was the case in the recent attempted replication of precognition effects [19]).

Manuscripts that pass registration review will be issued an in principle acceptance (IPA). This means that the manuscript is accepted for publication pending successful completion of the study according to the exact methods and analytic procedures outlined, as well as a defensible and evidence-based interpretation of the results.

Upon receiving IPA, authors will be informed that any deviation from the stated methods, regardless of how minor it may seem, will be lead to summary rejection of the manuscript. If the authors wish to alter the experimental procedures following IPA but still wish to publish it as a Registered Report in Cortex then the manuscript must be withdrawn and resubmitted as a new Stage 1 submission.

(b) Stage 2: Full manuscript review

Once the study is complete, the authors then prepare and resubmit their manuscript for full review, with the following additions:

· Submission of raw data and laboratory log

o Raw data must be made freely available via the website Figshare (or an alternative free service). Data files must be appropriately time stamped to show that it was collected after IPA and not before. Other than pre-registered and approved pilot data, no data acquired prior to the date of IPA is admissible in the final submission. Raw data must be accompanied by guidance notes, where required, to assist other scientists in replicating the analysis pipeline.

o The authors must collectively certify that all non-pilot data was collected after the date of IPA. A simple laboratory log will be provided outlining the range of dates during which data collection took place.

· Revisions to the Background and Rationale

o The stated hypotheses cannot be altered or appended. However, it is perfectly reasonable for the tone and content of an Introduction to be shaped by the results of a study. Moreover, depending on the timeframe of data collection, new relevant literature may have appeared between registration review and full manuscript review. Therefore, authors will be allowed to update at least part of the Introduction.

· Results & Discussion

o This will be included as per standard submissions. With one exception, all registered analyses must be included in the manuscript. The exception would be (very) rare instances where a registered and approved analysis is subsequently shown to be logically flawed or unfounded in the first place (i.e. the authors, reviewers, and editor made a collective error of judgment and must collectively agree that the analysis is, in fact, inappropriate). In such cases the analysis would still be mentioned in the Method but omitted from the Results (with the omission justified).

o It is sensible that authors may occasionally wish to include additional analyses that were not included in the registered submission; for instance, a new analytic approach might emerge between IPA and full review, or a particularly interesting and unexpected finding may emerge. Such analyses are admissible but must be clearly justified in the text, caveated, and reported in a separate section of the Results titled “Post hoc analyses”. Editors must ensure that authors do not base their conclusions entirely on the outcome of significant post hoc analyses.

o Authors will be required to report exact p values and effect sizes for all inferential tests.

The resubmission will ideally be considered by the same reviewers as in the registration stage, but could also be assessed by fresh reviewers. In considering papers at the full manuscript stage, reviewers will be asked to appraise:

Whether the data are able to test the authors’ proposed hypotheses by passing the approved outcome-neutral criteria (such as absence of floor and ceiling effects)
Whether any changes to the Introduction are reasonable and do not alter the rationale or hypotheses
Whether the authors adhered precisely to the registered experimental procedures
Whether any post-hoc analyses are justified, robust, and add to the informational content of the paper
Whether the authors’ conclusions are justified given the data

Crucially, reviewers will be informed that editorial decisions will not be based on the perceived importance or clarity of the data. Thus while reviewers are free to enter such comments on the record, they will not influence editorial decisions.

Reviews will be anonymous. To maximise transparency, however, the anonymous reviews and authors’ response to reviewers will be published alongside the full paper in an online supplement.

Manuscript withdrawal

It is possible that authors with IPA may seek to withdraw their manuscripts following or during data collection. Possible reasons could include technical error or an inability to complete the study due to other unforeseen circumstances. In all such cases, manuscripts can of course be withdrawn. However, the journal will publicly record each case in a section called Retracted Registrations. This will include the authors, proposed title, an abstract briefly outlining the original aim of the study, and brief reason(s) for the failure to complete the study. Partial retractions are not possible; i.e. authors cannot publish part of a registered study by selectively retracting one of the planned experiments. Such cases must lead to retraction of the entire paper.

3. Concerns, Responses and Discussion Points

Here follows a paraphrased Q & A, including some actual and hypothetical discussions about the proposal with colleagues.

1. Won’t Registered Reports just become a dumping ground for inconclusive null effects?

a. No. The required power level will increase the chances of detecting statistical significance when it reflects reality. Average power in psychology/cognitive neuroscience is low whereas IPA will be contingent on power of 0.9 or above. Thus, any non-significant findings will, by definition, be more conclusive than typically observed in the literature.

b. It is crucial that we provide a respected outlet for well-powered non-significant findings. This will help combat the file drawer effect and reduce the publication of false discoveries. Moreover, authors are welcome to propose superior alternatives to conventional null hypothesis testing, such as Bayesian approaches [20].

c. By guaranteeing publication prior to data being collected, this model would encourage authors to propose large scale studies for more definitive hypothesis testing – studies which investigators would otherwise be reluctant to pursue given the risk of yielding unpublishable null effects.

d. Registration review will be stringent, with reviewers asked to consider the methodology in detail for possible oversights and flaws that could prevent the study from testing the proposed hypotheses.

2. It all sounds too strict. Why would authors submit to this scheme when they can’t change even one small aspect of their experimental procedure without being ‘summarily rejected’? Even grant applications are not so demanding.

a. Yes it is stringent, and so it should be. This format of article is primarily intended for well-prepared scientists who have carefully considered their methodology and hypotheses in advance. And isn’t that how we ought to be doing science most of the time anyway?

b. Note that the strict methodological stringency is coupled with a complete lack of expectation of how the results should look. Whether an experiment supports the stated hypothesis is the one aspect of science that scientists (should) have no control over – yet the traditional publishing model encourages a host of dodgy practices to exert such control. This new model replaces the artificial and counterproductive ‘data stringency’ with constructive ‘methodological stringency’, and so would largely eliminate the pressure for scientists to submit data that perfectly fit their predictions or confirm someone’s theory. I believe many scientists would approach this model with relief rather than trepidation.

3. Authors could game the system by running a complete study as per usual and submitting the methodology for registration review after the fact.

a. No, raw data must be made freely available at the full review stage and time stamped for inspection, along with a laboratory log indicating that data collection took place between dates X and Y. Final submission must also be accompanied by a certification from each author that no data (other than approved pilot data) was collected prior to the date of IPA. Any violation of this rule would be considered misconduct; the article would be retracted by Cortex and referred to Retraction Watch.

4. What’s to stop unscrupulous reviewers stealing my ideas at the registration stage, running the experiments faster than I can (or rejecting my registration submission outright to buy time), and then publishing their own study?

a. This is a legitimate worry, and it is true that there is no perfect defense against bad practice. But we shouldn’t overstate this concern. Gazumping is rare and, in any case, is present in many areas of science. Fear of being scooped doesn’t stop us presenting preliminary data at conferences or writing grant applications. So why should we be so afraid of registration review?

b. Even if an unscrupulous reviewer decided to run a similar/identical experiment following IPA, the decision to publish would not be influenced. So being scooped would not cost the authors a publication once the authors pass IPA.

c. Unlike existing protocol journals, such as BMC Protocols, the IPA submission would not be published in advance of the main paper. So only the reviewers and editors would see it. This will reduce the chances of being gazumped.

5. A lot of the most interesting discoveries in science are serendipitous. Your approach will stifle creativity and data exploration.

a. No, it won’t. Authors will be allowed to include “post-hoc analyses” in the manuscript that were not in the registered submission. They simply won’t be able to pretend that such analyses were planned in advance or adjust their hypotheses to predict unexpected outcomes. And, sensibly, they won’t be able to base the conclusions of their study on the outcome of unplanned analyses – the original registered analyses would take precedence and must also be reported.

b. It should also be noted that a priori analyses in the registration stage could include exploration of possible serendipitous findings.

c. Serendipitous findings are, by their nature, rare. A far greater problem is the proliferation of false positives due to excessive post-hoc flexibility in analysis approaches. So let’s deal with the big problem first.

6. You propose allowing authors to alter the Introduction to include new literature. Doesn’t this create a slippery slope for changing the rationale or hypotheses too?

a. No, but we must be vigilant on this point. I think it is entirely sensible to allow revisions to the Introduction to contextualise the literature based on the findings and to focus on most recent publications that emerged following IPA. After all, we want readers to be engaged as well as informed. However, we must also ensure that such changes are reasonable. Monitoring this aspect in particular would be one of the central reviewing criteria at Stage 2 (see above). In a revised Introduction, the authors would not be permitted to alter the rationale for the study, to state new hypotheses, or to alter the existing hypotheses. These could be flagged in distinct sections of the Introduction that are untouchable following IPA.

7. What if the authors never submit a final manuscript because the results disagree with some desired outcome (such as supporting their preferred explanation)? How can you prevent publication bias on the part of the authors?

a. We can’t stop authors censoring themselves. As noted above, however, if a study is withdrawn following IPA then this will be noted in a Retracted Registrations section of the journal. So there would at least be a public record of the withdrawal and some explanation for why it happened.

b. Note also that if the authors have not submitted by their own stated deadline then the manuscript will be automatically withdrawn, considered retracted, and noted in the Retracted Registrations section. Extensions to the deadline are permissible following prior agreement with the action editor.

8. What would stop authors getting IPA, then running many more subjects than proposed and selectively including only the ones that support their desired hypothesis?

a. Nothing. But doing so is outright fraud, similar to the conduct of Dirk Smeesters [21]. No mechanism can fully guard against fraud, and regular submissions under the traditional publishing route are equally vulnerable to such misbehaviour. Note also that the proposed model requires submission of raw data, which will help protect against such eventualities. Selective exclusion of subjects to attain statistical significance can be detected using the statistical methods developed by Uri Simonsohn [22]. This alone will act as a significant deterrent to fraudsters.

9. How can IPA be guaranteed without knowing the author’s interpretation of the findings?

a. It isn’t. IPA ensures that the article cannot, and will not, be rejected based on the results themselves (with the exception of failing outcome-neutral reality checks, such as floor or ceiling effects, which prevent the stated hypotheses being appropriately tested). Manuscripts can still be rejected if the reviewers and editor believe the author’s interpretation is unreasonable given the data. And they will be rejected summarily if the authors change their experimental procedures in any way following IPA.

10. What if the authors obtain IPA but then realise (after data collection commenced) that part of their proposed methods or analyses were incorrect or suboptimal?

a. In the case of changes to the experimental procedures, the manuscript would have to be fully withdrawn but could be returned to Stage 1 for fresh registration review.

b. In this case of changes to the analysis approach, depending on the nature of the proposed change, Stage 2 may be able to proceed following a phase of interim review and discussion with the editor and reviewers (if all agree that a different form of analysis is preferable). In such cases, the original proposed analysis would still be published in the final article but may not be reported, and the reasons for excluding it would be acknowledged.

11. Cortex already has a long backlog of in-press articles. Adding yet another article format could make this problem worse.

a. I propose that each article published as a Registered Report takes the place of a standard research report, thus requiring similar journal space to the current model.

b. If registered reports become increasingly popular and well cited, the journal could gradually phase the standard report format out altogether, making registration reports the norm.

I hope I can convince you that Registration Reports would be a useful and valid initiative at Cortex. And even if not, I look forward to the ensuing discussion. Below is a list of key supporting references.

[1] Rosenthal R (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86: 638–641.

[2] Thornton A & Lee P (2000). Publication bias in meta-analysis: its causes and consequences

Journal of Clinical Epidemiology, 53: 207–216.

[3] Chase, LJ & Chase, RB (1976). A statistical power analysis of applied psychological research. Journal of Applied Psychology, 61: 234-237.

[4] Tressoldi, PE (2012). Replication unreliability in psychology: elusive phenomena or "elusive" statistical power? Frontiers in Psychology, 3: 218.

[5] Simmons JP, Nelson LD, and Simonsohn U. (2011). False-positive psychology: Undisclosed

flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 359-66.

[6] Wagenmakers, EJ (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14: 779–804.

[7] Masicampo, EJ & Lalande, DR (in press). A peculiar prevalence of p values just below .05. Quarterly Journal of Experimental Psychology.

[8] Ioannidis JPA (2005). Why Most Published Research Findings Are False. PLoS Medicine 2(8): e124. doi:10.1371/journal.pmed.0020124

[9] John, L, Loewenstein, G, & Prelec, D (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23: 524-532 DOI: 10.1177/0956797611430953

[10] Smith MB (1956). Editorial. Journal of Abnormal & Social Psychology, 52:1-4.

[11] Cohen, J (1962). The statistical power of abnormal – social psychological research: A review. Journal of Abnormal & Social Psychology, 65, 145‐153.

[12] http://openscienceframework.org/

[13] http://occamstypewriter.org/scurry/2012/09/05/key-questions-for-open-access-policy-in-the-uk

[14] Fang, FC, Steen, RG & Casadevalld, A. (2012) Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences USA: 10.1073/pnas.1212247109

[15] Lane, DM & Dunlap, WP (1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical and Statistical Psychology, 31: 107‐112.

[16] Hedges LV & Vevea, JL (1996). Estimating effect size under publication bias: Small sample properties and robustness of a random effects selection model. Journal of Educational and Behavioral Statistics, 21: 299-332.

[17] Strube, MJ (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing. Behavior Research Methods, 38: 24-27. Software available from here: http://www.artsci.wustl.edu/~socpsy/Snoop.7z

[18] Nosek, B. A., Spies, J. R., & Motyl, M. (in press). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science. arxiv.org/pdf/1205.4251

[19] Ritchie SJ, Wiseman R, French CC (2012) Failing the Future: Three Unsuccessful Attempts to Replicate Bem’s ‘Retroactive Facilitation of Recall’ Effect. PLoS ONE 7(3): e33423. doi:10.1371/journal.pone.0033423

[20] Kruschke, JK (in press). Bayesian estimation supercedes the t test. Journal of Experimental Psychology: General. www.indiana.edu/~kruschke/BEST/BEST.pdf

[21] http://blogs.discovermagazine.com/notrocketscience/2012/06/26/why-a-new-case-of-misconduct-in-psychology-heralds-interesting-times-for-the-field/

[22] http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571&http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571

60 comments:

Neuroskeptic8 October 2012 at 12:37
Great work! I'm very impressed by this and think it would be a big step in the right direction.

Some random thoughts:

1) Authors should be given the option of publishing their registered Protocol after the Registration Review. Either as an online mini-article or they could publish it themselves.

This would help to guard against idea stealing, because it would clearly establish precedent - anyone could steal the idea, but it would be obvious that they'd done so, which would make it much less desirable.

Also, this would help to guard against the possibility of misbehaviour by the second-stage reviewers. If these reviewers decided that they didn't like the data, and tried to block the paper for that reason, the authors would then be able to appeal to the court of public opinion, by pointing to their published (and therefore certified a priori) protocol and saying "Here's what we said we'd do and here's our data - form your own opinions". This is unlikely to happen often, but it would be a crucial check on the power of reviewers.

2) I'm not entirely happy with allowing people to change their Introduction, even for bona fide reasons like new literature emerging. I think it would be a slippery slope. But I can see that without that, you might end up with some really irrelevant Introductions. So why not just allow authors to change the Introduction at will, but, also publish the originally approved one as a Supplement? That would allow readers to judge whether the Intro had been altered for 'naughty' purposes or not.

3) Scientists will rightly object to any proposal that would cause an increase in bureaucracy. On its face, this proposal would "double the amount of peer review" which would be a hassle. I wonder if it could be coupled to some system for integrating peer review with the process of applying for a grant e.g. the journal could agree with Grant Body X that any protocol awarded money by X would be treated ipso facto as "reviewed" and would be fast-tracked through the Registration Review (but not the final review) with only minimal oversight?
ReplyDelete
Replies
EJ8 October 2012 at 12:40
I completely agree. In fact, my colleagues and I have proposed the very same idea in a paper ("An agenda for purely confirmatory research") that is in press for Perspectives on Psychological Science: http://www.ejwagenmakers.com/2012/ConfirmatoryResearchFTW_inpress.pdf

Cheers,
E.J.
ReplyDelete
Replies
Bashir8 October 2012 at 14:30
This may sound a bit daft but, what is the point of this? Let's say your journal implements the idea. How do you see things going? Why would any scientist use this system given other options? Seems like the most obvious differences here are more work and less flexibility for scientists.
ReplyDelete
Replies
StokesBlog8 October 2012 at 14:35
Great work Chris - A very sound proposal, which I would fully support. It will also make everyone think a lot harder in the planning stage of an experiment, rather than just hoping to find something interesting in the data.

Also, given the close link to current grant formats, one could imagine a future where these too processes begin to merge, thus reducing net bureaucracy.

You would want a mechanism for committing folks to stick with the target journal, so this route is not abused as a backup publication. I.e., register with journal X so you are guaranteed at minimum a publication with X, but if the results work out really nicely, why not go for Journal Y with a higher impact!?

Hopefully a more general shift in culture could protect against this, especially if replicability becomes a higher badge of honour than impact factor. Also, the retraction notice might be sufficient disincentive.

Anyway, great work!
ReplyDelete
Replies
Bas Neggers8 October 2012 at 22:08
My compliments for making this effort to change current publication policies.
I fully agree that current practices by scientists brushing up their data creating biases as a result, by journals too eager to publish the latest 'hot' finding, by funding agencies and hiring committees over emphasizing quantity over quality, make published papers less reliable and trustworthy and can discredit science as a whole.

I do however doubt whether scheme proposed here is the solution when other methods of publication are still available and more appealing to very busy scientists, and thus also to those evil scientists who want to publish their rubbish. I have a couple of reasons for my doubts.

First, this proposal will create a pretty big layer of bureaucracy on top of the already ever increasing bureaucracy of institutional ethical review boards (incl monitoring, auditing, etc), grant proposals, and so on. Without a reliable and rigorous review process of your ‘registered articles’, this idea is not going to work, so it requires an elaborate system and lots of work from both scientists and reviewers. I personally very much like the idea of publishing the analysis pipeline in detail, but many authors I know (luckily not all) hardly know or understand their pipeline (a poor TA has to do/code it, or an underpaid smart PhD student) for a classical post-acquisition paper. Let alone before an experiment is conducted and it is unsure whether there is an easy score of ‘hot’ news within reach.

Second, who is to review the high level of detail? Even as it is, it gets extremely hard to find good reviewers.

Third, true discoveries are often coming by surprise, and are all but pre-planned. Think of Kekule’s Benzene ring, the apple falling on Isaac Newton’s head or the all-changing bathing session by Archimedes leading to his Eureka. This proposal would add a rigor and inflexibility to science that can and will kill much needed creativity.

Fourth, I have serious doubts about power analyses. You somehow assume you have an a-priori idea about the expected effect size and the noise in your data. While the latter might be known from some null-data, the former cannot be really known without doing the experiment (especially when it’s a novel idea). So when you already know your result, why do the experiment in the first place?

Therefore, other than making the life of scientists even harder by bloating the publication process, I’d rather recommend a) the submission of the complete research data and b) serious space in journals for replications or ‘failures to replicate’, and credits for scientists who publish those. That might greatly reduce the temptation to publish dodgy results in the first place, which is the whole purpose of this debate. And leave the rest to the ‘market’, in the long run frauds are discovered, especially when a field finally gets applied in the real world (assumed they ever will be). The character flaws leading to the frauds are all too human and will never be fully rooted out, unfortunately.

To me it seems that this concern is currently very alive among psychologists and more generally (neuroimaging) social sicentists. Stapel and Smeesters as recent examples belong to these domains (except the imaging part perhaps), that might explain it. I do by no means want to imply that frauds do not exist in other fields, but there simply seems much less concern in older more established disciplines. This might be due to other working habits and much clearer areas of application requiring a higher degree of scrutiny automatically. Think of material science, pharmacology, physics, etc.
As laudable, noteworthy and interesting Chris’ proposal may be, I do not think it will be accepted by the majority of practicing scientists nor will it solve the issue that he is rightfully addressing. I as a publishing scientist at least hope I will not have to go through this process.
Thank you for inspiring this very important debate.
ReplyDelete
Replies
Unknown9 October 2012 at 09:32
I like this idea, but I do think it underemphasizes some important points regarding what makes a good paper. For instance, how well is it written? How well does it represent the previous literature? How interesting is the theoretical conclusions that are drawn based on the published results (and the current literature). If a paper is accepted based on the proposed methods, then these other elements of a paper don't seem to matter as much anymore.

I imagine it would be quite easy to pass the test of designing a study that makes sense. How one thinks about a set of data is often what makes a paper important or not.

Jeff Bowers
ReplyDelete
Replies
Molly9 October 2012 at 09:50
Brilliant work- thanks for taking the time to flesh out these ideas.

Several have raised concerns about this system creating an extra burden on reviewers- but I'm not sure I entirely agree. While it's true that reviews for this format may take a bit more time than traditional reviews, the value of this system is that a given manuscript will (ideally) only go through a single review process- so in terms of collective hours spent reviewing papers, your proposal may actually reduce the burden on the scientific community.

Consider the process we have now. Papers often face a string of rejections before getting published (and often rejections are based on data, not methods- e.g., null findings). A given paper may go through the review process at 3 or 4 different journals before getting published- so anywhere from 6 to 12 (or more) reviewers may take the time to review the paper. This is extremely inefficient- both for reviewers, and for authors, who must spend a substantial amount of time re-formatting the manuscript for different journals. None of this is time well spent. In contrast, the extra time involved for authors and reviewers in your proposed system *is* time well spent- the steps you outline guard against all sorts of problems that are rife in the scientific literature.

Finally, a question: presumably there would be leeway for revise-and-resubmit at the initial stage? Perhaps the most valuable contribution of this system would be its ability to prevent the wasting of resources on poorly designed studies. Too often I'm sent to review papers with fatal design flaws that render the data uninterpretable. Your proposed system would allow reviewers to point out such flaws *before* resources have been invested in data collection. Reviewers could suggest ways to improve the design, and papers could be IPA if they incorporate these suggestions. It might also be nice to give reviewers the option to be named when the paper is published, so they can be recognized for their contributions to the experimental design.
ReplyDelete
Replies
Neuroskeptic9 October 2012 at 15:29
Regarding the comparison with physics and the question of why physics seems immune to these problems and therefore doesn't need this kind of reform - which has been raised in this thread and elsewhere - two points:

1. The nature of physics (and astronomy) means most important work is de facto 'registered' because so many people are involved in planning, building the equipment, etc.

Indeed it's usually public knowledge not only what the experiments will be, but what any given result will mean: all that remains when the experiment goes live is to actually gather the numbers.

That's a great system!

Physics has it by default, because of the nature of most physical experiments. Biology is not so lucky, but through publishing reform we might be able to create it.
ReplyDelete
Replies
Sam Schwarzkopf9 October 2012 at 22:57
[1/2]

Interesting post, Chris, and a nice debate! I think you identify some of the main problems with the current publishing system. But I must say I remain unconvinced on many points. Most of this has already been discussed, such as increased bureaucracy, reduced flexibility, greater reviewer workload, etc. Some of the answers you give are fairly compelling but I'm unsure that there would be sufficient incentive for people to submit to this format in the present environment.

Molly above makes a great point that reviewer workload would be reduced under this system because there would only be one paper submission. However, crucially this is only true as long as the entire system is changed to the pre-registered approach. You are modestly proposing introducing this as an option in one journal to begin with. While this is much more realistic than a major overhaul of the current system it also means that this is not realistically improving matters. If the pre-registered paper gets rejected at the post-results stage, what is going to stop people from submitting it elsewhere (perhaps with "polished" results)? You say that there will be a record for the retracted registration but how many editors/reviewers at a different journal will know about that? Even if most people were aware of this new format at Cortex, would you really expect them to check for every single submitted manuscript if it had been submitted there previously?

There has also been much talk about serendipitous or revising your methods. I agree that there are researcher degrees of freedom that your approach would help to control but on the other hand I fear there would be great loss of flexibility in carrying out any research. Let's say I've been developing some new analytical procedure over the past years. It is not unreasonable that we might have started collecting some data trying to address actual experimental questions using these procedures, even though they are still under development (and for many procedures development is a continuous process). Obviously, there is something wrong with trying lots of different parameters and only reporting the single one that produces an interesting result. But surely there must be some wiggle room to allow for improvement. You emphasise the importance of well-planned experiments but conducting scientific research is often a learning process. It is not unusual that you discover something or have some form of insight about your procedure that truly enhances the method. The process you describe would stifle this sort of creativity and development.
ReplyDelete
Replies
Sam Schwarzkopf9 October 2012 at 22:58
[2/2]

Don't get me wrong, I can see the benefits of pre-registering experimental protocols. But I am wondering if the following wouldn't be more realistic and efficient: you create a new platform on which you can register your experimental design prior to carrying out the study. This is just like your pre-registration with an introduction and a methods section. But crucially there is no review and it is not bound to any journal. It is simply a registered protocol. Then, after you collected and analysed the data you submit a manuscript at a journal. During this process you can link to the pre-registered protocol to show what you proposed and reviewers can assess in how far your protocol has changed from the original. This could easily become part of the general peer review process. While this does not allow for scrutinising the design before the experiment is done (definitely a nice point about your proposal), it does not increase reviewer workload to the same degree as your proposal. It also is independent of being a particular format in one particular journal. Rather it is an optional step authors could take within the conventional system. Providing a pre-registered protocol and showing only minor or inconsequential changes in your final submission will be a credit to the authors and thus gives people an incentive for doing it.

Note that this idea still suffers from one problem that you also haven't addressed in your proposal. It would be fine to expect people to have time stamped data to prove that they really collected the data after submitting the proposal - but who is going to verify this? You would have to come up with a centralised, standardised system for keeping track of data time stamps. I am not sure how realistic that is.
ReplyDelete
Replies
EJ10 October 2012 at 22:58
I do agree with you, Chris, that it is important to eliminate wiggle room.
ReplyDelete
Replies
Emperor Joshua13 October 2012 at 08:44
Hi - thanks for setting out this idea so clearly. I pitched this to the editorial board of the journal I'm an editor for this morning, and it looks like there might be a real chance of this getting underway. I'll let you know how things go!
ReplyDelete
Replies
aner14 October 2012 at 16:49
(Disclaimer: I did not read the whole post)
This proposal correctly identifies some of the shortcoming of the current cycle of acceptance. However, I do not think it materially solves them. Specifically, the problem of accept/reject being based on the style and compelling nature of the write-up has been pushed from the final write-up to the proposal. In the proposed cycle, the most 'charismatic' authors would still win, but this time because they are able to convince the review board at the proposal stage. This might even amplify the problem since now these authors don't have to rely on completed data collection, but are free to spin their tales any way they want. This is similar to the problem of selecting a politician based on promises instead of past performance (though admittedly, being held to those promises is where the analogy ends). It would seem like the pre-approval phase already happens in the form of grant proposals, though again, grants don't follow up to see that the proposal has been strictly adhered to.
There is merit and peril in requiring strict adherence to a pre-set plan. I doubt many published authors would say that their results and conclusions came from a course that was known at the onset.
The biggest win I can see in this sort of cycle you propose is that the community would see more negative results getting published. Too often, a good scientific method is followed and then the results don't get published because they are 'negative', and the community is doomed to repeat history for lack of knowing it. In that respect, I support a plan to pre-accept publications regardless of how 'compelling' the results are, mainly because I find 'negative' results much more compelling than the average reviewer.
ReplyDelete
Replies
Unknown5 December 2012 at 08:39
Hey Chris - I wanted to let you know I have shared this article with FORCE11.org, and want to invite you to join this group - a community that shares the very same interests.
ReplyDelete
Replies
Unknown5 December 2012 at 08:49
Chris - I cannot edit my previous comment, but you can find the post here: http://force11.org/node/4222
ReplyDelete
Replies
phayes7 December 2012 at 09:43
“Moreover, authors are welcome to propose superior alternatives to conventional null hypothesis testing, such as Bayesian approaches [20].”

Kruschke is too kind. As has been pointed out again and again over the last several decades, NHST/orthodox statistics is an ill-conceived, “fundamentally irrational” and gratuitously arcane 'system' of pseudo-inference. It's unfit for purpose and it's appalling that it's still being allowed to damage science and mislead/confuse scientists and others. The suggestion to merely welcome authors who propose to do inference, and consequently science, properly is insufficient (and deeply ironic).
ReplyDelete
Replies
Anonymous19 December 2012 at 11:19
Hi Chris,

Somehow I've managed to miss this until now - apologies. It seems like a really long-overdue, radical and extremely important initiative, and I'm totally supportive of the main thrust of it - fantastic work.

I would like to take issue with one aspect though (and essentially echo Bas's comments above) - the emphasis on prospective power analyses. I understand why you've built it in to this proposal, but what you say above... "it is straightforward to estimate the likely effect size of the BOLD response" ...is just not true. Your ability to detect an effect in a BOLD fMRI experiment is dependent on many factors that aren't taken into account in a standard power analysis. I'm thinking of the statistical efficiency of your design, the scanner parameters (TR, TE, flip-angle, number of slices, acquisition matrix, phase-encoding direction etc.) used, and the hardware (magnet strength, 12 vs 24 vs 32 channel head-coils, even how good the scanner shim is on specific occasions). Inter-subject variability can be (more-or-less) estimated from previous experiments, however since the timing and amplitude of the HRF appears to vary (systematically?) across brain regions even within the same subject, it doesn't make much sense to estimate a single value for variance, or indeed power. And this is to say nothing of all the different variables that could be introduced during analysis (pre-processing, inclusion of noise-modelling regressors in the model, etc.). I'm firmly of the opinion that formal a priori power analyses are not helpful in neuroimaging (post-hoc power analyses are of course trivial, and also unhelpful). It's essentially plucking numbers out of thin air, and the only purpose it serves is to reassure ethics committees and grant reviewers that the issue has been considered.

An alternative, empirical approach is Murphy and Garavan (2004):
http://www.sciencedirect.com/science/article/pii/S1053811904000977 but even this is only suggestive since it was based on one particular (Go/No-Go) cognitive task.

Really, the only workable approach is essentially to do what most researchers do - roughly estimate participant numbers required from previous literature - which, as you mention above, is a kind of power analysis. However, we should do this with our eyes open and with full awareness that it's a pretty rough estimate. My problem with formal power analyses are that they misleadingly suggest that this process can be an exact one.
ReplyDelete
Replies
Anonymous18 January 2013 at 12:29
I have a quibble with the power analysis. Power analysis reinforces the idea that the study is about rejecting the null hypothesis. We should really care about the magnitude of effects. It's often even easier, and always more meaningful, to propose the size of your effect confidence intervals. Then you're not making a statement about passing the null goal but what you believe to be a meaningful effect size. Furthermore, this analysis leaves one open to discovering a different variance than predicted and modifying the N mid study. You're doing it to obtain a specific predetermined sensitivity, not to make an effect 'significant'.
ReplyDelete
Replies
Anonymous6 March 2013 at 17:01
What (other than a lack of funding) could prevent a researcher from doing lots of secret probe tests first, finding out what works and what doesn't, doing as much p-fishing as necessary; then, armed with this knowledge, write his registration submission describing the best series he did as if he's just considering it, get his IPA... wait some time and submit his best-fished preliminary tests that are guaranteed to fit the registration?

Of course this scenario is a bit far-fetched, and overall the IPA system will be a great improvement. I just don't think it's going to be a silver bullet against all kinds of bias.
ReplyDelete
Replies
Niko Busch26 April 2013 at 16:17
Thanks for your blogpost! I am really excited to see what comes out of Cerebral Cortex's initiative. While contemplating on submitting one of my own manuscripts for pre-registration, I realized that I either lack understanding of measures of statistical power, or there is a fundamental problem and unfair practice for a particular research technique: psychophysics.

The typical psychophysical study includes only a handful (+/- 2) participants. Since estimates of statistical power only look at sample size and effect size, the average psychophysical study would be considered severely underpowered. If I understand correctly, this type of study would not meet the requirements of Cerebral Cortex.

However, in psychophysics we usually torture each participant in thousands of trials, way more than in the average neuroimaging study. Intuitively, I feel that the "within-subject sample size" must be a major determinant of a study's trustworthiness. However, power measures seem to account only for "across-subjects sample size".

Are there any specific plans at Cerebral Cortex to let studies compensate small sample sizes with large trial numbers? Are there even statistical procedures that account for the number of repeated measurements in the calculation of statistical power?
ReplyDelete
Replies
Unknown2 May 2013 at 22:28
Dear Chris,

First of all, I would like to thank you for your contribution in trying to improve our scientific ways. I got, however, two concerns that I haven't read elsewhere on this page.

Firstly, in FAQ 3 it states: "raw data must be made freely available at the full review stage and time stamped for inspection, along with a laboratory log indicating that data collection took place between dates X and Y".
Although it doesn't state to whom the data is freely available, most medical centers across the world are reluctant to make patient information/data freely available. Many privacy problems could arise from this demand (e.g. patients could be identified on the basis of MR images), which might pose huge problems for local medical ethic committees. Furthermore, time stamps and laboratory logs will require a lot of effort and resources from scientists and/or medical centers in order to become a trustworthy instrument for verification and validation of data assessment.

Secondly, while rightfully shifting the emphasis of good science from "positive results" to a "well designed study", the importance of the hypothesized effect size is increased greatly. Is it not realistic to expect an exaggeration of the hypothesized effect size? This will increase the study's power and, perhaps, the liklyhood of publication. It won't be a problem, when a study doesn't (fully) live up to those expectations, for null results will be published as well. I fear that the effect size estimation might result in attempts to falsely hype one's study without consequences.
ReplyDelete
Replies
Anonymous8 September 2017 at 12:45
"I would therefore like to propose a new form of empirical article at Cortex, called Registered Reports"

I don't understand something and have a few questions: why is this called "Registered Reports"?

Is there anything that is "registered" and if so, how is that done, what does this "registration" imply, where is it "registered", and is this accessible to the reader?

The only form of "registration" i know of is pre-registration which involves a time-stamped, frozen, description of the research and analysis-plan that can be included in the paper so readers can check this information. But it seems to me that this is not the kind of "registration" that is involved in Registered Reports.
ReplyDelete
Replies

Add comment