Monday 8 October 2012

Changing the culture of scientific publishing from within


****************************
Update, 20 October
 I have just learned of a related idea being proposed at Psychological Science.
 ****************************

****************************
Update, 12 November
At 11am we got the all-clear from the publisher. This is going to be reality.
Stay tuned for further developments! 
****************************

This is a long post and won’t appeal to everyone. But it could well be the most important thing I have committed to this blog since I started it seven months ago.

What follows is an open letter to my colleagues on the editorial board at the journal Cortex, where I've just been made an associate editor. My new position offers the exciting opportunity to push much needed reforms in scientific publishing, and moreover, to do so from within the machinery of a peer-reviewed journal.

Here I'm pitching one reform in particular: a new kind of empirical article called a Registration Report, which would involve peer review of research methodology prior to data collection. Ever since reading this important post by Neuroskeptic, I’ve been convinced that study registration is the cure to much of what ails us.

Before launching into the text of the letter (or 'working document' as I prefer to think of it), I'd like to offer my thanks to the following people for inspiring – and, indeed, outright generating – the ideas in this proposal: Neuroskeptic, Marcus Munafò, Pete Etchells, Mark Stokes, Frederick Verbruggen, Petroc Sumner, Alex Holcombe, Ed Yong, Dorothy Bishop, Chris Said, Jon Brock, Ananyo Bhattacharya, Alok Jha, Uri Simonsohn, EJ Wagenmakers, and Brian Nosek. And no doubt many others who I have temporarily forgotten (my apologies, I will update the post accordingly as more names come to mind!)

You might ask why I'm blogging this. Well, I think it's important for two reasons. First, it's good to be transparent about the problems facing scientific publishing and the possible solutions we have to choose from. And second, I want this discussion to be open not only to the editorial board of Cortex but to scientists (both senior and junior), science writers and journalists, science policy makers, and science enthusiasts generally - and, in particular, to the scientists who would consider sending their submissions to Cortex. So whoever you are and whatever you do, if you care about scientific publishing then please do leave a comment.

Tell me if you think it's a good idea. Tell me if you think it's a stupid or naïve idea. Tell me where I've missed something crucial or where you see a particularly strong selling point. And above all, tell me this: would you consider submitting your own manuscripts using this new article format? The more interest this proposal receives, and publicly, the better chance I have of convincing my colleagues and the journal publisher to pursue it – or something like it.

Enough preamble. Here’s my open letter.

=====================================================================

Registered Reports: A proposal for a new article format at Cortex

=====================================================================

It is a great privilege to become an associate editor at Cortex.

Cortex was one of the first journals I published in, and I have reviewed at the journal for many years now. I’m particularly humbled to join such a distinguished editorial board.

As delighted as I am to join Cortex, I think we need to be doing more than editing submissions according to standard practices. In most journals, the traditional approach for handling empirical articles is archaic and demonstrably flawed. I believe we should be using our editorial positions to institute reforms that are long overdue.  

1. General proposal and rationale

I would therefore like to propose a new form of empirical article at Cortex, called Registered Reports. I hope to start a discussion among the editorial board and wider scientific community about the merits and drawbacks of such a proposal. In addition to emailing this document to the editorial board, I have also published it on my blog for open discussion, so please feel free to reply either confidentially (via email) or publicly (on the blog). This proposal is very much a working document so any edits or comments on the document itself are most welcome.

I need to make one point clear at the outset. At this stage I am not proposing that we drop any of the existing article formats at Cortex. Rather, I am suggesting an additional option for authors.

The cornerstone of Registered Reports is that a large part of the manuscript would be reviewed prior to the experiments being conducted. Initial manuscripts would be submitted before a study has been undertaken and would include a description of the key background literature, hypotheses, experimental procedures, analysis pipeline, a statistical power analysis, and pilot data (where applicable). Following peer review, the article would then be either rejected or accepted in principle for publication.

Once in principle acceptance (IPA) has been obtained, the authors would then proceed to conduct the study, adhering exactly to their peer-reviewed procedures. When the study is complete the authors would submit their finalised manuscript for re-review and would upload their raw data and laboratory log via Figshare for full public access. Pending quality checks and a sensible interpretation of the findings, the manuscript would be published – and, crucially, independently of what the results actually look like.

This form of article has a number of advantages over the traditional publishing model. First and foremost, it is immune to publication bias because the decision to accept or reject manuscripts will be based on the significance of the research question and methodological validity, never on whether results are statistically significant.

Second, by requiring prospective authors to adhere to a preapproved methodology and analysis pipeline, it will eliminate a host of suspect but common practices that increase false discoveries, including p value fishing (i.e. adding subjects to an experiment until statistical significance is obtained – a practice admitted to by 71% of recently surveyed psychologists; [9]) and selective reporting of experiments to reveal manipulations that “work”. Currently, many authors partake in these practices because doing so helps convince editors and reviewers that their research is worthy of publication. By providing IPA prior to data collection, the incentive to engage in these practices will be largely eliminated.

Third, by requiring an a priori power analysis, including a stringent minimum power level (see below), false negatives will be greatly reduced compared with standard empirical reports. This will increase the veracity of non-significant effects.

Taken together, these practices will ensure that articles published as Registered Reports have a substantially higher truth value than regular studies. Such articles can therefore be expected to be more replicable and have a greater impact on the field.

Why should we want to make this change? The life sciences, in general, suffer from a number of serious problems including publication bias [1, 2], low statistical power [3, 4], undisclosed post-hoc analytic flexibility [5, 6, 7], and a lack of data transparency [8]. By valuing findings that are novel and eye-catching over those that are likely to be true, we have incentivised a range of questionable practices at individual and group levels. What’s more, a worryingly high percentage of psychologists admit to engaging in dubious practices such as selectively reporting experiments that produced desirable outcomes (67%) and p value fishing (71%) [9].

So why should we change now? After all, these problems are far from new [10, 11]. My instinctive response to this question is, why haven't we changed already? In addition, there are several reasons why advances in scientific publishing are especially timely. The culture of science is evolving quickly under heightened funding pressure, with an increasing emphasis on transparency and reproducibility [12], open access publication [13], and the rising popularity of the PLoS model and other alternative publication avenues. Furthermore, retractions are at a record high [14], and recent high-profile fraud cases (e.g. Stapel, Smeesters, Sanna, Hauser) are casting a long shadow over our discipline as a whole.

The ideas outlined here are not new and I certainly can’t claim credit for them. I formulated this proposal after a year of discussion with scientists in multiple disciplines (including journal editors), science policy makers, science journalists and writers, and the Science Media Centre, as well as key blog articles (e.g. here, here and here).

I hope I can convince you that Registered Reports would provide an important innovation in scientific publishing and would position Cortex as a leader in the field. If you agree, in principle, then our next step will be to decide on the details. Then, finally, we would need to convince Elsevier to take this journey with us.

If we succeed then it will bring the scientific community one step closer to a system in which the incentive to discover something true, however small, outweighs the incentive to produce ‘good results’. Call me a shameless idealist, but I find that possibility hugely exciting.

2. The proposed mechanism

Registered Reports would work as follows.

(a) Stage 1: Registration review
Authors submit their initial manuscript prior to commencing their experiment(s). The initial submission would include the following sections:
·      Background and Hypotheses
o   A review of the relevant literature that motivates the research question, and a full description of the aims and experimental hypotheses.
·      Methods
o   Full description of proposed sample characteristics, including criteria for subject inclusion and exclusion, and detailed description of procedures for defining outliers. Procedures for objectively defining exclusion criteria due to technical errors (e.g. defining what counts as ‘excessive’ head movement during fMRI) or for any other reasons (where applicable) must be documented, including details of how and under what conditions subjects would be replaced.
o   A description of experimental procedures in sufficient detail to allow another researcher to repeat the methodology exactly, without requiring any further information.
o   Proposed analysis pipeline, including all preprocessing steps, and a precise description of every analysis that will be undertaken and appropriate correction for multiple comparisons. Any covariates or regressors must be stated. Consistent with the guidelines of Simmons et al. (2011; see 5), proposed analyses involving covariates must be reported with and without the covariate(s) included. Neuroimaging studies must document in advance, and in precise detail, the complete pipeline from raw data onwards.
o   Where analysis decisions or follow-up experiments are contingent on the outcome of prior analyses, these contingencies must be detailed and adhered to.
o   A statistical power analysis. Estimated effect sizes should be justified with reference to the existing literature. To account for existing publication bias, which leads to overestimation of true effect sizes [15, 16], power analysis must be based on the lowest available estimate of the effect size. Moreover, the a priori power (1 - B) must be 0.9 or higher. Setting a high power criterion for discovery of minimal effect sizes is paramount given that this model will lead to the publication non-significant effects.
o   In the case of very uncertain effect sizes, a variable sample size and interim data analysis would be permissible but with inspection points stated in advance, appropriate Type I error correction for ‘peeking’ employed [17], and a final stopping rule for data collection outlined.
o   Full description of any outcome-neutral criteria that are required for successful testing of the study hypotheses. Such ‘reality checks’ might include the absence of floor or ceiling effects, or other appropriate baseline measures. Editors must ensure that such criteria are not used by reviewers to enforce dogma about accepted ‘truths’. That is, we must allow for the possibility that failure to show evidence for a critical ‘reality check’ can raise doubt about the truth of that accepted reality in the first place.
o   Timeline for completion of the study and proposed resubmission date if registration review is successful. Extensions to this deadline can be arranged with the action editor.
·      Pilot Data
o   Optional. Can be included to establish reality checks, feasibility, or proof of principle. Any pilot data would be published with the final version of the manuscript and will be clearly distinguished from data obtained for the main experiment(s).

In considering papers in the registration stage, reviewers will be asked to assess:
  • The significance of the research question(s)
  • The logic, rationale, and plausibility of the proposed hypotheses
  • The soundness and feasibility of the methodology and analysis pipeline
  • Whether the level of methodological detail provided would be sufficient to duplicate exactly the proposed experimental procedures and analytic approach

Attempted replications of high profile studies would be welcomed. For replication attempts to be accepted, they must be regarded by the reviewers as significant and important regardless of outcome (i.e. having a high replication value [18] as was the case in the recent attempted replication of precognition effects [19]).

Manuscripts that pass registration review will be issued an in principle acceptance (IPA). This means that the manuscript is accepted for publication pending successful completion of the study according to the exact methods and analytic procedures outlined, as well as a defensible and evidence-based interpretation of the results.

Upon receiving IPA, authors will be informed that any deviation from the stated methods, regardless of how minor it may seem, will be lead to summary rejection of the manuscript. If the authors wish to alter the experimental procedures following IPA but still wish to publish it as a Registered Report in Cortex then the manuscript must be withdrawn and resubmitted as a new Stage 1 submission.

(b) Stage 2: Full manuscript review
Once the study is complete, the authors then prepare and resubmit their manuscript for full review, with the following additions:

·      Submission of raw data and laboratory log
o   Raw data must be made freely available via the website Figshare (or an alternative free service). Data files must be appropriately time stamped to show that it was collected after IPA and not before. Other than pre-registered and approved pilot data, no data acquired prior to the date of IPA is admissible in the final submission. Raw data must be accompanied by guidance notes, where required, to assist other scientists in replicating the analysis pipeline.
o   The authors must collectively certify that all non-pilot data was collected after the date of IPA. A simple laboratory log will be provided outlining the range of dates during which data collection took place.
·      Revisions to the Background and Rationale
o   The stated hypotheses cannot be altered or appended. However, it is perfectly reasonable for the tone and content of an Introduction to be shaped by the results of a study. Moreover, depending on the timeframe of data collection, new relevant literature may have appeared between registration review and full manuscript review. Therefore, authors will be allowed to update at least part of the Introduction.
·      Results & Discussion
o   This will be included as per standard submissions. With one exception, all registered analyses must be included in the manuscript. The exception would be (very) rare instances where a registered and approved analysis is subsequently shown to be logically flawed or unfounded in the first place (i.e. the authors, reviewers, and editor made a collective error of judgment and must collectively agree that the analysis is, in fact, inappropriate). In such cases the analysis would still be mentioned in the Method but omitted from the Results (with the omission justified).
o   It is sensible that authors may occasionally wish to include additional analyses that were not included in the registered submission; for instance, a new analytic approach might emerge between IPA and full review, or a particularly interesting and unexpected finding may emerge. Such analyses are admissible but must be clearly justified in the text, caveated, and reported in a separate section of the Results titled “Post hoc analyses”. Editors must ensure that authors do not base their conclusions entirely on the outcome of significant post hoc analyses.
o   Authors will be required to report exact p values and effect sizes for all inferential tests.


The resubmission will ideally be considered by the same reviewers as in the registration stage, but could also be assessed by fresh reviewers. In considering papers at the full manuscript stage, reviewers will be asked to appraise:

  • Whether the data are able to test the authors’ proposed hypotheses by passing the approved outcome-neutral criteria (such as absence of floor and ceiling effects)
  • Whether any changes to the Introduction are reasonable and do not alter the rationale or hypotheses
  • Whether the authors adhered precisely to the registered experimental procedures
  • Whether any post-hoc analyses are justified, robust, and add to the informational content of the paper
  • Whether the authors’ conclusions are justified given the data

Crucially, reviewers will be informed that editorial decisions will not be based on the perceived importance or clarity of the data. Thus while reviewers are free to enter such comments on the record, they will not influence editorial decisions.

Reviews will be anonymous. To maximise transparency, however, the anonymous reviews and authors’ response to reviewers will be published alongside the full paper in an online supplement.

Manuscript withdrawal
It is possible that authors with IPA may seek to withdraw their manuscripts following or during data collection. Possible reasons could include technical error or an inability to complete the study due to other unforeseen circumstances. In all such cases, manuscripts can of course be withdrawn. However, the journal will publicly record each case in a section called Retracted Registrations. This will include the authors, proposed title, an abstract briefly outlining the original aim of the study, and brief reason(s) for the failure to complete the study. Partial retractions are not possible; i.e. authors cannot publish part of a registered study by selectively retracting one of the planned experiments. Such cases must lead to retraction of the entire paper.

3. Concerns, Responses and Discussion Points

Here follows a paraphrased Q & A, including some actual and hypothetical discussions about the proposal with colleagues.

1.     Won’t Registered Reports just become a dumping ground for inconclusive null effects?
a.     No. The required power level will increase the chances of detecting statistical significance when it reflects reality. Average power in psychology/cognitive neuroscience is low whereas IPA will be contingent on power of 0.9 or above. Thus, any non-significant findings will, by definition, be more conclusive than typically observed in the literature.
b.     It is crucial that we provide a respected outlet for well-powered non-significant findings. This will help combat the file drawer effect and reduce the publication of false discoveries. Moreover, authors are welcome to propose superior alternatives to conventional null hypothesis testing, such as Bayesian approaches [20].
c.     By guaranteeing publication prior to data being collected, this model would encourage authors to propose large scale studies for more definitive hypothesis testing – studies which investigators would otherwise be reluctant to pursue given the risk of yielding unpublishable null effects.
d.     Registration review will be stringent, with reviewers asked to consider the methodology in detail for possible oversights and flaws that could prevent the study from testing the proposed hypotheses.

2.     It all sounds too strict. Why would authors submit to this scheme when they can’t change even one small aspect of their experimental procedure without being ‘summarily rejected’? Even grant applications are not so demanding.
a.     Yes it is stringent, and so it should be. This format of article is primarily intended for well-prepared scientists who have carefully considered their methodology and hypotheses in advance. And isn’t that how we ought to be doing science most of the time anyway?
b.     Note that the strict methodological stringency is coupled with a complete lack of expectation of how the results should look. Whether an experiment supports the stated hypothesis is the one aspect of science that scientists (should) have no control over – yet the traditional publishing model encourages a host of dodgy practices to exert such control. This new model replaces the artificial and counterproductive ‘data stringency’ with constructive ‘methodological stringency’, and so would largely eliminate the pressure for scientists to submit data that perfectly fit their predictions or confirm someone’s theory. I believe many scientists would approach this model with relief rather than trepidation.

3.     Authors could game the system by running a complete study as per usual and submitting the methodology for registration review after the fact.
a.     No, raw data must be made freely available at the full review stage and time stamped for inspection, along with a laboratory log indicating that data collection took place between dates X and Y. Final submission must also be accompanied by a certification from each author that no data (other than approved pilot data) was collected prior to the date of IPA. Any violation of this rule would be considered misconduct; the article would be retracted by Cortex and referred to Retraction Watch.

4.     What’s to stop unscrupulous reviewers stealing my ideas at the registration stage, running the experiments faster than I can (or rejecting my registration submission outright to buy time), and then publishing their own study?
a.     This is a legitimate worry, and it is true that there is no perfect defense against bad practice. But we shouldn’t overstate this concern. Gazumping is rare and, in any case, is present in many areas of science. Fear of being scooped doesn’t stop us presenting preliminary data at conferences or writing grant applications. So why should we be so afraid of registration review?
b.     Even if an unscrupulous reviewer decided to run a similar/identical experiment following IPA, the decision to publish would not be influenced. So being scooped would not cost the authors a publication once the authors pass IPA.
c.     Unlike existing protocol journals, such as BMC Protocols, the IPA submission would not be published in advance of the main paper. So only the reviewers and editors would see it. This will reduce the chances of being gazumped.

5.     A lot of the most interesting discoveries in science are serendipitous. Your approach will stifle creativity and data exploration.
a.     No, it won’t. Authors will be allowed to include “post-hoc analyses” in the manuscript that were not in the registered submission. They simply won’t be able to pretend that such analyses were planned in advance or adjust their hypotheses to predict unexpected outcomes. And, sensibly, they won’t be able to base the conclusions of their study on the outcome of unplanned analyses – the original registered analyses would take precedence and must also be reported.
b.     It should also be noted that a priori analyses in the registration stage could include exploration of possible serendipitous findings.
c.     Serendipitous findings are, by their nature, rare. A far greater problem is the proliferation of false positives due to excessive post-hoc flexibility in analysis approaches. So let’s deal with the big problem first.

6.     You propose allowing authors to alter the Introduction to include new literature. Doesn’t this create a slippery slope for changing the rationale or hypotheses too?
a.     No, but we must be vigilant on this point. I think it is entirely sensible to allow revisions to the Introduction to contextualise the literature based on the findings and to focus on most recent publications that emerged following IPA. After all, we want readers to be engaged as well as informed. However, we must also ensure that such changes are reasonable. Monitoring this aspect in particular would be one of the central reviewing criteria at Stage 2 (see above). In a revised Introduction, the authors would not be permitted to alter the rationale for the study, to state new hypotheses, or to alter the existing hypotheses. These could be flagged in distinct sections of the Introduction that are untouchable following IPA.

7.     What if the authors never submit a final manuscript because the results disagree with some desired outcome (such as supporting their preferred explanation)? How can you prevent publication bias on the part of the authors?
a.     We can’t stop authors censoring themselves. As noted above, however, if a study is withdrawn following IPA then this will be noted in a Retracted Registrations section of the journal. So there would at least be a public record of the withdrawal and some explanation for why it happened.
b.     Note also that if the authors have not submitted by their own stated deadline then the manuscript will be automatically withdrawn, considered retracted, and noted in the Retracted Registrations section. Extensions to the deadline are permissible following prior agreement with the action editor.

8.     What would stop authors getting IPA, then running many more subjects than proposed and selectively including only the ones that support their desired hypothesis?
a.     Nothing. But doing so is outright fraud, similar to the conduct of Dirk Smeesters [21]. No mechanism can fully guard against fraud, and regular submissions under the traditional publishing route are equally vulnerable to such misbehaviour. Note also that the proposed model requires submission of raw data, which will help protect against such eventualities. Selective exclusion of subjects to attain statistical significance can be detected using the statistical methods developed by Uri Simonsohn [22]. This alone will act as a significant deterrent to fraudsters.

9.     How can IPA be guaranteed without knowing the author’s interpretation of the findings?
a.     It isn’t. IPA ensures that the article cannot, and will not, be rejected based on the results themselves (with the exception of failing outcome-neutral reality checks, such as floor or ceiling effects, which prevent the stated hypotheses being appropriately tested). Manuscripts can still be rejected if the reviewers and editor believe the author’s interpretation is unreasonable given the data. And they will be rejected summarily if the authors change their experimental procedures in any way following IPA.

10.  What if the authors obtain IPA but then realise (after data collection commenced) that part of their proposed methods or analyses were incorrect or suboptimal?
a.     In the case of changes to the experimental procedures, the manuscript would have to be fully withdrawn but could be returned to Stage 1 for fresh registration review.
b.     In this case of changes to the analysis approach, depending on the nature of the proposed change, Stage 2 may be able to proceed following a phase of interim review and discussion with the editor and reviewers (if all agree that a different form of analysis is preferable). In such cases, the original proposed analysis would still be published in the final article but may not be reported, and the reasons for excluding it would be acknowledged.

11.  Cortex already has a long backlog of in-press articles. Adding yet another article format could make this problem worse.
a.     I propose that each article published as a Registered Report takes the place of a standard research report, thus requiring similar journal space to the current model.
b.     If registered reports become increasingly popular and well cited, the journal could gradually phase the standard report format out altogether, making registration reports the norm.

I hope I can convince you that Registration Reports would be a useful and valid initiative at Cortex. And even if not, I look forward to the ensuing discussion. Below is a list of key supporting references.




[1] Rosenthal R (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86: 638–641.

[2] Thornton A & Lee P (2000). Publication bias in meta-analysis: its causes and consequences
Journal of Clinical Epidemiology, 53: 207–216.

[3] Chase, LJ & Chase, RB (1976). A statistical power analysis of applied psychological research. Journal of Applied Psychology, 61: 234-237.

[4] Tressoldi, PE (2012). Replication unreliability in psychology: elusive phenomena or "elusive" statistical power? Frontiers in Psychology, 3: 218.

[5] Simmons JP, Nelson LD, and Simonsohn U. (2011). False-positive psychology: Undisclosed
flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22: 359-66.

[6] Wagenmakers, EJ (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14: 779804.

[7] Masicampo, EJ & Lalande, DR (in press). A peculiar prevalence of p values just below .05. Quarterly Journal of Experimental Psychology.

[8] Ioannidis JPA (2005). Why Most Published Research Findings Are False. PLoS Medicine 2(8): e124. doi:10.1371/journal.pmed.0020124

[9] John, L, Loewenstein, G, & Prelec, D (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23: 524-532 DOI: 10.1177/0956797611430953

[10] Smith MB (1956). Editorial. Journal of Abnormal & Social Psychology, 52:1-4.

[11] Cohen, J (1962). The statistical power of abnormal – social psychological research: A review. Journal of Abnormal & Social Psychology, 65, 145153.

[14] Fang, FC, Steen, RG & Casadevalld, A. (2012) Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences USA: 10.1073/pnas.1212247109

[15] Lane, DM & Dunlap, WP (1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical and Statistical Psychology, 31: 107112.

[16] Hedges LV & Vevea, JL (1996). Estimating effect size under publication bias: Small sample properties and robustness of a random effects selection model. Journal of Educational and Behavioral Statistics, 21: 299-332.

[17] Strube, MJ (2006). SNOOP: A program for demonstrating the consequences of premature and repeated null hypothesis testing. Behavior Research Methods, 38: 24-27. Software available from here: http://www.artsci.wustl.edu/~socpsy/Snoop.7z

[18] Nosek, B. A., Spies, J. R., & Motyl, M. (in press). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science. arxiv.org/pdf/1205.4251

[19] Ritchie SJ, Wiseman R, French CC (2012) Failing the Future: Three Unsuccessful Attempts to Replicate Bem’s ‘Retroactive Facilitation of Recall’ Effect. PLoS ONE 7(3): e33423. doi:10.1371/journal.pone.0033423

[20] Kruschke, JK (in press). Bayesian estimation supercedes the t test. Journal of Experimental Psychology: General. www.indiana.edu/~kruschke/BEST/BEST.pdf

60 comments:

  1. Great work! I'm very impressed by this and think it would be a big step in the right direction.

    Some random thoughts:

    1) Authors should be given the option of publishing their registered Protocol after the Registration Review. Either as an online mini-article or they could publish it themselves.

    This would help to guard against idea stealing, because it would clearly establish precedent - anyone could steal the idea, but it would be obvious that they'd done so, which would make it much less desirable.

    Also, this would help to guard against the possibility of misbehaviour by the second-stage reviewers. If these reviewers decided that they didn't like the data, and tried to block the paper for that reason, the authors would then be able to appeal to the court of public opinion, by pointing to their published (and therefore certified a priori) protocol and saying "Here's what we said we'd do and here's our data - form your own opinions". This is unlikely to happen often, but it would be a crucial check on the power of reviewers.

    2) I'm not entirely happy with allowing people to change their Introduction, even for bona fide reasons like new literature emerging. I think it would be a slippery slope. But I can see that without that, you might end up with some really irrelevant Introductions. So why not just allow authors to change the Introduction at will, but, also publish the originally approved one as a Supplement? That would allow readers to judge whether the Intro had been altered for 'naughty' purposes or not.

    3) Scientists will rightly object to any proposal that would cause an increase in bureaucracy. On its face, this proposal would "double the amount of peer review" which would be a hassle. I wonder if it could be coupled to some system for integrating peer review with the process of applying for a grant e.g. the journal could agree with Grant Body X that any protocol awarded money by X would be treated ipso facto as "reviewed" and would be fast-tracked through the Registration Review (but not the final review) with only minimal oversight?

    ReplyDelete
    Replies
    1. Thanks, I'm glad you like it!

      1) Good idea. Of course, one aim of this model is to prevent reviewers from rejecting manuscripts based on data, but its true that reviewers could do so under some other guise,so offering the option for separate publication of the protocol would provide additional insurance to the authors.

      2) I like this too. In fact, to make life easy, how about we just publish the original approved paper completed separately as a supplement. Then readers can compare every aspect between IPA and final version, including the Introduction. I do think some leeway is needed in terms of updating the literature or it will detract from readability, which will in turn reduce impact.

      3) This is a concern - and it does seem unavoidable that this model will increase load on reviewers (on the other hand, perhaps it would also lead, in the long run, to fewer publications per scientist and less salami slicing). The idea of integrating with funding agencies is appealing in principle but I suspect would be very difficult in practice: grant applications often don't allow space for the kind of methodological detail required for IPA under this proposed model.

      Delete
  2. I completely agree. In fact, my colleagues and I have proposed the very same idea in a paper ("An agenda for purely confirmatory research") that is in press for Perspectives on Psychological Science: http://www.ejwagenmakers.com/2012/ConfirmatoryResearchFTW_inpress.pdf

    Cheers,
    E.J.

    ReplyDelete
    Replies
    1. Thanks E.J. - will read this with interest. And good to know that we are all arriving at similar conclusions. Suggests we are truly on to something here.

      Delete
  3. This may sound a bit daft but, what is the point of this? Let's say your journal implements the idea. How do you see things going? Why would any scientist use this system given other options? Seems like the most obvious differences here are more work and less flexibility for scientists.

    ReplyDelete
    Replies
    1. It's a very good question. So long as the current system of post-results review was around, it would be easier in many ways for scientists to opt for that, and many would.

      Eventually, one hopes that there'd be a cultural change, or a policy shift on the part of the key players, such that pre-registration was seen as the gold standard. I think that'll take several years, maybe a generation.

      However early adopters would have advantages: in particular, researchers who suspect that their results will have a hard time getting published because they're 'inconvenient' ought to favor the new system as it would make it much harder for reviewers to block their papers on spurious grounds.

      e.g. people who are proposing to replicate 'classic' studies and expect to find no effect, amongst many other examples.

      Delete
    2. Agreed. Also, two other points I would make here:

      1) It may create slightly more work for reviewers but it isn't really any extra work for authors (unless deciding in advance what your experiment is counts as "extra"). It simply structures the work in a different way.

      2) Surely another huge incentive for scientists to participate in this model is that it removes, in one fell swoop, any pressure for scientists to produce "perfect results". One could obtain IPA and then simply press ahead with the study, knowing that publication is virtually guaranteed. Isn't this a major stress relief?

      Delete
  4. Great work Chris - A very sound proposal, which I would fully support. It will also make everyone think a lot harder in the planning stage of an experiment, rather than just hoping to find something interesting in the data.

    Also, given the close link to current grant formats, one could imagine a future where these too processes begin to merge, thus reducing net bureaucracy.

    You would want a mechanism for committing folks to stick with the target journal, so this route is not abused as a backup publication. I.e., register with journal X so you are guaranteed at minimum a publication with X, but if the results work out really nicely, why not go for Journal Y with a higher impact!?

    Hopefully a more general shift in culture could protect against this, especially if replicability becomes a higher badge of honour than impact factor. Also, the retraction notice might be sufficient disincentive.

    Anyway, great work!

    ReplyDelete
    Replies
    1. Thanks Mark. Yes, as Neuroskeptic said above, it would be nice to integrate with the mechanism for reviewing grants. Tough logistically but certainly ideal. I guess that would be a second-phase operation.

      That's a good point about gaming the system that I hadn't considered. We'd need rules about that kind of thing (similar to current rules against concurrent manuscript submissions).

      Delete
    2. I'm not too concerned about this: just get authors to sign a form upon approval of the protocol saying in effect "if you take your work elsewhere, we will publish the protocol and the fact that you originally submitted it here".

      You can't force people to play ball but that would be a good incentive, as it would look bad especially if they subsequently changed the protocol, and published 'good' results elsewhere!

      Delete
    3. Agreed, but I think it would have to be binding (like current rules banning publication of same data twice). Especially because if the temptation to go elsewhere was not because they breeched protocol, but because everything worked out super dandy and they wanted a shot at a higher journal. In this case, the publication of the protocol by Cortex would actually give the Nature/Science paper more credibility (though the authors could look bad, depending on how the community respond to this initiative).

      Delete
  5. My compliments for making this effort to change current publication policies.
    I fully agree that current practices by scientists brushing up their data creating biases as a result, by journals too eager to publish the latest 'hot' finding, by funding agencies and hiring committees over emphasizing quantity over quality, make published papers less reliable and trustworthy and can discredit science as a whole.

    I do however doubt whether scheme proposed here is the solution when other methods of publication are still available and more appealing to very busy scientists, and thus also to those evil scientists who want to publish their rubbish. I have a couple of reasons for my doubts.

    First, this proposal will create a pretty big layer of bureaucracy on top of the already ever increasing bureaucracy of institutional ethical review boards (incl monitoring, auditing, etc), grant proposals, and so on. Without a reliable and rigorous review process of your ‘registered articles’, this idea is not going to work, so it requires an elaborate system and lots of work from both scientists and reviewers. I personally very much like the idea of publishing the analysis pipeline in detail, but many authors I know (luckily not all) hardly know or understand their pipeline (a poor TA has to do/code it, or an underpaid smart PhD student) for a classical post-acquisition paper. Let alone before an experiment is conducted and it is unsure whether there is an easy score of ‘hot’ news within reach.

    Second, who is to review the high level of detail? Even as it is, it gets extremely hard to find good reviewers.

    Third, true discoveries are often coming by surprise, and are all but pre-planned. Think of Kekule’s Benzene ring, the apple falling on Isaac Newton’s head or the all-changing bathing session by Archimedes leading to his Eureka. This proposal would add a rigor and inflexibility to science that can and will kill much needed creativity.

    Fourth, I have serious doubts about power analyses. You somehow assume you have an a-priori idea about the expected effect size and the noise in your data. While the latter might be known from some null-data, the former cannot be really known without doing the experiment (especially when it’s a novel idea). So when you already know your result, why do the experiment in the first place?

    Therefore, other than making the life of scientists even harder by bloating the publication process, I’d rather recommend a) the submission of the complete research data and b) serious space in journals for replications or ‘failures to replicate’, and credits for scientists who publish those. That might greatly reduce the temptation to publish dodgy results in the first place, which is the whole purpose of this debate. And leave the rest to the ‘market’, in the long run frauds are discovered, especially when a field finally gets applied in the real world (assumed they ever will be). The character flaws leading to the frauds are all too human and will never be fully rooted out, unfortunately.

    To me it seems that this concern is currently very alive among psychologists and more generally (neuroimaging) social sicentists. Stapel and Smeesters as recent examples belong to these domains (except the imaging part perhaps), that might explain it. I do by no means want to imply that frauds do not exist in other fields, but there simply seems much less concern in older more established disciplines. This might be due to other working habits and much clearer areas of application requiring a higher degree of scrutiny automatically. Think of material science, pharmacology, physics, etc.
    As laudable, noteworthy and interesting Chris’ proposal may be, I do not think it will be accepted by the majority of practicing scientists nor will it solve the issue that he is rightfully addressing. I as a publishing scientist at least hope I will not have to go through this process.
    Thank you for inspiring this very important debate.

    ReplyDelete
    Replies
    1. Good points, but I disagree. These concerns are by no means limited to psychology. They're also serious concerns in: genetics of disease; epidemiology; animal models for drug development / pharmacology; and of course clinical trials where the concerns were recognized early and we already have a pre-registration system (albeit based on a centralized registry clinicaltrials.gov rather than a Journal-based system, but the intent is the same.)

      As to the point:

      "Therefore, other than making the life of scientists even harder by bloating the publication process, I’d rather recommend a) the submission of the complete research data and b) serious space in journals for replications or ‘failures to replicate’, and credits for scientists who publish those. That might greatly reduce the temptation to publish dodgy results in the first place, which is the whole purpose of this debate."

      I just don't see this working. It's already been tried, repeatedly - there have been endless Journals of Null Results and so forth - and it hasn't worked. Furthermore I have real concerns about the possible consequences if people are allowed to use 'questionable practices' in the pursuit of replications (or failures to replicate) - you can easily see how the same tricks (publication bias, p-value fishing) could be used to conjure up a replication or a null replication, just as they're today used to create results in the first place.

      In the long run, you're right, it would probably sort itself out by the 'market' but I would say that the long run might be decades in any particular case, and that could mean huge amounts of wasted time, effort and (perhaps most important) goodwill.

      Delete
    2. Hello Bas,
      Thanks for commenting. These are important criticisms that I’ve heard from other scientists too, and I’m going to respond to each of them in turn.

      First, this proposal will create a pretty big layer of bureaucracy on top of the already ever increasing bureaucracy of institutional ethical review boards (incl monitoring, auditing, etc), grant proposals, and so on. Without a reliable and rigorous review process of your ‘registered articles’, this idea is not going to work, so it requires an elaborate system and lots of work from both scientists and reviewers.

      I think this overestmates the bureaucratic load. All of the mechanisms needed to institute the system already exist, including rigorous peer review. At it's core, all this proposal does is split the review process into two phases, one before data collection and one afterward. So the major extra work for reviewers would be careful consideration of the methods and analysis pipeline, and good reviewers do this already in reviewing papers and grants.

      I agree that there would be an increase in reviewer load by virtue of the fact that each phase of manuscript consideration could involve separate revision stage. But the final product would be a much higher quality publication that has a much greater replicability, not only because of the increased methodological rigour but also because the methods would be expounded in a level of detail that permits genuine replication.

      For me, being a careful reviewer seems a very small price to pay for better science and elimination of the current incentive structure to produce 'good results'.

      I personally very much like the idea of publishing the analysis pipeline in detail, but many authors I know (luckily not all) hardly know or understand their pipeline (a poor TA has to do/code it, or an underpaid smart PhD student) for a classical post-acquisition paper. Let alone before an experiment is conducted and it is unsure whether there is an easy score of ‘hot’ news within reach.

      The system I’m proposing would be immune to this kind of cowboy science (isn't that a good thing?) and would reward the scientists who plan properly. Keep in mind that I'm not suggesting to replace the existing system with the new one, only to add an option for authors which doesn't currently exist.

      Second, who is to review the high level of detail? Even as it is, it gets extremely hard to find good reviewers.

      Yes - this system needs good reviewers, as all systems do. But, honestly, if the reason we reject a rigorous system of scientific publication is because we don’t trust ourselves to review it properly, then the entire field is already in crisis. I don't think we've reached that point yet.

      (part 1/3 cont')

      Delete
    3. Response to Bas (part 2/3)

      Third, true discoveries are often coming by surprise, and are all but pre-planned. Think of Kekule’s Benzene ring, the apple falling on Isaac Newton’s head or the all-changing bathing session by Archimedes leading to his Eureka. This proposal would add a rigor and inflexibility to science that can and will kill much needed creativity.

      A fair point but it is important to clarify that there is nothing in this mechanism to prevent serendipitous findings being published in registration reports (see FAQ #5 above). As described, full manuscript submissions could include a section “Post hoc analyses” where authors can report unexpected findings or unplanned analyses. They simply won’t be able to pretend, as many do now, that unexpected findings were expected, or that post-hoc analyses were planned a priori. Doesn't this fully address such concerns?

      Fourth, I have serious doubts about power analyses. You somehow assume you have an a-priori idea about the expected effect size and the noise in your data. While the latter might be known from some null-data, the former cannot be really known without doing the experiment (especially when it’s a novel idea). So when you already know your result, why do the experiment in the first place?

      I disagree. The great majority of experiments in psychology and cognitive neuroscience are increments of existing paradigms (or repetitions of existing tasks in new contexts), for which there is a wealth of information about effect sizes in the existing literature. Power analysis has been sadly neglected in our field compared to other areas, and to our great detriment.

      I could list dozens of examples from my own corner of study: behavioural tasks that measure attention, awareness, cognitive control, combined with TMS or fMRI. There are very few truly novel paradigms, and even where they arise, it is straightforward to estimate the likely effect size of the BOLD response or behavioural measures from similar tasks. Authors simply aren’t accustomed to formally assessing power. We have become quite lazy at this and just run experiments based on heuristics of how many subjects we think we’ll need based on conventions set by existing studies (in itself this decision process is, of course, a crude power analysis).

      Even so, let’s consider the (rare) case of a truly novel task or question in which the effect size is impossible to estimate. The proposed mechanism is protected against this (see Section 2a above):

      In the case of very uncertain effect sizes, a variable sample size and interim data analysis would be permissible but with inspection points stated in advance, appropriate Type I error correction for ‘peeking’ employed [17], and a final stopping rule for data collection outlined.

      Another option would be to run a pilot study to estimate the effect size, including this as pilot data during registration review, and basing the power analysis on the estimated effect size accordingly.

      What we shouldn’t be doing is using rule-of-thumb estimates of required sample size, completing the data collection and then simply adding subjects to an experiment until it achieves statistical significance -- at least without correcting for peeking. But this is what many researchers currently do
      (see John, L, Loewenstein, G, & Prelec, D (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23: 524-532 DOI: 10.1177/0956797611430953)

      (part 2/3 cont')

      Delete
    4. Response to Bas (part 3/3)

      Therefore, other than making the life of scientists even harder by bloating the publication process, I’d rather recommend a) the submission of the complete research data and b) serious space in journals for replications or ‘failures to replicate’, and credits for scientists who publish those. That might greatly reduce the temptation to publish dodgy results in the first place, which is the whole purpose of this debate.

      I agree that mandatory submission of research data, across the board, would be excellent, but it isn’t sufficient on its own. Poor research practices, such as undisclosed analytic flexibility, can easily lead to ‘successful’ replication of false positives.

      And leave the rest to the ‘market’, in the long run frauds are discovered, especially when a field finally gets applied in the real world (assumed they ever will be). The character flaws leading to the frauds are all too human and will never be fully rooted out, unfortunately.

      First, let me reiterate that this proposal isn’t about tackling fraud. No system, and especially not the one we currently have, can prevent deliberate fraud. This is about cleaning up the broad grey area of dubious but legal practices by eliminating the incentive for authors to do them in the first place. This proposal frees good scientists from the pressure to achieve ‘good results’, and it would make it difficult for less scrupulous scientists to publish in the Registration Reports format. We would essentially be creating a clean part of town rather than tidying up the whole city.

      To me it seems that this concern is currently very alive among psychologists and more generally (neuroimaging) social sicentists. Stapel and Smeesters as recent examples belong to these domains (except the imaging part perhaps), that might explain it. I do by no means want to imply that frauds do not exist in other fields, but there simply seems much less concern in older more established disciplines. This might be due to other working habits and much clearer areas of application requiring a higher degree of scrutiny automatically. Think of material science, pharmacology, physics, etc.

      There is no evidence, that I’m aware of, that psychology is more prone to fraud that other fields, like anesthesiology (!) for instance. But even so, as I say, this proposal isn’t intended to provide a cure for academic fraud. It’s about instituting the changes in work practices that we need to produce better, more replicable psychology and cognitive neuroscience. You mentioned physics, which I think is a great example of a field that is years, perhaps decades, ahead of us in term of publishing practices. To get anywhere near to this we need to make serious cultural changes.

      Thanks very much for your comment! If you happen to stop by the blog again, I’d be interested your follow-up thoughts.

      Delete
    5. I agree. I think it is a good and important point that replications are by no means immune to bad practices. With all the current tricks available, and at a low statistical threshold, it should be quite easy to replicate a false positive if you really want to.

      It is more important to improve replicability in the first place. It is unrealistic to hope that the current system can be self-correcting, even if we did more to champion replications.

      Firstly, there are simply too many questionable results out there to replicate. Where do we start? And when do we decide that a result has in fact been sufficiently replicated? We could reconsider the process as accumulation of evidence, rather than categorical replication, by meta-analyses that consider effect sizes and confidence intervals of each new study to update the current evidence for a give hypothesis (of course, taking into account effective peeking between studies). Then we are effectively imposing a stricter statistical threshold (but depending on how the variance is treated, it could have the advantage of generalising to the population of studies)

      Secondly, and perhaps more importantly, accumulation (or replication) is only as good as the evidence available. While all the current factors still exist to bias published results, the body of evidence is similarly biased. If we demand that every result is replicated, we would find a lot of replicated false positives (assuming the same biases exist). Of course, replication could be used as a vehicle of independent verification (i.e., make sure the effect was not due to bad practices), but this seems a clumsy method of policing science (esp. as it might be hard to identify conflict of interests between groups - you replicate mine if I replicate yours?!). Why not just tackle these issues at source?

      Pre-registration of all study details, including hypotheses and proposed statistical methods will certainly help reduce the fundamental problem. This proposed model is most clearly designed for data collection for hypothesis testing. Increasingly, I believe it is important (financially and ethically) to make more out of the masses of data already collected (esp. neuroimaging and neurophysiology), but there is no reason why a similar model can’t be adopted. If there were a repository of data, researchers could request access to full data sets based on the proposed analyses and hypothesis tests. Pilot sample data sets could be supplied without registration to validate the proposed analyses for proof-of-principle.

      Delete
    6. Dear all,

      a very interesting debate. I still have second thoughts about the registration-data acquisition-publication system, as it is a bit a thing you would imagine to see working in a scientific utopia (an ideal world, so to say). And our world is far from ideal as most will agree, a fact we simply have to work with in such a way that we get most interesting science and innovation out of it. This system simply assumes scientists will opt for this alternative (which I highly doubt), that enough reviewers can be found to actually scrutinize the process multiple times and in much more depth than before (which I doubt even more), and that scientists stick to the system during the entire timeline.
      And second, I did understand (and read in the FAQ) the proposed system would allow serendipitous findings in a section of the manuscript. But who is going to read a real interesting serendipitous finding when it is buried in a large report? And when it comes down to it, does it really matter whether this finding is framed as something the authors hypothesized a priori? Statistically there is of course the p-value inflation when testing multiple hypotheses, but even the proposed system has that risk as in the registration phase people will use pilot data and again not always disclose the number of tested hypotheses. Requiring the original data and/or asking other scientists to replicate it would solve parts of that problem. I've heard that in genetics it is already getting hard to publish something when you didn't replicate it yourself (but correct me when I'm wrong, I'm not into genetics).
      So I'd rather plea for the convential format where an entire study is submitted at once (with all data!), but with a drastically improved and more rigorous peer review system (paid per hour for example). Alternatively, a post-publication rating and/or commenting system with in addition a citation score or the like could be used. The nonsense will get filtered out eventually that way by the review, and/or the (low) citation score or negative ratings/comments.

      But no doubt current flawed practices need to be tackled somehow. It would be interesting to try out the system Chris proposed and see whether it will get accepted in the 'marketplace'. Otherwise we will never know. I hope I am wrong with my skepticism and concerns, but felt an urge to express it here.

      Delete
    7. Hi Bas. Thanks for dropping back in, your comments are well made and it's important that these concerns are voiced.

      This system simply assumes scientists will opt for this alternative (which I highly doubt), that enough reviewers can be found to actually scrutinize the process multiple times and in much more depth than before (which I doubt even more), and that scientists stick to the system during the entire timeline.

      This is possible, but it is unclear at this stage what the response would be. There are certainly a number of incentives for scientists to submit through this scheme: the most obvious being that experiments can be run purely objectively, knowing that a publication is guaranteed. We don't know wider opinion (which is why I'm blogging this), and I can only speak for myself and my colleagues who like the idea - but I would certainly submit to this format of article. Most of the experiments in my lab are hypothesis-driven and planned well in advance, so I would have nothing to lose and everything to gain by securing publication in advance of data acquisition. No more fear of null effects!

      And when it comes down to it, does it really matter whether this finding is framed as something the authors hypothesized a priori? Statistically there is of course the p-value inflation when testing multiple hypotheses, but even the proposed system has that risk as in the registration phase people will use pilot data and again not always disclose the number of tested hypotheses.

      First - yes I think it does matter whether findings are expected or serendipitous, precisely because we're not making unequivocal discoveries like dinosaur skeletons or stars -- we're dealing with noisy data sets in which data mining can produce apparent serendipity out of pure noise. Such false positives are potentially a huge waste of resources by creating a zeitgeist and blind alley down which other researchers follow.

      Second - yes, pilot data can be used in the registration phase and this of course could be the product of data mining. But it would be very quickly shown to be unreliable if the subsequent highly controlled (and registered) experiments failed to validate it in some way (for instance, by replicating the effects). The main experiments would thus serve as independent validation of whatever the authors wish to claim from the pilot data. This, in turn, would also incentivise careful and appropriate analysis of pilot data; after all, who wants to face a situation where their own results fail to replicate!

      So I'd rather plea for the convential format where an entire study is submitted at once (with all data!), but with a drastically improved and more rigorous peer review system (paid per hour for example).

      I agree about needing a more rigorous peer review system, and I would argue that this is precisely what the proposed mechanism will provide. Also, Molly's point below is well made: that in the long run, this proposed mechanism could actually reduce rather than increase the burden on reviewers. And there are ways to incentivise participation as a reviewer in this process, e.g. by providing reviewers with the option of being identified on the final publication as contributing to the experimental design.

      Alternatively, a post-publication rating and/or commenting system with in addition a citation score or the like could be used. The nonsense will get filtered out eventually that way by the review, and/or the (low) citation score or negative ratings/comments.

      Yes, I also think a lot of merit in the post-publication idea!

      As you say, there are many uncertainties in this proposed mechanism, in terms of uptake by authors and enthusiasm from readers. I guess we won't know for sure until we try something like this.

      Thanks again for your comments!

      Delete
  6. I like this idea, but I do think it underemphasizes some important points regarding what makes a good paper. For instance, how well is it written? How well does it represent the previous literature? How interesting is the theoretical conclusions that are drawn based on the published results (and the current literature). If a paper is accepted based on the proposed methods, then these other elements of a paper don't seem to matter as much anymore.

    I imagine it would be quite easy to pass the test of designing a study that makes sense. How one thinks about a set of data is often what makes a paper important or not.

    Jeff Bowers

    ReplyDelete
    Replies
    1. Thanks for commenting, Jeff.

      For instance, how well is it written? How well does it represent the previous literature?

      Good points, and these could be added review criteria. Implicit in this proposal is the requirement that manuscripts are clearly written.

      How interesting is the theoretical conclusions that are drawn based on the published results (and the current literature).

      This is more problematic. I think accepting or rejecting manuscripts based on the interest value of the results is dangerous for science, and is precisely why we suffer from publication bias. Certainly, the interest value of the research question is important to consider (and is one of the review criteria at the registration stage).

      But the purpose of this manuscript format is to insulate science against arbitrary subjective judgements of which results reviewers/editors find "interesting".

      I imagine it would be quite easy to pass the test of designing a study that makes sense. How one thinks about a set of data is often what makes a paper important or not.

      I should clarify that manuscripts at the full review stage could still be rejected if the interpretation based on the data is unsound or superficial. Do you think an additional criteria should be added to this stage?

      Delete
  7. Brilliant work- thanks for taking the time to flesh out these ideas.

    Several have raised concerns about this system creating an extra burden on reviewers- but I'm not sure I entirely agree. While it's true that reviews for this format may take a bit more time than traditional reviews, the value of this system is that a given manuscript will (ideally) only go through a single review process- so in terms of collective hours spent reviewing papers, your proposal may actually reduce the burden on the scientific community.

    Consider the process we have now. Papers often face a string of rejections before getting published (and often rejections are based on data, not methods- e.g., null findings). A given paper may go through the review process at 3 or 4 different journals before getting published- so anywhere from 6 to 12 (or more) reviewers may take the time to review the paper. This is extremely inefficient- both for reviewers, and for authors, who must spend a substantial amount of time re-formatting the manuscript for different journals. None of this is time well spent. In contrast, the extra time involved for authors and reviewers in your proposed system *is* time well spent- the steps you outline guard against all sorts of problems that are rife in the scientific literature.

    Finally, a question: presumably there would be leeway for revise-and-resubmit at the initial stage? Perhaps the most valuable contribution of this system would be its ability to prevent the wasting of resources on poorly designed studies. Too often I'm sent to review papers with fatal design flaws that render the data uninterpretable. Your proposed system would allow reviewers to point out such flaws *before* resources have been invested in data collection. Reviewers could suggest ways to improve the design, and papers could be IPA if they incorporate these suggestions. It might also be nice to give reviewers the option to be named when the paper is published, so they can be recognized for their contributions to the experimental design.

    ReplyDelete
    Replies
    1. Thanks Molly - that is a very good point that the reviewer load should actually be less under this system; and the more scientists who choose to adopt this system over the standard model, the lower the load will get.

      Yes - this system would very much encourage revise-and-resubmit at the registration stage, for all the reasons you outline. Giving the reviewer the opportunity to be named when the paper is published is also a great idea and an additional positive incentive for reviewers to participate keenly.

      Delete
  8. Regarding the comparison with physics and the question of why physics seems immune to these problems and therefore doesn't need this kind of reform - which has been raised in this thread and elsewhere - two points:

    1. The nature of physics (and astronomy) means most important work is de facto 'registered' because so many people are involved in planning, building the equipment, etc.

    Indeed it's usually public knowledge not only what the experiments will be, but what any given result will mean: all that remains when the experiment goes live is to actually gather the numbers.

    That's a great system!

    Physics has it by default, because of the nature of most physical experiments. Biology is not so lucky, but through publishing reform we might be able to create it.

    ReplyDelete
  9. [1/2]

    Interesting post, Chris, and a nice debate! I think you identify some of the main problems with the current publishing system. But I must say I remain unconvinced on many points. Most of this has already been discussed, such as increased bureaucracy, reduced flexibility, greater reviewer workload, etc. Some of the answers you give are fairly compelling but I'm unsure that there would be sufficient incentive for people to submit to this format in the present environment.

    Molly above makes a great point that reviewer workload would be reduced under this system because there would only be one paper submission. However, crucially this is only true as long as the entire system is changed to the pre-registered approach. You are modestly proposing introducing this as an option in one journal to begin with. While this is much more realistic than a major overhaul of the current system it also means that this is not realistically improving matters. If the pre-registered paper gets rejected at the post-results stage, what is going to stop people from submitting it elsewhere (perhaps with "polished" results)? You say that there will be a record for the retracted registration but how many editors/reviewers at a different journal will know about that? Even if most people were aware of this new format at Cortex, would you really expect them to check for every single submitted manuscript if it had been submitted there previously?

    There has also been much talk about serendipitous or revising your methods. I agree that there are researcher degrees of freedom that your approach would help to control but on the other hand I fear there would be great loss of flexibility in carrying out any research. Let's say I've been developing some new analytical procedure over the past years. It is not unreasonable that we might have started collecting some data trying to address actual experimental questions using these procedures, even though they are still under development (and for many procedures development is a continuous process). Obviously, there is something wrong with trying lots of different parameters and only reporting the single one that produces an interesting result. But surely there must be some wiggle room to allow for improvement. You emphasise the importance of well-planned experiments but conducting scientific research is often a learning process. It is not unusual that you discover something or have some form of insight about your procedure that truly enhances the method. The process you describe would stifle this sort of creativity and development.

    ReplyDelete
    Replies
    1. Hi Sam, thanks for these valuable comments. Like Bas above, you make excellent points. I’ve replied below to each criticism.

      Some of the answers you give are fairly compelling but I'm unsure that there would be sufficient incentive for people to submit to this format in the present environment.

      This may well be true (it’s impossible to know for sure) but I can’t get away from the truism that to institute changes in the environment we have to change the environment. So the only path I can see to creating the necessary incentive structure is to trial a registration model like this and see what happens.

      If the pre-registered paper gets rejected at the post-results stage, what is going to stop people from submitting it elsewhere (perhaps with "polished" results)?

      Nothing. But I could just as easily level that criticism at the current publishing system, except that the proposed mechanism has a tremendous advantage: the rate of rejections at the post-results stage will be minimal compared to standard journals because the manuscript can’t be rejected based on the results.

      Rejections at Stage 2 could only realistically arise if the authors insisted on an interpretation of the data that the reviewers felt was unsupported by the evidence (and my preference would be to keep Introductions and Discussions light under this model), or if they changed their experimental procedure, or if they changed their analysis without justification and interim consultation (see FAQ #10 above).

      You say that there will be a record for the retracted registration but how many editors/reviewers at a different journal will know about that? Even if most people were aware of this new format at Cortex, would you really expect them to check for every single submitted manuscript if it had been submitted there previously?

      For anyone who is interested to know, a simple pubmed search would do the trick, because registered retractions would be published. Over time, as this model becomes picked up by other journals, you could envisage journal cover letters requiring a standard statement of whether the article was previously retracted from any registration formats, and why.

      But in the meantime, I don’t see this is a stumbling block because in the rare instances where manuscripts are retracted, this needn’t have a strong bearing on consideration by another journal. The registration retraction is simply there so that in the case of extraordinary (or questionable) results, reviewers and readers would be able to check whether it was withdrawn and why. Also, Neuroskeptic argues above that Cortex could publish the protocol anyway even if the article is retracted. There is merit in that idea.

      (cont')

      Delete
    2. Let's say I've been developing some new analytical procedure over the past years. It is not unreasonable that we might have started collecting some data trying to address actual experimental questions using these procedures, even though they are still under development (and for many procedures development is a continuous process).… surely there must be some wiggle room to allow for improvement.

      Indeed, and there already is wiggle room in the proposed system for precisely this purpose. See FAQ #10 above (and 10b, in particular).

      It is not unusual that you discover something or have some form of insight about your procedure that truly enhances the method. The process you describe would stifle this sort of creativity and development.

      I agree 100% with your first point, and disagree 100% with the second! All this proposal requires is that researchers commit to a proposed experimental procedure before running the experiment (common sense, surely). And it encourages authors to consider their analyses in advance of running an experiment (also common sense, I would argue), while still allowing the flexibility to (a) report post-hoc analyses or serendipitous findings, and (b) to develop the analysis approach in consultation with the reviewers throughout the course of the process (as noted above in FAQ 10B).

      The only difference between this and a standard article is that this evolution of ideas would be transparently documented, allaying all concerns about analytic cherry picking. That is, in such cases where authors want to change an analysis, the final version of the manuscript would note the original proposed analysis in the Methods and that the authors developed an improved and appropriately justified alternative in consultation with the reviewers.

      All that said, this format of article would be most suitable to studies in which the analysis approach is well established rather than a work-in-progress. It would be particularly well suited for replication studies, or novel hypothesis-driven approaches with clear a priori analytic pipelines.

      Delete
  10. [2/2]

    Don't get me wrong, I can see the benefits of pre-registering experimental protocols. But I am wondering if the following wouldn't be more realistic and efficient: you create a new platform on which you can register your experimental design prior to carrying out the study. This is just like your pre-registration with an introduction and a methods section. But crucially there is no review and it is not bound to any journal. It is simply a registered protocol. Then, after you collected and analysed the data you submit a manuscript at a journal. During this process you can link to the pre-registered protocol to show what you proposed and reviewers can assess in how far your protocol has changed from the original. This could easily become part of the general peer review process. While this does not allow for scrutinising the design before the experiment is done (definitely a nice point about your proposal), it does not increase reviewer workload to the same degree as your proposal. It also is independent of being a particular format in one particular journal. Rather it is an optional step authors could take within the conventional system. Providing a pre-registered protocol and showing only minor or inconsequential changes in your final submission will be a credit to the authors and thus gives people an incentive for doing it.

    Note that this idea still suffers from one problem that you also haven't addressed in your proposal. It would be fine to expect people to have time stamped data to prove that they really collected the data after submitting the proposal - but who is going to verify this? You would have to come up with a centralised, standardised system for keeping track of data time stamps. I am not sure how realistic that is.

    ReplyDelete
    Replies
    1. But I am wondering if the following wouldn't be more realistic and efficient: you create a new platform on which you can register your experimental design prior to carrying out the study. This is just like your pre-registration with an introduction and a methods section. But crucially there is no review and it is not bound to any journal. It is simply a registered protocol. Then, after you collected and analysed the data you submit a manuscript at a journal. During this process you can link to the pre-registered protocol to show what you proposed and reviewers can assess in how far your protocol has changed from the original.

      I think this would certainly be a step in the right direction (I think I linked to BMC Protocols somewhere above) but it is also very easily gamed. Let’s say I want to claim the kudos of having my protocol pre-registered while also preserving the post-hoc flexibility to show statistically significant effects that will get me into a high-impact journal. All I need do is anticipate the points in my design where flexibility is needed, then make my protocol sufficiently vague on those points. This would be very easy to finesse, especially if the protocol isn’t peer reviewed. Then I will have afforded myself the necessary wiggle room to behave in exactly the way I would have otherwise, but – disturbingly – I can now also claim that my method is pre-registered and reliable.

      It would be fine to expect people to have time stamped data to prove that they really collected the data after submitting the proposal - but who is going to verify this? You would have to come up with a centralised, standardised system for keeping track of data time stamps. I am not sure how realistic that is.

      I don’t think such a centralised system is necessary. Under the proposed model, along with the data files, a simple lab log is uploaded indicating when data collection started and finished, and all authors must certify to this fact. Ultimately, faking time stamps on data files or falsifying a lab log would be considered gross misconduct and a probable career-ending move not only for the fraudster but possibly also for his/her co-authors who certified that the data was collected after IPA.

      Existing evidence suggests that genuine fraud like this is very rare – what’s far more common is the grey area of dubious but legal practices that scientists engage in to produce the significant results they need to publish ‘good papers’ (see reference 9 above).

      Thanks for the great discussion. If you happen to drop back in, I'd be interested in any follow-up thoughts you might have.

      Delete
    2. For me the key here is preregistration. This can be done for instance via the Open Science Framework website. Yes, people can still cheat with less formal methods for preregistration (changing time stamps etc.), but at least it will then be very clear that this is in fact research fraud. The main purpose of preregistration is to prevent hindsight bias and confirmation bias. Actually, these biases are so strong that I believe preregistration to be the only cure that is adequate.

      Delete
    3. Thanks for the detailed answers, Chris. I'll be heading to SfN soon and don't want to get drawn in a long discussion before I get back so I will only reply to the major points. But it might be useful to you t hear people's thoughts so here goes:

      I admit a lot of what you say makes sense. I certainly like the fact that this approach would focus on "good ideas, well executed" rather than "good data". But I still think that the proposal would be too inflexible to cope with the organic way science works in the real world. At the very least my suggestion would be to allow minor modification by default. Since the original protocol is published, any changes will be visible. Stage 2 reviewers will explicitly assess whether the changes to the study are justifiable and wherever applicable can request that the results from the original protocol must be reported. Either way, authors will be instructed to declare any differences from their original proposal in the final manuscript (and the original will always be publically available via a link).

      I don't disagree with you (and EJ) that wiggle room should be minimised but the kind of wiggle room this is intended to stop is the researchers degrees of freedom that can polish a turd (for lack of better phrase) or which report exploratory studies as comfirmatory. Now EJ will no doubt tell me that there are no confirmatory studies out there... ;). I am very much in favour of any changes that will make the distinction between exploration and hypothesis-driven work clearer. However, I disagree that every confirmatory experiment must be carried out via a railroad-track pipeline that allows no modification.

      EJ, you frequently point out how difficult it was to carry out the Bem replication using a fixed protocol. The reason for this is not that it's extremely difficult to do hypothesis-driven research but that in the real world this isn't how things work. I think when your aim is to make a perfect replication I agree you have to follow the original protocol by the book. However, we're not robots but can flexibly make decisions and we should use that trait. The trouble starts not when you digress from your pre-planned procedure but when you adapt your hypotheses to fit the data (within the same study, it's perfectly correct to do so afterwards) or when you adapt your data to fit your hypothesis.

      To be continued..

      Delete
    4. I can see how preregistration would help stop these things but I believe you need to retain flexibility. I agree there is merit in well-planned experiments but even the best among us won't always make perfect plans as that requires (ironically enough) precognition. Honouring well-planned experiments also biases the system towards established procedures (as you say, the format would fit best to studies using a clear prediction with an established pipeline). I am sure there are many of those around but the most exciting science is that which pushes the envelope, applies new approaches, etc. Don't get me wrong, incremental work is essential for scientific progress (all the open question from sensationalist findings have to be addressed after all). But incremental work is also by definition safer - obviously you can still cheat there to support/refute your theory of choice but my guess is it's a lot less problematic. Pro and con evidence will accumulate and the record self-corrects eventually. More crucial, to my mind, is that those kind of studies people always talk about when there are discussions like these are not your majority of incremental experiments but those "novel, ground-breaking" ones in high-impact papers or sensational ones like finding evidence for precognition.

      Anyway, I have to go and so won't get back into question of editorial and reviewer workload at this point. I think these things really are best addressed empirically by a field test. However, what I did want to mention is this: I understand why you want to start modestly, requesting this change in only one journal for a single format. None of our predictions are worth much until we try the idea. But I think you should perhaps aim higher to increase the chances of success. Rather than introducing a single format, wouldn't it be better to have pre-registration as an option for most formats in the journal? I can see that having a special format for this would be more contained but I think people would be more likely to pick up this chance if it were a general publishing option offered by the journal.

      Finally, thanks again for the lively discussion here. I do appreciate and admire the initiative you are clearly taking with this. I'm curious to hear what the editorial board and publisher will have to say in response to your letter!

      Delete
    5. Rather than introducing a single format, wouldn't it be better to have pre-registration as an option for most formats in the journal?

      I agree, and I considered pitching this, but felt it would be seen as too much too soon (as far as I know, no journals have adopted a model like this before).

      Enjoy SfN!

      Delete
  11. I do agree with you, Chris, that it is important to eliminate wiggle room.

    ReplyDelete
  12. Hi - thanks for setting out this idea so clearly. I pitched this to the editorial board of the journal I'm an editor for this morning, and it looks like there might be a real chance of this getting underway. I'll let you know how things go!

    ReplyDelete
    Replies
    1. That's exciting news, good luck with it! Please do post any updates here on the blog and I'll tweet them.

      Delete
  13. (Disclaimer: I did not read the whole post)
    This proposal correctly identifies some of the shortcoming of the current cycle of acceptance. However, I do not think it materially solves them. Specifically, the problem of accept/reject being based on the style and compelling nature of the write-up has been pushed from the final write-up to the proposal. In the proposed cycle, the most 'charismatic' authors would still win, but this time because they are able to convince the review board at the proposal stage. This might even amplify the problem since now these authors don't have to rely on completed data collection, but are free to spin their tales any way they want. This is similar to the problem of selecting a politician based on promises instead of past performance (though admittedly, being held to those promises is where the analogy ends). It would seem like the pre-approval phase already happens in the form of grant proposals, though again, grants don't follow up to see that the proposal has been strictly adhered to.
    There is merit and peril in requiring strict adherence to a pre-set plan. I doubt many published authors would say that their results and conclusions came from a course that was known at the onset.
    The biggest win I can see in this sort of cycle you propose is that the community would see more negative results getting published. Too often, a good scientific method is followed and then the results don't get published because they are 'negative', and the community is doomed to repeat history for lack of knowing it. In that respect, I support a plan to pre-accept publications regardless of how 'compelling' the results are, mainly because I find 'negative' results much more compelling than the average reviewer.

    ReplyDelete
    Replies
    1. Hi, thanks for commenting.

      This might even amplify the problem since now these authors don't have to rely on completed data collection, but are free to spin their tales any way they want.

      I disagree. Authors would have to adhere strictly to the experimental procedures, and the manuscript could still be rejected at the full review stage if the interpretation was deemed unsupported by the evidence - see FAQ #9 above. So, under the proposed model, authors would have much less carte blanche to introduce spin than under the traditional publishing model.

      Delete
  14. Hey Chris - I wanted to let you know I have shared this article with FORCE11.org, and want to invite you to join this group - a community that shares the very same interests.

    ReplyDelete
  15. Chris - I cannot edit my previous comment, but you can find the post here: http://force11.org/node/4222

    ReplyDelete
  16. “Moreover, authors are welcome to propose superior alternatives to conventional null hypothesis testing, such as Bayesian approaches [20].”

    Kruschke is too kind. As has been pointed out again and again over the last several decades, NHST/orthodox statistics is an ill-conceived, “fundamentally irrational” and gratuitously arcane 'system' of pseudo-inference. It's unfit for purpose and it's appalling that it's still being allowed to damage science and mislead/confuse scientists and others. The suggestion to merely welcome authors who propose to do inference, and consequently science, properly is insufficient (and deeply ironic).

    ReplyDelete
  17. Hi Chris,

    Somehow I've managed to miss this until now - apologies. It seems like a really long-overdue, radical and extremely important initiative, and I'm totally supportive of the main thrust of it - fantastic work.

    I would like to take issue with one aspect though (and essentially echo Bas's comments above) - the emphasis on prospective power analyses. I understand why you've built it in to this proposal, but what you say above... "it is straightforward to estimate the likely effect size of the BOLD response" ...is just not true. Your ability to detect an effect in a BOLD fMRI experiment is dependent on many factors that aren't taken into account in a standard power analysis. I'm thinking of the statistical efficiency of your design, the scanner parameters (TR, TE, flip-angle, number of slices, acquisition matrix, phase-encoding direction etc.) used, and the hardware (magnet strength, 12 vs 24 vs 32 channel head-coils, even how good the scanner shim is on specific occasions). Inter-subject variability can be (more-or-less) estimated from previous experiments, however since the timing and amplitude of the HRF appears to vary (systematically?) across brain regions even within the same subject, it doesn't make much sense to estimate a single value for variance, or indeed power. And this is to say nothing of all the different variables that could be introduced during analysis (pre-processing, inclusion of noise-modelling regressors in the model, etc.). I'm firmly of the opinion that formal a priori power analyses are not helpful in neuroimaging (post-hoc power analyses are of course trivial, and also unhelpful). It's essentially plucking numbers out of thin air, and the only purpose it serves is to reassure ethics committees and grant reviewers that the issue has been considered.

    An alternative, empirical approach is Murphy and Garavan (2004):
    http://www.sciencedirect.com/science/article/pii/S1053811904000977 but even this is only suggestive since it was based on one particular (Go/No-Go) cognitive task.

    Really, the only workable approach is essentially to do what most researchers do - roughly estimate participant numbers required from previous literature - which, as you mention above, is a kind of power analysis. However, we should do this with our eyes open and with full awareness that it's a pretty rough estimate. My problem with formal power analyses are that they misleadingly suggest that this process can be an exact one.

    ReplyDelete
    Replies
    1. Thanks Matt, that's an excellent point and we'll need to think carefully about how best to ensure that fMRI studies are sufficiently well powered. I do think, though, that relying solely on rules of thumb and what's already been published would be risky (due to publication bias and consequent over-estimation of true effect sizes). Luckily I'm in the same department as Kevin Murphy (from the ref you cite) so I'll go have a chat with him!

      Delete
    2. Ah... Yes, of course you are... Weirdly enough, I actually met Kevin for the first time last week myself! Say hi for me!

      Delete
  18. I have a quibble with the power analysis. Power analysis reinforces the idea that the study is about rejecting the null hypothesis. We should really care about the magnitude of effects. It's often even easier, and always more meaningful, to propose the size of your effect confidence intervals. Then you're not making a statement about passing the null goal but what you believe to be a meaningful effect size. Furthermore, this analysis leaves one open to discovering a different variance than predicted and modifying the N mid study. You're doing it to obtain a specific predetermined sensitivity, not to make an effect 'significant'.

    ReplyDelete
  19. What (other than a lack of funding) could prevent a researcher from doing lots of secret probe tests first, finding out what works and what doesn't, doing as much p-fishing as necessary; then, armed with this knowledge, write his registration submission describing the best series he did as if he's just considering it, get his IPA... wait some time and submit his best-fished preliminary tests that are guaranteed to fit the registration?

    Of course this scenario is a bit far-fetched, and overall the IPA system will be a great improvement. I just don't think it's going to be a silver bullet against all kinds of bias.

    ReplyDelete
    Replies
    1. Thanks for this comment - it's a scenario I get asked about a lot. This is addressed in FAQ 3 above, and also in guidelines. There is no defence against the most serious fraud in this (or any) publishing mechanism. However there are various mechanisms in place that will deter such actions. In brief:
      1) raw data, time-stamped, must be uploaded with the full submission
      2) authors and co-authors must certify that any data in the main experiments was collected after, and not before, IPA
      3) authors who complete all their experiments in advance are taking a substantial risk because it is likely that reviewers will require changes to the experimental protocol. This would of course make any data collected in advance via a different non-IPA-ed methodology quite useless

      The other, perhaps more likely, possibility is that researchers could conduct a series of pilot experiments and then (even implicitly) cherry pick the ones that supported their hypothesis, even if by chance. They could then submit a Stage 1 manuscript including the cherry-picked pilot data (which is allowed at the first submission) as proof of concept and propose to repeat the experiment in the main study, e.g. as a replication. However this also a risky strategy because if the pilot study is a false positive then it is unlikely that the main experiment would replicate it. The authors would then face the unenviable choice between retracting their Stage 2 paper (remembering that authors agree for any retractions to be published) or publishing a failure to internally replicate themselves.

      Naturally there is no perfect defence against fraud, as a single author, working alone, could lie at every stage of this process. However there is even less to deter such frauds in the existing publishing systems. Overall I think the RR initiative is well protected against the grey area of misconduct, and the various checks should hopefully dissuade more serious fraudsters from participating in it.

      Delete
  20. Thanks for your blogpost! I am really excited to see what comes out of Cerebral Cortex's initiative. While contemplating on submitting one of my own manuscripts for pre-registration, I realized that I either lack understanding of measures of statistical power, or there is a fundamental problem and unfair practice for a particular research technique: psychophysics.

    The typical psychophysical study includes only a handful (+/- 2) participants. Since estimates of statistical power only look at sample size and effect size, the average psychophysical study would be considered severely underpowered. If I understand correctly, this type of study would not meet the requirements of Cerebral Cortex.

    However, in psychophysics we usually torture each participant in thousands of trials, way more than in the average neuroimaging study. Intuitively, I feel that the "within-subject sample size" must be a major determinant of a study's trustworthiness. However, power measures seem to account only for "across-subjects sample size".

    Are there any specific plans at Cerebral Cortex to let studies compensate small sample sizes with large trial numbers? Are there even statistical procedures that account for the number of repeated measurements in the calculation of statistical power?

    ReplyDelete
    Replies
    1. Ooops - my comment should of course have referred to the journal "Cortex", not "Cerebral Cortex". My apologies!

      Delete
    2. Hi Niko, thanks for commenting. That's a good point. It seems to me that any power analysis needs to be tailored for the sample from which you are seeking to draw inferences. In a typical study with multiple subjects, we seek to generalise from a sample of subjects to the population. In a psychophysical experiment with single-subject analysis, we're instead seeking to generalise from a sample of the single subject's responses to that subject's population. So any statistical tests (and power analyses) would apply at the individual subject level, rather than at the group level.

      I don't know of any formal power analysis procedures for N=1 designs, but the principles should be the same. Provided the independence assumption is satisfied, I can't anything to stop investigators conducting simulations of data sets within subjects, varying the number of trials (analogous to varying number of subjects in a group study) and then determining the sensitivity of the within-subject analyses to detect effects of varying magnitude.

      Delete
    3. Dear Niko
      just treat every single trial (instead of single subjects) as analysis unit.
      Quite obviously, the assumption in psychophysics is that all normal subjects have virtually identical perceptual systems, and will show virtually identical effects - otherwise it would be weird to test only 2 subjects!
      If that assumption holds, strictly speaking, you can legitimately testa even a single subject. Now, each trial will be a unit, and you will have very powerful tests (on grounds of thousands of trials).
      Best
      Alessio Toraldo

      Delete
  21. Dear Chris,

    First of all, I would like to thank you for your contribution in trying to improve our scientific ways. I got, however, two concerns that I haven't read elsewhere on this page.

    Firstly, in FAQ 3 it states: "raw data must be made freely available at the full review stage and time stamped for inspection, along with a laboratory log indicating that data collection took place between dates X and Y".
    Although it doesn't state to whom the data is freely available, most medical centers across the world are reluctant to make patient information/data freely available. Many privacy problems could arise from this demand (e.g. patients could be identified on the basis of MR images), which might pose huge problems for local medical ethic committees. Furthermore, time stamps and laboratory logs will require a lot of effort and resources from scientists and/or medical centers in order to become a trustworthy instrument for verification and validation of data assessment.

    Secondly, while rightfully shifting the emphasis of good science from "positive results" to a "well designed study", the importance of the hypothesized effect size is increased greatly. Is it not realistic to expect an exaggeration of the hypothesized effect size? This will increase the study's power and, perhaps, the liklyhood of publication. It won't be a problem, when a study doesn't (fully) live up to those expectations, for null results will be published as well. I fear that the effect size estimation might result in attempts to falsely hype one's study without consequences.

    ReplyDelete
    Replies
    1. Hello Wouter,
      Thanks for these very astute comments.

      To address them in turn:

      1. On the anonymity of MRI scans. I agree that this could be a concern for some studies (e.g. patient work, sensitive genetic research etc.) In these cases, structural scans could be uploaded in a skull-stripped form, which would greatly reduce (or even negate) the chances of any individual being identified.

      2. On the effort and resources required for time stamps and lab logs. I don't see why this should be an issue. Whenever a raw data file is created, a time stamp is created with it. The researcher need never physically create a time stamp. A laboratory log is slightly more work, but Cortex will not require a detailed schedule. How hard is it for researchers to note the date when data is collected? (information which, in any case, can be extracted automatically from the file creation times)

      3. On exaggeration of predicted effect sizes. This is a genuine concern, and is omnipresent whenever prospective power analysis is undertaken. The solution lies in careful peer review and editorial oversight, being sure to scrutinize power estimations carefully at the pre-registration stage and ensure that they are well justified.

      Delete
    2. I'm not sure if this comment thread is still being followed, but I wanted to chime in with a quick comment on point 2. My laboratory collects time sensitive data (ERPs and eyetracking) and does so using computers running DOS. A nice feature of DOS is that we can turn off all extraneous processes during running to ensure the timing; this includes the system clock. The machines are not networked (by design and by necessity), so the clock is never updated. Therefore the date/time stamp on my raw data files is probably (I haven't actually checked recently) several years off by now. We transfer the data by CD, and of course a time stamp is generated then too, but it may not be the day the file was created. The point is just that the time stamp is perhaps not as straightforward as you state here, even without any question of fraud or bad intent. Files get transferred, opened, etc. and the time stamp changes. Might be hard for at least some labs to put that forward as an accurate representation of when the data was collected. A log is, of course, easy, but there are -- often good -- reasons that the data files might not match the log.

      Delete
  22. "I would therefore like to propose a new form of empirical article at Cortex, called Registered Reports"

    I don't understand something and have a few questions: why is this called "Registered Reports"?

    Is there anything that is "registered" and if so, how is that done, what does this "registration" imply, where is it "registered", and is this accessible to the reader?

    The only form of "registration" i know of is pre-registration which involves a time-stamped, frozen, description of the research and analysis-plan that can be included in the paper so readers can check this information. But it seems to me that this is not the kind of "registration" that is involved in Registered Reports.

    ReplyDelete
    Replies
    1. Hi - this post is now quite old so I recommend having a look at https://cos.io/rr/ -- it contains the answers to your questions.

      Delete
    2. Hi again - I was just alerted by Andrew Gelman to your comment over on his blog and I realised you are the same Anonymous who commented on my RR Qual post about public preregistration, and so the context of what you are asking here became clear (I misinterpreted your Q as a more basic one about how RRs work). I will add a comment later today over on Andrew's blog to address this issue - it is an important Q and wasn't ignored, I just didn't have time to reply at the time you wrote on the Qual blog.

      Delete
    3. Hi again. Thank you so much for your reply over there. I responded over there with one final and possibly crucial issue.

      You don't have to post this comment, this is just to let you know i replied over there.

      I am putting my trust in you guys. Please don't fuck up "Registered Reports".

      Delete
    4. We are indeed trying not to! Thanks for contributing.

      Delete
    5. I couldn't figure out how to reply to your comment on Andrew's blog given all the threading, but just to say here:

      1) yes, a public declaration that no data or analysis was collected/conducted prior to the date of IPA is good idea. Authors have to state this in their Stage 2 cover letter anyway, but stating it publicly as well can only be a good thing.

      2) You say "Anyway, as an outsider I would summarize “Registered Reports” as it exists now, as a format which has effectively managed to completely get rid of 2 crucial aspects of “pre-registration”: (possible) accountability and transparency." I don't agree, but I do think the "outsider" perspective can be very helpful precisely because those working very closely with an initiative can sometimes develop blindspots or have different priorities. It's a useful check.

      I appreciate your input, which reinforced issues we were already thinking about, and I am sorry again that I didn't respond sooner.

      PS. re your pre-registered prediction about getting noticed, I shall hold you to the same standard that you are holding RRs and trust only your *publicly* preregistered protocol :-)

      Delete