The research seeks to understand what drives decisions in data analyses and the process through which academics test a hypothesis by comparing the analyses of different researchers who tested the same hypotheses on the same dataset. Analysts reported radically different analyses and dispersed empirical outcomes, including, in some cases, significant effects in opposite directions from each other. Decisions about variable operationalizations explained the lack of consistency in results beyond statistical choices (i.e., which analysis or covariates to use).
"Our findings illustrate the importance of analytical choices and how different statistical methods can lead to different conclusions," says Martin Schweinsberg. "An academic research question can sometimes be investigated in different ways, even if the answers are derived from the same dataset and by analysts without any incentives to find a particular result, and this research highlights this."
To conduct the research, Professor Schweinsberg recruited a crowd of analysts from all over the world to test two hypotheses regarding the effects of scientists' gender and professional status on active participation in group conversations. Using the online academic forum Edge, researchers analyzed group discussion data of scientific discussions from more than two decades (1996-2014). The dataset contained more than 3 million words from 728 contributors and 150 variables related to the conversation, its contributors, or the textual level of the transcript. Then, using the new platform DataExplained, developed by co-authors Michael Feldman, Nicola Staub, and Abraham Bernstein, researchers analyzed the data in R to identify whether there was a link between a scientist's gender or professional status with their levels of verbosity.
Analysts utilized various sets of sample sizes, statistical approaches, and covariates, which led to several different results in relation to the hypotheses. This, therefore, resulted in various, yet defensible findings from the various analysts. By using Data Explained, Professor Schweinsberg and colleagues were able to understand precisely how these analytical choices differed, despite the data and hypotheses being the same. A qualitative study of the R-code used by analysts revealed a process model for the psychology behind data analyses.
Professor Schweinsberg says, "Our study illustrates the benefits of transparent and open science practices. Subjective analytical choices are unavoidable, and we should embrace them because a collection of diverse analytical backgrounds and approaches can reveal the true consistency of an empirical claim."
This research shows the critical role subjective researcher decisions play in influencing reported empirical results. According to the researchers, these findings stress the importance of open data, which is publicly available, systematic robustness checks in academic research, and as much transparency as possible regarding both analytic paths taken and not taken, in order to ensure research is as accurate as possible. They also suggest humility when communicating research findings and caution in applying them to organizational decision-making.