March 8, 2006
Reproducibility Should Be Minimum Standard for Epidemiologic Research
Epidemiologic findings, especially those on which public policies are based, are strengthened when they can be replicated by others. However, full replication by independent investigators is not always possible, due to time constraints or a lack of funding. A commentary by researchers from the Johns Hopkins Bloomberg School of Public Health suggests that analytic research data should be made available so that reproducibility of epidemiologic studies can be the new minimum standard for which investigators strive. The commentary will be published in the May 1, 2006, print edition of the American Journal of Epidemiology and can currently be viewed on the journal’s webpage.
Robert D.Peng, PhD
“All epidemiologic studies should be held to the standard of full replication. But in cases where this is not possible, study investigators should make it possible for others to reproduce their findings,” said Roger D. Peng, PhD, lead author of the commentary and an advocate for making research reproducible by others.
Peng and colleagues said the first requirement in reproducibility is that the analytical data set must be made available for others to view and use. The availability of data sets enables other investigators to verify previously published findings, conduct alternative analyses of the same data, eliminate uninformed criticisms and expedite the exchange of information among scientists. In addition to providing the computer code, or instructions for data analysis, authors must also explain how the computer code is linked to the data and which code sections apply to which data.
The Hopkins researchers acknowledged that making data available to others gives the original investigator little control over how the data will be used. They suggest a system by which partial rights are licensed to interested investigators according to how the data will be used.
“Providing others with partial rights to the data benefits both the original investigator and those interested in the data. The recipients obtain access to the data and the donor meets data disclosure obligations and maintains some control over others’ use of the data,” said Peng, who is an assistant professor in the Bloomberg School’s Department of Biostatistics.
As an example of how epidemiologic studies can meet the reproducibility minimum, Peng and his coauthors applied the standard to a study on quantification of air pollution risk, called the National Morbidity, Mortality and Air Pollution Study (NMMAPS). They created the Internet Health and Air Pollution Surveillance System—at www.ihapss.jhsph.edu—to disseminate the entire database and software used for the study. Other scientists are now able to fully reproduce the study results, apply the study’s methodology to their own data or apply their methodology to the NMMAPS data.
A handful of scholarly journals, such as Science and Nature, already require authors to place biologic data in public databases, and the National Institutes of Health requires grantees to have a data-sharing policy. Biologists have already made great strides toward integrating research databases, sharing software and making their analyses reproducible.
“Reproducibility is feasible now. Journals can play an important role in ensuring that their published work is reproducible,” said Scott L. Zeger, PhD, professor and chair of the Bloomberg School’s Department of Biostatistics.
The compendium, which is a full study linked with the data and code, for Peng and his colleagues’ commentary can be found at www.biostat.jhsph.edu/~rpeng/reproducible/.
The study was supported by grants from the National Institute for Environmental Health (NIEHS), the NIEHS Center in Urban Environmental Health and the Health Effects Institute.Public Affairs media contacts for the Johns Hopkins Bloomberg School of Public Health: Kenna Lowe or Tim Parsons at 410-955-6878 or email@example.com.