Publication Date
2024
Document Type
Dissertation/Thesis
First Advisor
Alhoori, Hamed
Degree Name
Ph.D. (Doctor of Philosophy)
Legacy Department
Department of Computer Science
Abstract
Validating scientific claims is an elaborate process that involves researchers investing considerable time and effort to discover the validity of claims. It involves repeatability, reproducibility, and replicability, where each subsequent step requires increased effort and time. With the evolving nature of computational experiments and the exponential pace at which scientific publications are disseminated, researchers find it challenging to devote time, energy, and resources to assess claims made in papers of interest. Preemptive signals for reproducible effort can be helpful to researchers in filtering academic works of interest, thereby allowing them to adjust efforts based on need. Through this research, I present new standards to measure reproducible effort under the unified "Effort of Reproducibility framework." The currently established standards for reproducibility are not adaptable to various sub-domains in artificial intelligence. Therefore, it is imperative to identify potential significant reproducibility signals across topics, journals, and conferences. In this body of research, I gathered qualitative metrics that encapsulate said reproducible effort, built information extraction pipelines to aid longitudinal data collection on reproducibility, and trained representational learning models that predict the distribution of factors contributing to the ease and difficulty of reproducing previously published studies. I found that the linguistic features of the papers primarily work with higher readability scores, and lexically diverse texts correlate positively with reproducibility. The representational learning models I built successfully predicted outcomes on a reproducibility spectrum with a high degree of accuracy, emphasizing the potential of machine learning to augment pre-emptive reproducibility checks. Additionally, it was observed that detailed documentation of the experimental artifacts and the clarity of experimental design are strongly associated with increased reproducibility. These insights form the basis of our proposed framework for the "Effort of Reproducibility," which allows for a dynamic assessment of reproducible effort based on evolving standards and methodologies in computational sciences. These findings will assist the scholarly community in enhancing the reliability of scientific findings with methods, metrics, and models estimating reproducible effort, and the results can foster a more collaborative and transparent research environment.
Recommended Citation
Akella, Akhil Pandey, "Effort of Reproducibility: Metrics, Methods, and Models to Navigate the Landscape of Reproducible Research" (2024). Graduate Research Theses & Dissertations. 7947.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/7947
Extent
102 pages
Language
en
Publisher
Northern Illinois University
Rights Statement
In Copyright
Rights Statement 2
NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.
Media Type
Text
