Publication Date

2024

Document Type

Dissertation/Thesis

First Advisor

Alhoori, Hamed

Degree Name

Ph.D. (Doctor of Philosophy)

Legacy Department

Department of Computer Science

Abstract

Validating scientific claims is an elaborate process that involves researchers investing considerable time and effort to discover the validity of claims. It involves repeatability, reproducibility, and replicability, where each subsequent step requires increased effort and time. With the evolving nature of computational experiments and the exponential pace at which scientific publications are disseminated, researchers find it challenging to devote time, energy, and resources to assess claims made in papers of interest. Preemptive signals for reproducible effort can be helpful to researchers in filtering academic works of interest, thereby allowing them to adjust efforts based on need. Through this research, I present new standards to measure reproducible effort under the unified "Effort of Reproducibility framework." The currently established standards for reproducibility are not adaptable to various sub-domains in artificial intelligence. Therefore, it is imperative to identify potential significant reproducibility signals across topics, journals, and conferences. In this body of research, I gathered qualitative metrics that encapsulate said reproducible effort, built information extraction pipelines to aid longitudinal data collection on reproducibility, and trained representational learning models that predict the distribution of factors contributing to the ease and difficulty of reproducing previously published studies. I found that the linguistic features of the papers primarily work with higher readability scores, and lexically diverse texts correlate positively with reproducibility. The representational learning models I built successfully predicted outcomes on a reproducibility spectrum with a high degree of accuracy, emphasizing the potential of machine learning to augment pre-emptive reproducibility checks. Additionally, it was observed that detailed documentation of the experimental artifacts and the clarity of experimental design are strongly associated with increased reproducibility. These insights form the basis of our proposed framework for the "Effort of Reproducibility," which allows for a dynamic assessment of reproducible effort based on evolving standards and methodologies in computational sciences. These findings will assist the scholarly community in enhancing the reliability of scientific findings with methods, metrics, and models estimating reproducible effort, and the results can foster a more collaborative and transparent research environment.

Extent

102 pages

Language

en

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Share

COinS