Publication Date

2024

Document Type

Dissertation/Thesis

First Advisor

Koop, David

Second Advisor

Hamed Alhoori

Degree Name

M.S. (Master of Science)

Legacy Department

Department of Computer Science

Abstract

In recent years, computational reproducibility, which refers to achieving consistent results upon rerunning an experiment, has become one of the major concerns of various research communities. Jupyter Notebook, as a web-based computational notebook application, offers useful features for running and publishing computational experiments through interactive environments. However, rerunning notebooks does not always reproduce the experimental results. This thesis aims to develop novel methods to evaluate reproducibility by comparing different types of outputs between original and rerun notebooks. It also explores the idea of using machine learning models to predict reproducibility of notebooks automatically without the need of rerunning them. Through building classifiers using various structural and stylistic features as well as linguistic features such as the readability of markdown texts of notebooks, this research found it possible to use machine learning models to classify notebooks based on their reproducibility and find important factors contributing to the reproducibility.

Extent

68 pages

Language

en

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Share

COinS