Publication Date
2024
Document Type
Dissertation/Thesis
First Advisor
Koop, David
Second Advisor
Hamed Alhoori
Degree Name
M.S. (Master of Science)
Legacy Department
Department of Computer Science
Abstract
In recent years, computational reproducibility, which refers to achieving consistent results upon rerunning an experiment, has become one of the major concerns of various research communities. Jupyter Notebook, as a web-based computational notebook application, offers useful features for running and publishing computational experiments through interactive environments. However, rerunning notebooks does not always reproduce the experimental results. This thesis aims to develop novel methods to evaluate reproducibility by comparing different types of outputs between original and rerun notebooks. It also explores the idea of using machine learning models to predict reproducibility of notebooks automatically without the need of rerunning them. Through building classifiers using various structural and stylistic features as well as linguistic features such as the readability of markdown texts of notebooks, this research found it possible to use machine learning models to classify notebooks based on their reproducibility and find important factors contributing to the reproducibility.
Recommended Citation
Hossain, A S M Shahadat, "Evaluating Computational Reproducibility of Jupyter Notebooks Using Machine Learning and Natural Language Processing" (2024). Graduate Research Theses & Dissertations. 7962.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/7962
Extent
68 pages
Language
en
Publisher
Northern Illinois University
Rights Statement
In Copyright
Rights Statement 2
NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.
Media Type
Text
