Publication Date


Document Type


First Advisor

Alhoori, Hamed||Rogness, Daniel

Degree Name

B.A. (Bachelor of Arts)

Legacy Department

Department of Computer Science


Empirical research should always be backed by substantial and verifiable data so that anyone who wishes to reproduce the study or replicate the study with different data can verify the claims made by the research are accurate. We attempt to use a novel method of discovering reproducible research papers. Using this technique future research can be done to provide an even better understanding of the reproducibility crisis. We collected scholarly data from three different sources and combined them in order to obtain a dataset of 657 papers. The dataset comprises of papers that are verified as reproducible and ones that have been shown to not be reproducible. When the dataset was cleaned it resulted in 237 papers marked reproducible and 36 irreproducible. We then used three different models; Gaussian Naive Bayes, Multinomial Naive Bayes, and Adaboost to classify texts based on structural characteristics of papers and linguistic. Then we used a Long Short-Term Memory Recurrent Neural Network to compare results.


10 pages




Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type