Publication Date

2024

Document Type

Dissertation/Thesis

First Advisor

Freedman, Reva

Degree Name

M.S. (Master of Science)

Legacy Department

Department of Computer Science

Abstract

Research, publish, repeat. This is the basic cycle of anyone in academia. Individuals in academia conduct research, write up your research into a paper or journal article, submit to a conference or journal, and repeat the process. If you're skilled you may even obtain the coveted best paper award from the conference. In this research, I compare full papers to short papers and full papers to best papers. I start by fine-tuning three transformer models for classification capabilities. After I calculate lexical diversity and readability metrics of the papers, I use the features to train three traditional machine learning models. Lastly, I perform a hypothesis test to see if the averages of the lexical diversity metrics differ, and if the averages of the readability metrics differ. I find that for full versus short comparison the transformer models are capable of distinguishing the papers and the hypothesis tests indicate that there are significant differences between most of the lexical and readability metrics. However, the traditional machine learning models were not able to classify short papers from full papers. In terms of full papers versus best papers, we see that neither the transformer models nor the traditional machine learning models were capable of distinguishing full papers from best papers. However, the hypothesis test does indicate that there is a significant difference between full papers and best papers for most of the lexical and readability metrics.

Extent

52 pages

Language

en

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Share

COinS