Publication Date


Document Type


First Advisor

Eads, Michael

Degree Name

Ph.D. (Doctor of Philosophy)

Legacy Department

Department of Physics


Predicting students’ performance to identify which students are at risk of receiving aD/Fail/Withdraw (DFW) grade and ensuring their timely graduation is not just desirable but also necessary in most educational entities. In the US, not only is the Science, Technology, Engineering, and Mathematics (STEM) major becoming less popular among students, the graduation rate of STEM students is steadily declining. The lack of STEM graduates in the US is a serious problem that will place this country at a disadvantage as a competitor in international technological advancement. In order to secure its status as a technological leader internationally, the US institutions must be more vigilant in predicting the grades of STEM students to improve student retention in STEM fields. Using early grade prediction allows the school to monitor students’ course progress and increases their chances of graduating. Predicting grades is highly beneficial for at-risk STEM students because it allows for timely pedagogical interventions that can better equip the students for success in their courses and prevent dropouts. Identifying at-risk students is a complicated problem since there are many factors to consider. Traditional approaches to analyzing and using students’ data have had mixed results in identifying factors that help predict students’ performance early on in the semester. Machine learning can uniquely identify patterns that other approaches cannot, making it a promising method for grade prediction that is currently available. This study uses machine learning algorithms to identify the key factors that predict which students are at-risk early in the semester. The results of this study show that demographic variables such as gender, academic level, and age are of little value in predicting student success. The factors with the highest correlation to the course grade were the students’ cumulative college grade point average (GPA) and the grades that the students received on the first four homework assignments of the semester. These variables are provided as input to machine learning algorithms - logistic regression, decision tree, and random forest. Using machine learning, grades can be predicted with 80% to 97% accuracy in the first two to four weeks of the semester, which allows the university to intervene early on.


108 pages




Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type


Included in

Physics Commons