Publication Date


Document Type

Student Project

First Advisor

Freedman, Reva

Degree Name

B.S. (Bachelor of Science)


Department of Computer Science


In this research project, natural language processing techniques’ ability to accurately classify medical text was measured to reinforce the relevance of artificial intelligence in the medical field. Sentiment analyses (analyses to determine whether the text was positive or negative) were performed on the prescription drug reviews in an open-source dataset using four different models: lexical, a neural network, a support vector machine, and a logistic regression model. Each model’s effectiveness was gauged by its ability to correctly classify unlabeled drug reviews (i.e., a percentage representing accuracy). The machine learning models were able to accurately classify the text, while the lexical model could not reliably produce an accurate prediction. The significance of the preprocessing technique known as ‘stemming’ was also analyzed in this project as well. Stemming made a negligible difference in accuracies (<1%).