Publication Date

2024

Document Type

Dissertation/Thesis

First Advisor

Hou, Minmei M.

Degree Name

M.S. (Master of Science)

Legacy Department

Department of Computer Science

Abstract

This study presents a bioinformatic analysis of SARS-CoV-2 genomes sampled in 2022, aimed at showing the genetic diversity and evolutionary patterns of the virus. A random selection of 1,000 genomes from a large collection of sequenced genomes was used to conduct an all-vs-all pairwise alignment, forming the basis for a genetic distance matrix utilizing thenumber of substitutions. This matrix served as a tool for clustering analysis, enabling the identification of distinct genomic variants and their relationships. Multidimensional Scaling (MDS) was employed to visualize the data in a two-dimensional space, revealing several distinct clusters that suggest the presence of diverse viral strains. Hierarchical clustering methods, including Ward, Average, Complete, and Single linkages, were explored for their efficacy in grouping sequences based on genetic similarity. The Ward method emerged as the most effective, delineating clear and coherent clusters. Each cluster was then annotated with a set of reference genomes which are major variants and subvariants, using both minimum and average distance measures, providing insights into the genetic connections among the variants. Also, each sampled genome was annotated using the most similar reference genome to compare with the clustering annotation. Our findings highlight the genetic variation within the SARS-CoV-2 virus and demonstrate the utility of advanced bioinformatics methods in understanding the complexities of viral evolution. This research contributes to the global effort in tracking the progression of the COVID-19 pandemic.

Extent

109 pages

Language

en

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Share

COinS