Publication Date

2016

Document Type

Dissertation/Thesis

First Advisor

Basu, Sanjib

Degree Name

Ph.D. (Doctor of Philosophy)

Legacy Department

Department of Statistics

LCSH

Bayesian statistical decision theory

Abstract

Appropriate feature selection is a fundamental problem in the field of statistics. Models with large number of features or variables require special attention due to the computational complexity of the huge model space. This is generally known as the variable or model selection problem in the field of statistics whereas in machine learning and other literature, this is also known as feature selection, attribute selection or variable subset selection. The method of variable selection is the process of efficiently selecting an optimal subset of relevant variables for use in model construction. The central assumption in this methodology is that the data contain many redundant variable; those which do not provide any significant additional information than the optimally selected subset of variable. Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. Under the Bayesian approach, the formal way to perform this optimal selection is to select the model with highest posterior probability. Using this fact the problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model and the maximization is taken place with respect to the models. We propose an efficient method for implementing this optimization and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient and to outperform other statistical feature selection methods methods namely median probability model and sampling method with frequency based estimators. Theoretical justifications are provided. Applications to logistic regression and survival regression are discussed.

Comments

Advisors: Sanjib Basu.||Committee members: Michael Geline; Balakrishna Hosmane; Alan Polansky; Duchwan Ryu.||Includes bibliographical references.

Extent

x, 111 pages

Language

eng

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Share

COinS