Publication Date
2016
Document Type
Dissertation/Thesis
First Advisor
Basu, Sanjib
Degree Name
Ph.D. (Doctor of Philosophy)
Legacy Department
Department of Statistics
LCSH
Bayesian statistical decision theory
Abstract
Appropriate feature selection is a fundamental problem in the field of statistics. Models with large number of features or variables require special attention due to the computational complexity of the huge model space. This is generally known as the variable or model selection problem in the field of statistics whereas in machine learning and other literature, this is also known as feature selection, attribute selection or variable subset selection. The method of variable selection is the process of efficiently selecting an optimal subset of relevant variables for use in model construction. The central assumption in this methodology is that the data contain many redundant variable; those which do not provide any significant additional information than the optimally selected subset of variable. Variable selection is widely used in all application areas of data analytics, ranging from optimal selection of genes in large scale micro-array studies, to optimal selection of biomarkers for targeted therapy in cancer genomics to selection of optimal predictors in business analytics. Under the Bayesian approach, the formal way to perform this optimal selection is to select the model with highest posterior probability. Using this fact the problem may be thought as an optimization problem over the model space where the objective function is the posterior probability of model and the maximization is taken place with respect to the models. We propose an efficient method for implementing this optimization and we illustrate its feasibility in high dimensional problems. By means of various simulation studies, this new approach has been shown to be efficient and to outperform other statistical feature selection methods methods namely median probability model and sampling method with frequency based estimators. Theoretical justifications are provided. Applications to logistic regression and survival regression are discussed.
Recommended Citation
Maity, Arnab Kumar, "Bayesian variable selection in linear and non-linear models" (2016). Graduate Research Theses & Dissertations. 1613.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/1613
Extent
x, 111 pages
Language
eng
Publisher
Northern Illinois University
Rights Statement
In Copyright
Rights Statement 2
NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.
Media Type
Text
Comments
Advisors: Sanjib Basu.||Committee members: Michael Geline; Balakrishna Hosmane; Alan Polansky; Duchwan Ryu.||Includes bibliographical references.