Graduate Research Theses & Dissertations

Assessing The Performance and Merit of The Random Survival Forest and Cox Models on A Pancreatic Cancer Data Set

Carl Edward MuellerFollow

Publication Date

2019

Document Type

Dissertation/Thesis

First Advisor

Zhou, Haiming

Degree Name

M.S. (Master of Science)

Legacy Department

Department of Statistics and Actuarial Science

Abstract

Random Survival Forest (RSF) is one of the most powerful and easily applied machine learning models for survival data. RSF sacrifices some of the interpretability of the decision trees used to grow the forest in order to significantly reduce the bias and variance of the basic classification and regression tree (CART) paradigm. The lessened interpretability and higher computational intensity of RSF means that it may not always be the preferred method, even in settings where black-box methods are readily used. By contrast, the Cox Proportional Hazards (PH) model is incredibly flexible, resistant to overfitting, and transparently estimable. The tradeoff for the Cox PH model is the difficulty in construction when rigorous best practices are followed and model assumptions are violated, requiring complex extensions.

This thesis finds the best performing RSF, as measured by the Concordance Index (CI), for a cohort of pancreatic cancer patients from the Surveillance, Epidemiology and End Results (SEER) index and compares the predictive power of that model to a carefully constructed Cox model, while providing commentary on the relative strengths and weaknesses of each approach. Our conclusion marries the pros and cons of the classical and contemporary to cap a complete and cohesive narrative that insists on analyst customization for the machine learning technique and careful construction in the context of the Cox approach.

Recommended Citation

Mueller, Carl Edward, "Assessing The Performance and Merit of The Random Survival Forest and Cox Models on A Pancreatic Cancer Data Set" (2019). Graduate Research Theses & Dissertations. 7478.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/7478

Extent

117 pages

Language

eng

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Download

Included in

Statistics and Probability Commons

COinS

Graduate Research Theses & Dissertations

Assessing The Performance and Merit of The Random Survival Forest and Cox Models on A Pancreatic Cancer Data Set

Publication Date

Document Type

First Advisor

Degree Name

Legacy Department

Abstract

Recommended Citation

Extent

Language

Publisher

Rights Statement

Rights Statement 2

Media Type

Included in

Browse

Search

Author Corner

Graduate Research Theses & Dissertations

Assessing The Performance and Merit of The Random Survival Forest and Cox Models on A Pancreatic Cancer Data Set

Author

Publication Date

Document Type

First Advisor

Degree Name

Legacy Department

Abstract

Recommended Citation

Extent

Language

Publisher

Rights Statement

Rights Statement 2

Media Type

Included in

Share

Browse

Search

Author Corner