Early Prediction of Sepsis from Clinical Data | Statistical Learning II: Multivariate Analysis

Use information collected during the stay of a patient in the ICU to predict whether the patient will develop sepsis.

Pre-processed dataset from the Computing in Cardiology Challenge 2019 through performing imputation of different types of variables for each individual patient to handle null values, and separating the dataset into a training set and a validation set for the model training
Selected CART and Random Forest models for training, built models with the full data and with the summary data, respectively, and ultimately chose the Random Forest model built with the summary data as the final model after comparing the performance of both methods
Utilized the final model to make a prediction on the test set and achieved an AUC of 0.906 and a BER of 0.222.