OLS Linear Regression Model Based on the NCSU.EDU Diabetes Dataset
- This Exercise explores the NCSU.EDU Diabetes Dataset with an OLS Linear Regression Model. The Linear Regression Model is Used to Predict the Progression of Diabetes One Year after the Baseline. The coding exercise is provided in the python jupyter notebook below.
- The NCSU Diabetes Data website notes: From Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499, we have "Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline."
- The OLS Linear Regression model summary using all ten baseline variables showed there were several variables that were statistically insignificant. The variables were identified as statistically insignificant for this exercise when the P>|t| is greater than 0.05. Additional information for the statsomodel OLS can be found here: Statsmodels.org Regression.linear_model.OLS
- Tables and graphs can be found here: OLS Linear Regression Model Based on the NCSU.EDU Diabetes Dataset Python Jupyter Notebook.