Recap We’ve covered various approaches in explaining model predictions globally. Today we will learn about another model specific post hoc analysis. We will learn to understand the workings of gradient boosting predictions. Like past posts, the Clevaland heart dataset as well as tidymodels principle will be used. Refer to the first post of this series for more details. Gradient Boosting Besides random forest introduced in a past post, another tree-based ensemble model is gradient boosting.
Recap This is a continuation on the explanation of machine learning model predictions. Specifically, random forest models. We can depend on the random forest package itself to explain predictions based on impurity importance or permutation importance. Today, we will explore external packages which aid in explaining random forest predictions. External packages There are external a few packages which offer to calculate variable importance for random forest models apart from the conventional measurements found within the random forest package.
Intro Recap There are 2 approaches to explaining models Use simple interpretable models. This approach was covered in the previous posts where we looked at logistic regression and decision trees as examples of white box models. Conduct post-hoc interpretation on models. There are two are two types of post-hoc analysis which can be done, model specific and model agonistic. Direction of post In the next few posts, we will look at model specific post-hoc analysis which involves ranking the variables according to importance to the model.
Syntax highlighting Previously, I posted entries without any syntax highlighting as I was satisfied using basic blogdown and Hugo functions until a Disqus member commented in the previous post to use syntax highlighting. Thus, I tasked myself to learn more about syntax highlighting and to implement them in future posts. Now I’d like to share what I’ve learned. There are various ways to highlight syntax in Hugo but the preferred approach for blogdown is to use Highlight.
Introduction This is a follow up post of using simple models to explain machine learning predictions. In the last post, we introduced logistic regression and in today’s entry we will learn about decision tree. We will continue to use the Cleveland heart dataset and use tidymodels principles where possible. The details of the Cleveland heart dataset was also described in the last post. #library library(tidyverse) library(tidymodels) #import heart<-read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data", col_names = F) # Renaming var colnames(heart)<- c("age", "sex", "rest_cp", "rest_bp", "chol", "fast_bloodsugar","rest_ecg","ex_maxHR","ex_cp", "ex_STdepression_dur", "ex_STpeak","coloured_vessels", "thalassemia","heart_disease") #elaborating cat var ##simple ifelse conversion heart<-heart %>% mutate(sex= ifelse(sex=="1", "male", "female"),fast_bloodsugar= ifelse(fast_bloodsugar=="1", ">120", "<120"), ex_cp=ifelse(ex_cp=="1", "yes", "no"), heart_disease=ifelse(heart_disease=="0", "no", "yes")) ## complex ifelse conversion using `case_when` heart<-heart %>% mutate( rest_cp=case_when(rest_cp== "1" ~ "typical",rest_cp=="2" ~ "atypical", rest_cp== "3" ~ "non-CP pain",rest_cp== "4" ~ "asymptomatic"), rest_ecg=case_when(rest_ecg=="0" ~ "normal",rest_ecg=="1" ~ "ST-T abnorm",rest_ecg=="2" ~ "LV hyperthrophy"), ex_STpeak=case_when(ex_STpeak=="1" ~ "up/norm", ex_STpeak== "2" ~ "flat",ex_STpeak== "3" ~ "down"), thalassemia=case_when(thalassemia=="3.