Fitted vs predict in R

There are two functions in R that seems almost similar yet different:

  1. fitted()
  2. predict()

First let’s prepare some data first.

# Packages
library(dplyr)

# Data
set.seed(123)
dat <- 
  iris %>% 
  mutate(twoGp = sample(c("Gp1", "Gp2"), 150, replace = T), #create two group factor
         twoGp = as.factor(twoGp))

This is our data.

summary(dat)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species   twoGp   
##  setosa    :50   Gp1:76  
##  versicolor:50   Gp2:74  
##  virginica :50           
##                          
##                          
## 

fitted() is used to get a predicted values or \(\hat{y}\) based on the data. Let’s see this on the logistic regression.

logR <- glm(twoGp ~ ., family = binomial(), data = dat)

These are the fitted values.

fitted(logR) %>% head()
##         1         2         3         4         5         6 
## 0.4074988 0.3385228 0.3772767 0.3555640 0.4255196 0.4602198

For predict(), we have three types:

  1. Response
  2. Link - default
  3. Terms

If no new data supplied to predict(), it will use the original data used to fit the model.

1. Response

The type response is identical to fitted().

predict(logR, type = "response") %>% head()
##         1         2         3         4         5         6 
## 0.4074988 0.3385228 0.3772767 0.3555640 0.4255196 0.4602198

We can confirm this as below.

all.equal(fitted(logR), predict(logR, type = "response"))
## [1] TRUE

Thus, fitted() and predict(type = "response") give use predicted probabilities on the scale of the response variable. The first observation of this values can be interpreted as probability of Gp2 (Gp1 is a reference group) for first observation is 0.41.

2. Link

predict(type = "link") gives us predicted probabilities on the logit scale or log odds prediction.

predict(logR, type = "link") %>% head() #similar to predict(logR)
##          1          2          3          4          5          6 
## -0.3743150 -0.6698840 -0.5011235 -0.5946702 -0.3001551 -0.1594578

So, the log odds prediction of Gp2 for the first observation is -0.37. Hence, we can get the same values if we apply a link function to the fitted values.

The link function for logistic regression is:

\[ ln(\frac{\mu}{1 - \mu}) \] So, we apply this link function to the fitted values.

logOddsProb <- log(fitted(logR) / (1 - fitted(logR))) 
head(logOddsProb)
##          1          2          3          4          5          6 
## -0.3743150 -0.6698840 -0.5011235 -0.5946702 -0.3001551 -0.1594578

We can further confirm this as we did previously.

all.equal(logOddsProb, predict(logR, type = "link"))
## [1] TRUE

Also, we can conclude predict(type = "link") give use a fitted values before an application of link function (log odds).

3. Terms

Lastly, we have predict(type = "terms"). This type gives us a matrix of fitted values for each variable of each observation in the model on the scale of linear predictor.

predict(logR, type = "terms") %>% head() 
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1   0.07988782  0.28070682    0.4819893  -0.2736677 -0.9178543
## 2   0.10138230 -0.03635661    0.4819893  -0.2736677 -0.9178543
## 3   0.12287679  0.09046877    0.5024299  -0.2736677 -0.9178543
## 4   0.13362403  0.02705608    0.4615487  -0.2736677 -0.9178543
## 5   0.09063506  0.34411951    0.4819893  -0.2736677 -0.9178543
## 6   0.04764610  0.53435757    0.4206675  -0.2188976 -0.9178543

So, if we add up the values of the first observation and the constant (or intercept), we will get the same values as the log odds prediction (predict(type = "link")).

predTerm <- predict(logR, type = "terms")
sum(predTerm[1, ], attr(predTerm, "constant")) #add up the first observation and the constant
## [1] -0.374315
logOddsProb[1]
##         1 
## -0.374315

Those values also similar to if we calculate manually using coefficient from the summary().

\[ LogOdds(Gp2) = \beta_0 + \beta_1(Sepal.Length) + \beta_2(Sepal.Width) + \] \[ \beta_3(Petal.Length) + \beta_4(Petal.Width) + \beta_5(Species) \] So, this is the values we get from the first observation.

coef(logR)[1] + coef(logR)[2]*dat$Sepal.Length[1] + coef(logR)[3]*dat$Sepal.Width[1] + coef(logR)[4]*dat$Petal.Length[1] + coef(logR)[5]*dat$Petal.Width[1] + 0 #setosa species
## (Intercept) 
##   -0.374315

However, in predict(type = "terms") the values is centered, thus we have a different values for constant/intercept and for \(\beta_1(Sepal.Length)\),\(\beta_2(Sepal.Width)\) and so on. For example, the values for intercept for both models are:

# Intercept/constant from predict(type = "terms")
attr(predTerm, "constant")
## [1] -0.02537694
# Intercept/constant from summary()
coef(logR)[1]
## (Intercept) 
##   -1.814251

References:

Tengku Muhammad Hanis
Tengku Muhammad Hanis
Lead academic trainer

My research interests include medical statistics and machine learning application.

Related