May 12, 2021 R language tutorial
We use regression analysis to create a model that describes the effect of variables on response variables in predictors. S ometimes, if we have a category variable such as Yes/No or Male/Female, etc. S imple regression analysis provides multiple results for each value of a classification variable. I n this case, we can study the effect of the classification variable by using it with the predictor and comparing the regression lines at each level of the classification variable. T his analysis is called coAVA, also known as ANCOVA.
Consider the R language built into the dataset mtcars. I n it, we observe that the field "am" indicates the type of transfer (automatic or manual). I t is a classification variable with values of 0 and 1. T he number of miles per gallon (mpg) of a car can also depend on the value of horsepower ("hp").
We looked at the effect of the value of "am" on regression between "mpg" and "hp". I t is done by using the aov() function and then using the anova() function to compare multiple regressions.
Create a data frame from the dataset mtcars that contains fields "mpg," "hp" and "am." H ere we take "mpg" as the response variable, "hp" as the predictor, and "am" as the classification variable.
input <- mtcars[,c("am","mpg","hp")] print(head(input))
When we execute the code above, it produces the following results -
am mpg hp Mazda RX4 1 21.0 110 Mazda RX4 Wag 1 21.0 110 Datsun 710 1 22.8 93 Hornet 4 Drive 0 21.4 110 Hornet Sportabout 0 18.7 175 Valiant 0 18.1 105
We create a regression model with "hp" as the predictor and "mpg" as the response variable, taking into account the interaction between "am" and "hp".
# Get the dataset. input <- mtcars # Create the regression model. result <- aov(mpg~hp*am,data = input) print(summary(result))
When we execute the code above, it produces the following results -
Df Sum Sq Mean Sq F value Pr(>F) hp 1 678.4 678.4 77.391 1.50e-09 *** am 1 202.2 202.2 23.072 4.75e-05 *** hp:am 1 0.0 0.0 0.001 0.981 Residuals 28 245.4 8.8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The results show that horsepower and transmission types have a significant effect on miles per gallon because in both cases the p-value is less than 0.05. H owever, the interaction between the two variables is not significant because the p-value is greater than 0.05.
# Get the dataset. input <- mtcars # Create the regression model. result <- aov(mpg~hp+am,data = input) print(summary(result))
When we execute the code above, it produces the following results -
Df Sum Sq Mean Sq F value Pr(>F) hp 1 678.4 678.4 80.15 7.63e-10 *** am 1 202.2 202.2 23.89 3.46e-05 *** Residuals 29 245.4 8.5 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The results show that horsepower and transmission types have a significant effect on miles per gallon because in both cases the p-value is less than 0.05.
Now we can compare the two models to conclude whether the interaction of variables really matters. T o do this, we use the anova() function.
# Get the dataset. input <- mtcars # Create the regression models. result1 <- aov(mpg~hp*am,data = input) result2 <- aov(mpg~hp+am,data = input) # Compare the two models. print(anova(result1,result2))
When we execute the code above, it produces the following results -
Model 1: mpg ~ hp * am Model 2: mpg ~ hp + am Res.Df RSS Df Sum of Sq F Pr(>F) 1 28 245.43 2 29 245.44 -1 -0.0052515 6e-04 0.9806
Since the p-value is greater than 0.05, we conclude that the interaction between horsepower and propagation type is not significant. T herefore, in automotive and manual transmission modes, mileage per gallon will depend on the horsepower of the car in a similar manner.