Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

R language co-variance analysis


May 12, 2021 R language tutorial


Table of contents


We use regression analysis to create a model that describes the effect of variables on response variables in predictors. S ometimes, if we have a category variable such as Yes/No or Male/Female, etc. S imple regression analysis provides multiple results for each value of a classification variable. I n this case, we can study the effect of the classification variable by using it with the predictor and comparing the regression lines at each level of the classification variable. T his analysis is called coAVA, also known as ANCOVA.

Cases

Consider the R language built into the dataset mtcars. I n it, we observe that the field "am" indicates the type of transfer (automatic or manual). I t is a classification variable with values of 0 and 1. T he number of miles per gallon (mpg) of a car can also depend on the value of horsepower ("hp").

We looked at the effect of the value of "am" on regression between "mpg" and "hp". I t is done by using the aov() function and then using the anova() function to compare multiple regressions.

Enter the data

Create a data frame from the dataset mtcars that contains fields "mpg," "hp" and "am." H ere we take "mpg" as the response variable, "hp" as the predictor, and "am" as the classification variable.

input <- mtcars[,c("am","mpg","hp")]
print(head(input))

When we execute the code above, it produces the following results -

                   am   mpg   hp
Mazda RX4          1    21.0  110
Mazda RX4 Wag      1    21.0  110
Datsun 710         1    22.8   93
Hornet 4 Drive     0    21.4  110
Hornet Sportabout  0    18.7  175
Valiant            0    18.1  105

Co-variance analysis

We create a regression model with "hp" as the predictor and "mpg" as the response variable, taking into account the interaction between "am" and "hp".

The interaction between a model and a classification variable and a predictor

# Get the dataset.
input <- mtcars

# Create the regression model.
result <- aov(mpg~hp*am,data = input)
print(summary(result))

When we execute the code above, it produces the following results -

            Df Sum Sq Mean Sq F value   Pr(>F)    
hp           1  678.4   678.4  77.391 1.50e-09 ***
am           1  202.2   202.2  23.072 4.75e-05 ***
hp:am        1    0.0     0.0   0.001    0.981    
Residuals   28  245.4     8.8                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The results show that horsepower and transmission types have a significant effect on miles per gallon because in both cases the p-value is less than 0.05. H owever, the interaction between the two variables is not significant because the p-value is greater than 0.05.

There is no model for the interaction between classification variables and predictors

# Get the dataset.
input <- mtcars

# Create the regression model.
result <- aov(mpg~hp+am,data = input)
print(summary(result))

When we execute the code above, it produces the following results -

            Df  Sum Sq  Mean Sq   F value   Pr(>F)    
hp           1  678.4   678.4   80.15 7.63e-10 ***
am           1  202.2   202.2   23.89 3.46e-05 ***
Residuals   29  245.4     8.5                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The results show that horsepower and transmission types have a significant effect on miles per gallon because in both cases the p-value is less than 0.05.

Compare the two models

Now we can compare the two models to conclude whether the interaction of variables really matters. T o do this, we use the anova() function.

# Get the dataset.
input <- mtcars

# Create the regression models.
result1 <- aov(mpg~hp*am,data = input)
result2 <- aov(mpg~hp+am,data = input)

# Compare the two models.
print(anova(result1,result2))

When we execute the code above, it produces the following results -

Model 1: mpg ~ hp * am
Model 2: mpg ~ hp + am
  Res.Df    RSS Df  Sum of Sq     F Pr(>F)
1     28 245.43                           
2     29 245.44 -1 -0.0052515 6e-04 0.9806

Since the p-value is greater than 0.05, we conclude that the interaction between horsepower and propagation type is not significant. T herefore, in automotive and manual transmission modes, mileage per gallon will depend on the horsepower of the car in a similar manner.