May 12, 2021 R language tutorial
Regression analysis is a very widely used statistical tool for modeling the relationship between two variables. O ne of these variables is called a predictor, and its values are collected experimentally. A nother variable is called a response variable, whose value is derived from the predictor.
In linear regression, the two variables are related by equations, where the exponent (power) of the two variables is 1. Mathematically, the linear relationship represents a straight line when drawn as a graph. A nonlinear relationship in which the exponent of any variable is not equal to 1 creates a curve.
The general mathematical equation for linear regression is -
y = ax + b
The following is a description of the parameters used -
y is the response variable.
x is a predictor.
A and b are called coefficient constants.
A simple example of regression is to predict a person's weight when his height is known. T o do this, we need to have a relationship between a person's height and weight.
The steps to create a relationship are -
Conduct experiments to collect samples of observations of height and corresponding weight.
Create a relationship model using the lm() function in the R language.
Find the coefficients from the model you created and use them to create mathematical equations
Get a summary of the relationship model to understand the average error in the forecast. A lso known as residuals.
To predict the weight of a new person, use the predict() function in R.
Below is sample data for observation -
# Values of height 151, 174, 138, 186, 128, 136, 179, 163, 152, 131 # Values of weight. 63, 81, 56, 91, 47, 57, 76, 72, 62, 48
This function creates a model of the relationship between predictors and response variables.
The basic syntax of the lm() function in linear regression is -
lm(formula,data)
The following is a description of the parameters used -
A formula is a symbol that represents the relationship between x and y.
The data is the vector to which the formula is applied.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) print(relation)
When we execute the code above, it produces the following results -
Call: lm(formula = y ~ x) Coefficients: (Intercept) x -38.4551 0.6746
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) print(summary(relation))
When we execute the code above, it produces the following results -
Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -6.3002 -1.6629 0.0412 1.8944 3.9775 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -38.45509 8.04901 -4.778 0.00139 ** x 0.67461 0.05191 12.997 1.16e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.253 on 8 degrees of freedom Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491 F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06
The basic syntax of predict() in linear regression is -
predict(object, newdata)
The following is a description of the parameters used -
object is a formula that has been created using the lm() function.
newdata is a vector that contains the new value of the predictor.
# The predictor vector. x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) # The resposne vector. y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # Apply the lm() function. relation <- lm(y~x) # Find weight of a person with height 170. a <- data.frame(x = 170) result <- predict(relation,a) print(result)
When we execute the code above, it produces the following results -
1 76.22869
# Create the predictor and response variable. x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) relation <- lm(y~x) # Give the chart file a name. png(file = "linearregression.png") # Plot the chart. plot(y,x,col = "blue",main = "Height & Weight Regression", abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm") # Save the file. dev.off()
When we execute the code above, it produces the following results -