May 12, 2021 R language tutorial
Poisson regression includes regression models, where the response variable is in the form of a count rather than a fraction.
For example, the number of births or victories in a football game series. In addition, the value of the response variable follows the Poisson distribution.
The general mathematical equation for Poisson's regression is -
log(y) = a + b1x1 + b2x2 + bnxn.....
The following is a description of the parameters used -
y
the response variable.
a
and
b
are numeric coefficients.
x
is a predictor.
The function used to create the Poisson regression
glm()
function.
The basic syntax of
glm()
regression is -
glm(formula,data,family)
The following is a description of the parameters used in the above features -
formula
a symbol that represents the relationship between variables.
data
a dataset that gives the values of these variables.
family
an R language object that specifies the details of the model. I
ts value is the logical regression of Poisson.
We have a built-in dataset called
warpbreaks
which describes the effect of wool type
A
or
B
and stress (low, medium or high) on the number of yarn breaks per loom. L
et's consider Break as a response variable, which is a count of the number of fractures. W
ool "type" and "stress" are predictors.
input <- warpbreaks print(head(input))
When we execute the code above, it produces the following results -
breaks wool tension 1 26 A L 2 30 A L 3 54 A L 4 25 A L 5 70 A L 6 52 A L
output <-glm(formula = breaks ~ wool+tension, data = warpbreaks, family = poisson) print(summary(output))
When we execute the code above, it produces the following results -
Call: glm(formula = breaks ~ wool + tension, family = poisson, data = warpbreaks) Deviance Residuals: Min 1Q Median 3Q Max -3.6871 -1.6503 -0.4269 1.1902 4.2616 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.69196 0.04541 81.302 < 2e-16 *** woolB -0.20599 0.05157 -3.994 6.49e-05 *** tensionM -0.32132 0.06027 -5.332 9.73e-08 *** tensionH -0.51849 0.06396 -8.107 5.21e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 297.37 on 53 degrees of freedom Residual deviance: 210.39 on 50 degrees of freedom AIC: 493.06 Number of Fisher Scoring iterations: 4
In the summary, we
p
look for p-values less than
0.05
in the last column to consider the effect of predictors on response variables. A
s shown in the figure, wool
M
stress types
H
and
B
have an effect on the break count.