R language Random forest algorithm

May 12, 2021 R language tutorial

In the random forest approach, a large number of decision trees are created. E ach observation is fed into each decision tree. T he most common result of each observation is used as the final output. T he new observations are fed into all the trees and a majority vote is taken for each classification model.

Make an incorrect estimate of what was not used when building the tree. T his is called an OOB (out-of-bag) error estimate, which is mentioned as a percentage.

The R language pack "randomForest" is used to create random forests.

Install the R package

Use the following commands in the R language console to install the package. Y ou must also install the relevant package, if any.

install.packages("randomForest)

The package "randomForest" has the function randomForest(), which is used to create and analyze random forests.

Grammar

The basic syntax for creating random forests in the R language is -

randomForest(formula, data)

The following is a description of the parameters used -

Formula is a formula that describes predictors and response variables.
Data is the name of the dataset used.

Enter the data

We'll create a decision tree using an R-language built-in dataset called ReadingSkills. I t describes someone's reading Skills score if we know the variables "age," "shoesize," "score," and whether that person is a native speaker.

The following is sample data.

# Load the party package. It will automatically load other required packages.
library(party)

# Print some records from data set readingSkills.
print(head(readingSkills))

When we execute the code above, it produces the following results and charts -

  nativeSpeaker   age   shoeSize      score
1           yes     5   24.83189   32.29385
2           yes     6   25.95238   36.63105
3            no    11   30.42170   49.60593
4           yes     7   28.66450   40.28456
5           yes    11   31.88207   55.46085
6           yes    10   30.07843   52.83124
Loading required package: methods
Loading required package: grid
...............................
...............................

Cases

We'll use the randomForest() function to create a decision tree and look at its graph.

# Load the party package. It will automatically load other required packages.
library(party)
library(randomForest)

# Create the forest.
output.forest <- randomForest(nativeSpeaker ~ age + shoeSize + score, 
           data = readingSkills)

# View the forest results.
print(output.forest) 

# Importance of each predictor.
print(importance(fit,type = 2))

When we execute the code above, it produces the following results -

Call:
 randomForest(formula = nativeSpeaker ~ age + shoeSize + score,     
                 data = readingSkills)
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 1

        OOB estimate of  error rate: 1%
Confusion matrix:
    no yes class.error
no  99   1        0.01
yes  1  99        0.01
         MeanDecreaseGini
age              13.95406
shoeSize         18.91006
score            56.73051

Conclusion

From the random forest shown above, we can conclude that shoe size and grade are important factors in determining if someone is a native speaker or not. I n addition, the model has a margin of error of only 1%, which means that we can predict accuracy of 99%.