May 12, 2021 R language tutorial
In the random forest approach, a large number of decision trees are created. E ach observation is fed into each decision tree. T he most common result of each observation is used as the final output. T he new observations are fed into all the trees and a majority vote is taken for each classification model.
Make an incorrect estimate of what was not used when building the tree. T his is called an OOB (out-of-bag) error estimate, which is mentioned as a percentage.
The R language pack "randomForest" is used to create random forests.
Use the following commands in the R language console to install the package. Y ou must also install the relevant package, if any.
install.packages("randomForest)
The package "randomForest" has the function randomForest(), which is used to create and analyze random forests.
The basic syntax for creating random forests in the R language is -
randomForest(formula, data)
The following is a description of the parameters used -
Formula is a formula that describes predictors and response variables.
Data is the name of the dataset used.
We'll create a decision tree using an R-language built-in dataset called ReadingSkills. I t describes someone's reading Skills score if we know the variables "age," "shoesize," "score," and whether that person is a native speaker.
The following is sample data.
# Load the party package. It will automatically load other required packages. library(party) # Print some records from data set readingSkills. print(head(readingSkills))
When we execute the code above, it produces the following results and charts -
nativeSpeaker age shoeSize score 1 yes 5 24.83189 32.29385 2 yes 6 25.95238 36.63105 3 no 11 30.42170 49.60593 4 yes 7 28.66450 40.28456 5 yes 11 31.88207 55.46085 6 yes 10 30.07843 52.83124 Loading required package: methods Loading required package: grid ............................... ...............................
We'll use the randomForest() function to create a decision tree and look at its graph.
# Load the party package. It will automatically load other required packages. library(party) library(randomForest) # Create the forest. output.forest <- randomForest(nativeSpeaker ~ age + shoeSize + score, data = readingSkills) # View the forest results. print(output.forest) # Importance of each predictor. print(importance(fit,type = 2))
When we execute the code above, it produces the following results -
Call: randomForest(formula = nativeSpeaker ~ age + shoeSize + score, data = readingSkills) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 1 OOB estimate of error rate: 1% Confusion matrix: no yes class.error no 99 1 0.01 yes 1 99 0.01 MeanDecreaseGini age 13.95406 shoeSize 18.91006 score 56.73051
From the random forest shown above, we can conclude that shoe size and grade are important factors in determining if someone is a native speaker or not. I n addition, the model has a margin of error of only 1%, which means that we can predict accuracy of 99%.