May 12, 2021 R language tutorial
A decision tree is a graph that represents the selection and its results in the form of a tree. T he nodes in the figure represent events or selections, and the edges of the graph represent decision rules or conditions. It is primarily used in machine learning and data mining applications using R.
Examples of the use of decision trees are - predicting that e-mail is spam or non-spam, predicting cancerous tumors, or predicting the credit risk of loans based on these factors. T ypically, you use observational data, also known as training data, to create models. T hen use a set of validation data to validate and improve the model. R has packages for creating and visualizing decision trees. For the new set of predictors, we use this model to determine that the R-pack "party" is used to create the decision tree.
Use the following commands in the R language console to install the package. Y ou must also install the relevant package, if any.
install.packages("party")
The "party" package has a function
ctree()
for creating and analyzing the decision tree.
The basic syntax for creating a decision tree in R is -
ctree(formula, data)
The following is a description of the parameters used -
Formula is a formula that describes predictors and response variables.
Data is the name of the dataset used.
We'll create a decision tree using an R-built-in dataset called ReadingSkills. I t describes someone's reading Skills score if we know the variables "age," "shoesize," "scores," and whether that person is a native speaker.
Here is the sample data.
# Load the party package. It will automatically load other dependent packages. library(party) # Print some records from data set readingSkills. print(head(readingSkills))
When we execute the code above, it produces the following results and charts -
nativeSpeaker age shoeSize score 1 yes 5 24.83189 32.29385 2 yes 6 25.95238 36.63105 3 no 11 30.42170 49.60593 4 yes 7 28.66450 40.28456 5 yes 11 31.88207 55.46085 6 yes 10 30.07843 52.83124 Loading required package: methods Loading required package: grid ............................... ...............................
We'll use the ctree() function to create a decision tree and view its graphics.
# Load the party package. It will automatically load other dependent packages. library(party) # Create the input data frame. input.dat <- readingSkills[c(1:105),] # Give the chart file a name. png(file = "decision_tree.png") # Create the tree. output.tree <- ctree( nativeSpeaker ~ age + shoeSize + score, data = input.dat) # Plot the tree. plot(output.tree) # Save the file. dev.off()
When we execute the code above, it produces the following results -
null device 1 Loading required package: methods Loading required package: grid Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Loading required package: strucchange Loading required package: zoo Attaching package: ‘zoo’ The following objects are masked from ‘package:base’: as.Date, as.Date.numeric Loading required package: sandwich
From the decision tree shown above, we can conclude that people with readingSkills scores below 38.3 and over 6 years of age are not native speakers.