May 12, 2021 R language tutorial
In the R language, we can read data from files stored outside the R language environment. W
e can also write data to files that will be stored and accessed by the operating system. T
he R language can read and write to a variety of file formats,
csv
excel
xml
so on.
In this chapter, we'll learn
csv
file and then write the data to a
csv
file. T
he file should exist in the current working directory so that the R language can read it. O
f course we can also set up our own directories and read files from there.
You can
getwd()
to examine the directory to which the R-language workspace points. Y
ou can also
setwd()
to set up a new working directory.
# Get and print current working directory. print(getwd()) # Set current working directory. setwd("/web/com") # Get and print current working directory. print(getwd())
When we execute the code above, it produces the following results -
[1] "/web/com/1441086124_2016" [1] "/web/com"
This result depends on your operating system and the directory where you are currently working.
A csv file is a text file in which the values in a column are separated by commas.
Let's consider the following .csv file called
input.csv
You can create this file by copying and pasting this data using Windows Note books.
Save the file as input using the Save as All Files
(*.*)
Notestation option ( .
input.csv
id,name,salary,start_date,dept 1,Rick,623.3,2012-01-01,IT 2,Dan,515.2,2013-09-23,Operations 3,Michelle,611,2014-11-15,IT 4,Ryan,729,2014-05-11,HR ,Gary,843.25,2015-03-27,Finance 6,Nina,578,2013-05-21,IT 7,Simon,632.8,2013-07-30,Operations 8,Guru,722.5,2014-06-17,Finance
The following .csv a simple example of a read-and-run
read.csv()
function that reads the CSV files available in the current working directory -
data <- read.csv("input.csv") print(data)
When we execute the code above, it produces the following results -
id, name, salary, start_date, dept 1 1 Rick 623.30 2012-01-01 IT 2 2 Dan 515.20 2013-09-23 Operations 3 3 Michelle 611.00 2014-11-15 IT 4 4 Ryan 729.00 2014-05-11 HR 5 NA Gary 843.25 2015-03-27 Finance 6 6 Nina 578.00 2013-05-21 IT 7 7 Simon 632.80 2013-07-30 Operations 8 8 Guru 722.50 2014-06-17 Finance
By default,
read.csv()
function treats the output as a data frame. T
his can be easily checked below. I
n addition, we can check the number of columns and rows.
data <- read.csv("input.csv") print(is.data.frame(data)) print(ncol(data)) print(nrow(data))
When we execute the code above, it produces the following results -
[1] TRUE [1] 5 [1] 8
Once we read the data in the data frame, we can apply all the functions that apply to the data frame, as described in the following section.
# Create a data frame. data <- read.csv("input.csv") # Get the max salary from data frame. sal <- max(data$salary) print(sal)
When we execute the code above, it produces the following results -
[1] 843.25
We can get rows that meet a specific filter,
SQL where
clause.
# Create a data frame. data <- read.csv("input.csv") # Get the max salary from data frame. sal <- max(data$salary) # Get the person detail having max salary. retval <- subset(data, salary == max(salary)) print(retval)
When we execute the code above, it produces the following results -
id name salary start_date dept 5 NA Gary 843.25 2015-03-27 Finance
# Create a data frame. data <- read.csv("input.csv") retval <- subset( data, dept == "IT") print(retval)
When we execute the code above, it produces the following results -
id name salary start_date dept 1 1 Rick 623.3 2012-01-01 IT 3 3 Michelle 611.0 2014-11-15 IT 6 6 Nina 578.0 2013-05-21 IT
# Create a data frame. data <- read.csv("input.csv") info <- subset(data, salary > 600 & dept == "IT") print(info)
When we execute the code above, it produces the following results -
id name salary start_date dept 1 1 Rick 623.3 2012-01-01 IT 3 3 Michelle 611.0 2014-11-15 IT
# Create a data frame. data <- read.csv("input.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01")) print(retval)
When we execute the code above, it produces the following results -
id name salary start_date dept 3 3 Michelle 611.00 2014-11-15 IT 4 4 Ryan 729.00 2014-05-11 HR 5 NA Gary 843.25 2015-03-27 Finance 8 8 Guru 722.50 2014-06-17 Finance
The R language can create
csv
in the form of csv files.
b20>
write.csv()
is used to
csv
file. T
his file is created in the working directory.
# Create a data frame. data <- read.csv("input.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01")) # Write filtered data into a new file. write.csv(retval,"output.csv") newdata <- read.csv("output.csv") print(newdata)
When we execute the code above, it produces the following results -
X id name salary start_date dept 1 3 3 Michelle 611.00 2014-11-15 IT 2 4 4 Ryan 729.00 2014-05-11 HR 3 5 NA Gary 843.25 2015-03-27 Finance 4 8 8 Guru 722.50 2014-06-17 Finance
Column X here comes from the
newper
T
his can be deleted with additional parameters when writing to the file.
# Create a data frame. data <- read.csv("input.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01")) # Write filtered data into a new file. write.csv(retval,"output.csv", row.names = FALSE) newdata <- read.csv("output.csv") print(newdata)
When we execute the code above, it produces the following results -
id name salary start_date dept 1 3 Michelle 611.00 2014-11-15 IT 2 4 Ryan 729.00 2014-05-11 HR 3 NA Gary 843.25 2015-03-27 Finance 4 8 Guru 722.50 2014-06-17 Finance