Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

R language CSV file


May 12, 2021 R language tutorial


Table of contents


In the R language, we can read data from files stored outside the R language environment. W e can also write data to files that will be stored and accessed by the operating system. T he R language can read and write to a variety of file formats, csv excel xml so on.

In this chapter, we'll learn csv file and then write the data to a csv file. T he file should exist in the current working directory so that the R language can read it. O f course we can also set up our own directories and read files from there.

Get and set up the working directory

You can getwd() to examine the directory to which the R-language workspace points. Y ou can also setwd() to set up a new working directory.

# Get and print current working directory.
print(getwd())

# Set current working directory.
setwd("/web/com")

# Get and print current working directory.
print(getwd())

When we execute the code above, it produces the following results -

[1] "/web/com/1441086124_2016"
[1] "/web/com"

This result depends on your operating system and the directory where you are currently working.

Enter as a CSV file

A csv file is a text file in which the values in a column are separated by commas. Let's consider the following .csv file called input.csv


You can create this file by copying and pasting this data using Windows Note books. Save the file as input using the Save as All Files (*.*) Notestation option ( . input.csv

id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
 ,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

Read the CSV file

The following .csv a simple example of a read-and-run read.csv() function that reads the CSV files available in the current working directory -

data <- read.csv("input.csv")
print(data)

When we execute the code above, it produces the following results -

      id,   name,    salary,   start_date,     dept
1      1    Rick     623.30    2012-01-01      IT
2      2    Dan      515.20    2013-09-23      Operations
3      3    Michelle 611.00    2014-11-15      IT
4      4    Ryan     729.00    2014-05-11      HR
5     NA    Gary     843.25    2015-03-27      Finance
6      6    Nina     578.00    2013-05-21      IT
7      7    Simon    632.80    2013-07-30      Operations
8      8    Guru     722.50    2014-06-17      Finance

Analyze the CSV file

By default, read.csv() function treats the output as a data frame. T his can be easily checked below. I n addition, we can check the number of columns and rows.

data <- read.csv("input.csv")

print(is.data.frame(data))
print(ncol(data))
print(nrow(data))

When we execute the code above, it produces the following results -

[1] TRUE
[1] 5
[1] 8

Once we read the data in the data frame, we can apply all the functions that apply to the data frame, as described in the following section.

Get the highest salary

# Create a data frame.
data <- read.csv("input.csv")

# Get the max salary from data frame.
sal <- max(data$salary)
print(sal)

When we execute the code above, it produces the following results -

[1] 843.25

Get details of the person with the highest salary

We can get rows that meet a specific filter, SQL where clause.

# Create a data frame.
data <- read.csv("input.csv")

# Get the max salary from data frame.
sal <- max(data$salary)

# Get the person detail having max salary.
retval <- subset(data, salary == max(salary))
print(retval)

When we execute the code above, it produces the following results -

      id    name  salary  start_date    dept
5     NA    Gary  843.25  2015-03-27    Finance

Get information about all IT staff

# Create a data frame.
data <- read.csv("input.csv")

retval <- subset( data, dept == "IT")
print(retval)

When we execute the code above, it produces the following results -

       id   name      salary   start_date   dept
1      1    Rick      623.3    2012-01-01   IT
3      3    Michelle  611.0    2014-11-15   IT
6      6    Nina      578.0    2013-05-21   IT

People in IT departments who are paid more than 600

# Create a data frame.
data <- read.csv("input.csv")

info <- subset(data, salary > 600 & dept == "IT")
print(info)

When we execute the code above, it produces the following results -

       id   name      salary   start_date   dept
1      1    Rick      623.3    2012-01-01   IT
3      3    Michelle  611.0    2014-11-15   IT

People who joined in 2014 or later

# Create a data frame.
data <- read.csv("input.csv")

retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
print(retval)

When we execute the code above, it produces the following results -

       id   name     salary   start_date    dept
3      3    Michelle 611.00   2014-11-15    IT
4      4    Ryan     729.00   2014-05-11    HR
5     NA    Gary     843.25   2015-03-27    Finance
8      8    Guru     722.50   2014-06-17    Finance

Write to the CSV file

The R language can create csv in the form of csv files. b20> write.csv() is used to csv file. T his file is created in the working directory.

# Create a data frame.
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered data into a new file.
write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")
print(newdata)

When we execute the code above, it produces the following results -

  X      id   name      salary   start_date    dept
1 3      3    Michelle  611.00   2014-11-15    IT
2 4      4    Ryan      729.00   2014-05-11    HR
3 5     NA    Gary      843.25   2015-03-27    Finance
4 8      8    Guru      722.50   2014-06-17    Finance

Column X here comes from the newper T his can be deleted with additional parameters when writing to the file.

# Create a data frame.
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered data into a new file.
write.csv(retval,"output.csv", row.names = FALSE)
newdata <- read.csv("output.csv")
print(newdata)

When we execute the code above, it produces the following results -

      id    name      salary   start_date    dept
1      3    Michelle  611.00   2014-11-15    IT
2      4    Ryan      729.00   2014-05-11    HR
3     NA    Gary      843.25   2015-03-27    Finance
4      8    Guru      722.50   2014-06-17    Finance