May 12, 2021 R language tutorial
Data is stored in a normalized format by a relationship database system. T
herefore, to perform statistical calculations, we will need very advanced and complex Sql queries. B
ut the R language can easily connect to many relationship databases, such as MySql, Oracle, Sql servers, and so on, and get records from them as data frames.
Once the data is available in the R-language environment, it becomes a normal R-language dataset and can be operated or analyzed using all powerful packages and functions.
In this tutorial, we'll use MySql as a reference database to connect to the R language.
The R language has a built-in package called RMySQL that provides a local connection to the MySql database. Y ou can install this package in the R-language environment using the following commands.
install.packages("RMySQL")
Once the package is installed, we create a connection object in R to connect to the database. I t uses a user name, password, database name, and host name as input.
# Create a connection Object to MySQL database. # We will connect to the sampel database named "sakila" that comes with MySql installation. mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost') # List the tables available in this database. dbListTables(mysqlconnection)
When we execute the code above, it produces the following results -
[1] "actor" "actor_info" [3] "address" "category" [5] "city" "country" [7] "customer" "customer_list" [9] "film" "film_actor" [11] "film_category" "film_list" [13] "film_text" "inventory" [15] "language" "nicer_but_slower_film_list" [17] "payment" "rental" [19] "sales_by_film_category" "sales_by_store" [21] "staff" "staff_list" [23] "store"
We can use the function dbSendQuery() to query the database table in MySql. T he query is executed in MySql and returns the result set using the R language fetch() function. Finally, it is stored as a data frame in the R language.
# Query the "actor" tables to get all the rows. result = dbSendQuery(mysqlconnection, "select * from actor") # Store the result in a R data frame object. n = 5 is used to fetch first 5 rows. data.frame = fetch(result, n = 5) print(data.frame)
When we execute the code above, it produces the following results -
actor_id first_name last_name last_update 1 1 PENELOPE GUINESS 2006-02-15 04:34:33 2 2 NICK WAHLBERG 2006-02-15 04:34:33 3 3 ED CHASE 2006-02-15 04:34:33 4 4 JENNIFER DAVIS 2006-02-15 04:34:33 5 5 JOHNNY LOLLOBRIGIDA 2006-02-15 04:34:33
We can pass any valid select query to get the results.
result = dbSendQuery(mysqlconnection, "select * from actor where last_name = 'TORN'") # Fetch all the records(with n = -1) and store it as a data frame. data.frame = fetch(result, n = -1) print(data)
When we execute the code above, it produces the following results -
actor_id first_name last_name last_update 1 18 DAN TORN 2006-02-15 04:34:33 2 94 KENNETH TORN 2006-02-15 04:34:33 3 102 WALTER TORN 2006-02-15 04:34:33
We can update the rows in the Mysql table by passing the update query to the dbSendQuery() function.
dbSendQuery(mysqlconnection, "update mtcars set disp = 168.5 where hp = 110")
After executing the code above, we can see the table updated in the MySql environment.
dbSendQuery(mysqlconnection, "insert into mtcars(row_names, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb) values('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)" )
After executing the code above, we can see the rows in the table inserted into the MySql environment.
We can create tables in MySql using the function dbWriteTable(). I f the table already exists, it overrides the table and uses the data frame as input.
# Create the connection object to the database where we want to create the table. mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost') # Use the R data frame "mtcars" to create the table in MySql. # All the rows of mtcars are taken inot MySql. dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)
After executing the code above, we can see the tables created in the MySql environment.
We can delete the table in the MySql database and pass the drop table statement to dbSendQuery(), just as we use it to query the data in the table.
dbSendQuery(mysqlconnection, 'drop table if exists mtcars')
After executing the code above, we can see that the table was deleted in the MySql environment.