Getting Data into (and Out of) R

RHow do we input data into R? The first method, and sometimes the simplest, is: type the data in! This is a good method for small data sets. You can always read raw data from a data file using read.table(). There are several help functions for reading delimited data as well as fixed length fields; the scan() function permits reading fields of variable length. You can also read data in from a database. Both DBI (Java) and ODBC (Microsoft) interfaces are supported. Drivers are available for most popular databases. Once in, there are several ways to save data from an R workspace. You can save the entire workspace and restore it in a later session. You can also write a R data object (usually a data frame) as a text file with field delimiters. Finally, you can save an R object or objects as a binary file, which can be loaded back into another session. R also allows you to specify a particular output device, which is the standard way to save the results of a graph or a plot. RStudio allows you to save the graph as an image (.jpg, etc.) directly from the plot window.

Data can always be created by typing in values. For example, the vector assignment v <- c(1:10) creates a vector of 10 elements numbered 1 through 10. More complicated data structures can be created by composing that data structure from a group of other data structures. First create an empty data structure, and fill it in via the editor or cut and paste from external files. The R script editor allows tweaking of input, and is easier that editing keystrokes in the console window. Remember that we can transform from one object type to another, so we could read data in as a matrix and use the as.data.frame() function to create the data frame.

R has the ability to read in data in many different formats. The read.table() function is the most used, although there are multiple helper functions such as read.csv(), read.delim() and read.fwf() for reading fixed-length fields. Multiple import functions also exist, including reading in data from SPSS, SAS, Sysstat, and other statistical packages. The file name argument to read.table() can also be a URL: this is useful in reading a data file from the Internet. Consult the help subsystem ( help(read.table) for more options). R always uses a forward slash “/” as the separator character in full pathnames for files. A file in your documents directory in Windows would be written as “C:/users/janedoe/My Documents/Newscript.R”. This makes script files somewhat more portable at the expense of some initial confusion on the part of Windows users.

 

Getting Data Out of R

R utilizes a workspace that consists of a collection of data objects, code libraries, and named data sets. Each workspace also support multiple environments, although we won’t address this issue further (see the R reference manual for more details).

R libraries that are not automatically loaded can be loaded into the workspace via the library(); datasets can be loaded into R via the load(“”) command.

Packages that are not part of the standard distribution can be obtained via the install.package(“<packagename>”) command (note the use of double quotes). Data objects in R can be exported either as .csv file, or in native format (save(<object name> …, file=“<full file path name>”)) (usually with a .Rdata extensions) and then reloaded into the R workspace via a load(file=“full path name”). This will repopulate workspace with that object or objects.

If you choose to save your R workspace, it can be reloaded automatically when R is restarted. Other workspaces can be loaded into R with the load.image() command. Lastly, plots can be saved to a file using the saveplot() command. Most platforms will allow .jpg and .png, but check your local R documentation for your particular platform.