J Entering and Saving Data

by Doc P, 10 Jun 2020



(Marin video on this topic: R tutorial 1.3)

If our data set is small it is quite simple to type the numbers in to the R console to do our analysis. With a larger data set, especially one we might want to come back to later, it is a good idea to enter the data into a spreadsheet program such as Excel and load it into R.

There are several ways to set up your spreadsheet file but the simplest is to save it as a .csv or comma separated value file. For a simple data set where we are measuring only one thing a single column of information should be entered into your spreadsheet. For example, if we have heart rate data on 10 subjects, we would enter it as follows:

Heart_Rate

62

83

71

68

63

80

74

79

65

69

This can make for a long string of numbers if we have a large data set, but is much easier to load into R than if we had several columns.

Try entering those numbers into your spreadsheet program.

Now save the file as a .csv file (Save As and choose csv, if you are using Excel). I named my file Heart_Rate.csv but you can use any name you want. Save the file to the desktop.

Start R and set your directory to the desktop.

Now type the following code into R:

HR <- read.csv(“HeartRate.csv”)

Type HR to list the data. It should look like this:

HR
## # A tibble: 10 x 1
##    Heart_Rate
##         <dbl>
##  1         62
##  2         83
##  3         71
##  4         68
##  5         63
##  6         80
##  7         74
##  8         79
##  9         65
## 10         69

There are 10 observations of one variable (Heart_Rate). Note that R assumes the first entry is a “Header”, that is, the name of the variable, and does not count it as a value. There is a way to tell R that you do not have a header, but it’s best just to include one.

If we want to add another variable to our observations that is quite simple. Reopen your spreadsheet in Excel and add the following information in column B:

Weight

147

163

158

175

162

134

129

158

147

162

Save the file to the desktop again. Now type the following into R: HR1<-read.csv(“HeartRate1.csv”)

Type HR1 to see the file.

HR1
## # A tibble: 10 x 2
##    Heart_Rate Weight
##         <dbl>  <dbl>
##  1         62    147
##  2         83    163
##  3         71    158
##  4         68    175
##  5         63    162
##  6         80    134
##  7         74    129
##  8         79    158
##  9         65    147
## 10         69    162

We now have 10 observations of 2 variables.

Now type summary(HR1) and you will see the following summary information for the data.

summary(HR1)
##    Heart_Rate        Weight     
##  Min.   :62.00   Min.   :129.0  
##  1st Qu.:65.75   1st Qu.:147.0  
##  Median :70.00   Median :158.0  
##  Mean   :71.40   Mean   :153.5  
##  3rd Qu.:77.75   3rd Qu.:162.0  
##  Max.   :83.00   Max.   :175.0

Now try the following:

summary(Weight)

This will return an error message “object Weight not found”. This is because R does not know about a variable “Weight” only the data set named HR1. There are several ways around this problem,but the easiest is simply to “attach” the data set HR1.

Type the following into R:

attach(HR1)

Then type:

summary(Weight)

which should return the same information we saw previously for the variable “Weight”.

attach(HR1)
## The following object is masked from HR:
## 
##     Heart_Rate
summary(Weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   129.0   147.0   158.0   153.5   162.0   175.0

Attaching the data set HR1 gives R access to the individual variables named by the headers in the data set. Again, there are other ways to do this but, for our purposes, attaching the data set is the simplest.

There are other ways to save data to use in R but for us the .csv file will be our standard.

If you do not have a spreadsheet program you can create a csv file in a text editor or even in a word processing program if you save it as a text file. Our Heart Rate and Weight data would look like this as a text file with commas as separators:

Heart_Rate, Weight

62, 147

83, 163

71, 158

68, 175

63, 162

80, 134

74, 129

79, 158

65, 147

69, 162

If you are using RStudio (and you should be) you can even enter your data in the text pane (upper left panel in the standard configuration), save it and then import it using the Import Dataset command in the Environment pane.

The easiest way to deal with large data sets, however, is to use a spreadsheet program. If you don’t have Excel or a similar spreadsheet program there are free programs available for download from the web.