G Additional Information on Graphing
by Doc P, 10 Jun 2020
Many other types of graphs can be created within R. Without getting too detailed here are some notes on several types you might want to use and some of the options that can be applied. If you would like to see examples, each of the sections is keyed to the Marin video that addresses the particular type of plot. Marin uses the LungCapData for his examples. It is probably best to follow the videos so you can see the results, as the sample code below is a bit cryptic because it is not keyed to a particular data set. It is, however, a good reference to remind you all that can be done to modify the various types of visual presentations.
Bar Charts (2.1)
Useful for categorical variables “variable” in the statement below is the variable we want to plot. We need, first to put it in a table so R can draw the plot. “myvariable1” is just the name I chose for the table.
myvariable1 <- table(variable)
barplot(myvariable1)
To get relative frequency (instead of just a count) use
table(variable)/number
where number is the total number of observations myproportion <- table(variable)/number
barplot(myproportion)
main, xlab, ylab, las can all be used as was shown in the previous crib sheet
names.arg = c(what names you want) can also be added if you want to change the default names of the variables.
Pie chart (2.1)
Pie charts are not often used in statistical analysis but are common in news reports
first, create a table:
variabletab <- table(variable)
then plot the pie chart of variabletab
pie(variabletab)
will produce a pie chart of the variable named. You can add a box around the plot with
box()
Boxplots 2.2 and 2.3
Boxplots are used for numeric variables.
boxplot(numeric variable)
quantile(numeric variable, probs=c(0, 0,25, 0.5, 0.75, 1)
will return the values for the components of the plot (the upper and lower limits of the box as well as the upper and lower limits of the whiskers).
main, xlab, ylab, las, can all be used as was shown previously
ylim will reset the limits for the y variable:
ylim(lower limit, upper limit)
boxplot(Var1~Var2)
will produce a box plot of Var1 broken down by Var2
boxplot(Var1[Var2==“One thing”], Var1(Var2==“The other thing”]
will do the same thing.
Histograms 2.4
Histograms are used for numeric variables
hist(Variable)
hist(Variable, freq = F)
will give relative frequency as will
hist(Variable, prob = T)
ylim(startvalue, endvalue)
to set y limits
breaks=N
for number of break points or bins
breaks=c(a,b,c,…n)
will break at specific points
breaks=seq(from=begin number, to=end number, by= increment)
another way
lines(density(Variable)
adds a density plot. NOTE: Y axis must be Density, not Frequency.
col, lwd, main, xlab, ylab can all be added also.
Stem and Leaf (2.5)
Stem and Leaf plots are used for numeric variables. (I prefer to create them by hand, as they are best used for relatively small data sets and getting R to do them just the way you want can be difficult.)
stem(variable, scale = number)
###Bar Charts and Mosaic Plots (2.6)
Bar charts and mosiac plots are used when you have 2 categorical variables
First, you need to create a table for the two variables
Table 1<-table(Var1,Var2)
barplot(Table1)
produces a stacked barplot (this is the default)
barplot(Table1, beside = T)
produces side by side bar plots
barplot(Table1, beside = T, legend.text = T)
for default legend
barplot(Table1, beside = T, legend.text =c(“Name1”,”Name2”)
produces a custom legend
main, xlab, ylab, and las may also be added to the command with the usual results
col may also be added for color
mosaicplot(Table1)
produces a mosaic plot
all of the above options may also be added.
Scatterplots (2.7)
Scatterplots are used when you have two quantitative variables
plot(Var1, Var2)
var1 on x axis, var2 on Y
xlab, ylab, and main may be added
range of the variables may be altered with xlim and ylim (FORM: xlim=c(lower.value, upper.value)
size of plotting character can be changed with cex argument
plotting character can be changed with pch argument
col can be added
abline(lm(var1, var2))
will add the regression line
col may be specified for the line color
lty and lwd for line type and line width, may also be specified