1.1 Analyzing Categorical Data
c. Pie Charts & Bar Charts
Download the .rmd file, which you can run yourself in your installation of R, here.
In this tutorial, we’re going to look at two ways of visually representing data in R – through a pie chart and bar chart.
First, we’ll load up a dataset that’s actually built right into R’s
platform – mtcars
.
#importing the data
attach(mtcars)
Let’s check out what mtcars
includes:
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
As you can see, this dataset has a number of variables that can be
measured on cars. The one we’ll focus on first is gear
, which tells us
the number of gears each car in the dataset has. Just to get a handle on
what we can expect, let’s look at a frequency table of gear
using the
$
operator to tell R that we want to look within the mtcars
dataset:
table(mtcars$gear)
##
## 3 4 5
## 15 12 5
This (very basic) table tells us that there are 15 cars with 3 gears, 12 with 4 gears, and 5 with 5 gears in our dataset.
First, we’ll create a pie chart to summarize this distribution:
#pie chart with the pie() function
slices = c(15, 12, 5)
labels.pie = c("3 gears", "4 gears", "5 gears")
pie(slices,
labels=labels.pie)
This does the trick!
If you want a slightly better looking pie chart, we can create one using
the ggplot2
package. First, install the package using the following
chunk. I have commented out (with a #
) the line that will install the
package for you, so just add it back in by removing the #
and run the
chunk!
#install.packages("ggplot2")
Now, we can use the ggplot2
package to create a slightly nicer pie
chart. This function has a bunch of arguments, which I’ll break down for
you.
#ggplot2 pie chart
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
##
## Attaching package: 'ggplot2'
## The following object is masked from 'mtcars':
##
## mpg
pieframe = data.frame(slices, labels.pie)
ggplot(pieframe, aes(x="", y=slices, fill=labels.pie)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +
theme_void() +
geom_text(aes(label = paste0(slices, " cars")), position = position_stack(vjust=0.5)) +
labs(x = NULL, y = NULL, fill = NULL)
First,
because the
ggplot
function expects a dataframe input, we had to
create a dataframe (combining the variables slices
and labels.pie
that we created in our earlier pie chart endeavor) called pieframe
.
We pass ggplot
the pieframe
data, then identify some “aesthetics”
through aes()
that we want to include in the plot: namely, slices
and labels.pie
. A full explanation of how ggplot
uses aesthetics is
beyond the scope of what we need to do here, although we may touch on
this function again throughout the course.
We then tell ggplot
to make us a geom_bar()
. (Wait, what?? We’re
telling it to make us a bar chart?) Sure – but we’re going to input the
heights of the bars in polar fashion (coord_polar("y", start=0)
) so
they tell ggplot
how much to rotate around a central axis instead of
how tall to make the bars. This function isn’t really meant to create
a pie chart, but we are able to convince it to do it for us with this
input!
The rest of our arguments are just for making a pretty pie chart.
theme_void()
gets rid of some messy background graphics that ggplot
includes by default, and geom_text()
inputs the labels into each
slice. Again, because we won’t be using ggplot
all the time, I’m
intentionally glossing over some of the details here.
Similarly, we can craft a bar chart to present the data:
barplot(slices,
names.arg=labels.pie,
xlab = "Number of Gears",
ylab = "Count")
Alternately, we can use ggplot
again to create a bar chart:
#ggplot2 bar chart
library(ggplot2)
ggplot(data=pieframe, aes(x=labels.pie, y=slices)) +
geom_bar(stat="identity") +
labs(x="Number of Gears", y="Count")
Here, the ggplot
arguments may be a bit easier to interpret (since we
aren’t trying to convince geom_bar()
to make us a pie chart this
time!). As you can see, the x-axis shows the categorical variable
labels.pie
from our pieframe
data, while the y-axis shows the count
variable slices
. We then create a geom_bar()
that takes that
information and crafts a bar chart out of it. We’ve also added labs()
to label the axes.