Blog Home

Data Visualize - R for Data Science

From R for Data Science

How many rows are in penguins? How many columns?

> glimpse(penguins)
Rows: 344
Columns: 8

What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.

> bill_depth_mm
   a number denoting bill depth (millimeters)

Make a scatterplot of bill_depth_mm vs. bill_length_mm. That is, make a scatterplot with bill_depth_mm on the y-axis and bill_length_mm on the x-axis. Describe the relationship between these two variables.

> ggplot(data=penguins, mapping=aes(x=bill_length_mm, y=bill_depth_mm)) + geom_point(mapping=aes(color=species, shape=species)) + geom_smooth(method="lm") + labs(title="Bill Depth and Bill Length", subtitle="Dimensions for Adelie, Chinstrap, and Gentoo", x="Bill Length (mm)", y="Bill Depth (mm)", color="Species", shape="Species") + scale_color_colorblind()

As bill length increases, bill depth decreases

What happens if you make a scatterplot of species vs. bill_depth_mm? What might be a better choice of geom?

Why does the following give an error and how would you fix it?

> ggplot(data = penguins) + 
+     geom_point()
Error in `geom_point()`:
! Problem while setting up geom.
ℹ Error occurred in the 1st layer.
Caused by error in `compute_geom_1()`:
! `geom_point()` requires the following missing aesthetics:
 x and y
Run `rlang::last_trace()` to see where the error occurred.

There is an error becaus no aesthetics are provided for x and y axes

What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE.

na.rm if set to TRUE removes the missing values silently. na.rm is set to FALSE removes the missing values with a warning. The default value is FALSE, and shows the warning message

ggplot(data=penguins, mapping=aes(x=species, y=bill_depth_mm)) + geom_point(mapping=aes(color=species, shape=species), na.rm=FALSE) + geom_smooth(method=”lm”) + labs(title=”Bill Depth and Bill Length”, subtitle=”Dimensions for Adelie, Chinstrap, and Gentoo”, x=”Bill Length (mm)”, y=”Bill Depth (mm)”, color=”Species”, shape=”Species”) + scale_color_colorblind() geom_smooth() using formula = ‘y ~ x’ Warning messages: 1: Removed 2 rows containing non-finite values (stat_smooth()). 2: Removed 2 rows containing missing values (geom_point()).

Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().

> ggplot(data=penguins, mapping=aes(x=bill_length_mm, y=bill_depth_mm)) + geom_point(mapping=aes(color=species, shape=species)) + geom_smooth(method="lm") + labs(caption="Data is from the palmerpenguins package",title="Bill Depth and Bill Length", subtitle="Dimensions for Adelie, Chinstrap, and Gentoo", x="Bill Length (mm)", y="Bill Depth (mm)", color="Species", shape="Species") + scale_color_colorblind()

Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?

> ggplot(data=penguins, mapping=aes(x=flipper_length_mm, y=body_mass_g)) + geom_point(mapping=aes(color=bill_depth_mm)) + geom_smooth() + labs(caption="Data is from the palmerpenguins package")