From R for Data Science
ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point()
ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point(position = "jitter")
The problem with that plot is that there are several overlapping points, causing overplotting, that obscures the distribution of data, and makes it hard to see how data is distributed throughout the graph.
The plot can be improvied with `geom_point(position = “jitter”), which creates random noise that moves the plots slightly to better display the data distribution
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(position = "identity")
There is nothing different between these two charts because the default of geom_point’s position argument is identity
width and height parameters
From the documentation: width and height: Amount of vertical and horizontal jitter. The jitter is added in both positive and negative directions, so the total spread is twice the value specified here.
Both geom_jitter() and geom_count() show overlapping data, where multiple data points are on the same point on the plot. Geom_jitter() provides random noise to plot each point and show where the multiple points are; geom_count() uses the sum stat to alter the size of the point to show where more points are, with the bigger circles showing more points.
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_jitter()
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_count()
The default position adjustment for geom_boxplot() is dodge2. From the documentation for position_dodge2(), dodging preserves the vertical position of a geom while adjusting the horizontal position. position_dodge2()
works with bars and rectangles, but is particularly useful for arranging box plots, which can have variable widths. When the color is added for “fl”, the widths are dodged, next to each other.
ggplot(mpg, aes(x = class, y = hwy, color= fl)) +
geom_boxplot()
ggsave("r-10-6-1-q5.png")