From R for Data Science
The z variable is the depth because it is smaller than the x and y values; and the x and y values are have almost the same values, and a diamond would most likely have nearly equal sides.
#depth
ggplot(diamonds, aes(x = z)) +
geom_histogram(binwidth = 0.5) +
coord_cartesian(xlim = c(0, 10))
ggsave("r-10-3-3-q1_1.png")
#width/height
ggplot(diamonds, aes(x = x)) +
geom_histogram(binwidth = 0.5) +
coord_cartesian(xlim = c(0, 10))
ggsave("r-10-3-3-q1_2.png")
#width/height
ggplot(diamonds, aes(x = y)) +
geom_histogram(binwidth = 0.5) +
coord_cartesian(xlim = c(0, 10))
ggsave("r-10-3-3-q1_3.png")
The larges number of the diamonds in the dataset are priced just below $1,000, and the number of diamonds decreases as the prices get higher for the most part, with a slight increasebetween 4,000 to 4,500.
When the binwidth is too large, it groups all of the diamonds below 1000 together, so that it obscures the variation in the number of diamonds within the $500 to $1,000 range. When the binwidth is too small and the whole x-axis is shown without limiting the higher, outer range of the values where there are not as many observations, it is difficult to decipher which price has the more diamonds. The smaller binwidth shows that the largest number of diamonds are around $700, and then decreases as the price reaches $1,000.
This plot has the default binwidth, at 30, and does not show that the number of diamonds decrease suddenly at $1,000 and does not show the variation just before $1,000 also
ggplot(diamonds, aes(x = price )) +
geom_histogram(color="red")
ggsave("r-10-3-3-q2_2.png")
A better binwidth size. Not too small, not too big either
ggplot(diamonds, aes(x = price )) +
geom_histogram(binwidth =100,color="red") +
coord_cartesian(xlim = c(0,5000))
ggsave("r-10-3-3-q2_1.png")
diamonds |>
filter(carat == 1 | carat == 0.99) |>
count(carat)
# A tibble: 2 × 2
carat n
<dbl> <int>
1 0.99 23
2 1 1558
There are 1,558 diamonds that are 1 carat, and 0.99 that are 23. Rounding up is probably the cause of the difference.
coord_cartesian needs xlim or ylim to zoom. If binwidth is unset, it defaults to 30. If you try to zoom so only half a bar shows, it will only show half the bar on the graph then