From R for Data Science
Show color as a percentage of cut, and color as a percentage of cut.
diamonds_prop_cut_color <- diamonds |>
group_by(color, cut) |>
summarize(count = n()) |>
mutate(prop = count/sum(count))
diamonds_prop_color_cut <- diamonds |>
group_by(cut, color) |>
summarize(count = n()) |>
mutate(prop = count/sum(count))
ggplot(diamonds_prop_cut_color, aes(x = cut, y = color)) +
geom_tile(aes(fill = prop))
ggsave("r-10-5-2-1-q1_1.png")
ggplot(diamonds_prop_color_cut, aes(x = color, y = cut)) +
geom_tile(aes(fill = prop))
ggsave("r-10-5-2-1-q1_2.png")
ggplot(diamonds, aes(x = color, fill = cut)) +
geom_bar()
ggsave("r-10-5-2-1-q2.png")
diamonds |>
count(color, cut)
This bar chart shows that the color G has the most diamonds; the largest amount of diamonds are considered ideal, premium, and very good, while the least amount of diamonds are considered fair and good. The counts within each segment show the total diamonds that are the cut within in each color then
The plot with average departure delay by destination and month is difficult to read because there are too many destination values. The plot could be improved by filtering only to certain ranges of average delays, arranging the plot by average delays, or only selecting destinations with a certain number of flights.
nycflights13::flights |>
#filter(str_detect(dest, "^A")) |>
group_by(dest, month) |>
summarize(avg_delay = mean(dep_delay, na.rm = TRUE), na.rm = TRUE) |>
ggplot(aes(x = dest, y = factor(month))) +
geom_tile(aes(fill = avg_delay))
ggsave("r-10-5-2-1-q3_1_rev.png")