From R for Data Science
10.4.1 Exercises
The NA values are dropped in a histogram because they cannot be put in a particular bin. The NA valus in a bar chart are included as their own category.
It removes the missing values (“NA”) from the calculations.
When faceting the plot by canceled vs non-canceled flights to mitigate the effect of non-cancelled flights and changing the scale to “free_y”, it becomes more apparent that there flights are less likely to be canceled between 5 am and 12 pm, and after that, and around 8 pm. The number of canceled flights trends upwards as the day progresses, where as teh number of non-canceled flights has two peaks, between 5 and 10 am, and again from 3 to 8 pm then.
flights2 <- nycflights13::flights |>
mutate(canceled = is.na(dep_time),
sched_hour = sched_dep_time %/% 100,
sched_min = sched_dep_time %% 100,
sched_dep_time = sched_hour + (sched_min / 60)
)
ggplot(flights2, aes(x = sched_dep_time)) +
geom_freqpoly(aes(color = canceled), binwidth = 1/4)
ggplot("r-unusual-values-q3_1.png")
ggplot(flights2, aes(x = sched_dep_time)) +
geom_freqpoly(aes(color = canceled), binwidth = 1/4) +
facet_wrap(~ canceled, scale="free_y")
ggplot("r-unusual-values-q3_2.png")