From R for Data Science
Exercises 8.2.4
**1-What function would you use to read a file where fields were separated with “ | ”?** |
read_delim(delim="|")
2-Apart from file, skip, and comment, what other arguments do read_csv() and read_tsv() have in common?
They have all arguments in common - documentation
read_csv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
read_tsv(
file,
col_names = TRUE,
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
3-What are the most important arguments to read_fwf()?
file and col_positions -> either fwf_empty(), fwf_widths(), fwf_position(), or fwf_cols() documentation
4-Sometimes strings in a CSV file contain commas. To prevent them from causing problems, they need to be surrounded by a quoting character, like “ or ‘. By default, read_csv() assumes that the quoting character will be “. To read the following text into a data frame, what argument to read_csv() do you need to specify?
"x,y\n1,'a,b'"
read_csv("x,y\n1,'a,b'", quote = "\'")
5-Identify what is wrong with each of the following inline CSV files. What happens when you run the code?
read_csv("a,b\n1,2,3\n4,5,6")
#correction - add enough column names
read_csv("a,b,c\n1,2,3\n4,5,6")
read_csv("a,b,c\n1,2\n1,2,3,4")
#correction - add enough column names and provide values for NA
read_csv("a,b,c,d\n1,2,,\n1,2,3,4", na = c("", "NA"))
read_csv("a,b\n\"1")
#not enough values, and additional backslash included
read_csv("a,b\n1,")
read_csv("a,b\n1,2\na,b")
#there is nothing wrong with that one
read_csv("a;b\n1;3", delim=";")
#change read_csv to read_delim
read_delim("a;b\n1;3", delim=";")
6-Practice referring to non-syntactic names in the following data frame by:
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
Extracting the variable called 1.
> annoying |>
+ select(1)
# A tibble: 10 × 1
`1`
<int>
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Plotting a scatterplot of 1 vs. 2.
ggplot(data=annoying, aes(x = `1`, y = `2`)) + geom_point()
Creating a new column called 3, which is 2 divided by 1.
annoying |>
mutate("3"=`2`/`1`)
# A tibble: 10 × 3
`1` `2` `3`
<int> <dbl> <dbl>
1 1 2.49 2.49
2 2 5.47 2.73
3 3 5.89 1.96
4 4 7.97 1.99
5 5 9.18 1.84
6 6 11.9 1.98
7 7 14.7 2.09
8 8 15.8 1.98
9 9 17.1 1.90
10 10 19.1 1.91
Renaming the columns to one, two, and three.
annoying |>
rename(
"one" = `1`
"two" = `2`,
"three"=`3`)