species <- c(
"MountainBeaver",
"Cow",
"GreyWolf",
"Goat",
"GuineaPig",
"Diplodocus",
"AsianElephant",
"Donkey",
"Horse",
"PotarMonkey",
"Cat",
"Giraffe",
"Gorilla",
"Human",
"AfricanElephant",
"Triceratops",
"RhesusMonkey",
"Kangaroo",
"GoldenHamster",
"Mouse",
"Rabbit",
"Sheep",
"Jaguar",
"Chimpanzee",
"Rat",
"Brachiosaurus",
"Mole",
"Pig"
)
bodywt_kg <- c(
1.4,
465,
36.3,
27.7,
1.,
11700,
2547,
187.1,
521,
10,
3.3,
529,
207,
62,
6654,
9400,
6.8,
35,
0.1,
0.02,
2.5,
55.5,
100,
52.2,
0.3,
87000,
0.1,
192
)
brainwt_kg <- c(
0.0081,
0.423,
0.1195,
0.115,
0.0055,
0.05,
4.603,
0.419,
0.655,
0.115,
0.0256,
0.68,
0.406,
1.32,
5.712,
0.07,
0.179,
0.056,
0.001,
0.0004,
0.0121,
0.175,
NA,
0.44,
0.0019,
0.1545,
NA,
0.18
)Solution: Introduction to R
You have the following three vectors:
Variables and vectors
1. What is the 13th species in the vector?
species[13][1] "Gorilla"
2. Get the species at positions 6, 13, and 14.
species[c(6, 13, 14)][1] "Diplodocus" "Gorilla" "Human"
3. How many species are in the dataset? Save the number in a variable called n_species.
Note that if you save the result of a command in a variable, it won’t be printed to the console. You need to explicitly print it by typing the variable name .
n_species <- length(species)
n_species[1] 28
Vector arithmetic
4. Calculate the brain-to-body weight ratio for all species and save it in a new variable called ratio.
ratio <- brainwt_kg / bodywt_kg
ratio [1] 0.005785714286 0.000909677419 0.003292011019 0.004151624549 0.005500000000
[6] 0.000004273504 0.001807224185 0.002239444148 0.001257197697 0.011500000000
[11] 0.007757575758 0.001285444234 0.001961352657 0.021290322581 0.000858431019
[16] 0.000007446809 0.026323529412 0.001600000000 0.010000000000 0.020000000000
[21] 0.004840000000 0.003153153153 NA 0.008429118774 0.006333333333
[26] 0.000001775862 NA 0.000937500000
5. Convert the body weight from kg to grams and save it in a variable called bodywt_g.
bodywt_g <- bodywt_kg * 1000
bodywt_g [1] 1400 465000 36300 27700 1000 11700000 2547000 187100
[9] 521000 10000 3300 529000 207000 62000 6654000 9400000
[17] 6800 35000 100 20 2500 55500 100000 52200
[25] 300 87000000 100 192000
Functions and missing values
6. Calculate the mean body weight and the mean brain weight. What happens for brain weight and how do you fix it?
mean(bodywt_kg)[1] 4278.44
mean(brainwt_kg)[1] NA
The result is NA because brainwt_kg contains missing values. To fix this, use the na.rm argument to remove missing values before calculating the mean:
mean(brainwt_kg, na.rm = TRUE)[1] 0.6125615
7. Also try sum() and median() on brain weight.
sum(brainwt_kg, na.rm = TRUE)[1] 15.9266
median(brainwt_kg, na.rm = TRUE)[1] 0.137
Without na.rm = TRUE, both functions return NA:
sum(brainwt_kg)[1] NA
median(brainwt_kg)[1] NA
Optional tasks
- Round the
ratiovector to 2 decimal places:
round(ratio, digits = 2) [1] 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.02 0.00
[16] 0.00 0.03 0.00 0.01 0.02 0.00 0.00 NA 0.01 0.01 0.00 NA 0.00
- Try
min(),max(),sum(), andsd()onbrainwt_kg.
These functions all have the same fix for missing values.
min(brainwt_kg, na.rm = TRUE)[1] 0.0004
max(brainwt_kg, na.rm = TRUE)[1] 5.712
sum(brainwt_kg, na.rm = TRUE)[1] 15.9266
sd(brainwt_kg, na.rm = TRUE)[1] 1.379513
- Use
sum()together withis.na()to find out how many missing values are inbrainwt_kg.
# is.na() returns TRUE for each NA value
is.na(brainwt_kg) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[25] FALSE FALSE TRUE FALSE
TRUE is treated as 1 and FALSE as 0 when you use sum() on a logical vector. So summing the result gives the number of missing values:
sum(is.na(brainwt_kg))[1] 2