| city_name | population_size | city_area |
|---|---|---|
| Istanbul | 15100000 | 2576 |
| Moscow | 12500000 | 2561 |
| London | 9000000 | 1572 |
| Saint Petersburg | 5400000 | 1439 |
| Berlin | 3800000 | 891 |
| Madrid | 3200000 | 604 |
| Kyiv | 3000000 | 839 |
| Rome | 2800000 | 1285 |
| Bucharest | 2200000 | 228 |
| Paris | 2100000 | 105 |
Day 1 - Introduction to Data Analysis with R
Freie Universität Berlin - Theoretical Ecology
October 13, 2025
The built-in data structure for tables in R is a data frame.
Vectors in R can’t represent data table where values are connected via rows
Data frames are one of the biggest and most important ideas in R, and one of the things that make R different from other programming languages.
(H. Wickham, Advanced R)
| city_name | population_size | city_area |
|---|---|---|
| Istanbul | 15100000 | 2576 |
| Moscow | 12500000 | 2561 |
| London | 9000000 | 1572 |
| Saint Petersburg | 5400000 | 1439 |
| Berlin | 3800000 | 891 |
| Madrid | 3200000 | 604 |
| Kyiv | 3000000 | 839 |
| Rome | 2800000 | 1285 |
| Bucharest | 2200000 | 228 |
| Paris | 2100000 | 105 |
A data frame is a named list of vectors of the same length.
Data frames are created with the function data.frame():
cities <- c(
"Istanbul", "Moscow", "London",
"Saint Petersburg", "Berlin","Madrid",
"Kyiv", "Rome", "Bucharest","Paris")
population <- c(
15.1e6, 12.5e6, 9e6, 5.4e6, 3.8e6,
3.2e6, 3e6, 2.8e6, 2.2e6, 2.1e6)
area_km2 <- c(2576, 2561, 1572, 1439,
891, 604, 839, 1285, 228, 105)
cities_dataframe <- data.frame(
city_name = cities,
population_size = population,
city_area = area_km2
)#> city_name population_size city_area
#> 1 Istanbul 15100000 2576
#> 2 Moscow 12500000 2561
#> 3 London 9000000 1572
#> 4 Saint Petersburg 5400000 1439
#> 5 Berlin 3800000 891
#> 6 Madrid 3200000 604
#> 7 Kyiv 3000000 839
#> 8 Rome 2800000 1285
#> 9 Bucharest 2200000 228
#> 10 Paris 2100000 105
Tibbles are
a modern reimagining of the data frame. Tibbles are designed to be (as much as possible) drop-in replacements for data frames.
(Wickham, Advanced R)
Have a look at this book chapter for a full list of the differences between data frames and tibbles and the advantages of using tibbles.
Tibbles have the same basic properties as data frames (named list of vectors)
Everything that you can do with data frames, you can do with tibbles
Tibbles are a available from the tibble package.
Before we use tibbles, we need to install the package once using the function install.packages:
Then, we need to load the package into our current R session using library:
Create a tibble using the tibble() function:
#> # A tibble: 10 × 3
#> city_name population_size city_area
#> <chr> <dbl> <dbl>
#> 1 Istanbul 15100000 2576
#> 2 Moscow 12500000 2561
#> 3 London 9000000 1572
#> 4 Saint Petersburg 5400000 1439
#> 5 Berlin 3800000 891
#> 6 Madrid 3200000 604
#> 7 Kyiv 3000000 839
#> 8 Rome 2800000 1285
#> 9 Bucharest 2200000 228
#> 10 Paris 2100000 105
How many rows?
Look at the entire table in a separate window with view():
Get a quick summary of all columns:
Indexing tibbles works similar to indexing vectors but with 2 dimensions instead of 1:
tibble [ row_index, col_index or col_name ]
[] always returns another tibble.#> # A tibble: 10 × 2
#> city_name population_size
#> <chr> <dbl>
#> 1 Istanbul 15100000
#> 2 Moscow 12500000
#> 3 London 9000000
#> # ℹ 7 more rows
Indexing columns by name is usually preferred to indexing by position
Code is much easier to read
Code is more robust against
General rule
Good code produces errors when something unintended or wrong happens
$Select an entire column from a tibble using $ (this returns a vector instead of a tibble):
New columns can be added as vectors using the $ operator. The vectors need to have the same length as the tibble has rows.
#> # A tibble: 10 × 4
#> city_name population_size city_area country
#> <chr> <dbl> <dbl> <chr>
#> 1 Istanbul 15100000 2576 Turkey
#> 2 Moscow 12500000 2561 Russia
#> 3 London 9000000 1572 UK
#> 4 Saint Petersburg 5400000 1439 Russia
#> 5 Berlin 3800000 891 Germany
#> 6 Madrid 3200000 604 Spain
#> 7 Kyiv 3000000 839 Ukraine
#> 8 Rome 2800000 1285 Italy
#> 9 Bucharest 2200000 228 Romania
#> 10 Paris 2100000 105 France
Tables in R: Data frames and tibbles
install.packages("tibble")library(tibble) at the beginning of your script to load packageReturn result as tibble:
Return result as vector:
Task (15 min)
Tibbles
Find the task description here
Selina Baldauf // Data frames and tibbles