Introduction to RStudio, Importing Data, and Running Code
January 16, 2026
01. What is R?
02. What is RStudio?
03. What is an RStudio project?
04. What is a coding notebook?
05. How do I write and run code in a coding notebook?
06. What is an object and how do I create one?
07. What are R packages and how do I load them?
08. How do I import data?
09. How can I get to know my variables?
10. How can I transform my data?
Programming language used for statistical computing and graphics
Free and open source
Base functionality + thousands of extensions
“RStudio gives you a way to talk to your computer. R gives you a language to speak in.”
Hands-On Programming with R
Many thanks to Julia Silge at Posit
File directory where you can store your data, R scripts/coding notebooks, and output for a given project
Keeps everything together in one place
| Document type | Description | |
|---|---|---|
| R script | .R | Plain text file |
| R Markdown | .Rmd | Combines text, code, and results |
| Quarto | .qmd | Fancier version of R Markdown |
Cmd + Shift + Enter / Ctrl + Shift + EnterNote: To create a new code chunk, click the green +C button at the top of the Source pane and select R (or use the keyboard shortcut Cmd + Option + I / Ctrl + Alt + I )
Task
In your Quarto notebook, write and run code that finds the square root of 60
Note: the square root function in R is sqrt()
Hint
Run a code chunk by clicking the green “play” button or Cmd / Ctrl + Shift + Enter
<-
object_name <- valueTask
Create an object called my_age that stores your age, then “call” the object so it prints to the Console.
| Task | base R | {tidyverse} |
|---|---|---|
| Keep rows where x > 1 | data[data$x > 1, ] |
data |> filter(x > 1) |
| Keep columns x and y | data[ , c("x", "y")] |
data |> select(x, y) |
First install using the function install.packages()
Note: This is one of the few times you should code directly in the console – do not include this function in your coding notebook
Then load using the function library()
Note: It is generally good practice to load all of your needed packages at the beginning of your script or coding notebook
read_csv()read_csv()parentheses, specify where in your project folder the file is saved, and what the file is calledsummary() for numeric
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 23.00 31.00 31.88 39.00 77.00
unique() for character
[1] "David Bowie"
[2] "David Bowie (aka Space Oddity)"
[3] "The Man Who Sold the World"
[4] "Hunky Dory"
[5] "Aladdin Sane"
[6] "The Rise and Fall of Ziggy Stardust and the Spiders from Mars"
[7] "Pinups"
[8] "Diamond Dogs"
[9] "Young Americans"
[10] "Station to Station"
[11] "\"Heroes\""
[12] "Low"
[13] "Lodger"
[14] "Scary Monsters (And Super Creeps)"
[15] "Let's Dance"
[16] "Tonight"
[17] "Never Let Me Down"
[18] "Black Tie White Noise"
[19] "Buddha of Suburbia"
[20] "1. Outside (The Nathan Adler Diaries: A Hyper Cycle)"
[21] "Earthling"
[22] "Hours..."
[23] "Heathen"
[24] "Reality"
[25] "The Next Day"
[26] "Blackstar"
Task
Using summary(), determine whether the mean or median of energy is greater.
| Task | {dplyr} verb |
|---|---|
| Subset rows | filter() |
| Subset columns | select() |
| Sort | arrange() |
| Create a new variable | mutate() |
Note: {dplyr} is one of the core {tidyverse} packages
|> or %>%
Links together lines of {tidyverse} code
Means “and then”
Cmd/Ctrl + Shift + M
filter()== is a conditional equals sign (tests equivalence)filter()%in% tests for equivalence against a vector of valuesc( ) creates a vector (collection of values)filter()!= means “does not equal”Task
Filter bowie to only contain songs that are NOT on the “Low” album
select()Limit columns:
# A tibble: 276 × 3
track year energy
<chr> <dbl> <dbl>
1 Come And Buy My Toys 1967 0.183
2 When I Live My Dream 1967 0.19
3 Love You Till Tuesday 1967 0.346
4 Uncle Arthur 1967 0.337
5 Rubber Band 1967 0.312
6 Sell Me A Coat 1967 0.256
7 There Is A Happy Land 1967 0.338
8 She's Got Medals 1967 0.417
9 We Are Hungry Men 1967 0.365
10 Maid Of Bond Street 1967 0.326
# ℹ 266 more rows
select()There are shortcuts for variables with patterns:
# A tibble: 276 × 5
loudness speechiness acousticness instrumentalness liveness
<dbl> <dbl> <dbl> <dbl> <dbl>
1 -16.9 0.0604 0.847 0.00000108 0.103
2 -14.6 0.0294 0.712 0 0.236
3 -15.5 0.0338 0.774 0.0000433 0.0651
4 -15.0 0.0783 0.701 0 0.0882
5 -16.5 0.0725 0.826 0.0000101 0.123
6 -15.3 0.0462 0.743 0 0.154
7 -14.8 0.0275 0.719 0.0000018 0.1
8 -12.0 0.0575 0.376 0.0000015 0.277
9 -14.0 0.0639 0.191 0 0.133
10 -12.6 0.0582 0.795 0 0.34
# ℹ 266 more rows
Task
Limit the columns in bowie to only include track and speechiness
arrange()Ascending order by default:
# A tibble: 276 × 2
track speechiness
<chr> <dbl>
1 Where Are We Now? 0.0228
2 Outside 0.0238
3 Everyone Says 'Hi' 0.0241
4 Days 0.0241
5 Something in the Air 0.0245
6 Survive 0.0254
7 Shining Star (Makin' My Love) 0.0261
8 Loving The Alien 0.0262
9 Love Is Lost 0.0264
10 Tonight 0.0265
# ℹ 266 more rows
arrange()Use desc() for descending order (can also use - sign):
# A tibble: 276 × 2
track speechiness
<chr> <dbl>
1 Please Mr. Gravedigger 0.87
2 Segue - Nathan Adler - Version #1 0.292
3 Chant of the Ever Circling Skeletal Family 0.281
4 Neighborhood Threat 0.213
5 What in the World 0.207
6 Somebody up There Likes Me 0.203
7 Join The Gang 0.196
8 Right 0.17
9 Battle for Britain (The Letter) 0.163
10 Black Tie White Noise 0.151
# ℹ 266 more rows
arrange()Use desc() for descending order (can also use - sign):
# A tibble: 276 × 2
track speechiness
<chr> <dbl>
1 Please Mr. Gravedigger 0.87
2 Segue - Nathan Adler - Version #1 0.292
3 Chant of the Ever Circling Skeletal Family 0.281
4 Neighborhood Threat 0.213
5 What in the World 0.207
6 Somebody up There Likes Me 0.203
7 Join The Gang 0.196
8 Right 0.17
9 Battle for Britain (The Letter) 0.163
10 Black Tie White Noise 0.151
# ℹ 266 more rows
arrange()Arrange by multiple variables:
# A tibble: 276 × 3
track year speechiness
<chr> <dbl> <dbl>
1 There Is A Happy Land 1967 0.0275
2 When I Live My Dream 1967 0.0294
3 Love You Till Tuesday 1967 0.0338
4 Sell Me A Coat 1967 0.0462
5 She's Got Medals 1967 0.0575
6 Maid Of Bond Street 1967 0.0582
7 Come And Buy My Toys 1967 0.0604
8 We Are Hungry Men 1967 0.0639
9 Rubber Band 1967 0.0725
10 Uncle Arthur 1967 0.0783
# ℹ 266 more rows
Task
Find the 3 most popular David Bowie songs (according to Spotify)
Note: Browse the environment pane for the variable that measures this (it’s at the bottom)
mutate()Duration in minutes?
# A tibble: 276 × 3
track duration_sec duration_min
<chr> <dbl> <dbl>
1 Come And Buy My Toys 129 2.15
2 When I Live My Dream 204 3.4
3 Love You Till Tuesday 194 3.23
4 Uncle Arthur 131 2.18
5 Rubber Band 139 2.32
6 Sell Me A Coat 182 3.03
7 There Is A Happy Land 197 3.28
8 She's Got Medals 146 2.43
9 We Are Hungry Men 180 3
10 Maid Of Bond Street 105 1.75
# ℹ 266 more rows
mutate()Create multiple variables at once:
# A tibble: 276 × 4
track album duration_min duration_hr
<chr> <chr> <dbl> <dbl>
1 Come And Buy My Toys David Bowie 2.15 0.0358
2 When I Live My Dream David Bowie 3.4 0.0567
3 Love You Till Tuesday David Bowie 3.23 0.0539
4 Uncle Arthur David Bowie 2.18 0.0364
5 Rubber Band David Bowie 2.32 0.0386
6 Sell Me A Coat David Bowie 3.03 0.0506
7 There Is A Happy Land David Bowie 3.28 0.0547
8 She's Got Medals David Bowie 2.43 0.0406
9 We Are Hungry Men David Bowie 3 0.05
10 Maid Of Bond Street David Bowie 1.75 0.0292
# ℹ 266 more rows
Task
Create a new variable in bowie that multiplies danceability by 100 (you pick the name of the new variable)
Then limit your columns to track, danceability, and your new variable.
Note: The symbol for multiplication in R is *
Try the extended exercises!
Come next week!
Email askdata@duke.edu