Introduction to RStudio, Importing Data, and Running Code
September 11, 2025
01. What is R?
02. What is RStudio?
03. What is an RStudio project?
04. What is a coding notebook?
05. How do I write and run code in a coding notebook?
06. What is an object and how do I create one?
07. What are R packages and how do I load them?
08. How do I import data?
09. How can I get to know my variables?
10. How can I transform my data?
Programming language used for statistical computing and graphics
Free and open source
Base functionality + thousands of extensions
“RStudio gives you a way to talk to your computer. R gives you a language to speak in.”
Hands-On Programming with R
Many thanks to Julia Silge at Posit
File directory where you can store your data, R scripts/coding notebooks, and output for a given project
Keeps everything together in one place
Document type | Description | |
---|---|---|
R script | .R | Plain text file |
R Markdown | .Rmd | Combines text, code, and results |
Quarto | .qmd | Fancier version of R Markdown |
Cmd + Shift + Enter / Ctrl + Shift + Enter
Note: To create a new code chunk, click the green +C
button at the top of the Source pane and select R
(or use the keyboard shortcut Cmd + Option + I / Ctrl + Alt + I
)
01:00
Task
In your Quarto notebook, write and run code that finds the square root of 60
Note: the square root function in R is sqrt()
Hint
Run a code chunk by clicking the green “play” button or Cmd / Ctrl + Shift + Enter
<-
object_name <- value
00:40
Task | base R | {tidyverse} |
---|---|---|
Keep rows where x > 1 | data[data$x > 1, ] |
data |> filter(x > 1) |
Keep columns x and y | data[ , c("x", "y")] |
data |> select(x, y) |
First install using the function install.packages()
Note: This is one of the few times you should code directly in the console – do not include this function in your coding notebook
Then load using the function library()
Note: It is generally good practice to load all of your needed packages at the beginning of your script or coding notebook
read_csv()
read_csv()
parentheses, specify where in your project folder the file is saved, and what the file is calledsummary()
for numeric
unique()
for character
[1] "THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY"
[2] "THE TORTURED POETS DEPARTMENT"
[3] "1989 (Taylor's Version) [Deluxe]"
[4] "1989 (Taylor's Version)"
[5] "Speak Now (Taylor's Version)"
[6] "Midnights (The Til Dawn Edition)"
[7] "Midnights (3am Edition)"
[8] "Midnights"
[9] "Red (Taylor's Version)"
[10] "Fearless (Taylor's Version)"
[11] "evermore (deluxe version)"
[12] "evermore"
[13] "folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition]"
[14] "folklore (deluxe version)"
[15] "folklore"
[16] "Lover"
[17] "reputation"
[18] "reputation Stadium Tour Surprise Song Playlist"
[19] "1989 (Deluxe)"
[20] "1989"
[21] "Red (Deluxe Edition)"
[22] "Red"
[23] "Speak Now World Tour Live"
[24] "Speak Now"
[25] "Speak Now (Deluxe Package)"
[26] "Fearless (Platinum Edition)"
[27] "Fearless (International Version)"
[28] "Live From Clear Channel Stripped 2008"
[29] "Taylor Swift (Deluxe Edition)"
00:40
Task | {dplyr} verb |
---|---|
Subset rows | filter() |
Subset columns | select() |
Sort | arrange() |
Create a new variable | mutate() |
Note: {dplyr} is one of the core {tidyverse} packages
|>
or %>%
Links together lines of {tidyverse} code
Means “and then”
Cmd/Ctrl + Shift + M
filter()
==
is a conditional equals sign (tests equivalence)filter()
%in%
tests for equivalence against a vector of valuesc( )
creates a vector (collection of values)filter()
!=
means “does not equal”01:00
select()
Limit columns:
# A tibble: 582 × 3
name energy loudness
<chr> <dbl> <dbl>
1 Fortnight (feat. Post Malone) 0.386 -11.0
2 The Tortured Poets Department 0.428 -8.44
3 My Boy Only Breaks His Favorite Toys 0.563 -7.36
4 Down Bad 0.366 -10.4
5 So Long, London 0.533 -11.4
6 But Daddy I Love Him 0.72 -7.68
7 Fresh Out The Slammer 0.483 -9.39
8 Florida!!! (feat. Florence + The Machine) 0.573 -7.12
9 Guilty as Sin? 0.428 -8.37
10 Who’s Afraid of Little Old Me? 0.338 -10.6
# ℹ 572 more rows
select()
There are shortcuts for variables with patterns:
# A tibble: 582 × 5
acousticness instrumentalness liveness loudness speechiness
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.502 0.0000153 0.0961 -11.0 0.0308
2 0.0483 0 0.126 -8.44 0.0255
3 0.137 0 0.302 -7.36 0.0269
4 0.56 0.000001 0.0946 -10.4 0.0748
5 0.73 0.00264 0.0816 -11.4 0.322
6 0.384 0 0.135 -7.68 0.104
7 0.624 0 0.111 -9.39 0.0399
8 0.178 0 0.309 -7.12 0.138
9 0.607 0 0.0921 -8.37 0.0261
10 0.315 0 0.106 -10.6 0.048
# ℹ 572 more rows
00:40
arrange()
Ascending order by default:
# A tibble: 582 × 2
name speechiness
<chr> <dbl>
1 Teardrops On My Guitar - Radio Single Remix 0.0231
2 Teardrops On My Guitar - Radio Single Remix 0.0234
3 SuperStar 0.0239
4 All Too Well 0.0243
5 All Too Well 0.0243
6 Invisible 0.0243
7 Stay Stay Stay 0.0245
8 Stay Stay Stay 0.0245
9 Invisible 0.0246
10 Stay Beautiful 0.0246
# ℹ 572 more rows
arrange()
Use desc()
for descending order:
# A tibble: 582 × 2
name speechiness
<chr> <dbl>
1 I Wish You Would - Voice Memo 0.912
2 Blank Space - Voice Memo 0.721
3 I Know Places - Voice Memo 0.589
4 I Forgot That You Existed 0.519
5 Vigilante Shit 0.387
6 Vigilante Shit 0.387
7 Vigilante Shit 0.387
8 So Long, London 0.322
9 So Long, London 0.322
10 Glitch 0.259
# ℹ 572 more rows
arrange()
Arrange by multiple variables:
# A tibble: 582 × 3
name album speechiness
<chr> <chr> <dbl>
1 Bad Blood 1989 0.181
2 Shake It Off 1989 0.165
3 Wildest Dreams 1989 0.0741
4 I Know Places 1989 0.0661
5 Blank Space 1989 0.0646
6 I Wish You Would 1989 0.0511
7 How You Get The Girl 1989 0.0492
8 Style 1989 0.0382
9 Out Of The Woods 1989 0.0372
10 Clean 1989 0.035
# ℹ 572 more rows
00:40
mutate()
Song duration in seconds (rather than milliseconds)?
# A tibble: 582 × 3
name duration_ms duration_sec
<chr> <dbl> <dbl>
1 Fortnight (feat. Post Malone) 228965 229.
2 The Tortured Poets Department 293048 293.
3 My Boy Only Breaks His Favorite Toys 203801 204.
4 Down Bad 261228 261.
5 So Long, London 262974 263.
6 But Daddy I Love Him 340428 340.
7 Fresh Out The Slammer 210789 211.
8 Florida!!! (feat. Florence + The Machine) 215463 215.
9 Guilty as Sin? 254365 254.
10 Who’s Afraid of Little Old Me? 334084 334.
# ℹ 572 more rows
mutate()
Create multiple variables at once:
tswift |>
mutate(duration_sec = duration_ms / 1000,
duration_min = duration_sec / 60) |>
select(name, album, duration_sec, duration_min)
# A tibble: 582 × 4
name album duration_sec duration_min
<chr> <chr> <dbl> <dbl>
1 Fortnight (feat. Post Malone) THE TORT… 229. 3.82
2 The Tortured Poets Department THE TORT… 293. 4.88
3 My Boy Only Breaks His Favorite Toys THE TORT… 204. 3.40
4 Down Bad THE TORT… 261. 4.35
5 So Long, London THE TORT… 263. 4.38
6 But Daddy I Love Him THE TORT… 340. 5.67
7 Fresh Out The Slammer THE TORT… 211. 3.51
8 Florida!!! (feat. Florence + The Machine) THE TORT… 215. 3.59
9 Guilty as Sin? THE TORT… 254. 4.24
10 Who’s Afraid of Little Old Me? THE TORT… 334. 5.57
# ℹ 572 more rows
01:30
Task
Create a new variable in tswift
that multiplies danceability
by 100 (you pick the name of the new variable)
Then limit your columns to name
, danceability
, and your new variable.
Note: The symbol for multiplication in R is *
Try the extended exercises!
Come next week!
Email askdata@duke.edu