Learn R Workshop Part I

Introduction to RStudio, Importing Data, and Running Code

McCall Pitcher
Center for Data and Visualization Sciences

September 11, 2025

Questions we will answer today


01. What is R?

02. What is RStudio?

03. What is an RStudio project?

04. What is a coding notebook?

05. How do I write and run code in a coding notebook?

06. What is an object and how do I create one?

07. What are R packages and how do I load them?

08. How do I import data?

09. How can I get to know my variables?

10. How can I transform my data?

01.
What is R?

What is R?

  • Programming language used for statistical computing and graphics

  • Free and open source

  • Base functionality + thousands of extensions

02.
What is RStudio?

What is RStudio?


“RStudio gives you a way to talk to your computer. R gives you a language to speak in.”

Hands-On Programming with R

What is RStudio?

What is RStudio?

Many thanks to Julia Silge at Posit

03.
What is an RStudio project?

What is an RStudio project?

  • File directory where you can store your data, R scripts/coding notebooks, and output for a given project

  • Keeps everything together in one place

04.
What is a coding notebook?

What is a coding notebook?

  • Code written directly in the Console does not get saved!
  • You should document your code so it’s reproducible
Document type Description
R script .R Plain text file
R Markdown .Rmd Combines text, code, and results
Quarto .qmd Fancier version of R Markdown
  • Today we will use Quarto!

05.
How do I write and run code in a coding notebook?

How do I write and run code in a coding notebook?

  • Place code inside a “code chunk”
  • Run by pressing the green “play” button in the corner, or hit Cmd + Shift + Enter / Ctrl + Shift + Enter


Note: To create a new code chunk, click the green +C button at the top of the Source pane and select R (or use the keyboard shortcut Cmd + Option + I / Ctrl + Alt + I )

05. Your Turn!

01:00

Task

In your Quarto notebook, write and run code that finds the square root of 60

Note: the square root function in R is sqrt()

Hint

Run a code chunk by clicking the green “play” button or Cmd / Ctrl + Shift + Enter

06.
What is an object and how do I create one?

What is an object and how do I create one?

  • Something you store in R
  • Can be a single value, a collection of values, or something even more complex like a function or a plot
  • Create using the assignment operator <-
    • The object name you create on the left gets value from whatever you place on the right: object_name <- value

What is an object and how do I create one?

Store objects

# McCall's favorite number
fav_number <- 11

# McCall's favorite word
fav_word <- "quintessence"

Call objects

fav_number
[1] 11
fav_word 
[1] "quintessence"

06. Your Turn!

00:40

Task

Create an object called my_age that stores your age, then “call” the object so it prints to the Console.

Hint

# store favorite number
fav_number <- 11

# call favorite number
fav_number
[1] 11

07.
What are R packages and how do I load them?

What are R packages and how do I load them?

  • R has a lot of functionality built-in, often referred to as “base R”
  • However, R is set up to allow users to write packages that extend this functionality

What are R packages and how do I load them?

  • {tidyverse} is a widely used collection of R packages designed to streamline data manipulation, exploration, and visualization
  • Many find {tidyverse} syntax to be more intuitive than base R

What are R packages and how do I load them?


Task base R {tidyverse}
Keep rows where x > 1 data[data$x > 1, ] data |> filter(x > 1)
Keep columns x and y data[ , c("x", "y")] data |> select(x, y)

What are R packages and how do I load them?

First install using the function install.packages()

Note: This is one of the few times you should code directly in the console – do not include this function in your coding notebook


Then load using the function library()

Note: It is generally good practice to load all of your needed packages at the beginning of your script or coding notebook

08.
How do I import data?

How do I import data?

  • Today you will learn how to import a comma-separated values file, or CSV, one of the most common plain text data types
  • Rectangular data (rows and columns) are stored in a tabular data structure called a data frame
  • Data include information about Taylor Swift’s Spotify music (downloaded here)

How do I import data?

  • There are multiple ways to load a CSV. We will use the {tidyverse} function read_csv()
  • Inside the read_csv()parentheses, specify where in your project folder the file is saved, and what the file is called
# import data
tswift <- read_csv("data/taylor_swift_spotify.csv")

09.
How can I get to know my variables?

How can I get to know my variables?

summary() for numeric

summary(tswift$popularity)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   45.00   62.00   57.86   70.00   93.00 

unique() for character

unique(tswift$album)
 [1] "THE TORTURED POETS DEPARTMENT: THE ANTHOLOGY"                                       
 [2] "THE TORTURED POETS DEPARTMENT"                                                      
 [3] "1989 (Taylor's Version) [Deluxe]"                                                   
 [4] "1989 (Taylor's Version)"                                                            
 [5] "Speak Now (Taylor's Version)"                                                       
 [6] "Midnights (The Til Dawn Edition)"                                                   
 [7] "Midnights (3am Edition)"                                                            
 [8] "Midnights"                                                                          
 [9] "Red (Taylor's Version)"                                                             
[10] "Fearless (Taylor's Version)"                                                        
[11] "evermore (deluxe version)"                                                          
[12] "evermore"                                                                           
[13] "folklore: the long pond studio sessions (from the Disney+ special) [deluxe edition]"
[14] "folklore (deluxe version)"                                                          
[15] "folklore"                                                                           
[16] "Lover"                                                                              
[17] "reputation"                                                                         
[18] "reputation Stadium Tour Surprise Song Playlist"                                     
[19] "1989 (Deluxe)"                                                                      
[20] "1989"                                                                               
[21] "Red (Deluxe Edition)"                                                               
[22] "Red"                                                                                
[23] "Speak Now World Tour Live"                                                          
[24] "Speak Now"                                                                          
[25] "Speak Now (Deluxe Package)"                                                         
[26] "Fearless (Platinum Edition)"                                                        
[27] "Fearless (International Version)"                                                   
[28] "Live From Clear Channel Stripped 2008"                                              
[29] "Taylor Swift (Deluxe Edition)"                                                      

09. Your Turn!

00:40

Task

Using summary(), determine whether the mean or median of danceability is greater.

Hint

# info about numeric var
summary(tswift$popularity)

10.
What are some ways I can transform my data?

What are some ways I can transform my data?


Task {dplyr} verb
Subset rows filter()
Subset columns select()
Sort arrange()
Create a new variable mutate()


Note: {dplyr} is one of the core {tidyverse} packages

The pipe operator


|>     or    %>%


  • Links together lines of {tidyverse} code

  • Means “and then”

  • Cmd/Ctrl + Shift + M

10.1
Subset rows with filter()

Subset rows with filter()

  • Limit to one album?
  • == is a conditional equals sign (tests equivalence)
tswift |> 
  filter(album == "Red")

Subset rows with filter()

  • Limit to two (or more) albums?
  • %in% tests for equivalence against a vector of values
  • c( ) creates a vector (collection of values)
tswift |> 
  filter(album %in% c("Red", "reputation"))

Subset rows with filter()

  • Limit based on multiple conditions?
  • != means “does not equal”
tswift |> 
  filter(album != "Midnights",
         danceability > .7)

10.1 Your Turn!

01:00

Task

Filter tswift to only contain songs that are NOT on the “Lover” album

Hint

# exclude one album
tswift |> 
  filter(album != "Midnights")

10.2
Subset columns with select()

Subset columns with select()

Limit columns:

# keep only some variables
tswift |> 
  select(name, energy, loudness)
# A tibble: 582 × 3
   name                                      energy loudness
   <chr>                                      <dbl>    <dbl>
 1 Fortnight (feat. Post Malone)              0.386   -11.0 
 2 The Tortured Poets Department              0.428    -8.44
 3 My Boy Only Breaks His Favorite Toys       0.563    -7.36
 4 Down Bad                                   0.366   -10.4 
 5 So Long, London                            0.533   -11.4 
 6 But Daddy I Love Him                       0.72     -7.68
 7 Fresh Out The Slammer                      0.483    -9.39
 8 Florida!!! (feat. Florence + The Machine)  0.573    -7.12
 9 Guilty as Sin?                             0.428    -8.37
10 Who’s Afraid of Little Old Me?             0.338   -10.6 
# ℹ 572 more rows

Subset columns with select()

There are shortcuts for variables with patterns:

tswift |> 
  select(ends_with("ness"))
# A tibble: 582 × 5
   acousticness instrumentalness liveness loudness speechiness
          <dbl>            <dbl>    <dbl>    <dbl>       <dbl>
 1       0.502         0.0000153   0.0961   -11.0       0.0308
 2       0.0483        0           0.126     -8.44      0.0255
 3       0.137         0           0.302     -7.36      0.0269
 4       0.56          0.000001    0.0946   -10.4       0.0748
 5       0.73          0.00264     0.0816   -11.4       0.322 
 6       0.384         0           0.135     -7.68      0.104 
 7       0.624         0           0.111     -9.39      0.0399
 8       0.178         0           0.309     -7.12      0.138 
 9       0.607         0           0.0921    -8.37      0.0261
10       0.315         0           0.106    -10.6       0.048 
# ℹ 572 more rows

10.2 Your Turn!

00:40

Task

Limit the columns in tswift to only include name and speechiness

Hint

# keep only some variables
tswift |> 
  select(name, energy, loudness)

10.3
Sort rows with arrange()

Sort rows with arrange()

Ascending order by default:

tswift |> 
  select(name, speechiness) |> 
  arrange(speechiness)
# A tibble: 582 × 2
   name                                        speechiness
   <chr>                                             <dbl>
 1 Teardrops On My Guitar - Radio Single Remix      0.0231
 2 Teardrops On My Guitar - Radio Single Remix      0.0234
 3 SuperStar                                        0.0239
 4 All Too Well                                     0.0243
 5 All Too Well                                     0.0243
 6 Invisible                                        0.0243
 7 Stay Stay Stay                                   0.0245
 8 Stay Stay Stay                                   0.0245
 9 Invisible                                        0.0246
10 Stay Beautiful                                   0.0246
# ℹ 572 more rows

Sort rows with arrange()

Use desc() for descending order:

tswift |> 
  select(name, speechiness) |> 
  arrange(desc(speechiness))
# A tibble: 582 × 2
   name                          speechiness
   <chr>                               <dbl>
 1 I Wish You Would - Voice Memo       0.912
 2 Blank Space - Voice Memo            0.721
 3 I Know Places - Voice Memo          0.589
 4 I Forgot That You Existed           0.519
 5 Vigilante Shit                      0.387
 6 Vigilante Shit                      0.387
 7 Vigilante Shit                      0.387
 8 So Long, London                     0.322
 9 So Long, London                     0.322
10 Glitch                              0.259
# ℹ 572 more rows

Sort rows with arrange()

Arrange by multiple variables:

tswift |> 
  select(name, album, speechiness) |> 
  arrange(album, desc(speechiness))
# A tibble: 582 × 3
   name                 album speechiness
   <chr>                <chr>       <dbl>
 1 Bad Blood            1989       0.181 
 2 Shake It Off         1989       0.165 
 3 Wildest Dreams       1989       0.0741
 4 I Know Places        1989       0.0661
 5 Blank Space          1989       0.0646
 6 I Wish You Would     1989       0.0511
 7 How You Get The Girl 1989       0.0492
 8 Style                1989       0.0382
 9 Out Of The Woods     1989       0.0372
10 Clean                1989       0.035 
# ℹ 572 more rows

10.3 Your Turn!

00:40

Task

Find the 3 most popular Taylor Swift songs (according to Spotify)

Note: Browse the environment pane for the variable that measures this (it’s near the bottom)

Hint

# sort descending
tswift |> 
  arrange(desc(duration_ms))

10.4
Creating new variables with mutate()

Creating new variables with mutate()

Song duration in seconds (rather than milliseconds)?

tswift |> 
  mutate(duration_sec = duration_ms / 1000) |> 
  select(name, duration_ms, duration_sec)
# A tibble: 582 × 3
   name                                      duration_ms duration_sec
   <chr>                                           <dbl>        <dbl>
 1 Fortnight (feat. Post Malone)                  228965         229.
 2 The Tortured Poets Department                  293048         293.
 3 My Boy Only Breaks His Favorite Toys           203801         204.
 4 Down Bad                                       261228         261.
 5 So Long, London                                262974         263.
 6 But Daddy I Love Him                           340428         340.
 7 Fresh Out The Slammer                          210789         211.
 8 Florida!!! (feat. Florence + The Machine)      215463         215.
 9 Guilty as Sin?                                 254365         254.
10 Who’s Afraid of Little Old Me?                 334084         334.
# ℹ 572 more rows

Creating new variables with mutate()

Create multiple variables at once:

tswift |> 
  mutate(duration_sec = duration_ms  / 1000,
         duration_min = duration_sec / 60) |>  
  select(name, album, duration_sec, duration_min) 
# A tibble: 582 × 4
   name                                      album     duration_sec duration_min
   <chr>                                     <chr>            <dbl>        <dbl>
 1 Fortnight (feat. Post Malone)             THE TORT…         229.         3.82
 2 The Tortured Poets Department             THE TORT…         293.         4.88
 3 My Boy Only Breaks His Favorite Toys      THE TORT…         204.         3.40
 4 Down Bad                                  THE TORT…         261.         4.35
 5 So Long, London                           THE TORT…         263.         4.38
 6 But Daddy I Love Him                      THE TORT…         340.         5.67
 7 Fresh Out The Slammer                     THE TORT…         211.         3.51
 8 Florida!!! (feat. Florence + The Machine) THE TORT…         215.         3.59
 9 Guilty as Sin?                            THE TORT…         254.         4.24
10 Who’s Afraid of Little Old Me?            THE TORT…         334.         5.57
# ℹ 572 more rows

10.4 Your Turn!

01:30

Task

Create a new variable in tswift that multiplies danceability by 100 (you pick the name of the new variable)

Then limit your columns to name, danceability, and your new variable.

Note: The symbol for multiplication in R is *

Hint

# new variable + select
tswift |> 
  mutate(duration_sec = duration_ms / 1000) |>
  select(name, duration_ms, duration_sec)

Resources for further learning