Learn R:

Introduction to RStudio, Importing Data, and Running Code

McCall Pitcher
Center for Data and Visualization Sciences

January 16, 2026

duke.is/LearnR-1-S26

Questions we will answer today


01. What is R?

02. What is RStudio?

03. What is an RStudio project?

04. What is a coding notebook?

05. How do I write and run code in a coding notebook?

06. What is an object and how do I create one?

07. What are R packages and how do I load them?

08. How do I import data?

09. How can I get to know my variables?

10. How can I transform my data?

01.
What is R?

What is R?

  • Programming language used for statistical computing and graphics

  • Free and open source

  • Base functionality + thousands of extensions

02.
What is RStudio?

What is RStudio?


“RStudio gives you a way to talk to your computer. R gives you a language to speak in.”

Hands-On Programming with R

What is RStudio?

What is RStudio?

Many thanks to Julia Silge at Posit

03.
What is an RStudio project?

What is an RStudio project?

  • File directory where you can store your data, R scripts/coding notebooks, and output for a given project

  • Keeps everything together in one place

04.
What is a coding notebook?

What is a coding notebook?

  • Code written directly in the Console does not get saved!
  • You should document your code so it’s reproducible
Document type Description
R script .R Plain text file
R Markdown .Rmd Combines text, code, and results
Quarto .qmd Fancier version of R Markdown
  • Today we will use Quarto!

05.
How do I write and run code in a coding notebook?

How do I write and run code in a coding notebook?

  • Place code inside a “code chunk”
  • Run by pressing the green “play” button in the corner, or hit Cmd + Shift + Enter / Ctrl + Shift + Enter


Note: To create a new code chunk, click the green +C button at the top of the Source pane and select R (or use the keyboard shortcut Cmd + Option + I / Ctrl + Alt + I )

05. Your Turn!

Task

In your Quarto notebook, write and run code that finds the square root of 60

Note: the square root function in R is sqrt()

Hint

Run a code chunk by clicking the green “play” button or Cmd / Ctrl + Shift + Enter

06.
What is an object and how do I create one?

What is an object and how do I create one?

  • Something you store in R
  • Can be a single value, a collection of values, or something even more complex like a function or a plot
  • Create using the assignment operator <-
    • The object name you create on the left gets value from whatever you place on the right: object_name <- value

What is an object and how do I create one?

Store objects

# McCall's favorite number
fav_number <- 11

# McCall's favorite numbers
fav_numbers <- c(11, 22, 33)

Evaluate objects

fav_number + 4
[1] 15
fav_numbers + 4 
[1] 15 26 37

06. Your Turn!

Task

Create an object called my_age that stores your age, then “call” the object so it prints to the Console.

Hint

# store favorite number
fav_number <- 11

# call favorite number
fav_number
[1] 11

07.
What are R packages and how do I load them?

What are R packages and how do I load them?

  • R has a lot of functionality built-in, often referred to as “base R”
  • However, R is set up to allow users to write packages that extend this functionality

What are R packages and how do I load them?

  • {tidyverse} is a widely used collection of R packages designed to streamline data manipulation, exploration, and visualization
  • Many find {tidyverse} syntax to be more intuitive than base R

What are R packages and how do I load them?


Task base R {tidyverse}
Keep rows where x > 1 data[data$x > 1, ] data |> filter(x > 1)
Keep columns x and y data[ , c("x", "y")] data |> select(x, y)

What are R packages and how do I load them?

First install using the function install.packages()

Note: This is one of the few times you should code directly in the console – do not include this function in your coding notebook


Then load using the function library()

Note: It is generally good practice to load all of your needed packages at the beginning of your script or coding notebook

08.
How do I import data?

How do I import data?

  • Today you will learn how to import a comma-separated values file, or CSV, one of the most common plain text data types
  • Rectangular data (rows and columns) are stored in a tabular data structure called a data frame
  • Data include information about David Bowie’s Spotify music (downloaded here)

How do I import data?

  • There are multiple ways to load a CSV. We will use the {tidyverse} function read_csv()
  • Inside the read_csv()parentheses, specify where in your project folder the file is saved, and what the file is called
# import data
bowie <- read_csv("data/david_bowie_spotify.csv")

09.
How can I get to know my variables?

How can I get to know my variables?

summary() for numeric

summary(bowie$popularity)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   23.00   31.00   31.88   39.00   77.00 

unique() for character

unique(bowie$album)
 [1] "David Bowie"                                                  
 [2] "David Bowie (aka Space Oddity)"                               
 [3] "The Man Who Sold the World"                                   
 [4] "Hunky Dory"                                                   
 [5] "Aladdin Sane"                                                 
 [6] "The Rise and Fall of Ziggy Stardust and the Spiders from Mars"
 [7] "Pinups"                                                       
 [8] "Diamond Dogs"                                                 
 [9] "Young Americans"                                              
[10] "Station to Station"                                           
[11] "\"Heroes\""                                                   
[12] "Low"                                                          
[13] "Lodger"                                                       
[14] "Scary Monsters (And Super Creeps)"                            
[15] "Let's Dance"                                                  
[16] "Tonight"                                                      
[17] "Never Let Me Down"                                            
[18] "Black Tie White Noise"                                        
[19] "Buddha of Suburbia"                                           
[20] "1. Outside (The Nathan Adler Diaries: A Hyper Cycle)"         
[21] "Earthling"                                                    
[22] "Hours..."                                                     
[23] "Heathen"                                                      
[24] "Reality"                                                      
[25] "The Next Day"                                                 
[26] "Blackstar"                                                    

09. Your Turn!

Task

Using summary(), determine whether the mean or median of energy is greater.

Hint

# info about numeric var
summary(bowie$popularity)

10.
What are some ways I can transform my data?

What are some ways I can transform my data?


Task {dplyr} verb
Subset rows filter()
Subset columns select()
Sort arrange()
Create a new variable mutate()


Note: {dplyr} is one of the core {tidyverse} packages

The pipe operator


|>     or    %>%


  • Links together lines of {tidyverse} code

  • Means “and then”

  • Cmd/Ctrl + Shift + M

10.1
Subset rows with filter()

Subset rows with filter()

  • Limit to one album?
  • == is a conditional equals sign (tests equivalence)
bowie |> 
  filter(album == "Let's Dance")

Subset rows with filter()

  • Limit to two (or more) albums?
  • %in% tests for equivalence against a vector of values
  • c( ) creates a vector (collection of values)
bowie |> 
  filter(album %in% c("Let's Dance", "Hunky Dory"))

Subset rows with filter()

  • Limit based on multiple conditions?
  • != means “does not equal”
bowie |> 
  filter(album != "Blackstar",
         danceability > .7)

10.1 Your Turn!

Task

Filter bowie to only contain songs that are NOT on the “Low” album

Hint

# exclude one album
bowie |> 
  filter(album != "Blackstar")

10.2
Subset columns with select()

Subset columns with select()

Limit columns:

# keep only some variables
bowie |> 
  select(track, year, energy)
# A tibble: 276 × 3
   track                  year energy
   <chr>                 <dbl>  <dbl>
 1 Come And Buy My Toys   1967  0.183
 2 When I Live My Dream   1967  0.19 
 3 Love You Till Tuesday  1967  0.346
 4 Uncle Arthur           1967  0.337
 5 Rubber Band            1967  0.312
 6 Sell Me A Coat         1967  0.256
 7 There Is A Happy Land  1967  0.338
 8 She's Got Medals       1967  0.417
 9 We Are Hungry Men      1967  0.365
10 Maid Of Bond Street    1967  0.326
# ℹ 266 more rows

Subset columns with select()

There are shortcuts for variables with patterns:

bowie |> 
  select(ends_with("ness"))
# A tibble: 276 × 5
   loudness speechiness acousticness instrumentalness liveness
      <dbl>       <dbl>        <dbl>            <dbl>    <dbl>
 1    -16.9      0.0604        0.847       0.00000108   0.103 
 2    -14.6      0.0294        0.712       0            0.236 
 3    -15.5      0.0338        0.774       0.0000433    0.0651
 4    -15.0      0.0783        0.701       0            0.0882
 5    -16.5      0.0725        0.826       0.0000101    0.123 
 6    -15.3      0.0462        0.743       0            0.154 
 7    -14.8      0.0275        0.719       0.0000018    0.1   
 8    -12.0      0.0575        0.376       0.0000015    0.277 
 9    -14.0      0.0639        0.191       0            0.133 
10    -12.6      0.0582        0.795       0            0.34  
# ℹ 266 more rows

10.2 Your Turn!

Task

Limit the columns in bowie to only include track and speechiness

Hint

# keep only some variables
bowie |> 
  select(track, year, energy)

10.3
Sort rows with arrange()

Sort rows with arrange()

Ascending order by default:

bowie |> 
  select(track, speechiness) |> 
  arrange(speechiness)
# A tibble: 276 × 2
   track                         speechiness
   <chr>                               <dbl>
 1 Where Are We Now?                  0.0228
 2 Outside                            0.0238
 3 Everyone Says 'Hi'                 0.0241
 4 Days                               0.0241
 5 Something in the Air               0.0245
 6 Survive                            0.0254
 7 Shining Star (Makin' My Love)      0.0261
 8 Loving The Alien                   0.0262
 9 Love Is Lost                       0.0264
10 Tonight                            0.0265
# ℹ 266 more rows

Sort rows with arrange()

Use desc() for descending order (can also use - sign):

bowie |> 
  select(track, speechiness) |> 
  arrange(desc(speechiness))
# A tibble: 276 × 2
   track                                      speechiness
   <chr>                                            <dbl>
 1 Please Mr. Gravedigger                           0.87 
 2 Segue - Nathan Adler - Version #1                0.292
 3 Chant of the Ever Circling Skeletal Family       0.281
 4 Neighborhood Threat                              0.213
 5 What in the World                                0.207
 6 Somebody up There Likes Me                       0.203
 7 Join The Gang                                    0.196
 8 Right                                            0.17 
 9 Battle for Britain (The Letter)                  0.163
10 Black Tie White Noise                            0.151
# ℹ 266 more rows

Sort rows with arrange()

Use desc() for descending order (can also use - sign):

bowie |> 
  select(track, speechiness) |> 
  arrange(-speechiness)
# A tibble: 276 × 2
   track                                      speechiness
   <chr>                                            <dbl>
 1 Please Mr. Gravedigger                           0.87 
 2 Segue - Nathan Adler - Version #1                0.292
 3 Chant of the Ever Circling Skeletal Family       0.281
 4 Neighborhood Threat                              0.213
 5 What in the World                                0.207
 6 Somebody up There Likes Me                       0.203
 7 Join The Gang                                    0.196
 8 Right                                            0.17 
 9 Battle for Britain (The Letter)                  0.163
10 Black Tie White Noise                            0.151
# ℹ 266 more rows

Sort rows with arrange()

Arrange by multiple variables:

bowie |> 
  select(track, year, speechiness) |> 
  arrange(year, speechiness)
# A tibble: 276 × 3
   track                  year speechiness
   <chr>                 <dbl>       <dbl>
 1 There Is A Happy Land  1967      0.0275
 2 When I Live My Dream   1967      0.0294
 3 Love You Till Tuesday  1967      0.0338
 4 Sell Me A Coat         1967      0.0462
 5 She's Got Medals       1967      0.0575
 6 Maid Of Bond Street    1967      0.0582
 7 Come And Buy My Toys   1967      0.0604
 8 We Are Hungry Men      1967      0.0639
 9 Rubber Band            1967      0.0725
10 Uncle Arthur           1967      0.0783
# ℹ 266 more rows

10.3 Your Turn!

Task

Find the 3 most popular David Bowie songs (according to Spotify)

Note: Browse the environment pane for the variable that measures this (it’s at the bottom)

Hint

# sort descending
bowie |> 
  arrange(desc(speechiness))

10.4
Creating new variables with mutate()

Creating new variables with mutate()

Duration in minutes?

bowie |> 
  mutate(duration_min = duration_sec / 60) |> 
  select(track, duration_sec, duration_min)
# A tibble: 276 × 3
   track                 duration_sec duration_min
   <chr>                        <dbl>        <dbl>
 1 Come And Buy My Toys           129         2.15
 2 When I Live My Dream           204         3.4 
 3 Love You Till Tuesday          194         3.23
 4 Uncle Arthur                   131         2.18
 5 Rubber Band                    139         2.32
 6 Sell Me A Coat                 182         3.03
 7 There Is A Happy Land          197         3.28
 8 She's Got Medals               146         2.43
 9 We Are Hungry Men              180         3   
10 Maid Of Bond Street            105         1.75
# ℹ 266 more rows

Creating new variables with mutate()

Create multiple variables at once:

bowie |> 
  mutate(duration_min = duration_sec / 60,
         duration_hr = duration_min / 60) |>  
  select(track, album, duration_min, duration_hr) 
# A tibble: 276 × 4
   track                 album       duration_min duration_hr
   <chr>                 <chr>              <dbl>       <dbl>
 1 Come And Buy My Toys  David Bowie         2.15      0.0358
 2 When I Live My Dream  David Bowie         3.4       0.0567
 3 Love You Till Tuesday David Bowie         3.23      0.0539
 4 Uncle Arthur          David Bowie         2.18      0.0364
 5 Rubber Band           David Bowie         2.32      0.0386
 6 Sell Me A Coat        David Bowie         3.03      0.0506
 7 There Is A Happy Land David Bowie         3.28      0.0547
 8 She's Got Medals      David Bowie         2.43      0.0406
 9 We Are Hungry Men     David Bowie         3         0.05  
10 Maid Of Bond Street   David Bowie         1.75      0.0292
# ℹ 266 more rows

10.4 Your Turn!

Task

Create a new variable in bowie that multiplies danceability by 100 (you pick the name of the new variable)

Then limit your columns to track, danceability, and your new variable.

Note: The symbol for multiplication in R is *

Hint

# new variable + select
bowie |> 
  mutate(duration_min = duration_sec / 60) |>
  select(track, duration_sec, duration_min)

Resources for further learning