Learn R:

Introduction to RStudio, Importing Data, and Running Code

McCall Pitcher
Center for Data and Visualization Sciences

January 16, 2026

duke.is/LearnR-1-S26

Questions we will answer today

01. What is R?

02. What is RStudio?

03. What is an RStudio project?

04. What is a coding notebook?

05. How do I write and run code in a coding notebook?

06. What is an object and how do I create one?

07. What are R packages and how do I load them?

08. How do I import data?

09. How can I get to know my variables?

10. How can I transform my data?

01.
What is R?

What is R?

Programming language used for statistical computing and graphics
Free and open source
Base functionality + thousands of extensions

02.
What is RStudio?

What is RStudio?

“RStudio gives you a way to talk to your computer. R gives you a language to speak in.”

Hands-On Programming with R

What is RStudio?

Many thanks to Julia Silge at Posit

03.
What is an RStudio project?

What is an RStudio project?

File directory where you can store your data, R scripts/coding notebooks, and output for a given project
Keeps everything together in one place

04.
What is a coding notebook?

What is a coding notebook?

Code written directly in the Console does not get saved!

You should document your code so it’s reproducible

Document type		Description
R script	.R	Plain text file
R Markdown	.Rmd	Combines text, code, and results
Quarto	.qmd	Fancier version of R Markdown

Today we will use Quarto!

05.
How do I write and run code in a coding notebook?

How do I write and run code in a coding notebook?

Place code inside a “code chunk”

Run by pressing the green “play” button in the corner, or hit Cmd + Shift + Enter / Ctrl + Shift + Enter

Note: To create a new code chunk, click the green +C button at the top of the Source pane and select R (or use the keyboard shortcut Cmd + Option + I / Ctrl + Alt + I )

05. Your Turn!

Task

In your Quarto notebook, write and run code that finds the square root of 60

Note: the square root function in R is sqrt()

Hint

Run a code chunk by clicking the green “play” button or Cmd / Ctrl + Shift + Enter

06.
What is an object and how do I create one?

What is an object and how do I create one?

Something you store in R

Can be a single value, a collection of values, or something even more complex like a function or a plot

Create using the assignment operator <-
- The object name you create on the left gets value from whatever you place on the right: object_name <- value

What is an object and how do I create one?

Store objects

# McCall's favorite number
fav_number <- 11

# McCall's favorite numbers
fav_numbers <- c(11, 22, 33)

Evaluate objects

fav_number + 4

[1] 15

fav_numbers + 4

[1] 15 26 37

06. Your Turn!

Task

Create an object called my_age that stores your age, then “call” the object so it prints to the Console.

Hint

# store favorite number
fav_number <- 11

# call favorite number
fav_number

[1] 11

07.
What are R packages and how do I load them?

What are R packages and how do I load them?

R has a lot of functionality built-in, often referred to as “base R”

However, R is set up to allow users to write packages that extend this functionality

What are R packages and how do I load them?

{tidyverse} is a widely used collection of R packages designed to streamline data manipulation, exploration, and visualization

Many find {tidyverse} syntax to be more intuitive than base R

What are R packages and how do I load them?

Task	base R	{tidyverse}
Keep rows where x > 1	`data[data$x > 1, ]`	`data \|> filter(x > 1)`
Keep columns x and y	`data[ , c("x", "y")]`	`data \|> select(x, y)`

What are R packages and how do I load them?

First install using the function install.packages()

Note: This is one of the few times you should code directly in the console – do not include this function in your coding notebook

Then load using the function library()

Note: It is generally good practice to load all of your needed packages at the beginning of your script or coding notebook

08.
How do I import data?

How do I import data?

Today you will learn how to import a comma-separated values file, or CSV, one of the most common plain text data types

Rectangular data (rows and columns) are stored in a tabular data structure called a data frame

Data include information about David Bowie’s Spotify music (downloaded here)

How do I import data?

There are multiple ways to load a CSV. We will use the {tidyverse} function read_csv()

Inside the read_csv()parentheses, specify where in your project folder the file is saved, and what the file is called

# import data
bowie <- read_csv("data/david_bowie_spotify.csv")

09.
How can I get to know my variables?

How can I get to know my variables?

summary() for numeric

summary(bowie$popularity)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   23.00   31.00   31.88   39.00   77.00

unique() for character

unique(bowie$album)

 [1] "David Bowie"                                                  
 [2] "David Bowie (aka Space Oddity)"                               
 [3] "The Man Who Sold the World"                                   
 [4] "Hunky Dory"                                                   
 [5] "Aladdin Sane"                                                 
 [6] "The Rise and Fall of Ziggy Stardust and the Spiders from Mars"
 [7] "Pinups"                                                       
 [8] "Diamond Dogs"                                                 
 [9] "Young Americans"                                              
[10] "Station to Station"                                           
[11] "\"Heroes\""                                                   
[12] "Low"                                                          
[13] "Lodger"                                                       
[14] "Scary Monsters (And Super Creeps)"                            
[15] "Let's Dance"                                                  
[16] "Tonight"                                                      
[17] "Never Let Me Down"                                            
[18] "Black Tie White Noise"                                        
[19] "Buddha of Suburbia"                                           
[20] "1. Outside (The Nathan Adler Diaries: A Hyper Cycle)"         
[21] "Earthling"                                                    
[22] "Hours..."                                                     
[23] "Heathen"                                                      
[24] "Reality"                                                      
[25] "The Next Day"                                                 
[26] "Blackstar"

09. Your Turn!

Task

Using summary(), determine whether the mean or median of energy is greater.

Hint

# info about numeric var
summary(bowie$popularity)

10.
What are some ways I can transform my data?

What are some ways I can transform my data?

Task	{dplyr} verb
Subset rows	`filter()`
Subset columns	`select()`
Sort	`arrange()`
Create a new variable	`mutate()`

Note: {dplyr} is one of the core {tidyverse} packages

The pipe operator

|> or %>%

Links together lines of {tidyverse} code
Means “and then”
Cmd/Ctrl + Shift + M

10.1
Subset rows with filter()

Subset rows with `filter()`

Limit to one album?
== is a conditional equals sign (tests equivalence)

bowie |> 
  filter(album == "Let's Dance")

Subset rows with `filter()`

Limit to two (or more) albums?
%in% tests for equivalence against a vector of values
c( ) creates a vector (collection of values)

bowie |> 
  filter(album %in% c("Let's Dance", "Hunky Dory"))

Subset rows with `filter()`

Limit based on multiple conditions?
!= means “does not equal”

bowie |> 
  filter(album != "Blackstar",
         danceability > .7)

10.1 Your Turn!

Task

Filter bowie to only contain songs that are NOT on the “Low” album

Hint

# exclude one album
bowie |> 
  filter(album != "Blackstar")

10.2
Subset columns with select()

Subset columns with `select()`

Limit columns:

# keep only some variables
bowie |> 
  select(track, year, energy)

# A tibble: 276 × 3
   track                  year energy
   <chr>                 <dbl>  <dbl>
 1 Come And Buy My Toys   1967  0.183
 2 When I Live My Dream   1967  0.19 
 3 Love You Till Tuesday  1967  0.346
 4 Uncle Arthur           1967  0.337
 5 Rubber Band            1967  0.312
 6 Sell Me A Coat         1967  0.256
 7 There Is A Happy Land  1967  0.338
 8 She's Got Medals       1967  0.417
 9 We Are Hungry Men      1967  0.365
10 Maid Of Bond Street    1967  0.326
# ℹ 266 more rows

Subset columns with `select()`

There are shortcuts for variables with patterns:

bowie |> 
  select(ends_with("ness"))

# A tibble: 276 × 5
   loudness speechiness acousticness instrumentalness liveness
      <dbl>       <dbl>        <dbl>            <dbl>    <dbl>
 1    -16.9      0.0604        0.847       0.00000108   0.103 
 2    -14.6      0.0294        0.712       0            0.236 
 3    -15.5      0.0338        0.774       0.0000433    0.0651
 4    -15.0      0.0783        0.701       0            0.0882
 5    -16.5      0.0725        0.826       0.0000101    0.123 
 6    -15.3      0.0462        0.743       0            0.154 
 7    -14.8      0.0275        0.719       0.0000018    0.1   
 8    -12.0      0.0575        0.376       0.0000015    0.277 
 9    -14.0      0.0639        0.191       0            0.133 
10    -12.6      0.0582        0.795       0            0.34  
# ℹ 266 more rows

10.2 Your Turn!

Task

Limit the columns in bowie to only include track and speechiness

Hint

# keep only some variables
bowie |> 
  select(track, year, energy)

10.3
Sort rows with arrange()

Sort rows with `arrange()`

Ascending order by default:

bowie |> 
  select(track, speechiness) |> 
  arrange(speechiness)

# A tibble: 276 × 2
   track                         speechiness
   <chr>                               <dbl>
 1 Where Are We Now?                  0.0228
 2 Outside                            0.0238
 3 Everyone Says 'Hi'                 0.0241
 4 Days                               0.0241
 5 Something in the Air               0.0245
 6 Survive                            0.0254
 7 Shining Star (Makin' My Love)      0.0261
 8 Loving The Alien                   0.0262
 9 Love Is Lost                       0.0264
10 Tonight                            0.0265
# ℹ 266 more rows

Sort rows with `arrange()`

Use desc() for descending order (can also use - sign):

bowie |> 
  select(track, speechiness) |> 
  arrange(desc(speechiness))

# A tibble: 276 × 2
   track                                      speechiness
   <chr>                                            <dbl>
 1 Please Mr. Gravedigger                           0.87 
 2 Segue - Nathan Adler - Version #1                0.292
 3 Chant of the Ever Circling Skeletal Family       0.281
 4 Neighborhood Threat                              0.213
 5 What in the World                                0.207
 6 Somebody up There Likes Me                       0.203
 7 Join The Gang                                    0.196
 8 Right                                            0.17 
 9 Battle for Britain (The Letter)                  0.163
10 Black Tie White Noise                            0.151
# ℹ 266 more rows

Sort rows with `arrange()`

Use desc() for descending order (can also use - sign):

bowie |> 
  select(track, speechiness) |> 
  arrange(-speechiness)

# A tibble: 276 × 2
   track                                      speechiness
   <chr>                                            <dbl>
 1 Please Mr. Gravedigger                           0.87 
 2 Segue - Nathan Adler - Version #1                0.292
 3 Chant of the Ever Circling Skeletal Family       0.281
 4 Neighborhood Threat                              0.213
 5 What in the World                                0.207
 6 Somebody up There Likes Me                       0.203
 7 Join The Gang                                    0.196
 8 Right                                            0.17 
 9 Battle for Britain (The Letter)                  0.163
10 Black Tie White Noise                            0.151
# ℹ 266 more rows

Sort rows with `arrange()`

Arrange by multiple variables:

bowie |> 
  select(track, year, speechiness) |> 
  arrange(year, speechiness)

# A tibble: 276 × 3
   track                  year speechiness
   <chr>                 <dbl>       <dbl>
 1 There Is A Happy Land  1967      0.0275
 2 When I Live My Dream   1967      0.0294
 3 Love You Till Tuesday  1967      0.0338
 4 Sell Me A Coat         1967      0.0462
 5 She's Got Medals       1967      0.0575
 6 Maid Of Bond Street    1967      0.0582
 7 Come And Buy My Toys   1967      0.0604
 8 We Are Hungry Men      1967      0.0639
 9 Rubber Band            1967      0.0725
10 Uncle Arthur           1967      0.0783
# ℹ 266 more rows

10.3 Your Turn!

Task

Find the 3 most popular David Bowie songs (according to Spotify)

Note: Browse the environment pane for the variable that measures this (it’s at the bottom)

Hint

# sort descending
bowie |> 
  arrange(desc(speechiness))

10.4
Creating new variables with mutate()

Creating new variables with `mutate()`

Duration in minutes?

bowie |> 
  mutate(duration_min = duration_sec / 60) |> 
  select(track, duration_sec, duration_min)

# A tibble: 276 × 3
   track                 duration_sec duration_min
   <chr>                        <dbl>        <dbl>
 1 Come And Buy My Toys           129         2.15
 2 When I Live My Dream           204         3.4 
 3 Love You Till Tuesday          194         3.23
 4 Uncle Arthur                   131         2.18
 5 Rubber Band                    139         2.32
 6 Sell Me A Coat                 182         3.03
 7 There Is A Happy Land          197         3.28
 8 She's Got Medals               146         2.43
 9 We Are Hungry Men              180         3   
10 Maid Of Bond Street            105         1.75
# ℹ 266 more rows

Creating new variables with `mutate()`

Create multiple variables at once:

bowie |> 
  mutate(duration_min = duration_sec / 60,
         duration_hr = duration_min / 60) |>  
  select(track, album, duration_min, duration_hr)

# A tibble: 276 × 4
   track                 album       duration_min duration_hr
   <chr>                 <chr>              <dbl>       <dbl>
 1 Come And Buy My Toys  David Bowie         2.15      0.0358
 2 When I Live My Dream  David Bowie         3.4       0.0567
 3 Love You Till Tuesday David Bowie         3.23      0.0539
 4 Uncle Arthur          David Bowie         2.18      0.0364
 5 Rubber Band           David Bowie         2.32      0.0386
 6 Sell Me A Coat        David Bowie         3.03      0.0506
 7 There Is A Happy Land David Bowie         3.28      0.0547
 8 She's Got Medals      David Bowie         2.43      0.0406
 9 We Are Hungry Men     David Bowie         3         0.05  
10 Maid Of Bond Street   David Bowie         1.75      0.0292
# ℹ 266 more rows

10.4 Your Turn!

Task

Create a new variable in bowie that multiplies danceability by 100 (you pick the name of the new variable)

Then limit your columns to track, danceability, and your new variable.

Note: The symbol for multiplication in R is *

Hint

# new variable + select
bowie |> 
  mutate(duration_min = duration_sec / 60) |>
  select(track, duration_sec, duration_min)

Resources for further learning

Try the extended exercises!
Come next week!
Email askdata@duke.edu
R for Data Science (2e)

Learn R:

duke.is/LearnR-1-S26

Questions we will answer today

01. What is R?

What is R?

02. What is RStudio?

What is RStudio?

What is RStudio?

What is RStudio?

03. What is an RStudio project?

What is an RStudio project?

04. What is a coding notebook?

What is a coding notebook?

05. How do I write and run code in a coding notebook?

How do I write and run code in a coding notebook?

05. Your Turn!

06. What is an object and how do I create one?

What is an object and how do I create one?

What is an object and how do I create one?

06. Your Turn!

07. What are R packages and how do I load them?

What are R packages and how do I load them?

What are R packages and how do I load them?

What are R packages and how do I load them?

What are R packages and how do I load them?

08. How do I import data?

How do I import data?

How do I import data?

09. How can I get to know my variables?

How can I get to know my variables?

09. Your Turn!

10. What are some ways I can transform my data?

What are some ways I can transform my data?

The pipe operator

10.1 Subset rows with filter()

Subset rows with filter()

Subset rows with filter()

Subset rows with filter()

10.1 Your Turn!

10.2 Subset columns with select()

Subset columns with select()

Subset columns with select()

10.2 Your Turn!

10.3 Sort rows with arrange()

Sort rows with arrange()

Sort rows with arrange()

Sort rows with arrange()

Sort rows with arrange()

10.3 Your Turn!

10.4 Creating new variables with mutate()

Creating new variables with mutate()

Creating new variables with mutate()

10.4 Your Turn!

Resources for further learning

01.
What is R?

02.
What is RStudio?

03.
What is an RStudio project?

04.
What is a coding notebook?

05.
How do I write and run code in a coding notebook?

06.
What is an object and how do I create one?

07.
What are R packages and how do I load them?

08.
How do I import data?

09.
How can I get to know my variables?

10.
What are some ways I can transform my data?

10.1
Subset rows with filter()

Subset rows with `filter()`

Subset rows with `filter()`

Subset rows with `filter()`

10.2
Subset columns with select()

Subset columns with `select()`

Subset columns with `select()`

10.3
Sort rows with arrange()

Sort rows with `arrange()`

Sort rows with `arrange()`

Sort rows with `arrange()`

Sort rows with `arrange()`

10.4
Creating new variables with mutate()

Creating new variables with `mutate()`

Creating new variables with `mutate()`