A simple Shiny app for visualizing F1 data (Lab 7)

From Medicine to Formula 1

Around the time I started medical school, I became interested in longevity science and the emerging field within medicine. One of the physicians carving out this new niche in the profession is Dr. Peter Attia, whose practice specializes in increasing a patient’s healthspan and preparing patients for a “Centenarian Olympics.”" After listening to this episode of Peter Attia’s podcast The Drive (in which he interviews Dr. Luke Bennett, team doctor for Mercedes), I thought I would check out the show Drive to Survive on Netflix.

As part of the BMI 625 course at OHSU we needed to build a R Shiny application that wrangled and visualized a data set of our choice. Given my new interest in Formula 1, thought it would be interesting to see what the expected points might be for any team and driver during any race weekend - basically what is the average number of points a constructor or driver scores per grand prix. Thankfully, I didn’t need to do too much data scraping thanks to GitHub user UpperCase78. Please check out their repo to see the underlying data.

This analysis is an overly simple way to break down a complex sport like F1, but I think it was a good start and got me thinking about more downstream analyses I could do (breakdowns by track, for example). I might continue to work on this Shiny app further by adding more nuanced data sets and complex analyses.

You can check out the Shiny app here or below.

Data Wrangling

For those interested, here is how I cleaned up the data and calculated points per grand prix for drivers and constructors in R.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93


# 2013-2018
f1_2013 <- read_csv("data/PreviousSeasons/Formula1_2013season_raceResults.csv") %>%
  mutate(Year = 2013)
f1_2014 <- read_csv("data/PreviousSeasons/Formula1_2014season_raceResults.csv") %>%
  mutate(Year = 2014)
f1_2015 <- read_csv("data/PreviousSeasons/Formula1_2015season_raceResults.csv") %>%
  mutate(Year = 2015)
f1_2016 <- read_csv("data/PreviousSeasons/Formula1_2016season_raceResults.csv") %>%
  mutate(Year = 2016)
f1_2017 <- read_csv("data/PreviousSeasons/Formula1_2017season_raceResults.csv") %>%
  mutate(Year = 2017)
f1_2018 <- read_csv("data/PreviousSeasons/Formula1_2018season_raceResults.csv") %>%
  mutate(Year = 2018)

# 2019
race2019 <- read_csv("data/formula1_2019season_raceResults.csv") %>%
  mutate(Year = 2019)

# 2020
race2020 <- read_csv("data/formula1_2020season_raceResults.csv") %>%
  mutate(Year = 2020)

# 2021
race2021 <- read_csv("data/formula1_2021season_raceResults.csv") %>%
  mutate(Year = 2021)

# 2022
race2022 <- read_csv("data/Formula1_2022season_raceResults.csv") %>%
  mutate(Year = 2022)

varnames <- c("Track", "Driver", "Team", "Points", "Fastest Lap", "Year")

f1_data_1 <- rbind(
  f1_2013
  , f1_2014
  , f1_2015
  , f1_2016
  , f1_2017
  , f1_2018
) %>%
  select(varnames)

f1_data_2 <- rbind(
  race2019
  , race2020
) %>%
  select(varnames)

f1_data_3 <- rbind(
  race2021
  , race2022
) %>%
  select(varnames)

f1_data <- rbind(
  f1_data_1
  , f1_data_2
  , f1_data_3
) %>%
  group_by(Year, Driver) %>%
  mutate(driver_ppgp = round(sum(Points)/n(), 2)) %>%
  ungroup() %>%
  group_by(Year, Team) %>%
  mutate(team_ppgp = round(sum(Points)/n(), 2)) %>%
  ungroup() %>%
  mutate(Latest_Team_Name = Team)

# need to collapse the teams based on name changes...otherwise constructor 
# visualizations become unusable.
# Using the latest available team name based on information from:
# https://wtf1.com/post/these-are-all-the-f1-team-changes-in-the-last-decade/

f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Renault")] <- "Alpine"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Mercedes")] <- "Mercedes"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Ferrari")] <- "Ferrari"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Red Bull")] <- "Red Bull"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Toro Rosso")] <- "AlphaTauri"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "AlphaTauri")] <- "AlphaTauri"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "STR Ferrari")] <- "AlphaTauri"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Sauber")] <- "Alfa Romeo"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Alfa Romeo")] <- "Alfa Romeo"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Williams")] <- "Williams"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Force India")] <- "Aston Martin"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Aston Martin")] <- "Aston Martin"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Racing Point")] <- "Aston Martin"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "McLaren")] <- "McLaren"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Lotus")] <- "Lotus"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Haas")] <- "Haas"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Marussia")] <- "Manor"
f1_data$Latest_Team_Name[str_detect(f1_data$Team, "Caterham")] <- "Caterham"

# write out
write_csv(f1_data, "f1_data.csv")

This post was excerpted from Lab 7 of BMI 625