In this post, I will explain how to add possessions to the NBA play-by-play table. The data we’ll use will be the pbp with the lineups for each team added to every event. I explain in details the steps to get it here and here. Let’s load it:
lineup_game_stats <- read_csv("https://github.com/ramirobentes/NBA-in-R/releases/download/lineup-game-stats-7f61e89/data.csv",
col_types = c(timeQuarter = "c")) %>%
mutate(across(starts_with("description"), ~ coalesce(., "")))
team_logs <- game_logs(seasons = 2022, result_types = "team")
## Acquiring NBA basic team game logs for the 2021-22 Regular Season
A team possession happens when there is:
- a made field goal
- a missed field goal where the shooting team doesn’t get the rebound
- a turnover
- a 2 or 3 free throw trip where the shooting team doesn’t keep the ball after the last attempt (via offensive rebound or flagrant/clear path foul)
In this post, I will follow the pbpstats.com model, in which they also count as possession if a team gets the ball with more than 2 seconds remaining in the quarter and just dribble the ball out without doing any of the above. On the other hand, if a team does get the ball with less than 2 seconds remaining and attempts + misses a field goal or turns the ball over, it is considered a heave and thus not counted as a possession.
Let’s start by identifying every field goal, turnover and 2 or 3 free throw trips that aren’t the result of flagrant/clear path fouls, as well as the team that’s on offense:
possession_initial <- lineup_game_stats %>%
mutate(possession = case_when(numberEventMessageType %in% c(1, 2, 5) ~ 1,
numberEventMessageType == 3 & numberEventActionType %in% c(12, 15) ~ 1,
TRUE ~ 0),
team_possession = case_when(is.na(slugTeamPlayer1) & possession == 1 & descriptionPlayHome == "" ~ slugTeamAway,
is.na(slugTeamPlayer1) & possession == 1 & descriptionPlayVisitor == "" ~ slugTeamHome,
TRUE ~ slugTeamPlayer1))
There are 2 unique situation when the play-by-play doesn’t explicitly tells us that a turnover has occurred:
- when there’s a lane violation on the shooting team in the last free throw of a trip, the free throw gets “canceled”. Sometimes the play-by-play will describs it as “lane violation turnover” and keep the first one as “free throw 1 of 2”. In this case, we don’t need to do anything, because we will count it as a possession for being a turnover. However, there are rare times when the pbp describes it only as “lane”, and turns the previous free throw into 1 of 1. When this happens, we need to add a possession to the free throw.
- when a coach challenges a foul call and is able to reverse it, there is a jump ball. If the team that won the challenge wins it and recovers the ball, it should count as a turnover for the opposing team.
Let’s find these instances:
lane_description_missing <- possession_initial %>%
group_by(idGame, secsPassedGame) %>%
filter(sum(numberEventMessageType == 3 & numberEventActionType == 10) > 0, # free throw 1 of 1
sum(numberEventMessageType == 7 & numberEventActionType == 3) > 0, # lane
sum(numberEventMessageType == 1) == 0) %>% # no made field goal (and 1)
ungroup() %>%
mutate(possession = ifelse(numberEventMessageType == 3 & numberEventActionType == 10, 1, possession)) %>%
select(idGame, numberEvent, team_possession, possession)
lane_description_missing
## # A tibble: 3 x 4
## idGame numberEvent team_possession possession
## <dbl> <dbl> <chr> <dbl>
## 1 22100645 135 PHI 0
## 2 22100645 136 MIA 1
## 3 22100645 137 MIA 0
jumpball_turnovers <- possession_initial %>%
group_by(idGame, numberPeriod) %>%
mutate(prev_poss = zoo::na.locf0(ifelse(possession == 1, team_possession, NA)),
next_poss = zoo::na.locf0(ifelse(possession == 1, team_possession, NA), fromLast = TRUE)) %>%
ungroup() %>%
mutate(slugTeamPlayer1 = case_when(numberEventMessageType == 9 & descriptionPlayHome == "" ~ slugTeamAway,
numberEventMessageType == 9 & descriptionPlayVisitor == "" ~ slugTeamHome,
TRUE ~ slugTeamPlayer1)) %>%
group_by(idGame, secsPassedGame) %>%
mutate(team_reb_chall = sum(numberEventMessageType == 9 & numberEventActionType == 7) > 0 &
sum(numberEventMessageType == 4 & is.na(namePlayer1)) > 0) %>%
ungroup() %>%
filter(numberEventMessageType == 10 & numberEventActionType == 1 &
lag(numberEventMessageType) == 9 & lag(numberEventActionType) == 7 &
slugTeamPlayer3 == lag(slugTeamPlayer1) &
prev_poss == next_poss &
lag(team_reb_chall) == FALSE) %>%
mutate(team_possession = ifelse(slugTeamPlayer3 == slugTeamPlayer1, slugTeamPlayer2, slugTeamPlayer1),
possession = 1) %>%
select(idGame, numberEvent, team_possession, possession)
jumpball_turnovers
## # A tibble: 2 x 4
## idGame numberEvent team_possession possession
## <dbl> <dbl> <chr> <dbl>
## 1 22100328 291 CHI 1
## 2 22100473 266 HOU 1
We know that a team can not have 2 possessions in a row. In our current table, this is happening a lot, mainly due to offensive rebounds. So let’s identify the times when the same team had consecutive possessions and disregard the first one. We’ll also update the jump ball turnovers:
change_consec <- possession_initial %>%
rows_update(lane_description_missing, by = c("idGame", "numberEvent")) %>%
rows_update(jumpball_turnovers, by = c("idGame", "numberEvent")) %>%
filter(possession == 1 | (numberEventMessageType == 6 & numberEventActionType == 30)) %>% # when there is a technical for too many players on the court (message type 6 and action type 30), the ball goes back to the other team but it's not counted as a turnover
group_by(idGame, numberPeriod) %>%
filter(possession == lead(possession) & team_possession == lead(team_possession)) %>%
ungroup() %>%
mutate(possession = 0) %>%
select(idGame, numberEvent, possession)
# replacing in original data
poss_pack <- possession_initial %>%
rows_update(lane_description_missing, by = c("idGame", "numberEvent")) %>%
rows_update(jumpball_turnovers, by = c("idGame", "numberEvent")) %>%
rows_update(change_consec, by = c("idGame","numberEvent"))
Ideally, the possession count would stop here. However, since we’re following the pbpstats.com model, we need to account for the times when a team gets the ball with more than 2 seconds remaining in the quarter and doesn’t do anything with it. In order to do that, we need to find the start of every possession. When it’s originated off a made field goal/free throw/turnover, it will start at the time of the event. When it starts off a missed field goal, it will start at the time of the defensive rebound.
start_possessions <- poss_pack %>%
mutate(slugTeamPlayer1 = case_when(is.na(slugTeamPlayer1) & descriptionPlayHome == "" ~ slugTeamAway,
is.na(slugTeamPlayer1) & descriptionPlayVisitor == "" ~ slugTeamHome,
TRUE ~ slugTeamPlayer1)) %>%
select(idGame, numberPeriod, timeQuarter, numberEventMessageType, slugTeamPlayer1,
descriptionPlayHome, descriptionPlayVisitor, numberEvent) %>%
filter(numberEventMessageType %in% c(1:5)) %>%
group_by(idGame, numberPeriod) %>%
mutate(start_poss = case_when(slugTeamPlayer1 != lag(slugTeamPlayer1) & numberEventMessageType == 4 ~ timeQuarter,
slugTeamPlayer1 != lag(slugTeamPlayer1) & numberEventMessageType != 4 ~ lag(timeQuarter))) %>%
mutate(start_poss = ifelse(is.na(start_poss) & row_number() == 1, "12:00", start_poss)) %>% # when it starts at the beginning of quarter
ungroup()
start_possessions
## # A tibble: 227,040 x 9
## idGame numberPeriod timeQuarter numberEventMessageType slugTeamPlayer1
## <dbl> <dbl> <chr> <dbl> <chr>
## 1 22100001 1 11:42 2 MIL
## 2 22100001 1 11:39 4 BKN
## 3 22100001 1 11:27 3 BKN
## 4 22100001 1 11:27 3 BKN
## 5 22100001 1 11:27 4 BKN
## 6 22100001 1 11:25 4 MIL
## 7 22100001 1 11:13 2 MIL
## 8 22100001 1 11:10 4 MIL
## 9 22100001 1 11:01 2 MIL
## 10 22100001 1 10:59 4 BKN
## # ... with 227,030 more rows, and 4 more variables: descriptionPlayHome <chr>,
## # descriptionPlayVisitor <chr>, numberEvent <dbl>, start_poss <chr>
Add it to the original table and identify heaves, according to the pbpstats.com definition:
poss_pack_start <- poss_pack %>%
left_join(start_possessions %>%
select(idGame, numberEvent, start_poss)) %>%
group_by(idGame, numberPeriod) %>%
mutate(start_poss = na.locf0(start_poss)) %>%
ungroup() %>%
mutate(heave = ifelse(numberEventMessageType %in% c(2, 5) & possession == 1 & as.integer(str_sub(start_poss, 4, 5)) <= 2 & str_starts(start_poss, "00:") & (lead(shotPtsHome) + lead(shotPtsAway) == 0), 1, 0),
possession = ifelse(heave == 1, 0, possession))
We need to identify the team that had the ball at the end of the quarter. To do that, let’s find the last possession in every quarter:
last_possessions <- poss_pack_start %>%
group_by(idGame, numberPeriod) %>%
filter(cumsum(possession) >= max(cumsum(possession)) & possession == 1) %>%
ungroup()
The only way the team that had the last possession could also be the team to have the ball at the end of the quarter is if they got an offensive rebound after a missed field goal or free throw. So let’s see which teams got the last rebound of every quarter, and how many seconds were left on the clock:
last_rebounds <- poss_pack_start %>%
group_by(idGame, numberPeriod) %>%
filter(numberEventMessageType == 4 & !(lag(numberEventMessageType) == 3 & lag(numberEventActionType) %in% c(18:20, 27:29))) %>%
filter(row_number() == max(row_number())) %>%
ungroup() %>%
mutate(rebound_team = case_when(is.na(slugTeamPlayer1) & descriptionPlayHome == "" ~ slugTeamAway,
is.na(slugTeamPlayer1) & descriptionPlayVisitor == "" ~ slugTeamHome,
TRUE ~ slugTeamPlayer1)) %>%
select(idGame, numberPeriod, rebound_team, timeQuarterReb = timeQuarter)
When a team makes a field goal and gets fouled (and-1), then misses the free throw, the start of the next possession should be at the moment of the defensive rebound instead of the made field goal (which counts as possession). Therefore, let’s identify these situations:
missedft_and1_last <- poss_pack_start %>%
semi_join(last_possessions %>%
select(idGame, secsPassedGame)) %>%
group_by(idGame, secsPassedGame) %>%
filter(sum(numberEventMessageType == 1) > 0 & sum(numberEventMessageType == 3 & numberEventActionType == 10) > 0 & sum(str_detect(descriptionPlayHome, "MISS") | str_detect(descriptionPlayVisitor, "MISS")) > 0) %>%
ungroup() %>%
filter(numberEventMessageType == 1) %>%
select(idGame, numberEvent)
missedft_and1_last
## # A tibble: 9 x 2
## idGame numberEvent
## <dbl> <dbl>
## 1 22100076 350
## 2 22100177 367
## 3 22100195 110
## 4 22100305 329
## 5 22100312 447
## 6 22100434 221
## 7 22100596 331
## 8 22100635 430
## 9 22100653 437
Now we can find the teams that kept the ball at the end of the quarter, when the previous possession ends in a missed fg/ft:
addit_poss_reb <- last_possessions %>%
left_join(last_rebounds, by = c("idGame", "numberPeriod")) %>%
left_join(missedft_and1_last %>%
mutate(and1_ft = 1)) %>%
filter(numberEventMessageType == 2 | (numberEventMessageType == 3 & (str_detect(descriptionPlayHome, "MISS") | str_detect(descriptionPlayVisitor, "MISS"))) | and1_ft == 1) %>%
filter(rebound_team != team_possession, # ignore offensive rebounds
as.integer(str_sub(timeQuarterReb, 4, 5)) >= 3) %>% # more than 2 seconds remaining in quarter
transmute(idGame, numberPeriod, start_poss = timeQuarterReb,
team_possession = rebound_team, possession)
addit_poss_reb
## # A tibble: 205 x 5
## idGame numberPeriod start_poss team_possession possession
## <dbl> <dbl> <chr> <chr> <dbl>
## 1 22100006 4 00:08 WAS 1
## 2 22100007 2 00:03 MEM 1
## 3 22100007 4 00:05 MEM 1
## 4 22100008 4 00:07 MIN 1
## 5 22100011 4 00:20 UTA 1
## 6 22100015 4 00:10 MIA 1
## 7 22100018 4 00:07 NYK 1
## 8 22100021 4 00:05 BKN 1
## 9 22100022 4 00:09 CHI 1
## 10 22100023 4 00:11 HOU 1
## # ... with 195 more rows
And when it ends in a made fg/ft or turnover:
addit_poss_made <- last_possessions %>%
filter(numberEventMessageType %in% c(1, 5) | (numberEventMessageType == 3 & !str_detect(descriptionPlayHome, "MISS") & !str_detect(descriptionPlayVisitor, "MISS"))) %>%
anti_join(missedft_and1_last) %>% # tirando fgs and1 que foram seguidos de missed ft (vai passar a valer o momento do rebote)
left_join(team_logs %>%
distinct(idGame, .keep_all = TRUE) %>%
select(idGame, slugTeam, slugOpponent)) %>%
mutate(team_possession_next = ifelse(team_possession == slugTeam, slugOpponent, slugTeam)) %>%
filter(as.integer(str_sub(timeQuarter, 4, 5)) >= 3) %>%
transmute(idGame, numberPeriod, start_poss = timeQuarter,
team_possession = team_possession_next, possession)
addit_poss_made
## # A tibble: 347 x 5
## idGame numberPeriod start_poss team_possession possession
## <dbl> <dbl> <chr> <chr> <dbl>
## 1 22100002 4 00:07 GSW 1
## 2 22100008 1 00:03 HOU 1
## 3 22100009 4 00:08 PHI 1
## 4 22100010 4 00:14 SAS 1
## 5 22100012 1 00:12 DEN 1
## 6 22100012 4 00:21 DEN 1
## 7 22100013 1 00:03 SAC 1
## 8 22100014 4 00:03 ATL 1
## 9 22100017 4 00:13 CHA 1
## 10 22100020 4 00:05 TOR 1
## # ... with 337 more rows
Now let’s put it all together and add some information to the other columns:
additional_possessions <- bind_rows(addit_poss_reb, addit_poss_made) %>%
mutate(numberEventMessageType = 0,
numberEventActionType = 0,
numberOriginal = 0,
descriptionPlayNeutral = "Last possession of quarter") %>%
left_join(poss_pack %>%
filter(numberEventMessageType == 13) %>%
select(-c(numberOriginal, numberEventMessageType, numberEventActionType,
descriptionPlayNeutral, possession, team_possession))) %>%
mutate(numberEvent = numberEvent - 0.5)
final_poss_pack <- poss_pack_start %>%
bind_rows(additional_possessions) %>%
arrange(idGame, numberEvent) %>%
select(-c(hasFouls, subOpp, canSub))
final_poss_pack %>%
select(idGame, descriptionPlayHome, descriptionPlayVisitor, team_possession, possession) %>%
filter(!is.na(team_possession))
## # A tibble: 275,728 x 5
## idGame descriptionPlayHome descriptionPlayVi~ team_possession possession
## <dbl> <chr> <chr> <chr> <dbl>
## 1 22100001 "Jump Ball Lopez vs. ~ "" MIL 0
## 2 22100001 "MISS Allen 27' 3PT J~ "" MIL 1
## 3 22100001 "" "Durant REBOUND (~ BKN 0
## 4 22100001 "Antetokounmpo S.FOUL~ "" MIL 0
## 5 22100001 "" "MISS Claxton Fre~ BKN 0
## 6 22100001 "" "MISS Claxton Fre~ BKN 1
## 7 22100001 "Antetokounmpo REBOUN~ "" MIL 0
## 8 22100001 "MISS Antetokounmpo 2~ "" MIL 0
## 9 22100001 "Lopez REBOUND (Off:1~ "" MIL 0
## 10 22100001 "MISS Antetokounmpo 1~ "" MIL 1
## # ... with 275,718 more rows
We finally have a table with the possessions in the play-by-play! There’s just one thing left to do: just like with points in the plus minus, whenever there is a substitution in between free throws, the possession should be counted to the player who was on the floor when the foul occurred. Since we are using the last free throw of a trip as the trigger for possessions, this would not be the case, as the player who subbed in would already be on the court. Therefore, we are going to create a new column when the possession will be counted at the moment of the foul. First, let’s identify the occasions when the possession ended on free throws, and find the foul that originated them:
fouls_possessions <- final_poss_pack %>%
filter(numberEventMessageType == 3 & possession == 1) %>%
select(idGame, secsPassedGame, player_foul = namePlayer1, team_possession, numberEvent_ft = numberEvent) %>%
left_join(final_poss_pack %>%
filter(numberEventMessageType == 6 & !numberEventActionType %in% c(6, 9, 11, 13, 14, 15, 16, 17)) %>% # fouls
mutate(description = ifelse(slugTeamPlayer1 == slugTeamHome, descriptionPlayHome, descriptionPlayVisitor)) %>%
select(idGame, secsPassedGame, player_foul = namePlayer2, numberEvent_foul = numberEvent, description)) %>%
add_count(idGame, secsPassedGame, player_foul, name = "number_plays") %>%
filter(!(number_plays > 1 & !str_detect(description, " S.FOUL |\\.PN\\)"))) # if the same player is fouled twice in the same second, keeps only the shooting foul
# there are occasions when the namePlayer2, who is supposed to be the player who got fouled, is wrong, leading to NAs in the join. When this happens, we will join without the player.
missing_comp <- fouls_possessions %>%
filter(is.na(numberEvent_foul)) %>%
left_join(final_poss_pack %>%
filter(numberEventMessageType == 6 & !numberEventActionType %in% c(6, 9, 11, 13, 14, 15, 16, 17)) %>%
mutate(description = ifelse(slugTeamPlayer1 == slugTeamHome, descriptionPlayHome, descriptionPlayVisitor)) %>%
select(idGame, secsPassedGame, numberEvent_foul = numberEvent, description),
by = c("idGame", "secsPassedGame"),
suffix = c("", "_new")) %>%
mutate(numberEvent_foul = numberEvent_foul_new,
description = description_new) %>%
select(-c(numberEvent_foul_new, description_new))
Now we create the column with the possession at the moment of the foul and add it to the original table:
fouls_possessions <- fouls_possessions %>%
rows_update(missing_comp, by = c("idGame", "secsPassedGame", "player_foul", "team_possession", "numberEvent_ft", "number_plays")) %>%
select(idGame, secsPassedGame, team_possession, numberEvent_ft, numberEvent_foul) %>%
pivot_longer(cols = starts_with("numberEvent"),
names_to = "type_play",
values_to = "numberEvent",
names_prefix = "numberEvent_") %>%
mutate(possession_players = ifelse(type_play == "foul", 1, 0)) %>%
select(-type_play)
final_poss_pack <- final_poss_pack %>%
mutate(possession_players = possession) %>%
rows_update(fouls_possessions, by = c("idGame", "numberEvent"))
final_poss_pack %>%
select(idGame, descriptionPlayHome, descriptionPlayVisitor, team_possession, possession, possession_players) %>%
filter(!is.na(team_possession))
## # A tibble: 275,728 x 6
## idGame descriptionPlayHome descriptionPlayVi~ team_possession possession
## <dbl> <chr> <chr> <chr> <dbl>
## 1 22100001 "Jump Ball Lopez vs. ~ "" MIL 0
## 2 22100001 "MISS Allen 27' 3PT J~ "" MIL 1
## 3 22100001 "" "Durant REBOUND (~ BKN 0
## 4 22100001 "Antetokounmpo S.FOUL~ "" BKN 0
## 5 22100001 "" "MISS Claxton Fre~ BKN 0
## 6 22100001 "" "MISS Claxton Fre~ BKN 1
## 7 22100001 "Antetokounmpo REBOUN~ "" MIL 0
## 8 22100001 "MISS Antetokounmpo 2~ "" MIL 0
## 9 22100001 "Lopez REBOUND (Off:1~ "" MIL 0
## 10 22100001 "MISS Antetokounmpo 1~ "" MIL 1
## # ... with 275,718 more rows, and 1 more variable: possession_players <dbl>
This is our final table. I recommend you save it every time you run the code to this point, and use the output to do play-by-play analysis. One of the analysis we can do is extract lineup and player stats from it, including plus minus, number of possessions and total time, just like we did in this post. We’ll do it by getting the stats for each lineup stint:
lineup_stats <- final_poss_pack %>%
select(idGame, numberEvent, slugTeamHome, slugTeamAway, numberPeriod, timeQuarter, secsPassedGame,
newptsHome, newptsAway, lineupHome, lineupAway, possession_players, team_possession) %>%
mutate(possession_home = ifelse(team_possession == slugTeamHome & possession_players == 1, 1, 0),
possession_away = ifelse(team_possession == slugTeamAway & possession_players == 1, 1, 0)) %>%
pivot_longer(cols = starts_with("lineup"),
names_to = "lineupLocation",
names_prefix = "lineup",
values_to = "lineup") %>%
mutate(ptsTeam = ifelse(lineupLocation == "Home", newptsHome, newptsAway),
ptsOpp = ifelse(lineupLocation == "Away", newptsHome, newptsAway),
possTeam = ifelse(lineupLocation == "Home", possession_home, possession_away),
possOpp = ifelse(lineupLocation == "Away", possession_home, possession_away),
slugTeam = ifelse(lineupLocation == "Home", slugTeamHome, slugTeamAway),
slugOpp = ifelse(lineupLocation == "Away", slugTeamHome, slugTeamAway)) %>%
distinct(idGame, slugTeam, slugOpp, numberPeriod, timeQuarter, secsPassedGame, ptsTeam, ptsOpp,
possTeam, possOpp, lineup, teamLocation = lineupLocation, numberEvent) %>%
arrange(idGame, numberEvent) %>%
group_by(idGame, slugTeam) %>%
mutate(lineupChange = lineup != lag(lineup),
lineupChange = coalesce(lineupChange, FALSE)) %>%
group_by(idGame, slugTeam) %>%
mutate(lineupStint = cumsum(lineupChange)) %>%
ungroup() %>%
arrange(idGame, lineupStint, numberEvent) %>%
group_by(idGame, slugTeam, lineup, lineupStint, numberPeriod) %>%
summarise(totalPossTeam = sum(possTeam),
totalPossOpp = sum(possOpp),
initialScoreTeam = ptsTeam[row_number() == min(row_number())],
initialScoreOpp = ptsOpp[row_number() == min(row_number())],
finalScoreTeam = ptsTeam[row_number() == max(row_number())],
finalScoreOpp = ptsOpp[row_number() == max(row_number())],
initialTime = secsPassedGame[row_number() == min(row_number())],
finalTime = secsPassedGame[row_number() == max(row_number())]) %>%
ungroup() %>%
arrange(idGame, lineupStint) %>%
group_by(idGame, slugTeam) %>%
mutate(finalTime = ifelse(row_number() == max(row_number()), finalTime, lead(initialTime))) %>%
ungroup() %>%
mutate(across(c(contains("Score")), ~ as.numeric(.), .names = "{col}")) %>%
mutate(totalScoreTeam = finalScoreTeam - initialScoreTeam,
totalScoreOpp = finalScoreOpp - initialScoreOpp,
netScoreTeam = totalScoreTeam - totalScoreOpp,
totalTime = finalTime - initialTime) %>%
arrange(idGame, lineupStint)
lineup_stats
## # A tibble: 36,217 x 17
## idGame slugTeam lineup lineupStint numberPeriod totalPossTeam totalPossOpp
## <dbl> <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 22100001 BKN Blake ~ 0 1 8 9
## 2 22100001 MIL Brook ~ 0 1 7 6
## 3 22100001 BKN Blake ~ 1 1 5 5
## 4 22100001 MIL Gianni~ 1 1 7 7
## 5 22100001 BKN James ~ 2 1 0 0
## 6 22100001 MIL George~ 2 1 2 2
## 7 22100001 BKN Kevin ~ 3 1 0 0
## 8 22100001 MIL George~ 3 1 0 1
## 9 22100001 BKN Jevon ~ 4 1 5 4
## 10 22100001 MIL Brook ~ 4 1 2 2
## # ... with 36,207 more rows, and 10 more variables: initialScoreTeam <dbl>,
## # initialScoreOpp <dbl>, finalScoreTeam <dbl>, finalScoreOpp <dbl>,
## # initialTime <dbl>, finalTime <dbl>, totalScoreTeam <dbl>,
## # totalScoreOpp <dbl>, netScoreTeam <dbl>, totalTime <dbl>
To add a little more information, we are just going to create columns showing the number of reserves in the lineups:
lineup_stats <- lineup_stats %>%
left_join(lineup_stats %>%
filter(lineupStint == 0) %>%
distinct(idGame, slugTeam, starters = lineup)) %>%
mutate(across(c(lineup, starters), ~ str_split(., ", "), .names = "{.col}_list")) %>%
mutate(reserves = map_int(map2(lineup_list, starters_list, setdiff), length)) %>%
select(-c(contains("list"), starters))
Now we can do a number of different analysis from this table, in the individual player, lineup or team level.
- Lineups (season):
lineup_stats %>%
group_by(lineup, slugTeam) %>%
summarise(across(starts_with("total"), sum)) %>%
ungroup() %>%
mutate(pts_100poss = totalScoreTeam / totalPossTeam * 100,
pts_opp_100poss = totalScoreOpp / totalPossOpp * 100,
net_100poss = pts_100poss - pts_opp_100poss) %>%
filter(totalTime >= 100 * 60) %>% # minimum 100 minutes
arrange(-net_100poss)
## # A tibble: 47 x 10
## lineup slugTeam totalPossTeam totalPossOpp totalScoreTeam totalScoreOpp
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Bogdan Bogd~ ATL 289 286 383 285
## 2 Anthony Edw~ MIN 372 367 485 360
## 3 Joe Ingles,~ UTA 413 410 499 415
## 4 Al Horford,~ BOS 287 290 323 272
## 5 Eric Pascha~ UTA 230 226 266 225
## 6 Darius Garl~ CLE 293 288 341 291
## 7 Bojan Bogda~ UTA 950 947 1182 1053
## 8 Caris LeVer~ IND 293 288 364 324
## 9 Danny Green~ PHI 442 431 505 443
## 10 Fred VanVle~ TOR 299 303 340 310
## # ... with 37 more rows, and 4 more variables: totalTime <dbl>,
## # pts_100poss <dbl>, pts_opp_100poss <dbl>, net_100poss <dbl>
- Team (by game):
lineup_stats %>%
group_by(idGame, slugTeam) %>%
summarise(across(starts_with("total"), sum)) %>%
ungroup() %>%
mutate(pts_100poss = totalScoreTeam / totalPossTeam * 100,
pts_opp_100poss = totalScoreOpp / totalPossOpp * 100,
net_100poss = pts_100poss - pts_opp_100poss) %>%
arrange(-pts_100poss)
## # A tibble: 1,290 x 10
## idGame slugTeam totalPossTeam totalPossOpp totalScoreTeam totalScoreOpp
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 22100330 MEM 98 98 152 79
## 2 22100347 BOS 96 97 145 117
## 3 22100325 DAL 95 93 139 107
## 4 22100637 GSW 95 95 138 96
## 5 22100377 SAC 99 100 142 130
## 6 22100245 MIN 97 96 138 95
## 7 22100538 LAL 98 99 139 106
## 8 22100459 CHI 94 93 133 118
## 9 22100410 DAL 85 84 120 96
## 10 22100631 DEN 100 99 140 108
## # ... with 1,280 more rows, and 4 more variables: totalTime <dbl>,
## # pts_100poss <dbl>, pts_opp_100poss <dbl>, net_100poss <dbl>
In the next post, I will write some thoughts and ideas on possession counts. Thanks for reading!