Adding possession count to NBA play-by-play

In this post, I will explain how to add possessions to the NBA play-by-play table. The data we’ll use will be the pbp with the lineups for each team added to every event. I explain in details the steps to get it here and here. Let’s load it:

lineup_game_stats <- read_csv("https://github.com/ramirobentes/NBA-in-R/releases/download/lineup-game-stats-7f61e89/data.csv",
                              col_types = c(timeQuarter = "c")) %>%
  mutate(across(starts_with("description"), ~ coalesce(., "")))

team_logs <- game_logs(seasons = 2022, result_types = "team")
## Acquiring NBA basic team game logs for the 2021-22 Regular Season

A team possession happens when there is:

  • a made field goal
  • a missed field goal where the shooting team doesn’t get the rebound
  • a turnover
  • a 2 or 3 free throw trip where the shooting team doesn’t keep the ball after the last attempt (via offensive rebound or flagrant/clear path foul)

In this post, I will follow the pbpstats.com model, in which they also count as possession if a team gets the ball with more than 2 seconds remaining in the quarter and just dribble the ball out without doing any of the above. On the other hand, if a team does get the ball with less than 2 seconds remaining and attempts + misses a field goal or turns the ball over, it is considered a heave and thus not counted as a possession.

Let’s start by identifying every field goal, turnover and 2 or 3 free throw trips that aren’t the result of flagrant/clear path fouls, as well as the team that’s on offense:

possession_initial <- lineup_game_stats %>%
  mutate(possession = case_when(numberEventMessageType %in% c(1, 2, 5) ~ 1,
                                numberEventMessageType == 3 & numberEventActionType %in% c(12, 15) ~ 1,
                                TRUE ~ 0),
         team_possession = case_when(is.na(slugTeamPlayer1) & possession == 1 & descriptionPlayHome == "" ~ slugTeamAway,
                                     is.na(slugTeamPlayer1) & possession == 1 & descriptionPlayVisitor == "" ~ slugTeamHome,
                                     TRUE ~ slugTeamPlayer1))

There are 2 unique situation when the play-by-play doesn’t explicitly tells us that a turnover has occurred:

  • when there’s a lane violation on the shooting team in the last free throw of a trip, the free throw gets “canceled”. Sometimes the play-by-play will describs it as “lane violation turnover” and keep the first one as “free throw 1 of 2”. In this case, we don’t need to do anything, because we will count it as a possession for being a turnover. However, there are rare times when the pbp describes it only as “lane”, and turns the previous free throw into 1 of 1. When this happens, we need to add a possession to the free throw.
  • when a coach challenges a foul call and is able to reverse it, there is a jump ball. If the team that won the challenge wins it and recovers the ball, it should count as a turnover for the opposing team.

Let’s find these instances:

lane_description_missing <- possession_initial %>%
  group_by(idGame, secsPassedGame) %>%
  filter(sum(numberEventMessageType == 3 & numberEventActionType == 10) > 0,   # free throw 1 of 1
         sum(numberEventMessageType == 7 & numberEventActionType == 3) > 0,    # lane
         sum(numberEventMessageType == 1) == 0) %>%                            # no made field goal (and 1)
  ungroup() %>%
  mutate(possession = ifelse(numberEventMessageType == 3 & numberEventActionType == 10, 1, possession)) %>%
  select(idGame, numberEvent, team_possession, possession)

lane_description_missing
## # A tibble: 3 x 4
##     idGame numberEvent team_possession possession
##      <dbl>       <dbl> <chr>                <dbl>
## 1 22100645         135 PHI                      0
## 2 22100645         136 MIA                      1
## 3 22100645         137 MIA                      0
jumpball_turnovers <- possession_initial %>%
  group_by(idGame, numberPeriod) %>%
  mutate(prev_poss = zoo::na.locf0(ifelse(possession == 1, team_possession, NA)),
         next_poss = zoo::na.locf0(ifelse(possession == 1, team_possession, NA), fromLast = TRUE)) %>%
  ungroup() %>%
  mutate(slugTeamPlayer1 = case_when(numberEventMessageType == 9 & descriptionPlayHome == "" ~ slugTeamAway,
                                     numberEventMessageType == 9 & descriptionPlayVisitor == "" ~ slugTeamHome,
                                     TRUE ~ slugTeamPlayer1)) %>%
  group_by(idGame, secsPassedGame) %>%
  mutate(team_reb_chall = sum(numberEventMessageType == 9 & numberEventActionType == 7) > 0 &
           sum(numberEventMessageType == 4 & is.na(namePlayer1)) > 0) %>% 
  ungroup() %>%
  filter(numberEventMessageType == 10 & numberEventActionType == 1 & 
           lag(numberEventMessageType) == 9 & lag(numberEventActionType) == 7 &
           slugTeamPlayer3 == lag(slugTeamPlayer1) &
           prev_poss == next_poss &
           lag(team_reb_chall) == FALSE) %>%
  mutate(team_possession = ifelse(slugTeamPlayer3 == slugTeamPlayer1, slugTeamPlayer2, slugTeamPlayer1),
         possession = 1) %>%
  select(idGame, numberEvent, team_possession, possession)

jumpball_turnovers
## # A tibble: 2 x 4
##     idGame numberEvent team_possession possession
##      <dbl>       <dbl> <chr>                <dbl>
## 1 22100328         291 CHI                      1
## 2 22100473         266 HOU                      1

We know that a team can not have 2 possessions in a row. In our current table, this is happening a lot, mainly due to offensive rebounds. So let’s identify the times when the same team had consecutive possessions and disregard the first one. We’ll also update the jump ball turnovers:

change_consec <- possession_initial %>%
  rows_update(lane_description_missing, by = c("idGame", "numberEvent")) %>%
  rows_update(jumpball_turnovers, by = c("idGame", "numberEvent")) %>%
  filter(possession == 1 | (numberEventMessageType == 6 & numberEventActionType == 30)) %>% # when there is a technical for too many players on the court (message type 6 and action type 30), the ball goes back to the other team but it's not counted as a turnover
  group_by(idGame, numberPeriod) %>%
  filter(possession == lead(possession) & team_possession == lead(team_possession)) %>%
  ungroup() %>%
  mutate(possession = 0) %>%
  select(idGame, numberEvent, possession)

# replacing in original data
poss_pack <- possession_initial %>%
  rows_update(lane_description_missing, by = c("idGame", "numberEvent")) %>%
  rows_update(jumpball_turnovers, by = c("idGame", "numberEvent")) %>%
  rows_update(change_consec, by = c("idGame","numberEvent"))

Ideally, the possession count would stop here. However, since we’re following the pbpstats.com model, we need to account for the times when a team gets the ball with more than 2 seconds remaining in the quarter and doesn’t do anything with it. In order to do that, we need to find the start of every possession. When it’s originated off a made field goal/free throw/turnover, it will start at the time of the event. When it starts off a missed field goal, it will start at the time of the defensive rebound.

start_possessions <- poss_pack %>%
  mutate(slugTeamPlayer1 = case_when(is.na(slugTeamPlayer1) & descriptionPlayHome == "" ~ slugTeamAway,
                                     is.na(slugTeamPlayer1) & descriptionPlayVisitor == "" ~ slugTeamHome,
                                     TRUE ~ slugTeamPlayer1)) %>% 
  select(idGame, numberPeriod, timeQuarter, numberEventMessageType,  slugTeamPlayer1, 
         descriptionPlayHome, descriptionPlayVisitor, numberEvent) %>%
  filter(numberEventMessageType %in% c(1:5)) %>%
  group_by(idGame, numberPeriod) %>%
  mutate(start_poss = case_when(slugTeamPlayer1 != lag(slugTeamPlayer1) & numberEventMessageType == 4 ~ timeQuarter, 
                                slugTeamPlayer1 != lag(slugTeamPlayer1) & numberEventMessageType != 4 ~ lag(timeQuarter))) %>%
  mutate(start_poss = ifelse(is.na(start_poss) & row_number() == 1, "12:00", start_poss)) %>%  # when it starts at the beginning of quarter
  ungroup()

start_possessions
## # A tibble: 227,040 x 9
##      idGame numberPeriod timeQuarter numberEventMessageType slugTeamPlayer1
##       <dbl>        <dbl> <chr>                        <dbl> <chr>          
##  1 22100001            1 11:42                            2 MIL            
##  2 22100001            1 11:39                            4 BKN            
##  3 22100001            1 11:27                            3 BKN            
##  4 22100001            1 11:27                            3 BKN            
##  5 22100001            1 11:27                            4 BKN            
##  6 22100001            1 11:25                            4 MIL            
##  7 22100001            1 11:13                            2 MIL            
##  8 22100001            1 11:10                            4 MIL            
##  9 22100001            1 11:01                            2 MIL            
## 10 22100001            1 10:59                            4 BKN            
## # ... with 227,030 more rows, and 4 more variables: descriptionPlayHome <chr>,
## #   descriptionPlayVisitor <chr>, numberEvent <dbl>, start_poss <chr>

Add it to the original table and identify heaves, according to the pbpstats.com definition:

poss_pack_start <- poss_pack %>%
  left_join(start_possessions %>%
              select(idGame, numberEvent, start_poss)) %>%
  group_by(idGame, numberPeriod) %>%
  mutate(start_poss = na.locf0(start_poss)) %>%
  ungroup() %>%
  mutate(heave = ifelse(numberEventMessageType %in% c(2, 5) & possession == 1 & as.integer(str_sub(start_poss, 4, 5)) <= 2 & str_starts(start_poss, "00:") & (lead(shotPtsHome) + lead(shotPtsAway) == 0), 1, 0),
         possession = ifelse(heave == 1, 0, possession))

We need to identify the team that had the ball at the end of the quarter. To do that, let’s find the last possession in every quarter:

last_possessions <- poss_pack_start %>%
  group_by(idGame, numberPeriod) %>%
  filter(cumsum(possession) >= max(cumsum(possession)) & possession == 1) %>%
  ungroup()

The only way the team that had the last possession could also be the team to have the ball at the end of the quarter is if they got an offensive rebound after a missed field goal or free throw. So let’s see which teams got the last rebound of every quarter, and how many seconds were left on the clock:

last_rebounds <- poss_pack_start %>%
  group_by(idGame, numberPeriod) %>%
  filter(numberEventMessageType == 4 & !(lag(numberEventMessageType) == 3 & lag(numberEventActionType) %in% c(18:20, 27:29))) %>%
  filter(row_number() == max(row_number())) %>%
  ungroup() %>%
  mutate(rebound_team = case_when(is.na(slugTeamPlayer1) & descriptionPlayHome == "" ~ slugTeamAway,
                                  is.na(slugTeamPlayer1) & descriptionPlayVisitor == "" ~ slugTeamHome,
                                  TRUE ~ slugTeamPlayer1)) %>%
  select(idGame, numberPeriod, rebound_team, timeQuarterReb = timeQuarter)

When a team makes a field goal and gets fouled (and-1), then misses the free throw, the start of the next possession should be at the moment of the defensive rebound instead of the made field goal (which counts as possession). Therefore, let’s identify these situations:

missedft_and1_last <- poss_pack_start %>%
  semi_join(last_possessions %>%
              select(idGame, secsPassedGame)) %>%
  group_by(idGame, secsPassedGame) %>%
  filter(sum(numberEventMessageType == 1) > 0 & sum(numberEventMessageType == 3 & numberEventActionType == 10) > 0 & sum(str_detect(descriptionPlayHome, "MISS") | str_detect(descriptionPlayVisitor, "MISS")) > 0) %>%
  ungroup() %>%
  filter(numberEventMessageType == 1) %>%
  select(idGame, numberEvent)

missedft_and1_last
## # A tibble: 9 x 2
##     idGame numberEvent
##      <dbl>       <dbl>
## 1 22100076         350
## 2 22100177         367
## 3 22100195         110
## 4 22100305         329
## 5 22100312         447
## 6 22100434         221
## 7 22100596         331
## 8 22100635         430
## 9 22100653         437

Now we can find the teams that kept the ball at the end of the quarter, when the previous possession ends in a missed fg/ft:

addit_poss_reb <- last_possessions %>%
  left_join(last_rebounds, by = c("idGame", "numberPeriod")) %>%
  left_join(missedft_and1_last %>%
              mutate(and1_ft = 1)) %>%
  filter(numberEventMessageType == 2 | (numberEventMessageType == 3 & (str_detect(descriptionPlayHome, "MISS") | str_detect(descriptionPlayVisitor, "MISS"))) | and1_ft == 1) %>%
  filter(rebound_team != team_possession,  # ignore offensive rebounds
         as.integer(str_sub(timeQuarterReb, 4, 5)) >= 3) %>%   # more than 2 seconds remaining in quarter
  transmute(idGame, numberPeriod, start_poss = timeQuarterReb, 
            team_possession = rebound_team, possession)

addit_poss_reb
## # A tibble: 205 x 5
##      idGame numberPeriod start_poss team_possession possession
##       <dbl>        <dbl> <chr>      <chr>                <dbl>
##  1 22100006            4 00:08      WAS                      1
##  2 22100007            2 00:03      MEM                      1
##  3 22100007            4 00:05      MEM                      1
##  4 22100008            4 00:07      MIN                      1
##  5 22100011            4 00:20      UTA                      1
##  6 22100015            4 00:10      MIA                      1
##  7 22100018            4 00:07      NYK                      1
##  8 22100021            4 00:05      BKN                      1
##  9 22100022            4 00:09      CHI                      1
## 10 22100023            4 00:11      HOU                      1
## # ... with 195 more rows

And when it ends in a made fg/ft or turnover:

addit_poss_made <- last_possessions %>%
  filter(numberEventMessageType %in% c(1, 5) | (numberEventMessageType == 3 & !str_detect(descriptionPlayHome, "MISS") & !str_detect(descriptionPlayVisitor, "MISS"))) %>%
  anti_join(missedft_and1_last) %>%    # tirando fgs and1 que foram seguidos de missed ft (vai passar a valer o momento do rebote)
  left_join(team_logs %>%
              distinct(idGame, .keep_all = TRUE) %>%
              select(idGame, slugTeam, slugOpponent)) %>%
  mutate(team_possession_next = ifelse(team_possession == slugTeam, slugOpponent, slugTeam)) %>%
  filter(as.integer(str_sub(timeQuarter, 4, 5)) >= 3) %>%
  transmute(idGame, numberPeriod, start_poss = timeQuarter, 
            team_possession = team_possession_next, possession)

addit_poss_made
## # A tibble: 347 x 5
##      idGame numberPeriod start_poss team_possession possession
##       <dbl>        <dbl> <chr>      <chr>                <dbl>
##  1 22100002            4 00:07      GSW                      1
##  2 22100008            1 00:03      HOU                      1
##  3 22100009            4 00:08      PHI                      1
##  4 22100010            4 00:14      SAS                      1
##  5 22100012            1 00:12      DEN                      1
##  6 22100012            4 00:21      DEN                      1
##  7 22100013            1 00:03      SAC                      1
##  8 22100014            4 00:03      ATL                      1
##  9 22100017            4 00:13      CHA                      1
## 10 22100020            4 00:05      TOR                      1
## # ... with 337 more rows

Now let’s put it all together and add some information to the other columns:

additional_possessions <- bind_rows(addit_poss_reb,  addit_poss_made) %>%
  mutate(numberEventMessageType = 0,
         numberEventActionType = 0,
         numberOriginal = 0,
         descriptionPlayNeutral = "Last possession of quarter") %>%
  left_join(poss_pack %>%
              filter(numberEventMessageType == 13) %>%
              select(-c(numberOriginal, numberEventMessageType, numberEventActionType,
                        descriptionPlayNeutral, possession, team_possession))) %>%
  mutate(numberEvent = numberEvent - 0.5)

final_poss_pack <- poss_pack_start %>%
  bind_rows(additional_possessions) %>%
  arrange(idGame, numberEvent) %>%
  select(-c(hasFouls, subOpp, canSub))

final_poss_pack %>%
  select(idGame, descriptionPlayHome, descriptionPlayVisitor, team_possession, possession) %>%
  filter(!is.na(team_possession))
## # A tibble: 275,728 x 5
##      idGame descriptionPlayHome    descriptionPlayVi~ team_possession possession
##       <dbl> <chr>                  <chr>              <chr>                <dbl>
##  1 22100001 "Jump Ball Lopez vs. ~ ""                 MIL                      0
##  2 22100001 "MISS Allen 27' 3PT J~ ""                 MIL                      1
##  3 22100001 ""                     "Durant REBOUND (~ BKN                      0
##  4 22100001 "Antetokounmpo S.FOUL~ ""                 MIL                      0
##  5 22100001 ""                     "MISS Claxton Fre~ BKN                      0
##  6 22100001 ""                     "MISS Claxton Fre~ BKN                      1
##  7 22100001 "Antetokounmpo REBOUN~ ""                 MIL                      0
##  8 22100001 "MISS Antetokounmpo 2~ ""                 MIL                      0
##  9 22100001 "Lopez REBOUND (Off:1~ ""                 MIL                      0
## 10 22100001 "MISS Antetokounmpo 1~ ""                 MIL                      1
## # ... with 275,718 more rows

We finally have a table with the possessions in the play-by-play! There’s just one thing left to do: just like with points in the plus minus, whenever there is a substitution in between free throws, the possession should be counted to the player who was on the floor when the foul occurred. Since we are using the last free throw of a trip as the trigger for possessions, this would not be the case, as the player who subbed in would already be on the court. Therefore, we are going to create a new column when the possession will be counted at the moment of the foul. First, let’s identify the occasions when the possession ended on free throws, and find the foul that originated them:

fouls_possessions <- final_poss_pack %>%
  filter(numberEventMessageType == 3 & possession == 1) %>%
  select(idGame, secsPassedGame, player_foul = namePlayer1, team_possession, numberEvent_ft = numberEvent) %>%
  left_join(final_poss_pack %>%
              filter(numberEventMessageType == 6 & !numberEventActionType %in% c(6, 9, 11, 13, 14, 15, 16, 17)) %>%  # fouls
              mutate(description = ifelse(slugTeamPlayer1 == slugTeamHome, descriptionPlayHome, descriptionPlayVisitor)) %>%
              select(idGame, secsPassedGame, player_foul = namePlayer2, numberEvent_foul = numberEvent, description)) %>%
  add_count(idGame, secsPassedGame, player_foul, name = "number_plays") %>%
  filter(!(number_plays > 1 & !str_detect(description, " S.FOUL |\\.PN\\)"))) # if the same player is fouled twice in the same second, keeps only the shooting foul

# there  are occasions when the namePlayer2, who is supposed to be the player who got fouled, is wrong, leading to NAs in the join. When this happens, we will join without the player.
missing_comp <- fouls_possessions %>%
  filter(is.na(numberEvent_foul)) %>%
  left_join(final_poss_pack %>%
              filter(numberEventMessageType == 6 & !numberEventActionType %in% c(6, 9, 11, 13, 14, 15, 16, 17)) %>%
              mutate(description = ifelse(slugTeamPlayer1 == slugTeamHome, descriptionPlayHome, descriptionPlayVisitor)) %>%
              select(idGame, secsPassedGame, numberEvent_foul = numberEvent, description),
            by = c("idGame", "secsPassedGame"),
            suffix = c("", "_new")) %>%
  mutate(numberEvent_foul = numberEvent_foul_new,
         description = description_new) %>%
  select(-c(numberEvent_foul_new, description_new))

Now we create the column with the possession at the moment of the foul and add it to the original table:

fouls_possessions <- fouls_possessions %>%
  rows_update(missing_comp, by = c("idGame", "secsPassedGame", "player_foul", "team_possession", "numberEvent_ft", "number_plays")) %>%
  select(idGame, secsPassedGame, team_possession, numberEvent_ft, numberEvent_foul) %>%
  pivot_longer(cols = starts_with("numberEvent"),
               names_to = "type_play",
               values_to = "numberEvent",
               names_prefix = "numberEvent_") %>%
  mutate(possession_players = ifelse(type_play == "foul", 1, 0)) %>%  
  select(-type_play)

final_poss_pack <- final_poss_pack %>%
  mutate(possession_players = possession) %>%
  rows_update(fouls_possessions, by = c("idGame", "numberEvent"))

final_poss_pack %>%
  select(idGame, descriptionPlayHome, descriptionPlayVisitor, team_possession, possession, possession_players) %>%
  filter(!is.na(team_possession))
## # A tibble: 275,728 x 6
##      idGame descriptionPlayHome    descriptionPlayVi~ team_possession possession
##       <dbl> <chr>                  <chr>              <chr>                <dbl>
##  1 22100001 "Jump Ball Lopez vs. ~ ""                 MIL                      0
##  2 22100001 "MISS Allen 27' 3PT J~ ""                 MIL                      1
##  3 22100001 ""                     "Durant REBOUND (~ BKN                      0
##  4 22100001 "Antetokounmpo S.FOUL~ ""                 BKN                      0
##  5 22100001 ""                     "MISS Claxton Fre~ BKN                      0
##  6 22100001 ""                     "MISS Claxton Fre~ BKN                      1
##  7 22100001 "Antetokounmpo REBOUN~ ""                 MIL                      0
##  8 22100001 "MISS Antetokounmpo 2~ ""                 MIL                      0
##  9 22100001 "Lopez REBOUND (Off:1~ ""                 MIL                      0
## 10 22100001 "MISS Antetokounmpo 1~ ""                 MIL                      1
## # ... with 275,718 more rows, and 1 more variable: possession_players <dbl>

This is our final table. I recommend you save it every time you run the code to this point, and use the output to do play-by-play analysis. One of the analysis we can do is extract lineup and player stats from it, including plus minus, number of possessions and total time, just like we did in this post. We’ll do it by getting the stats for each lineup stint:

lineup_stats <- final_poss_pack %>%
  select(idGame, numberEvent, slugTeamHome, slugTeamAway, numberPeriod, timeQuarter, secsPassedGame, 
         newptsHome, newptsAway, lineupHome, lineupAway, possession_players, team_possession) %>%
  mutate(possession_home = ifelse(team_possession == slugTeamHome & possession_players == 1, 1, 0),
         possession_away = ifelse(team_possession == slugTeamAway & possession_players == 1, 1, 0)) %>%
  pivot_longer(cols = starts_with("lineup"),
               names_to = "lineupLocation",
               names_prefix = "lineup",
               values_to = "lineup") %>%
  mutate(ptsTeam = ifelse(lineupLocation == "Home", newptsHome, newptsAway),
         ptsOpp = ifelse(lineupLocation == "Away", newptsHome, newptsAway),
         possTeam = ifelse(lineupLocation == "Home", possession_home, possession_away),
         possOpp = ifelse(lineupLocation == "Away", possession_home, possession_away),
         slugTeam = ifelse(lineupLocation == "Home", slugTeamHome, slugTeamAway),
         slugOpp = ifelse(lineupLocation == "Away", slugTeamHome, slugTeamAway)) %>%
  distinct(idGame, slugTeam, slugOpp, numberPeriod, timeQuarter, secsPassedGame, ptsTeam, ptsOpp,
           possTeam, possOpp, lineup, teamLocation = lineupLocation, numberEvent) %>%
  arrange(idGame, numberEvent) %>%
  group_by(idGame, slugTeam) %>%
  mutate(lineupChange = lineup != lag(lineup),
         lineupChange = coalesce(lineupChange, FALSE)) %>%
  group_by(idGame, slugTeam) %>%
  mutate(lineupStint = cumsum(lineupChange)) %>%
  ungroup() %>%
  arrange(idGame, lineupStint, numberEvent) %>%
  group_by(idGame, slugTeam, lineup, lineupStint, numberPeriod) %>%
  summarise(totalPossTeam = sum(possTeam),
            totalPossOpp = sum(possOpp),
            initialScoreTeam = ptsTeam[row_number() == min(row_number())],
            initialScoreOpp = ptsOpp[row_number() == min(row_number())],
            finalScoreTeam = ptsTeam[row_number() == max(row_number())],
            finalScoreOpp =  ptsOpp[row_number() == max(row_number())],
            initialTime = secsPassedGame[row_number() == min(row_number())],
            finalTime = secsPassedGame[row_number() == max(row_number())]) %>%
  ungroup() %>%
  arrange(idGame, lineupStint) %>%
  group_by(idGame, slugTeam) %>%                              
  mutate(finalTime = ifelse(row_number() == max(row_number()), finalTime, lead(initialTime))) %>%  
  ungroup() %>%
  mutate(across(c(contains("Score")), ~ as.numeric(.), .names = "{col}")) %>%
  mutate(totalScoreTeam = finalScoreTeam - initialScoreTeam,
         totalScoreOpp = finalScoreOpp - initialScoreOpp,
         netScoreTeam = totalScoreTeam - totalScoreOpp,
         totalTime = finalTime - initialTime) %>%
  arrange(idGame, lineupStint)

lineup_stats
## # A tibble: 36,217 x 17
##      idGame slugTeam lineup  lineupStint numberPeriod totalPossTeam totalPossOpp
##       <dbl> <chr>    <chr>         <int>        <dbl>         <dbl>        <dbl>
##  1 22100001 BKN      Blake ~           0            1             8            9
##  2 22100001 MIL      Brook ~           0            1             7            6
##  3 22100001 BKN      Blake ~           1            1             5            5
##  4 22100001 MIL      Gianni~           1            1             7            7
##  5 22100001 BKN      James ~           2            1             0            0
##  6 22100001 MIL      George~           2            1             2            2
##  7 22100001 BKN      Kevin ~           3            1             0            0
##  8 22100001 MIL      George~           3            1             0            1
##  9 22100001 BKN      Jevon ~           4            1             5            4
## 10 22100001 MIL      Brook ~           4            1             2            2
## # ... with 36,207 more rows, and 10 more variables: initialScoreTeam <dbl>,
## #   initialScoreOpp <dbl>, finalScoreTeam <dbl>, finalScoreOpp <dbl>,
## #   initialTime <dbl>, finalTime <dbl>, totalScoreTeam <dbl>,
## #   totalScoreOpp <dbl>, netScoreTeam <dbl>, totalTime <dbl>

To add a little more information, we are just going to create columns showing the number of reserves in the lineups:

lineup_stats <- lineup_stats %>%
  left_join(lineup_stats %>%
              filter(lineupStint == 0) %>%
              distinct(idGame, slugTeam, starters = lineup)) %>%
  mutate(across(c(lineup, starters), ~ str_split(., ", "), .names = "{.col}_list")) %>%
  mutate(reserves = map_int(map2(lineup_list, starters_list, setdiff), length)) %>%
  select(-c(contains("list"), starters))

Now we can do a number of different analysis from this table, in the individual player, lineup or team level.

  • Lineups (season):
lineup_stats %>%
  group_by(lineup, slugTeam) %>%
  summarise(across(starts_with("total"), sum)) %>%
  ungroup() %>%
  mutate(pts_100poss = totalScoreTeam / totalPossTeam * 100,
         pts_opp_100poss = totalScoreOpp / totalPossOpp * 100,
         net_100poss = pts_100poss - pts_opp_100poss) %>%
  filter(totalTime >= 100 * 60) %>% # minimum 100 minutes
  arrange(-net_100poss)
## # A tibble: 47 x 10
##    lineup       slugTeam totalPossTeam totalPossOpp totalScoreTeam totalScoreOpp
##    <chr>        <chr>            <dbl>        <dbl>          <dbl>         <dbl>
##  1 Bogdan Bogd~ ATL                289          286            383           285
##  2 Anthony Edw~ MIN                372          367            485           360
##  3 Joe Ingles,~ UTA                413          410            499           415
##  4 Al Horford,~ BOS                287          290            323           272
##  5 Eric Pascha~ UTA                230          226            266           225
##  6 Darius Garl~ CLE                293          288            341           291
##  7 Bojan Bogda~ UTA                950          947           1182          1053
##  8 Caris LeVer~ IND                293          288            364           324
##  9 Danny Green~ PHI                442          431            505           443
## 10 Fred VanVle~ TOR                299          303            340           310
## # ... with 37 more rows, and 4 more variables: totalTime <dbl>,
## #   pts_100poss <dbl>, pts_opp_100poss <dbl>, net_100poss <dbl>
  • Team (by game):
lineup_stats %>%
  group_by(idGame, slugTeam) %>%
  summarise(across(starts_with("total"), sum)) %>%
  ungroup() %>%
  mutate(pts_100poss = totalScoreTeam / totalPossTeam * 100,
         pts_opp_100poss = totalScoreOpp / totalPossOpp * 100,
         net_100poss = pts_100poss - pts_opp_100poss) %>%
  arrange(-pts_100poss)
## # A tibble: 1,290 x 10
##      idGame slugTeam totalPossTeam totalPossOpp totalScoreTeam totalScoreOpp
##       <dbl> <chr>            <dbl>        <dbl>          <dbl>         <dbl>
##  1 22100330 MEM                 98           98            152            79
##  2 22100347 BOS                 96           97            145           117
##  3 22100325 DAL                 95           93            139           107
##  4 22100637 GSW                 95           95            138            96
##  5 22100377 SAC                 99          100            142           130
##  6 22100245 MIN                 97           96            138            95
##  7 22100538 LAL                 98           99            139           106
##  8 22100459 CHI                 94           93            133           118
##  9 22100410 DAL                 85           84            120            96
## 10 22100631 DEN                100           99            140           108
## # ... with 1,280 more rows, and 4 more variables: totalTime <dbl>,
## #   pts_100poss <dbl>, pts_opp_100poss <dbl>, net_100poss <dbl>

In the next post, I will write some thoughts and ideas on possession counts. Thanks for reading!

 Share!

 
comments powered by Disqus