Analyzing NBA lineups stats

This is part 3 in our series of posts on NBA lineups data. In the last post we extracted lineups stats from the play-by-play. Now, we are going to reproduce some numbers from articles using those outputs. You can find the play-by-play table with lineups added here and the lineups stats table here. Let’s load both of them:

library(tidyverse)
library(glue)

lineup_game <- read_csv("LineupGame0107.csv",
                        col_types = cols(timeQuarter = "c"))

lineup_stats <- read_csv("https://raw.githubusercontent.com/ramirobentes/NBAblog/master/LineupStatsNBA.csv")

In our first step, we will find how a lineup performs with a specific group of players and/or without a specific group of players. Last week, Zach Lowe published a column with the 13 most fascinating lineups to watch in Orlando. We are going to use it to demonstrate how to reproduce some of the stats.

Let’s start with a basic example:

We want to find how a lineup performs when 3 players (Embiid, Horford, Simmons) are on the floor at the same time. The simple solution would be just doing something like this:

lineup_stats %>%
  filter(str_detect(lineup, "Ben Simmons") &
           str_detect(lineup, "Joel Embiid") &
           str_detect(lineup, "Al Horford")) %>%
  summarise(plusMinus = sum(netScoreTeam))
## # A tibble: 1 x 1
##   plusMinus
##       <dbl>
## 1        -9

But let’s look at this other example:

Now we have to account for players on the floor (Giannis Antetokounmpo) and players off the floor (every big man in the Bucks roster - Lopez brothers, Ilyasova and D.J. Wilson). By sticking to our previous approach, we would have something like this:

lineup_stats %>%
  filter(str_detect(lineup, "Giannis Antetokounmpo") &
           !str_detect(lineup, "Lopez|Ilyasova|D.J. Wilson")) %>%
  summarise(plusMinus = sum(netScoreTeam),
            totalTime = sum(totalTime)) %>%
  ungroup() %>%
  mutate(totalTime = paste0(floor(totalTime / 60), ":", str_pad(round(totalTime %% 60, 0), side = "left", width = 2, pad = 0)))
## # A tibble: 1 x 2
##   plusMinus totalTime
##       <dbl> <chr>    
## 1       140 259:47

However, we want to make our code as scalable as possible. Meaning we want to make a single solution work for different situations, no matter how many players are supposed to be on or off the court. Instead of adding a new line to our filter for every new player added to the analysis, we will create a formula that accounts for every change. First, I want to introduce a small trick I learned about while doing this: I always thought it was only possible to put more than one element in a str_detect() pattern when we wanted to detect element X OR element Y, just like we did above with Lopez, Ilyasova or D.J. Wilson. It turns out we can do the same when we want to detect element X AND element Y. Here’s how, using another example from Lowe’s article:

lineup_stats %>%
  filter(str_detect(lineup, "(?=.*Seth Curry)(?=.*Porzingis)(?=.*Doncic)(?=.*Finney-Smith)(?=.*Hardaway Jr.)")) %>%
  summarise(plusMinus = sum(netScoreTeam),
            totalTime = sum(totalTime)) %>%
  ungroup() %>%
  mutate(totalTime = paste0(floor(totalTime / 60), ":", str_pad(round(totalTime %% 60, 0), side = "left", width = 2, pad = 0)))
## # A tibble: 1 x 2
##   plusMinus totalTime
##       <dbl> <chr>    
## 1        30 122:22

This makes it easier to scale our code because now we don’t have to add a str_detect() for every new element we want to consider. It’s still a bit repetitive to add the pattern for every player, so let’s make it even easier by finding a way to add all elements from a vector to our filter:

players_wanted <- c("Seth Curry", "Porzingis", "Doncic", "Finney-Smith", "Hardaway Jr.")
paste(map_chr(players_wanted, ~ glue("(?=.*{.x})")), collapse = "")
## [1] "(?=.*Seth Curry)(?=.*Porzingis)(?=.*Doncic)(?=.*Finney-Smith)(?=.*Hardaway Jr.)"

Now all we need is to provide the names of the players we want in a vector. Then, the next step is to create a function. We want to be able to account for players on and off the floor, so we are going to have 2 arguments in it (with and without):

function_players <- function(with = NULL, without = NULL){
  lineup_stats %>%
    filter(if (!is.null(with)) str_detect(lineup, paste(map_chr(with, ~ glue("(?=.*{.x})")), collapse = "")) else TRUE) %>%
    filter(if (!is.null(without)) !str_detect(lineup, paste(without, collapse = "|")) else TRUE) %>%
    summarise(games = n_distinct(idGame), 
              totalTime = sum(totalTime),
              plusMinus = sum(netScoreTeam)) %>%
    mutate(totalTime = paste0(floor(totalTime / 60), ":", str_pad(round(totalTime %% 60, 0), side = "left", width = 2, pad = 0)),
           with_plr = paste(with, collapse = ", "),
           without_plr = paste(without, collapse = ", ")) %>%
    select(with_plr, without_plr, everything())
}

The function is taking the two arguments and filtering them only if they exist. Therefore, if we only want to find the stats of a lineup when a group of players is on the floor, we don’t need to add the 2nd argument (players off the floor), and vice-versa. Let’s test it:

function_players("Zion Williamson", c("Jaxson Hayes", "Derrick Favors", "Jahlil Okafor", "Nicolo Melli"))
## # A tibble: 1 x 5
##   with_plr        without_plr                          games totalTime plusMinus
##   <chr>           <chr>                                <int> <chr>         <dbl>
## 1 Zion Williamson Jaxson Hayes, Derrick Favors, Jahli~    11 48:10            19

If we want, we can also group by all the lineups that fit the criteria:

function_players_group <- function(with = NULL, without = NULL){
  lineup_stats %>%
    filter(if (!is.null(with)) str_detect(lineup, paste(map_chr(with, ~ glue("(?=.*{.x})")), collapse = "")) else TRUE) %>%
    filter(if (!is.null(without)) !str_detect(lineup, paste(without, collapse = "|")) else TRUE) %>%
    group_by(lineup) %>%
    summarise(games = n_distinct(idGame), 
              totalTime = sum(totalTime),
              plusMinus = sum(netScoreTeam)) %>%
    ungroup() %>%
    arrange(-totalTime) %>%
    mutate(totalTime = paste0(floor(totalTime / 60), ":", str_pad(round(totalTime %% 60, 0), side = "left", width = 2, pad = 0)))
}

function_players_group("Bam Adebayo", c("Kelly Olynyk", "Meyers Leonard", "Chris Silva", "Udonis Haslem", "James Johnson"))
## # A tibble: 87 x 4
##    lineup                                              games totalTime plusMinus
##    <chr>                                               <int> <chr>         <dbl>
##  1 Bam Adebayo, Derrick Jones Jr., Duncan Robinson, J~    24 161:59           37
##  2 Bam Adebayo, Derrick Jones Jr., Goran Dragic, Jimm~    20 49:53            -9
##  3 Bam Adebayo, Derrick Jones Jr., Jimmy Butler, Kend~    11 43:03            -2
##  4 Bam Adebayo, Derrick Jones Jr., Goran Dragic, Jimm~    10 38:36             0
##  5 Bam Adebayo, Derrick Jones Jr., Duncan Robinson, J~    11 35:54            20
##  6 Bam Adebayo, Duncan Robinson, Goran Dragic, Jae Cr~     8 27:32            15
##  7 Bam Adebayo, Derrick Jones Jr., Duncan Robinson, G~    14 27:05            18
##  8 Andre Iguodala, Bam Adebayo, Duncan Robinson, Gora~    12 25:33             7
##  9 Bam Adebayo, Duncan Robinson, Jae Crowder, Jimmy B~     7 23:47             6
## 10 Bam Adebayo, Derrick Jones Jr., Goran Dragic, Kend~     7 23:22            12
## # ... with 77 more rows

Finally, let’s find every combination of n players in a lineup. What pair of players has played the most minutes together?

lineup_combinations <- lineup_stats %>%
  select(idGame, slugTeam, lineup, lineupStint, netScoreTeam, totalTime) %>%
  separate_rows(lineup, sep = ", ") %>%
  group_by(idGame, slugTeam, lineupStint, netScoreTeam, totalTime) %>%
  summarise(combinations = combn(lineup, m = 2, simplify = FALSE)) %>%
  ungroup() %>%
  mutate(combinations = map_chr(combinations, ~ paste(sort(.), collapse = ", ")))

lineup_combinations
## # A tibble: 539,240 x 6
##      idGame slugTeam lineupStint netScoreTeam totalTime combinations            
##       <dbl> <chr>          <dbl>        <dbl>     <dbl> <chr>                   
##  1 21900001 NOP                0            1       434 Brandon Ingram, Derrick~
##  2 21900001 NOP                0            1       434 Brandon Ingram, JJ Redi~
##  3 21900001 NOP                0            1       434 Brandon Ingram, Jrue Ho~
##  4 21900001 NOP                0            1       434 Brandon Ingram, Lonzo B~
##  5 21900001 NOP                0            1       434 Derrick Favors, JJ Redi~
##  6 21900001 NOP                0            1       434 Derrick Favors, Jrue Ho~
##  7 21900001 NOP                0            1       434 Derrick Favors, Lonzo B~
##  8 21900001 NOP                0            1       434 JJ Redick, Jrue Holiday 
##  9 21900001 NOP                0            1       434 JJ Redick, Lonzo Ball   
## 10 21900001 NOP                0            1       434 Jrue Holiday, Lonzo Ball
## # ... with 539,230 more rows

Still from Lowe’s column:

lineup_combinations %>%
  group_by(combinations) %>%
  summarise(totalTime = sum(totalTime),
            plusMinus = sum(netScoreTeam)) %>%
  ungroup() %>%
  arrange(-totalTime) %>%
  mutate(totalTime = paste0(floor(totalTime / 60), ":", str_pad(round(totalTime %% 60, 0), side = "left", width = 2, pad = 0)))
## # A tibble: 4,347 x 3
##    combinations                     totalTime plusMinus
##    <chr>                            <chr>         <dbl>
##  1 Bojan Bogdanovic, Rudy Gobert    1910:57         303
##  2 James Harden, P.J. Tucker        1834:12         149
##  3 Tomas Satoransky, Zach LaVine    1610:08         -19
##  4 Donovan Mitchell, Rudy Gobert    1606:48         185
##  5 Cedi Osman, Collin Sexton        1603:49        -315
##  6 Damian Lillard, Hassan Whiteside 1601:43         113
##  7 Al Horford, Tobias Harris        1594:46         144
##  8 Royce O'Neale, Rudy Gobert       1590:18         277
##  9 Bojan Bogdanovic, Royce O'Neale  1580:27         259
## 10 Ben Simmons, Tobias Harris       1577:10          90
## # ... with 4,337 more rows

To find 3 and 4-men combinations, we just need to change the argument m in the combn function.

I have been asked on Twitter about adding offensive and defensive efficiency to these stats. However, getting to the official count of possessions used by NBA.com from the play-by-play data is a little complicated, and I haven’t yet been able to replicate it. Therefore, any advice is welcome! In the next post, we are going to look at other stats when players are on or off the court. Thanks for reading!

 Share!

 
comments powered by Disqus