clock menu more-arrow no yes mobile

Filed under:

Quantifying the Individual: How to Partition the Signal of Statistics

NBA: Milwaukee Bucks at Los Angeles Clippers Jayne Kamin-Oncea-USA TODAY Sports

Statistics have been on my mind. Mitchell’s write-up of Seth Partnow’s book discussed how Brook’s lackluster rebounding totals obscure the fact that his box-outs clear space for others to grab rebounds instead. Adam’s wrap-up last week aimed to determine when the Bucks’ Big Three is “on” by examining whether they exceeded their season averages in points and assists in games they played together. I listened to Timothée Chalamet’s evergreen rap video from his statistics class in high school.

In mulling statistics, I returned to a question that surfaced in comments on my piece last week that tried to analyze Giannis’ revolution on the game of basketball. Basketball is increasingly a game of stars, and individual statistics - points scored, championships won, etc. - are its currency. Yet it is difficult to evaluate a team sport solely through the lens of individuals. Thus, what follows is my attempt to make sense of statistics: their components, their drawbacks, and ultimately what they should be.

Let’s start with the basics: what is a statistic? It is a population parameter estimated from a sample of that population. In basketball terms, we assume a population of plays, minutes, games, seasons, etc.; select a subset of those instances; and then calculate a metric of interest. When Giannis scores 30 points in a game (sample), that is a statistic that estimates his scoring output over the course of the season (population).

The law of large numbers dictates that the bigger the sample, the closer we get to the actual population parameter. For the same reason that flipping a coin 10 times may net eight heads, but flipping a coin 100 times will likely not yield 80 heads, Giannis’ average PPG over 10 games will likely be closer to his season average than his tally on a given night. The fact that statistics fluctuate is both fundamental and important. It shows that, as the season continues, the statistic approaches the true population parameter. The noise of individual games slowly recedes to reveal a signal.

It is worthwhile, however, to more deeply partition the noise. An obvious source of noise is the opponent. The Bucks play different teams over the course of a season, and these teams vary in general quality as well as their ability to match up with particular players. Against good teams with good personnel (e.g., the Suns), individual statistics typically drop; against bad teams with bad personnel (e.g., the Lakers), individual statistics typically soar. A larger sample size can account for this variability by including a bigger swath of opponents that represent the overall schedule.

Another source of noise is one’s teammates. On a given night, one’s teammates will play better or worse. This can cut both ways. A better game from one’s teammates may be a rising tide that lifts all boats, providing more opportunities to pad one’s individual statistics. Alternatively, a better game from one’s teammates may de-incentivize the need for the player to pile up individual statistics. (I’ll note that this parallels debates about immigration without going into detail.) Regardless, fluctuation in the performance of teammates will also reach equilibrium in expectation over the course of a season.

The last source of noise is oneself. Independent of the opponent and one’s teammates, it may be a particularly good or bad day at the office. But, as above, this source of variation also evens out in the long run.

This leaves the signal. Based on the sources of noise discussed above, the signal should be the value of a player, above and beyond variation in opponent, teammates, and even the player themselves. As approximations of that signal, statistics represent an estimate of that value.

However, this is not the case. Individual statistics can be critiqued for myriad reasons. The statistics that commonly inhabit a box score were not necessarily chosen for their importance; they are simply readily observable. A variety of modern analytic techniques aim to ameliorate their flaws, although often in mechanistically unclear ways due to the use of proprietary black box algorithms. My focus here is not to beat a dead horse. Instead, I want to focus on purportedly individual statistics that do not actually index the individual per se.

The most egregious example is the assist. An assist is recorded when a player passes the ball to another player who makes a shot. The most challenging hurdle to recording an assist is often the latter part of the equation: the ability of a teammate to score. The assisting player can help by putting their teammate in a better position (a point that I will revisit below), but this individual statistic ultimately rests on someone other than the individual.

As a case in point, Old Friend Scott Skiles - do we say Old Friend for former coaches? - holds the record for the most assists in an NBA game. He piled up 30 assists - bully for him. Meanwhile, his Orlando Magic teammates piled in 133 points on 57% shooting. Their season averages? 106 points and 46% shooting. With all due respect, Skiles’ teammates earned him that record.

Let’s turn to rebounds. A rebound is recorded when a player retains possession of a ball following a missed shot. This is almost a mirror image of an assist: instead of the player starting with the ball and handing it off for a teammate to score, a teammate (or opponent) does not score and the player ends up with the ball. Here, the individual statistic again misses the mark by resting on someone other than the individual, only this time it requires a miss rather than a make.

In a similar manner, let’s look at Charles Oakley, who holds the record for the most rebounds in an NBA game. He snatched 35 rebounds for the Chicago Bulls in a loss to the Cleveland Cavaliers. The Cavs shot close to their season average, shooting 51% that night compared to their 49% average. But his Bulls teammates shot an abysmal 40%, below their 49% season average. This fueled his 16 offensive rebounds, only two off from the single-game NBA record. (As a side note, did y’all know that Zaza captured said record as a Buck? Maybe I need to get back to trivia...) With credit to Oakley, he owes a thank you to MJ and his teammates for losing the game due to their poor shooting performance.

These individual statistics stand out, but I think that other, more indirect arguments could be made for points, turnovers, steals, blocks, and even championships. The exemplars of Skiles and Oakley represent extreme cases that average out in the long run, but they show that the signal of individual statistics is fundamentally pinned to factors besides the individual. This supports our intuition that it is harder to gauge individual statistics on bad teams. Bad teams likely hit fewer shots, deflating assist totals and inflating rebound totals. Unlucky teams who consistently catch opponents on hot shooting nights (cough cough the Bucks) will rebound less. Consistent trends do not average out with larger samples over time — rather, they are baked into our evaluations of individuals.

To me, the heart of the matter is ontological: what should individual statistics be? It would be great if we could quantify the impact of an individual that is wholly removed from their opponents, teammates, and daily fluctuations. To be sure, a lot of advanced statistics attempt to do as much! But I think it is more interesting to consider whether that should be our goal in the first place.

In my view, much of sport analytics can be traced back to baseball. When a pitcher is facing a batter, it makes sense to think in terms of individual statistics. (Although certain statistics, like RBIs, still rely on the contributions of teammates.) Applying this individualistic paradigm to team sports like basketball and soccer, however, paves over interdependencies between players. In almost all situations, a batter should try to hit the ball; likewise, a bowler should try to knock down all of the pins, for any Jim Gaffigan fans out there. The same cannot be said in basketball. On a drive, should Giannis score or kick it out to a teammate? It depends.

At issue here is the signal, the true value of the player. In baseball, it is relatively linear: more hits = good. In basketball, it is more complex: more points = ?. Giannis should not score every time he drives. When he instead dishes it to Pat, Bobby, or gRay, his statistics become contingent on their shot-making prowess. Yet, this contingency is not complete. Giannis can make super-human wrap-around passes that others do not attempt, and that he has more chemistry to find their ideal spots. This leaves his individual statistics to be a finicky bouillabaisse of his and others’ efforts. But maybe, in a team sport like basketball, that’s the point.

My hunch is that individual statistics should indeed be agnostic of opponent, but try to find a balance between the individual’s abilities and their capacity to make their teammates better. This may require some fine-tuning of existing statistics, or already be represented in statistics that I am too ignorant to have found or understood. At the very least, I think that it is important to read box scores with a more interdependent mindset. Giannis had a lot of rebounds - how badly were the opponents shooting? Jrue had a lot of assists - how well were his fellow Bucks shooting? The goal is not to delegitimize the individual, but to place their contributions within the broader context of the team.

The key with most statistics is to be aware of their drawbacks. Individual statistics in basketball implicitly assume that they are capturing the signal of the individual player and averaging out the noise - without accounting for interdependencies that are part of the signal itself. In general, the assignment of numbers to any sort of entity runs the risk of overlooking contextual factors that contribute to those entities - a lesson that is as true in basketball as it is elsewhere.

Author’s note: Will trivia ever return? Probably.