Unexpected Findings of Possession Efficiency

I have spent the last few weeks chasing what I thought was a very elegant understanding between the number of possessions in a game and it’s result. I’m hesitant to call it a failure, but I have certainly left this exercise with many more questions than answers.

Essentially, my argument was that as the number of possessions in a game decreases, the liklihood of an upset increases. This is simply an extension of the fact that a weaker team is more likely to win against a stronger team in a single-game series than in a multiple-game series.

Despite finding that possessions per game seems to be normally distributed:

My first hint that I was wrong came in the fact that goals per game was actually negatively corrolated to possessions per game (albeit weakly). I expected at least a positive corrolation – perhaps even a strong one.

Then, I ranked each EPL team on the number of possessions per goal and also looked at the standard deviation of game-over-game possession efficiencies.

Team

Poss. Per Goal

Std. Dev.

Manchester United 98.5769 64.07804134
Chelsea 110.3333 65.59348760
Arsenal 111.9583 63.99903954
Manchester City 127.9000 70.44894066
West Bromwich Albion 134.6786 63.68292183
Newcastle United 139.0179 74.94802860
Liverpool 139.2373 67.37557058
Tottenham Hotspur 144.8545 59.69067099
Blackpool 147.8364 65.99140546
Everton 161.2353 56.45666225
Aston Villa 170.8333 61.44454270
Bolton Wanderers 172.5962 73.02691820
Fulham 173.7959 65.40885368
Stoke City 178.9783 66.84436451
Wolverhampton Wanderers 182.8478 66.98477993
Blackburn Rovers 183.4565 64.24138325
West Ham United 183.8837 61.57649119
Sunderland 189.3111 66.84921414
Wigan Athletic 200.7500 65.84863113
Birmingham City 217.8919 53.02540996

While possession efficiency expectantly corrolates strongly to league position, I was concerned with the large standard deviations. Increasing a seasonal scoring efficiency will obviously help a team in the long run, but seeing this play out on a game-by-game basis seems unlikely.

Here is a list of teams sorted by their average number of possessions per game and the associated z-score. The results here are interesting.

The league-wide average number of possessions per game was: 213.4105 and the standard deviation was 22.1311

Team

Possessions Per Game

Z Score

Bolton Wanderers 236.1842 1.02903654
Sunderland 224.1842 0.48681315
Fulham 224.1053 0.48324589
Blackburn Rovers 222.0789 0.39168624
Wolverhampton Wanderers 221.3421 0.35839182
Stoke City 216.6579 0.14673445
Everton 216.3947 0.13484358
Liverpool 216.1842 0.12533089
Aston Villa 215.7895 0.10749460
Blackpool 213.9737 0.02544764
Birmingham City 212.1579 -0.05659932
Arsenal 212.1316 -0.05778841
Wigan Athletic 211.3158 -0.09465009
Tottenham Hotspur 209.6579 -0.16956253
West Ham United 208.0789 -0.24090771
Newcastle United 204.8684 -0.38597625
Manchester United 202.3421 -0.50012854
Manchester City 201.9474 -0.51796484
Chelsea 200.3421 -0.59049910

The outlier in Bolton Wanders with signifigantly the largest z-score is pretty interesting. If I saw a prototypical team such as Manchester United or Stoke in this situation, I wouldn’t be surprised. What’s up with Bolton?

Also, given the original premise that weaker teams want to decrease the rate of possessions (and conversely for strong teams), why do we see Manchester United, Manchester City and Chelsea occupying 3 of the lowest 4 positions in average possessions per game? Shouldn’t they be the teams that would benefit the most from increasing the rate of play?

The first thought that comes to mind is that stronger teams find themselves in winning situations more often, and therefore actually can benefit more from slowing the game down.

By looking at the total number of possessions that a team had in losing situations, divided by the number of goals scored from those losing situations, we can get a picture of scoring efficiency when we expect it is in the best interest of the team to push the rate of play. More importantly: how each team’s efficiency changes based on the game state.

Team

Losing

Total

Delta

Tottenham Hotspur 86.47 144.85 58.38
Aston Villa 121.71 170.83 49.12
Manchester United 73.25 98.57 25.33
Wigan Athletic 149.33 200.75 51.42
Everton 121.89 161.23 39.35
Fulham 137.76 173.79 36.03
Newcastle United 111.47 139.01 27.54
Wolverhampton Wanderers 150.16 182.84 32.69
Bolton Wanderers 143.83 172.59 28.76
Blackpool 123.64 147.83 24.20
Sunderland 160.92 189.31 28.39
West Ham United 192.64 183.88 -8.76
Manchester City 136.13 127.90 -8.23
West Bromwich Albion 146.00 134.67 -11.32
Arsenal 123.18 111.95 -11.22
Birmingham City 241.90 217.89 -24.01
Blackburn Rovers 204.00 183.45 -20.54
Stoke City 218.08 178.97 -39.11
Liverpool 180.22 139.23 -40.98
Chelsea 165.90 110.33 -55.57

While I was not surprised to see Manchester United’s efficiency improve from one goal every 98 possessions to a staggering one goal every 73 possessions when losing(a 34% change), I was incredibly surprised to find Chelsea and Liverpool bottom-dwelling. What would cause such an incredible disparity between top clubs?

This metric is a decent measurement of a team’s ability to chase a game. Apparently, this quality isn’t necesssarily required to finish in top league positions.

I wonder if I could find a corrolation between the number of fans leaving early and this derived metric!

A Case for Possession – How Goals Change Games

Possession statistics are notoriously misleading. Both Chimu Solutions and Soccer Statistically have found that MLS teams that possess the ball more than their opponent actually win less than 50% of the time. 5 Added Minutes found similar trends in the EPL, suggesting that winning teams only had more possession an unconvincing 50.1% of the time.

All of these posts are fantastic at pointing out the problems with the possession percentage metric and how misleading it can be from a 1,000 foot view. This is surely not a popular viewpoint at a time when the media loves to shove Barcelona’s possession statistics down our throat. Questioning long held beliefs is incredibly healthy for the future of soccer analytics.

However, I think everyone would agree that possession does mean something. It’s the quantifying this something that has proven to be difficult. By slowly crossing out things that this something could possibly be, we will eventually be left with what it has to be.

Let’s begin by thinking of each goal scored during the 2010-2011 EPL season as an individual game. This game’s length is the amount of time between each goal. For example, sticking with this blog’s Fulham theme, let’s look at Fulham’s 2-2 draw with Manchester United at the beginning of last season.

There were 4 “games” in this fixture: the 0 to 10th minute period before Manchester scored, the 10th to 54th minute before Fulham scored, the 54th to 84th minute before Manchester scored and the 84th to 89th minute before Fulham scored.

This is a possession breakdown of the 4 “games”. The team that won the possession battle during 3 of these 4 periods ended up scoring the eventual goal.

While not conclusive, it’s very clear that individual goals (not necessarily game results) are connected with possession statistics in some way. A simple 90 minute possession statistic of 57% to 43% clearly doesn’t tell the whole story of this 2-2 draw.

Click Image to Enlarge.

This is a time-series that shows the rolling average of possession percentage over the course of the game. This shows the ebbs and flows of the game with considerably more granularity than grouping by the 4 goal times. Understandably (and expectantly), this shows that goals seem to cause dramatic inflection points.

Also, the Manchester surge somewhere between the 30th minute and Fulham’s 54th minute goal helps explain why the 3rd column in the previous graph isn’t so heavily skewed in Fulham’s favor.

Now, the real question is how some of these trends fare on the larger season-wide scale.

I wrote a few scripts that calculate possession percentages for each previously defined sub “game” over the course of the season.

For all goals that resulted in a team gaining a lead (172 of them), the distribution of possession percentages of the scoring team is as follows:

While the 45% to 50% possession is the largest bucket, this distribution of goals is very clearly skewed to the right, suggesting that possession does indeed correlate positively to scoring lead-gaining goals. Also, 55.8% (96/172) of go-ahead goals were scored by teams that held over 50% possession in the time leading up to the goal. I think this is pretty significant.

By looking at goal distributions filtered by particular game states, we can begin to get a clearer look at possession statistics.

However, I am still very cautious of some of these findings. I believe that there remains plenty to be said about teams that employ approaches that are “more Stoke than Samba”.

In order to score, a team must significantly risk losing possession of the ball. In Barcelona’s example, when they are a playing against a much weaker opponent, worthy risks come up more often – therefore they are more likely to exchange possession for a scoring opportunity. In games that there aren’t as many opportunities, they retain the ball for longer periods of time.

In reality, teams do not to trade goal scoring opportunities for a larger share of possession.

Understanding Asymmetry – Fulham 2012


Not all 4-4-2′s are the same. Some have flat midfields and some have diamond midfields. Some have a pair of strikers and some have strikers deployed one on top of the other. Some have central midfielders who drop back into the defensive line, and some have central wingers. It should come as no surprise that some formations are also asymmetrical.

The modern game has evolved, yet our naming convention remains heavily rooted in the dark ages. The rise of the 4-2-3-1 is helpful because it recognizes that players are beginning to play “between the lines”, but there is still significant room for growth.

As a youth coach I have noticed something about the general manner in which we train our youth players. The first 14-15 years of their lives, we are training them how to “play their position”. Afterwards, we are burdened with trying to get them to play dynamically. It is easy to play against a “cookie-cutter” 4-4-2. It is much tougher to play against a 4-4-2 that is so tailored to a particular team’s strengths that it is hard to actually classify them as a 4-4-2. In this post, we are going to look at Fulham’s current 2011-2012 campaign; especially the interesting relationship between Clint Dempsey (#23) and John Arne Riise (#3) and the team’s general asymmetry.

This is my first post that introduces passing network graphs. By placing each player into their average position on the field, and drawing a line between each player that’s thickness is equal to the number of passes exchanged between the players, we get a very interesting look at how a team preformed during a particular game. In order to reduce the amount of noise in the visualization, there is a threshold of 4 passes for a line to be drawn. Also, the size of each player’s circle (node) is equal to the positional deviation of that player over the course of the game.

Case 1: Fulham vs. Everton (1-3) | Gameweek 9

For Everton’s graph for the same game, click here

The main thing that I want to draw attention to is John Arne Riise (#3) as the outside fullback and how much further up the field he plays than his right-back counterpart. Also, Riise and Clint Dempsey’s (#23) circles are larger than any other player – meaning that they tended to patrol the largest amount of area.

After looking at a lot of these graphs, it would seem to be that Fulham is playing a variation of a 4-3-3 in this match, with some very interesting quirks. Up the left flank, Riise is playing higher (or at least as high) as the three central midfielders. This allows for Dempsey to pinch inwards, acting as an inverted winger.

Since the two more traditional strikers, Zamora and Johnson are deployed more centrally and Riise and Dempsey harassing the left wing, it seems that Fulham is playing completely without a right winger – and it looks very intentional.

Also, it seems that Fulham tries to compensate for Riise’s forward deployment by playing Steve Sidwell (#4) as a holding midfielder.

Case 2: Fulham vs. Chelsea (1-1) | Gameweek 18

For Chelsea’s graph for the same game, click here

This is a very interesting side that Fulham fielded against a high-pressure Chelsea side. Notice Riise’s much more subdued deployment and the lack of a clear holding midfielder (Dembele, Murphy and Dempsey are certainly not holding midfielders).

Yet, it is still very clear that Fulham aims to attack up the left flank with the rare deployment of a classical left-winger in Kerim Frei (#21) instead of the hybrid system along the left that Dempsey usually shares with Riise. Clint finds himself still on his natural left side but in a considerably different role.

Case 3: Fulham vs. Bolton (2-0) | Gameweek 16

For Bolton’s graph for the same game, click here

I included this side because I felt that it was one of the more aesthetically pleasing graphs. It is perhaps not a coincidence that I feel similarly of how Fulham played on this particular outing.

Clint Dempsey: Average Position vs. Result


I have been experimenting with some different positional visualization ideas and this is hopefully the first of a handful of related posts. Once I stop being technically inept (and/or lazy), and figure out how to properly plug MySQL into Java/Processing, I can mass produce these for every player in the league. I picked Dempsey because he played very regularly for Fulham (35 starts) last season and was relatively integral to their success.

What you’re looking at is the average position of Clint Dempsey during the 37 EPL games that he appeared in for Fulham during the 2010-2011 season. Light green circles represent his position when Fulham won, the dark green circles represent when Fulham drew, and the red circles are when Fulham lost.

All circles are connected to the season’s average position via a line to show how different the position was from the “norm”. The concentric opaque circles represent one and two standard deviations from the average.

We can make a couple interesting observations from this visualization.  First is the obvious tenancy for Dempsey to drift forward during Fulham wins.

Astute readers will point out the Dempsey was deployed as both a Striker and an outside winger during the season. I recognize this, but it’s tough to discount that Fulham didn’t lose in the 10 games where Dempsey was deployed furthest up the pitch. I recognize that this correlation does not necessarily imply causation. A player shifting backwards could be caused by his team losing – not the cause for the team losing.

The other interesting observation is that the further away Clint’s average position is, the more likely Fulham is to win. For positions beyond one standard deviation, Fulham seems to be about three times as likely to win.

Further ideas for this kind of visualization is including some extra dimensionality. For example, if I weight the size of each circle based upon the positional standard deviation during that game, it would add some meaningful context to some of the outlying data points.

Clutch Goal-scoring in the English Premier League 2010

Extending the work done by Ford Bohrmann (Twitter: @SoccerStatistic) at SoccerStatistically for his Outcome Probability Calculator, I put together a method for weighting the relative importance of a particular goal.

Using Ford’s formulas, the percentile chance of victory can be calculated by the current score and the current minute. For example, a home team up by 1 goal in the 80th minute has a 90.5% chance of winning and a 8.5% chance of drawing. The away team only has a 1% chance of pulling out a victory.

However, if the away team manages to score a goal in the 80th minute, these statistics change dramatically. Suddenly, the home team has only a 17.5% chance of winning, a 70.7% chance of drawing, and a 11.8% chance of drawing. This goal increased the chances of the away team winning by 11.8%. The goal also increased the chance of a draw by 60.2%.

Now, we compare the total expected points before and after the goal by weighing the particular chances of each outcome. Since a victory is worth 3 points, and a draw is worth 1 point, we combine the product of the two outcome point-values and their chances of happening.

For example, before the goal, the away team is expected to walk off with: (0.085)(1 point) + (0.010)(3 point) = 0.115 points.

After the goal, the away team is expected to get: (0.707)(1 point) + (0.175)(3 points) = 1.232 points.

Therefore, the worth (or weight) of this goal is the difference between the two expected values: (1.232 points) – (0.115 points) = 1.117 points

After weighting each goal for its expected point value during the 2010 English Premier League Season, these are the average expected point value for each scored goal (or goal scored against)

Team Average Goal Value
Bolton Wanderers 0.9979
Tottenham Hotspur 0.9882
Wigan Athletic 0.9744
Birmingham City 0.9735
Aston Villa 0.9727
West Bromwich Albion 0.9723
Everton 0.9161
Fulham 0.8796
Manchester United 0.8741
Sunderland 0.8544
Liverpool 0.8479
Manchester City 0.8322
Wolverhampton Wanderers 0.8260
Blackpool 0.8060
Blackburn Rovers 0.7824
Arsenal 0.7779
Chelsea 0.7479
West Ham United 0.7233
Stoke City 0.7212
Newcastle United 0.6658

Bolton, it seems, is the most clutch goal-scoring team during the 2010 EPL season – closely followed by Tottenham. Newcastle’s goals had the lowest average impact on the game.

Average expected value for goal scored against:

Team Average Goal Against Value
Everton 1.0433
Stoke City 0.9963
Bolton Wanderers 0.9610
Liverpool 0.9493
Fulham 0.9101
Blackpool 0.8864
Newcastle United 0.8785
Manchester United 0.8774
Chelsea 0.8515
Aston Villa 0.8486
Wolverhampton Wanderers 0.8457
Manchester City 0.8357
Birmingham City 0.8341
Blackburn Rovers 0.8151
Tottenham Hotspur 0.8085
Sunderland 0.7974
West Ham United 0.7802
Arsenal 0.7755
West Bromwich Albion 0.7416
Wigan Athletic 0.7062

In 2010, Everton gave up the most clutch goals – closely followed by Bolton. On the other hand, Wigan and West Brom were the most stingy – giving up the least amount of value for each goal conceded.

Posted in EPL

Drafting for Value in the MLS SuperDraft

There are approximately 530 currently active players in the MLS. Of which, about 200 of them initially entered the league via the MLS SuperDraft.

Using guaranteed compensation, draft selection number and year drafted – a second degree polynomial regression provides a formula that effectively predicts the expected compensation that a player will be paid based upon the number they were selected from the MLS SuperDraft and how many years it has been since they entered the MLS. This is a significant gain over a standard linear regression which results only in a 29% coefficient of determination. This polynomial (non-linear) regression provides an improved 39% coefficient of determination.

The base salary for a player who entered the MLS via the SuperDraft, according to this statistical model, is $158,962. Depending on the player’s selection number in the draft and how many years the player has been in the league, this expected compensation value fluctuates either up or down. For each pick that the player remained undrafted, they lose $6,627.34 off their base salary but pickup $88.43 multiplied by their pick number squared. In other words, a pick’s expected value decreases after each selection, but the size of the decrease lessens exponentially as the pick number grows.

For example, the salary for a rookie player selected with the third pick will have the expected initial salary of:

$139,876.06 = $158,962.21 – $6,627.34*(3) + $88.43*(3^2)

As the player ages, his salary is expected to increase $1,014.40 per year squared, lose $1,552 per year, and gain $106.18 per year multiplied by the player’s initial draft pick number.

For example, after this player has been in the league for two years, his expected salary grows to:

$141,466.24 = $139,876.06 + $1,014.40*(2^2) – $1,552.70*(2) +$106.18*(2)(3)

Using these same formulas, we can develop a table of relative draft pick values, as well as their expected value after multiple years.

Full table is available at: http://dl.dropbox.com/u/380945/mlsSuperdraft.xls

This chart shows that the value of top picks, while initially high, tend not to increase as dramatically as lower draft picks. For example, the compensation of a player selected with a number three pick is expected to rise only $11,293.76 after four years of being in the league. On the other hand, the 38th draft pick’s compensation is expected to rise $26,158.96 over the same period.

According to the chart, exchanging a number three draft pick for any other two draft picks in the first round (given 18 selection picks per round) would be an upgrade. If this hypothetical team was to exchange their number three draft pick for the 17th and 18th draft picks, the expected salary of the two players is expected to be slightly more than the 3rd pick alone. However, their combined value is expected to increase in value by $34,904.40 over four years. In comparison, the 3rd pick in the same draft would have been expected to increase in value only $11,293.76.

Because of the MLS’s single entity structure, maximizing the cultivation of player value increase is perhaps even more important than maximizing the total value of the team. The player market in the MLS is very similar to playing the stock market, but only worrying about stock value fluctuations – not current stock value. According to this model, it may be in a team’s best interest to invest in “penny stocks”. Essentially, what this chart is suggesting is that it is much harder for a good player to double their value than a lesser rated player.

However, there are certainly statistically relevant ramifications of taking this “penny stock” approach. A player’s fluctuation in value certainly correlates very strongly to the amount of minutes that they play during a season. Also, it is much easier for a team to provide one top draft pick playing minutes, than to provide two lower draft picks with a significant share of time. This methodology doesn’t work by letting these investments ride the bench all season.

Also, there are clear salary cap and roster size-limit complications with taking this approach. With a top draft pick you can expect their salary to remain relatively static. With a lower draft pick (who manages to remain rostered), their value is expected to increase by about $10,000 in the first two years. For teams already pushing the salary cap, lower pick investments may not be the best avenue of growth. For teams with confidence in their ability to maximize a young player’s potential and have salary cap room to spare for long-term investments, this avenue is most certainly worth exploring.

Now, by calculating the expected compensation for every drafted player in the league, we quickly learn which players were good draft picks versus players that were not good draft picks. We will classify every player that has a lower actual compensation total than the expected compensation total as a bad pick. Conversely, we will classify every player that has a higher actual compensation total than the expected compensation total as a good pick. Notice, this classification does not imply that a particular player is a good (or bad) investment at this current point in his career.

Using this methodology (determining the difference between the player’s current salary and their algorithmically calculated expected salary), the ten best MLS SuperDraft picks (that are still currently active in the MLS) of all time are:

Year Pick Name Current Salary Expected Salary Difference
2004 1 *Freddy Adu $594,884 $192,003 $402,881
2001 16 Brian Ching $412,500 $178,464 $234,036
2008 42 Geoff Cameron $245,000 $54,454 $190,546
2004 2 Chad Marshall $320,000 $186,384 $133,616
2006 1 Marvell Wynne $301,667 $170,550 $131,117
2010 8 Dilly Duka $223,000 $111,914 $111,086
2002 50 Davy Arnaud $258,750 $164,643 $94,197
2005 35 Gonzalo Segares $167,750 $84,832 $82,918
2009 41 Danny Cruz $123,000 $45,551 $77,449
2004 28 Khari Stephenson $178,333 $102,373 $75,960

*Freddy Adu is a special case because he has spent a lot of time outside of the MLS before returning. He was also on loan as a designated played and therefore only $415,000 of the player’s salary counted against the salary cap. Even at the league maximum, he is the best draft pick of all time with a positive differential of over $300,000.

By breaking down players based upon position, we can begin to determine what positions tend to do better than others in the draft.

Striker Midfielder Defender Goalkeeper
Average Value Change $5,805 $9,523 -$13,023 $5,307
Standard Deviation of Value Change $91,058 $49,612 $42,919 $40,609
According to these results, you are expected to make, on average, 40% more money drafting a Midfielder than a Striker. With the standard deviation of the Striker difference being more than twice the amount as any other position, it suggests that Strikers are risky picks, but have a greater potential for large payoff.The model we have constructed clearly suggests that a team’s total salary fluctuation, year to year, is much more heavily related to players that were selected late in the draft in comparison to players that were selected early. Because of this result, it is clear that the careful selection and development of late-round picks is more related to a team’s financial growth than early-round picks.

It is important to remember that these conclusions are merely a guideline for drafting with potential value in mind. With such a small sample data size of only a decade of MLS SuperDraft results, it remains difficult to consider this guideline complete. As with any guideline, there will always be exceptions to these rules. Hopefully, with this mathematical model, franchises can better understand the risks that they are taking.