
A Million Simulated Seasons - Numberphile
video description
I did a similar simulation on handball once to find out if the sudden long strings of goals one way (where teams would seemingly be very evenly matched and follow each other closely, then suddenly one team would pull ahead by 5, 6 or even 7 goals in a row) are statstically expected. I found that they weren't. The teams I tested were expected to have up to 4 goals in a row based on keeper save percentages and goals scored per attack, but never up to 7 or 8 as we can sometimes observe. There are clearly -human- factors entering into it - my guess was the teams who fail -statistically- in their attacks (or defense) a few times in a row will have a few players start to lose faith in their strategy and begin riskier and less optimal play out of desperation, until a concious decision is made to bring people back in line.
Date: 2022-04-09
Related videos
Comments and reviews: 9
rellimkram
While it would take a lot longer than the hour used for this simulation, I'd reckon using EA's FIFA games to simulate Premier League seasons would be quite accurate. I don't know of anyone using the FIFA games for such things, but considering the track record of EA's Madden games for predicting the Super Bowl, I'd wager it'd be more accurate than what was used for this video.
As for why I'd assume it would be more accurate, EA's sport games take massive amounts of information, such as (but not limited to) each player's performance and habits as well as each team's general tactics, rather than just the final league table from a season.
reply
While it would take a lot longer than the hour used for this simulation, I'd reckon using EA's FIFA games to simulate Premier League seasons would be quite accurate. I don't know of anyone using the FIFA games for such things, but considering the track record of EA's Madden games for predicting the Super Bowl, I'd wager it'd be more accurate than what was used for this video.
As for why I'd assume it would be more accurate, EA's sport games take massive amounts of information, such as (but not limited to) each player's performance and habits as well as each team's general tactics, rather than just the final league table from a season.
reply
Bo
I think you could improve the model with two simple things: 1. have a multivariate poisson distribution scoring as obviously, the number of goals that are scored by two teams playing against each other are not independent. If one team scores, the other team will go on playing more offensive to even the balance, because they have no points to lose anymore.
2. Replace the (multivariate) poisson distribution by the actual distribution seen in the historic data of outcomes in games played, smooth this out a little bit, especially in the tail (or use the 5% tail of the poisson distribution 5% of the time). It will solve the problem of the 0-0's.
reply
I think you could improve the model with two simple things: 1. have a multivariate poisson distribution scoring as obviously, the number of goals that are scored by two teams playing against each other are not independent. If one team scores, the other team will go on playing more offensive to even the balance, because they have no points to lose anymore.
2. Replace the (multivariate) poisson distribution by the actual distribution seen in the historic data of outcomes in games played, smooth this out a little bit, especially in the tail (or use the 5% tail of the poisson distribution 5% of the time). It will solve the problem of the 0-0's.
reply
variousthings
In the video it's stated that nil-nil draws are not reflected very well in this simulation.
Perhaps a way to get the results more realistic would be to split each simulated game into two halves. Then you could incorporate the historical data on how the situation at half time affects the final result:
e. g. If it's 0-0 at half time, is it more likely that it will stay that way to the end, or that one team will pull ahead and win?
Or if one team is ahead by one goal at half time, is it more likely that they'll continue to score more, or that the losing team will equalise and it will end up a draw?
reply
In the video it's stated that nil-nil draws are not reflected very well in this simulation.
Perhaps a way to get the results more realistic would be to split each simulated game into two halves. Then you could incorporate the historical data on how the situation at half time affects the final result:
e. g. If it's 0-0 at half time, is it more likely that it will stay that way to the end, or that one team will pull ahead and win?
Or if one team is ahead by one goal at half time, is it more likely that they'll continue to score more, or that the losing team will equalise and it will end up a draw?
reply
education
I think you need to take your analysis a step farther, and realize that you can AT LEAST come up with some estimated distribution for achieving each position. so what you would be asking are things like -what is the 95% confidence of getting relegated/winning the season? - Also, when you talk about Manchester City being anomalously high, that would generally mean that the rest of the field is anomalously low, assuming a relatively consistent % of draws per season. The fact that the pool of points varies depending on game outcomes also makes this a very difficult analysis.
reply
I think you need to take your analysis a step farther, and realize that you can AT LEAST come up with some estimated distribution for achieving each position. so what you would be asking are things like -what is the 95% confidence of getting relegated/winning the season? - Also, when you talk about Manchester City being anomalously high, that would generally mean that the rest of the field is anomalously low, assuming a relatively consistent % of draws per season. The fact that the pool of points varies depending on game outcomes also makes this a very difficult analysis.
reply
NetAndyCz
Basing your data on just one season seems weird, it is bound to lead to all sorts of anomalies I think using at least the last 10 seasons would be better. Maybe give more weight to more recent seasons, but the data seems to show numbers for how the last season may have played out, the average numbers required to drop or to win should be based on data from all the seasons, or at least the last decade. Otherwise the model is biased to sort of come with the numbers that happened the last season.
reply
Basing your data on just one season seems weird, it is bound to lead to all sorts of anomalies I think using at least the last 10 seasons would be better. Maybe give more weight to more recent seasons, but the data seems to show numbers for how the last season may have played out, the average numbers required to drop or to win should be based on data from all the seasons, or at least the last decade. Otherwise the model is biased to sort of come with the numbers that happened the last season.
reply
Uncle
Simulations are great, but this was totally skewed because you only used 1 season worth of stats. If you type in Manchester is expected to score 5 goals in a game and Southampton is expected to score 0, why be surprised when Manchester keeps winning and scores 114 points? Why not just analyze real life data or make your simulations more balanced. In Madden, when you simulate just 20 years, your players retire and new ones are drafted, changing the scope of the league.
reply
Simulations are great, but this was totally skewed because you only used 1 season worth of stats. If you type in Manchester is expected to score 5 goals in a game and Southampton is expected to score 0, why be surprised when Manchester keeps winning and scores 114 points? Why not just analyze real life data or make your simulations more balanced. In Madden, when you simulate just 20 years, your players retire and new ones are drafted, changing the scope of the league.
reply
Attimiss
Ever thought about running information in a loop through two hard drives that communicate through one processor. Information is gathered in one hard drive, this hard drive then sends that data to the other hard drive through the processor where this hard drive sends the data back to the hard drive that sent the data, through the same processor. This flow of data creates a loop that flows with very little interruptions, creating a flowing loop of information.
reply
Ever thought about running information in a loop through two hard drives that communicate through one processor. Information is gathered in one hard drive, this hard drive then sends that data to the other hard drive through the processor where this hard drive sends the data back to the hard drive that sent the data, through the same processor. This flow of data creates a loop that flows with very little interruptions, creating a flowing loop of information.
reply
exceltraining
purely and solely on numbers and stats, that's quite interesting. i'd be interested to see if you were able to come up with streaks. unbeaten runs, winning runs, losing runs. etc etc. would they only be 38 games or could some of them be 1/2 million games. however, over a million seasons, only 6 teams win the league? I know it's stats, but that will never be correct - something else needs to be factored in to that single seson you're basing it on.
reply
purely and solely on numbers and stats, that's quite interesting. i'd be interested to see if you were able to come up with streaks. unbeaten runs, winning runs, losing runs. etc etc. would they only be 38 games or could some of them be 1/2 million games. however, over a million seasons, only 6 teams win the league? I know it's stats, but that will never be correct - something else needs to be factored in to that single seson you're basing it on.
reply
David
For the record, -soccer- is short for -association football-, which was one of several different types of football, such as rubgy football, that existed around 1863. So one could argue that -soccer- is the more specific, correct name for the sport, whereas -football- is more generic and could refer to multiple sports that use that name.
Why yes, I am from America. Why do you ask? :P
reply
For the record, -soccer- is short for -association football-, which was one of several different types of football, such as rubgy football, that existed around 1863. So one could argue that -soccer- is the more specific, correct name for the sport, whereas -football- is more generic and could refer to multiple sports that use that name.
Why yes, I am from America. Why do you ask? :P
reply
Add a review, comment
Other channel videos















