So what do we have this time?
Well to be honest, perhaps most will want to skip the theory and just use the formulae – they are towards the end.
But if you want to use compile your own ratings then the theory is worth wading through so that later you have a better idea of what ratings can do, and what they can’t. After all, anyone can come up with ratings – but only very few use mathematically sound ones and know what they are up to!
Fair enough - so last time out you gave a bit of theory showing how a Least-Squares (LS) solution could be worked out to give a Mathematically sound ratings system.
Yes – and at the moment we are using a very basic metric, and the purest form of Least-Squares solution. However, I and others happen to think that this form of LS solution is too oversensitive to use as a predictive model and there are no end of modifications that have been tried here though.
Oversensitive?
Well take for example a ‘Hammering’ - like Team A beats Team B 5-1. Theses tend to cause quite a large ripple as they filter through the system. As a consequence, some have tried changing the metric, truncating large score differences.
There are also quite complicated ways of weighting the importance of games so that consistent teams and results have more importance to the system. This way 'freak' results have less impact. I think that all these changes, though perhaps mathematically sound, are all forms of fine tuning - and that it's more pragmatic to just look for a better metric in the first place.
I will add though, that for sports where fixture scheduling is not uniform, and where players/teams may play varying amounts of games, (Champions League, Tennis, Boxing etc) then this pure form of LS solution is probably a good way to stick with - or modify as outlined above.
Well let’s stick with League Football for now. There’s plenty of matches and the freak results will tend to dissipate if we look at a large sample of games.
OK – in this case we can make some simplifications to the rating system which actually make the ratings momre stable and reliable.
To recap, we are using a ratings system where,
Each teams rating = Average of all previous opponents ratings + Average performance against each opponent (as given by the metric)
Now, let us consider two teams, Team A, and Team B, that have just played a game against each other. We will assume that it is some time after the season (rating period) has already started.
We know the result of the game, and now want to update their ratings.
There are several variables here, but only two unknowns, (the new ratings for A and B). So we are going to set up the ('mother of all') system of simultaneous equations to solve...
Let's use:
RatA for the new updated rating for Team A
NumA for the number of games that Team A has already played in the rating period
OldA for the rating of Team A prior to the update.
(similarly for Team B)
Lastly, p is the performance result of the game using the metric between A and B. (It will be +p for team A, and -p for team B)
So, to obtain the new rating for A, we use;
RatA = ( 1 / NumA ) * ( ( NumA - 1 ) * OldA + RatB + p )
and
RatB = ( 1 / NumB ) * ( ( NumB - 1 ) * OldB + RatA - p )
If you solve these simultaneously to find RatA and RatB, you will get (after some considerable amount of tidying up!)
RatA = OldA + ( numB - 1 ) / (( numA * numB) - 1 ) * ( p - ( OldA -OldB ))
and similarly for team B
RatB = Old B + ( numA - 1 ) / (( numA * numB) - 1 ) * (- p - ( OldB - OldA ))
So these equations are a simplified version of the LS solution!
Each teams rating is only updated after each game that they play. This is a big difference from the previous LS solution, where all teams ratings are affected when a new result is fed into the system. One consequence of this, is that the system is desensitized - and (personal) practice shows this simplified system is better as a prediction tool when used in this way.
Now we can make a further simplification here when dealing with league based systems - or where it is usual that teams have played roughly the same amounts of games as others (if doesn't really matter if there are a few teams out of synch with the rest - trends...- though if you are studying 'form' with just a few games by each team rated it might be best to stay with above formulae...)
The assumption is that teams have played about the same number of games, so NumA = NumB. That simplifies the above equations to,
RatA = OldA + ( 1 / ( NumA + 1 )) * ( p - ( OldA - OldB ))
with a similar result for team B
RatB = OldB + ( 1 / ( NumB + 1 )) * ( -p - ( OldB - OldA ))
This last set of equations are really easy to use. If you have a record of the previous ratings of the teams, the number of games played, and the new result - you just plug into the formula, and the new rating pops out.
In practice, I use the second set of formulae for my ratings projects - and only use the first formula when the number of games is small (ie start of season).
There is one big problem though before rushing off and using these...
These formulas were derived assuming that you know the previous, old rating, of each team. Of course, when you start your system, what ratings do you give...
There are two ways of tackling this.
If you have a large databank of results, you can just start each rating at a nominal value (anything, say 10?, as it's only the differences in ratings that are important, not the absolute values), and roll through the data. There is an interesting mathematical result that proves that for all rating systems of these types, the ratings will tend towards the true values, given enough time...so how much is enough time? You can do a lot of complex maths here - but I've found that a couple of seasons, or even about 50 games for each team, are more than enough for the system to settle down.
To speed up the settling down, you can even start the ratings off 'where you think they should be...' This is perhaps necessary when you have very little back data. A more mathematically based process for this is to iterate over a small number of games - but I won't go into this unless someone asks...
Better, IMO, is to have plenty of data, and let the system settle down itself...
You say this is simplified – but I’m a bit lost. Any chance of an example?
Say, Team A, with a rating of 10, beats Team B, with a rating of 8.
Let's say the score was 3 - 0, a little more than we might have predicted.
Let's also say that this was the tenth game of the season for ech team...
so, using the last formula above, an approximated LS solution, gives
New Rating for Team A = 10 + (1 / 11) * ( 3 - (10 - 8))
New Rating for Team A = 10.09
and for Team B,
New Rating for Team B = 8 + (1 / 11) * ( -3 - (8-10))
New Rating for Team B = 7.91
All we have done here is find an easier way of using a LS solution, that incorporates all games played since the start, each game of equal value.
The answers from repeated use of the formulae from start to finish, will give pretty much the same as a pure LS solution used on all the games.
There are some differences though, and when using as a predictive tool (later) the fact that the system is desensitized somewhat is a significant factor IMO.
Excellent – is that it then?
Well, believe it or not, I actually use yet another modification to the above formula!! It’s a very simple change - so that the formula can be used soundly over a specific number of games for each team - rather than use all the games in the database right from the start for working out each rating, but I'll leave that until next time.