Developing a Rating System, Part 3
Right – I think I’m ready to construct my own ratings system now. Show me a sound Mathematical way to go about this.
Fine. Let’s take an example and try to keep it as simple as possible. We’ll look at football and use this very straightforward metric – that the ratings describe by how many goals one team is better than another.
Suppose we have three football teams, and two results so far.
Let’s say that Arsenal beat Birmingham 2-0, and that later Birmingham beat Charlton 3-0.
Well, I think there is an easy way to rate those results. Charlton seem the worst so let me give them a rating of 0. I’d then take the direct comparison between Birmingham and Charlton to give Birmingham 3 better than Charlton. Arsenal are 2 better than Birmingham. So I get,
Arsenal 5
Birmingham 3
Charlton 0
OK. Now, we could perhaps use our ratings to make a prediction that if Arsenal were to play Charlton then they would beat Charlton by five goals.
That sounds reasonable. After all the metric we are using is that the ratings describe by how many goals one team is better than another.
Now if Arsenal did subsequently beat Charlton by five goals we could just keep this rating list, and perhaps use it again for the next time that these teams meet. Of course, and this is why we like football and sport in general, it’s unlikey that this will happen. What if Charlton actually beat Arsenal 1-0. How would you rate the teams now?
Hmmm. Now that’s a serious problem. OK, I give up…Show me the Maths!
It's just a little algebra, and some matrix work, which can be done very easily on a spreadsheet even if you have forgotten or never really knew what matrices are....but you can still apply the method later even by skipping this bit.
Using Ra, Rb and Rc to be the ratings of each of the three teams, and a metric that measures performance in terms of the difference in goals scored between the teams in the match, we can form the following equations...
Team A beat Team B by 2-0
Ra - Rb = 2
Team B beat Team C by 3-0
Rb - Rc = 3
Team C beat Team A by 1-0
-Ra + Rc = 1
It's also necessary to fix the scale, have a base rating. Of the many ways, I prefer this one - putting the rating of the last team on the list to zero. It’s what we did earlier as well.
So there is this fourth equation,
Rc = 0
So, there are 4 equations, and 3 unknown ratings here...this is an over-determined system of equations - and as such there is generally no exact solution. In other words, you won't be able to give a rating to each team here that exactly models the events as described. This means that no rating list will ever be a 100% accurate descriptor of the events that have happened - it can't be done (full stop!) Anyhow, we can still find a best solution to this system of equations, one that minimises the errors.
The preferred tool of mathematicians in this situation is to use a least squares approximation,which is a little bit like drawing a line of best fit through a set of points that you suspect should lie on a straight line but don't.
Ah, least squares. I’ve heard of this.
Well me too! I first read about applying this method to sports prediction about twenty years ago. It was written by a Mathematician called Stefani and it’s his ideas that started me off down this road…
Right, where are all these matrices then…I’ve got my old maths text book handy…
Here goes…
So, to start off write the first three equation in matrix form.
| 1 -1 0| |Ra| |2|
| 0 1 -1| |Rb| = |3|
|-1 0 1 | |Rc| |1|
but, from the fourth equation, there is Rc = 0, that gives
| 1 -1 0| |Ra| |2|
| 0 1 -1| |Rb| = |3|
|-1 0 1 | |0 | |1|
which effectively eliminates the last column of the first matrix, and the last row of the second matrix (that's matrix multiplication for you!!)
| 1 -1 | |Ra| |2|
| 0 1 | |Rb| = |3|
|-1 0 | |1|
Now a bit of matrix algebra!!
It will be easier to use A for the first matrix, r for the second matrix, a column matrix holding the ratings of Team A and Team B, and g for the third matrix, a column matrix holding the goals difference in each match played.
So, we now have the much easier to write,
A.r = g
To find the least-squares solution for this, first pre-multiply each side by the transpose matrix of A, I'll call that (At)
At.A.r = At.g
Now pre-multiply each side by the inverse matrix of At.A - let me call that
inv(At.A)
inv(At.A).At.A.r = inv(At.A).At.g
giving
r = inv(At.A).At.g
So there it is!!! To get the ratings, just do that series of matrix calculations.
Any spreadsheet can be used to crank that out. Put the matrix A into some cells, find A transpose using the functions in the spreadsheet, then working from left to right in the final equation above, multiply the inverse of (matrix A transpose times matrix A) times matrix A transpose times matrix g.
I get this
Ra = 1
Rb = 1
and don't forget we put Rc = 0
Yes – but I’m glad we can get the computer to do all this! Is that it then?
In principle, yes. But of course there are loads of refinements to add - and I think we should leave those for next time...
Read previous installments:
Part One| Part Two