Saturday, January 3, 2015

Explaining the Rating System (II)

Other sports have advantages over MMA as far as rating systems go. They may have hundreds of games a season with only a dozen or two teams, and they may have a balanced schedule so that a team's record is a fairly good indication of their skill. A team may have many players contributing to a their success, so that their skill level stays relatively constant over time even when they lose or gain players by injuries or trades. Many games are determined by the accumulation of points, so "worse" teams may win a few battles but ultimately lose the war.

Contrast that with MMA, where everyone has a puncher's chance, where "styles make fights", and the extreme case where "worse" fighters can lay-and-pray or wall-and-stall to a decision win. Skills can vary greatly as fighters learn new techniques or lose their chins. And there's thousands upon thousands of fighters, with only a fraction having more than a dozen or two fights their whole career.

As motivation, let me talk about the simple case where two fighters fight each another a bunch of times. Suppose one of them wins s times out of n fights. What's the best guess at the probability that he wins the next fight? You might want to say it's just his win percentage s/n, but this would mean for example that a fighter who was 1-0 would have a 100% chance of winning the next fight. That's not very reasonable.

Laplace's rule of succession says that the best guess is (s+1)/(n+2). In the 1-0 case this is (1+1)/(1+2) = 2/3 or 67%. In general, this means that we add two fights to a fighter's record, one win and one loss. So if a fighter wins 100 fights in a row, his chance of winning the 101st is 101/102 = 99.02%.

CCR uses the Bradley-Terry framework, which says that the likelihood that fighter i beats fighter j is Prob(i beats j) = p_i/(p_i + p_j), where p_i and p_j are the fighters' strengths. Often in this setup the strengths are converted to ratings by setting p_i = c^r_i where c is a constant (for example, c = 10^(1/400) in the ELO rating system). This way the probability depends on the difference of the two ratings rather than the ratio of the two strengths, and differences are often easier to work with and understand intuitively.

CCR tries to find the most likely set of ratings given the fight outcomes. Bayes' theorem says that Prob(ratings | results) = Prob(results | ratings) * Prob(ratings) / Prob(results). So maximizing the left-hand side (the probability of the ratings given the fight results) is the same as maximizing the right-hand side. Prob(results) is constant since the results are fixed. Prob(results | ratings) = probability of the fight results given the ratings, is just the product of Prob(i beats j) over all fights where fighter i beat fighter j. And Prob(ratings), called the "prior", is where the assumption about the extra win and loss comes in. That is, Prob(ratings) is equal to the product of Prob(i beats 0)*Prob(0 beats i) over all fighters i, where fighter 0 is the "average fighter" (or "dummy") with fixed rating r_0 (usually = 0).

Solving for the maximum produces a decent set of ratings. (** I'm leaving out one additional detail which I'll discuss in a few paragraphs.) But there were some anomalies. There were some...


...can crushers. Not that there's anything wrong with that... Some undefeated guys seemed to be rated a little too high, or maybe some veterans seemed to be rated a little too low. Not a huge deal, but enough to make me take another look at the prior.

If MMA has an advantage over other sports from a rating perspective, it's in the match-making. Generally fights are between fighters of comparable skill, meaning that just knowing the match-ups (and not knowing the results) tells us something about how the ratings should cluster. The prior used above corresponds to a win and loss to an average fighter, but the rule of succession was meant for an average opponent. Thus to account for match-making, CCR divides each fighter's extra win and loss equally among their opponents, including the two dummies.

(Dividing it only among their real opponents can lead to unreasonable ratings. For example, suppose a person had only one fight that was a loss to Jon Jones. Jones currently has a rating of 102 and the "average fighter" has a rating of 32. Let's take those to be fixed values for the moment. The original prior corresponded to a win and loss to the dummy, meaning a total record of 0-1 against Jones and 1-1 against the dummy. This translates to a rating of 31.98 -- he doesn't drop much from the Jones loss because he was expected to lose badly. The prior corresponding to a win and loss divided equally among his real opponents means a record of 1-2 against Jones. This translates to a rating of 95, which is good enough to be #6 on the pound-for-pound list! Finally, the prior with the extra win and loss divided equally among his three opponents (Jones and the two dummies) means a record of 0.333-1.333 against Jones and 0.667-0.667 against the dummy. This translates to a rating of 43, which seems to be a reasonable compromise between the other two numbers. (Anthony Pina's actual rating is 43.))

** The additional detail: CCR also accounts for a fighter's skill changing through time. Instead of calculating one rating, it calculates a fighter's rating at the time of each fight. Those ratings are connected by a Wiener process (no joke), which depends on a parameter w. If w is very small, then the ratings are nearly constant, meaning that it treats a fighter's skill as constant over their whole career. If w is very large, then the ratings are nearly disconnected from each other, meaning that a fighter's rating is determined by only their last fight. So it's about finding the right balance between the two. Ultimately there's a range of reasonable values, but I settled on w = 6 points/sqrt(year). (This system is called Whole-History Rating, and you can read the paper at that link for more details including the algorithm that's used to determine the ratings.)

A few other notes: CCR ignores no contests, disqualifications and fights against unknown opponents, but it includes amateur and TUF fights (since those give some indication of a fighter's skill). It counts a draw as 1/2 win and 1/2 loss and a split decision as 2/3 win and 1/3 loss. It scales the natural ratings by a factor of ten and rounds them, because rating differences smaller than that are negligible. (A one point rating difference translates to a win probability of 52.5%-47.5% or moneyline of ±110.)

Finally, the growth of MMA has lead to ratings inflation. To adjust for this, CCR sets the rating of the #100 active fighter to 75. (A fighter's inactive if they haven't fought in the past 18 months.) The idea is that we can compare fighters of different eras by looking at how much better they were than their contemporaries. The next post has some graphs that should illustrate this.

The nice thing about CCR compared to Fight Matrix or ScoreCardMMA is that it doesn't depend on arbitrary formulas or parameters. Every dynamic rating system will have a parameter (like w) that controls how quickly the ratings change. But beyond that, there's nothing to tweak with CCR. The ratings are what they are, like it or not.

No comments:

Post a Comment