HEMA Ratings Beta Released

by Meg Floyd


Petter Brodin and Markus Koivisto have finally released a much-anticipated beta of their HEMA Rankings system, which ranks fighters globally according to the submitted statistics of several events, dating all the way back to Swordfish 2011.

Fighters are ranked by weapon and tournament. Currently the system has data for the following weapons: steel longsword (open and women’s), rapier and dagger, saber, sword and buckler, and sidesword. If some of the ratings seem a bit off for American fencers, keep in mind the data for Longpoint 2016 and Longpoint South are missing, which will likely bump everyone around some.

Fighters are ranked using a number generated by a Glicko-2 algorithm, a math algorithm for ranking players’ strengths in games of skill, which you can read about in detail here. It’s also used notably for chess rankings and online game servers.

How does it work? The About page says, “The key assumptions here are at work are the following:

The performance of each player in each match is a normally distributed random variable. Although a player might perform significantly better or worse from one game to the next, we assume that the mean value of the performances of any given player changes only slowly over time.

Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.

Suppose two players, both rated 1700, played a tournament game with the first player defeating the second. Suppose that the first player had just returned to tournament play after many years, while the second player plays every weekend. In this situation, the first player’s rating of 1700 is not a very reliable measure of his strength, while the second player’s rating of 1700 is much more trustworthy. Our intuition tells us that that

– (1) the first player’s rating should increase by a large amount (more than 16 points) because his rating of 1700 is not believable in the first place, and that defeating a player with a fairly precise rating of 1700 is reasonable evidence that his strength is probably much higher than 1700, and

– (2) the second player’s rating should decrease by a small amount because his rating is already precisely measured to be near 1700, and that he loses to a player whose rating cannot be trusted, so that very little information about his own playing strength has been learned.”

Brodin said in a recent Facebook post that there’s plans to add search functionality, as well as profile pages for each fighter, etc.

If you’re a tournament organizer and would like to submit your event, please use the Contact Page. For a full list of events used in the data set, see the Events Page.


