Statistical Aggregation of WFTDA Sanctioned Roller Derby

WFTDA Rank Comparisons

July 23rd, 2008

The new WFTDA ranks are now up.  Available on the WFTDA History page to see how these new ranks compare to last quarter.  And available on the WFTDA Compare page to see how the new ranks compare with the FTS ranks.

Bugfix on the WFTDA Compare page where Low Participation teams were throwing off the difference arrows for a couple teams (thanks Southbay for the find).

Scrapping Divisions and Active Status

July 14th, 2008

We’re no longer tracking a team’s Division or Active/Inactive status.  They were purely informational aside from the ability to filter out inactive teams on a few pages.  We never had an up to date, reliable source on that info - and while we were getting lots of updates from teams on their current status (thanks!), it’s hardly a meaningful metric if just as many teams are not keeping us up to date.

Furthermore I’m totally confused on what Divisions even mean to WFTDA anymore, or if they’ve tossed the thing themselves.  So meh, we’re letting it go.

Low Participators As Opponents

July 2nd, 2008

When we introduced the low participation rankings, we didn’t address what happens when you play against one of these teams.  We’ve never actually penalized the team’s ratings or what it means to play them.  We just separate them to indicate that comparing them to other teams should be done cautiously as they’ve only played one or two bouts.  Since an opponent’s skill is determined by their rating, this could potentially be a problem for the opponent of one of these teams.  (i.e. a team with a winning record (and only two bouts) may be placed higher than future games will correct for.  Unchecked, another team’s win against this team would give more points than otherwise, and so on with the effect rippling outward.)

The goal is to keep dramatic results in check for those that play a low participator.  But not so much so that it’s not worth playing one.

We now rate newbies with 2 or less games at the lowest rating for that quarter.  We don’t change their actual rating however, we only penalize what it means to play one of these teams.  It doesn’t seem necessary to penalize their visible rating since we segregate them into a Low Participation section.  If you believe the two games played were indicative of their games to come, you can more or less interpret that rating at face value.  Otherwise you’re just waiting for more games to play out, and a more accurate reading to surface.

Another question to address: Should veteren low participators be docked?  These aren’t newbies entering from the bottom of the ranks - they fell from above because they aren’t keeping up the number of games they play in a year.  We decided no, they won’t be docked.  Since we never change the calculated rating when a team falls into Low Participation, they’ll enter with their historically calculated rating.  The potential penalty of only two games on record is hopefully enough.  We don’t think you can penalize the veterens in the same way as newbies, because these are likely to be good or at least experienced teams, and if they aren’t represented as such in the system, outcomes will be treated as unexpected when they are not.

Bout Decay

June 27th, 2008

The system now only considers bouts 365 days before today’s date.  We experimented with a smooth falloff, where bouts were counted fully for maybe 6 months, and then another 6 months of linear falloff.  This would make a bout played 9 months ago worth half it’s actual weighting.  This would be good because it makes more recent bouts worth more, which is definitely intuitive and nicely corrects for rematches by weighting the more recent bout higher.

But we’re doing on-demand rankings - rankings that people check up on frequently.  And having a smooth falloff period would make the ratings change slightly every day as the transition period slides over a team’s bout record.  I think this would make for very poor readability, as there would be up and down arrows all over the place indicating slight changes as the oldest bouts slowly fades out of existence.  For example: if today a bout is weighted .5 because it’s halfway through the falloff period, tomorrow it will be about .498.  And in two days it will be worth .496.  So a teams rating will constantly be in a state of transition, which may be nice model, but it also throws a wrench in what it means to watch teams change every week.

So instead we have a sharp cutoff at 365 days, and every bout within 365 days are weighted equally.  A year after a bout is played, the bout will disappear, and a team will move accordingly.  This could still be confusing, and to the uninitiated it won’t make sense why a team moved without playing a game.  We’ll have to come up with some way of making that situation clear.

There’s a couple hacks to this issue:
1) You could wait until the team plays a new bout before calculating how their old bouts have decayed.  This would be like accruing interest and paying up when you play again.  The major issue here is that when you see how a bout effects a rating, you’d be seeing two factors at once: that bout outcome, and the sum of all decayed bouts since their last bout.  That sounds pretty undesirable to me because it would make watching teams rise and fall unintuitive.  At least now, when something blinks out of existence it will (probably) happen on some random day of the week so you’ll see it isolated and understand it’s effect.

2) You could decay bouts as function of number of games played instead of time.  So if the 3rd oldest bout is worth 50%, everyone’s 3rd oldest bout is worth 50%, whether that bout was in the last month or one year ago.  If everyone played relatively even schedules this is probably the best compromise - since it’s based on bout number and not time, you don’t have to worry about teams floating day to day.  I’m pretty sure this is the basic concept behind David Dyte’s decay [http://www.covehurst.net/ddyte/derby/dd-ratings.html].

The following infographic hopefully helps to illustrate these systems.  The first would be our ideal model.  The second is our compromised choice.  The third is a different approach as a function of bouts instead of time.

I think the smooth, time-dependent falloff is most viable if you were doing end of season rankings or some other milestone: if we were calculating rankings every quarter, this would be the choice.  This problem is in need of a more elegent solution.

Upcoming and Unofficial Bouts

May 18th, 2008

So we’re now tracking upcoming (official) bouts for a league, as well as their unofficial bouts [more info on what official/unofficial means].  The unofficial ones were always in the database, but we’ve kept them hidden because knew we didn’t want them affecting rankings, and that we likely didn’t want them affecting stats.  They still affect neither, but we’re showing them now on a team’s page because we believe it helps paint a more complete picture of a team you’re studying.  What could make this better is to see exactly why the bout is unofficial.  Right now the vast majority of them are a mixed bout (travel vs local), but there’s some that are due to not being a WFTDA member on the bout’s date.

Upcoming bouts, originally, were helping us keep up on entering new scores every weekend.  They’re going public though because I really think they help contextualize the rankings.  When I see a team that seems out of place, the first thing I look to after seeing their bouting history is who they’re playing next.  For team’s that are dropping for their lack of bouting, it’s reassuring to see what they have coming up.  As well as on a team’s page, you can find all upcoming bouts we have recorded on the Upcoming Bouts page.

WFTDA History

April 23rd, 2008

Added WFTDA’s new rankings to this page and made accessible all the old WFTDA rankings back to 2006 Q4.

I’ve been trying to do this site with the least amount extra browser junk (i.e. Flash). I’m not sure how long I can keep that up as we move into some richer visualizations that I had in mind. But for now, Javascript (+jQuery) has been holding up quite well. jQuery is pulling off a scroll effect here that I quite like, and I’m loading old rankings on demand to keep the first page load as quick as possible. After the AJAX call, the ranking differences are calculated client-side - the same method as other parts of the site. I’ve kept those calculations client side because I imagine an interface in the future where you can arbitrarily hold up any set of rankings from any date and have those difference arrows be calculated automatically.

With all this processing happening in the browser, I’m hoping I’m not making this site too clunky on old computers and browsers…

Teams By Join Date

April 19th, 2008

This new section lists teams by the date that they joined WFTDA. This helps us nail down an aspect of what makes a bout official to our rankings: both teams had to have been WFTDA approved on the day they played for us to count it.

For original WFTDA leagues, we’re counting all 2005 bouts as official. So with January 1st, 2005 as the cutoff, we’re demoting the four 2004 bouts we have on record to unofficial. Hopefully those bouts are ancient enough that there’s not much controversy in that - my understanding is that these bouts were before any real standardization anyway.

That means those four disappear for now until we get an Unofficial Bouts pane happening for each team.

Dev Blog

April 9th, 2008

I’ve added this new blog to act as the site’s changelog and developer blog. I’ve always been hesitant to spam the front page with every new thing we add, so that’s what this is for. If you want the slightly more technical news of any and everything that’s new, subscribe to this blog’s feed.

I’ve gotten rid of the Algorithm Changelog which I’ve merged into this blog under it’s own category.

Also, I back filled the last couple weeks of additions/changes.

FTS Logical Errors

April 8th, 2008

FTS Logical Errors

Added a new section that identifies logical errors in our rankings:

  • A win against a team that is above them in ranking
  • A loss against a team that is below them in ranking

We had identified a lot of these just by working closely with the rankings in the past, but seeing them all together like this really helps us see where the rankings need to be tweaked, and which errors are basically inevitable and unsolvable.

Hopefully it also helps people understand more about the reality of ranking systems by pointing out these problem areas.

Last Week’s Rankings

April 7th, 2008

Added a third column to Today’s Ranks showing the state of the rankings at the end of the weekend of last week. Now that we’re at the end of the quarter, and most teams have played this quarter, it was getting hard to see who had actually played recently. Arrows show changes from the chart to it’s left, so arrows in Last Week’s chart show changes since last quarter, and arrows in Today’s Rankings only show changes compared to last week.