Updates stopped, full archive available

In case this isn’t clear enough, I’ve stopped the update script for the website; it won’t update stats or graphs anymore. My free time has been running thin for a while, and it has become increasingly difficult to monitor the website and correct mistakes in data reading when they occurred – which was too often, since I had to use proxies to read data from GameSpy’s servers. Also, my solution ran on a shared hosting service, and it didn’t have the CPU cycles available to do updates or generate graphics as properly as I wanted it to. Instead of let it run indefinitely and grab data that’s more and more unreliable, I decided to just shut it down and save the time for other things.

For all it’s worth, though, I’m making the data I’ve gathered for the past 4 years or so available for anyone. They’re dumps of the complete tables, in CSV format. You can open it on any tool that supports CSV files, such as Microsoft Excel or OpenOffice Calc. You will also need 7-Zip to uncompress the .7z files.

This data is not for the faint of heart. Do not expect to open this on a spreadsheet and make a beautiful chart, it won’t work. For one thing, because the data is pretty massive; but also because there’s no time consistency to the data rows, since I had to resort to occasionally averaging data to make the table smaller. Read below for more information.

  • games: the list of games as they appear on Gamespy’s site, plus their id (the same as gamespy’s list id), its developer name, and its short title (used on the mods graph). All games found on GameSpy’s top 40 list were automatically added to this list (and subsequentially to the stats lists). If a game had the read_mods field on, it was also read from GameSpy’s website even if it didn’t make to the top 40 games.
  • games_stats: the game stats. Contains number of players and servers. Fields year/month/day/hour contain the time that data was gathered; however, if, when adding the last hour (23) it already has 23 other stats for that game on that same year/month/day (that is, the game had statistics for all hours of a given day), it would average them all (set the statistics as “day”) and set the hour as “0″. It’s a bit dubious (hour “0″ can either mean the averaged stats, or stats for hour 0), but it worked. In theory, if the data was 100% consistent, you’d never find any value in the “hour” field, only 0s, since it’d all be averaged eventually. But since it does not average the data when less than 24 samples are available for that day (like, when the server was down, or when gamespy was down, or when the script timed out before adding) it would sometimes leave the data alone. They’re still averaged by the graph generation script; this is mainly a database thing, meant to make it smaller and faster to read.
  • mods: the list of mods, similar to the games one. Their id is internal. Also added automatically as new mods were found.
  • mods_stats: same as games_stats, but for mods.

On the CSV file, field values are separated by semicolons and enclosed by quotes. The first line contains the field names.

Problematic ranges

Due to the problems that occured a few times when reading data, some of the data is ignored when making the graph, and is instead simulated: when creating the graphs, the graph-making script simply ignore these days and interpolates values from the nearest available data instead (previously it’d simply plot the data as available, making it look weird, but I changed it later and made it bypass these days). The list of “problematic days” (hard-coded into the graphic plotting script) are as follow:

  • 20/june/2006 to 28/june/2006 (all data): on these days, the script didn’t really work because (apparently) it was blocked by the gamespy server. Data for these days isn’t available.
  • 12/october/2006 to 13/october/2006 (mods): I introduced a feature that allowed the mods of the 20 top games to be read automatically (previously it would only read the mods of a few selected games). However, this created a bug in which the games that were set to always have its mod data read (ie, HL1, HL2) would simply ignore the mod data. That is to say, the mod data on these couple of days is lost for many games, so the data is instead simulated for all mods.
  • 6/february/2007 to 13/february/2007 (all data): GameSpy had some kind of crash on their stats and stopped working on these days, giving either cached results (on the games pages) or no result at all (on the mods pages). Data for these days are more or less available but are corrupted and should not be considered.
  • 7/april/2007 to 11/april/2007 (all data): GameSpy went crazy and a few games where getting ignored (including Half-life 1 and 2, Enemy Territory, Call of Duty 1 and 2, Quake 3, and others), listing little to no players and servers.
  • 13/november/2007 to 14/april/2007 (all data): The update script was broken for a few hours (server was down?) so data isn’t available.
  • 28/june/2008 to 4/july/2008 (Unreal Tournament): there was some kind of problem reading data for those days (it generated some super high numbers), so the game and mod data was deleted and interpolated from the known numbers.
  • 29/june/2008 to 2/july/2008 (Unreal Tournament 2004): there was some kind of problem reading data for those days (it generated some super high numbers), so the game and mod data was deleted and interpolated from the known numbers.

Remember that this data belongs to GameSpy and they must be credited wherever needed. I merely store the statistics found on their page.

I started gathering the data from GameSpy on 14/12/2004 (for reference, that was shortly after CS:S release, if I remember correctly). The graphs on the site only show the last 365 days.

Like mentioned above, the data I’ve gathered is not 100% consistent, but it’s pretty close. I still stand by the data I gathered; while there’s no website with 100% accurate stats out there, it is my belief that GameSpy has been the most consistent one on average. There’s a lot criticism over it, as it should be (but also a lot of it unfounded or just plain false), but if one looks at the alternatives the critics point to, they leave a lot more to be desired and perform a lot worse on several other games, or just use a very different methodology; for example, it was common for people to point to Steam stats when stating that GameSpy numbers were wrong, but they failed to notice Steam stats also used local/singleplayer numbers on their counts, while GameSpy stats uses information from publicly known servers only. That is all to say, GameSpy stats was not perfect but it was the better of the bunch given my objectives.

It’s not my intention to continue a discussion that will never have an end, though. To be quite honest, this has been a great experiment for a few years. I certainly liked seeing how the audience for each game fluctuated over the years, and when new games were released. It’s too bad it has to close down; here’s to hope we’ll see something better or more accurate in the future.

Do not adjust your web set

Users may have noticed that, as of lately, some graphics are showing some pretty odd results, with certain games shooting high up on the chart or some other games disappearing.

Contrary to popular belief, this is not a sign of the apocalypse, that the website or its database has been fucked up beyond recognition, or (and this is my favorite) that this is some kind of conspiracy to make a game “popular”/”unpopular” artificially. Instead, this behavior is due to the incorrect reading of (more or less) one day’s worth of data. It’s not an ongoing error; it’s something that happened once and now is already back to normal.

The problem is, with the way I render the graphics – doing an evened-out 7-day average plot, to dilute the effect of weekends – having invalid data for a single date is enough to make the plot lines pretty weird for almost half a month. So that’s what you’re seeing right now; given time, things will look like they looked before, with game averages back to their original positions.

Read the rest of this entry »

Xbox Live top games of 2007

Major Nelson – of the Xbox Team – has posted a few lists with the most popular games for Live, the online system used on the Microsoft consoles. The list is ranked by number of unique users (or sales, in the case of Live Arcade titles) and covers the entire year of 2007. No raw numbers are listed, but it’s still interesting.

On Call of Duty 4′s absence

Since a few visitors have noticed Call of Duty 4 hasn’t shown up on the site’s graphs, here’s a small update on the situation.

Just to make it clear, the game is indeed doing very well online. If you check the ServerSpy or Game-monitor statistics, you’ll see it’s doing well enough to put it quite strongly at the spot of 3rd most played online PC FPS.

The reason it’s not showing on this website’s graphics is because GameSpy stats – which is the source I use for my data – isn’t covering it yet. They probably still have to code the server query logic into their system, and hopefully they will have it soon. So unfortunately the post-release data is lost, but it’s a sure bet the game is having a big impact.

And before someone points how I should switch to using some other data source instead, let me remind you that none of these FPS statistics out there is perfect. Each have its own little perks and little issues with specific games. It just so happens now that GameSpy had a major issue with CoD 4, but the sanest thing to do is to expect it to recover soon. Just switching the data source to another website would invalidate any possible comparison to the data already gathered, requiring a database reset, so it’s something I’d like to avoid.

Update: it’s working now, as CoD4 is being listed on GameSpy stats. Check the comments for more information.

Enemy Territory: Quake Wars, 3 months later

Three months after the release of id Software and Splash Damage’s Enemy Territory:Quake Wars, how well is it doing online?

Enemy Territory:Quake Wars, 3 months later

Read the rest of this entry »

Three-year special

Slightly more noisy this year, here’s the new special graph, in commemoration of 3 years of data gathering and graph generation. Only the most major titles are drawn.

Certain mods now listed as retail games

I have finally changed the way some of the graphics are generated on the website: now, certain mods are treated as full games and listed on the games graphics.

This move was needed because many full-fledged games based on the HL2 engine – games such as TF2, Counter-Strike:Source, Day of Defeat:Source, The Ship, Dark Messiah of Might & Magic, among others – are listed as HL2 mods. Technically, they might be mods after all, but since they’re sold as separate titles – with different SKUs and all – it makes sense to have them listed separately.

However, I went ahead and took the separation a bit further. Some original HL1 mods – Counter-Strike, Day of Defeat and Team Fortress Classic – are also separate now. This one is a bit controversial, as they are HL1 mods, and one can, technically, get them installed without the need to buy anything other than the original HL1 game. However, they’re also sold separately and generally considered to be very different games with separate communities and all. They also take a huge chunk of the online player base. So, as such, they are now treated as full games.

Due to this change, it’s now finally possible to see what kind of impact a few recent releases had on the online player base, specially comparing TF2 to the rest of the games. For example, it’s possible to see TF2 managed to get to the 4th place in online popularity quite fast, but now has dropped to 6th. I’ll have some more in-depth posts about this game and others in the future, as soon as December is over.

Finally, I will probably tweak these graphics – and what should and shouldn’t be considered a full game – a bit further in the future. The data remains the same – all mod-to-game “upgrading” is done when the graphics are rendered, not when the data is read – so I can always go back and undo anything.

The HL2 and HL1 mods are still listed on their original game pages, but that will probably change soon too.

Valve releases Team Fortress 2 statistics

In addition to other Half-life 2: Episode 2 statistics (released a few weeks ago), Valve now has updated their Game and Player Statistics page and added some Team Fortress 2 Gameplay Stats with some cool information such as class, weapon and map breakdown in a number of different parameters. There’s some really, really awesome information there.

As a follow-up, other interesting discussions about the statistics are taking place over at Rock, Paper, Shotgun and at ShackNews.

New Steam statistics

Valve – developers of games Half-Life, Counter-Strike and Team Fortress, among others – have recently updated their game statistics page. They added some very nifty gameplay statistics for Half-Life:Episode 2 and, more recently, they’ve also started a new edition of their player hardware survey which automatically gathers hardware and software data from Steam users through the globe.

The big news is that this version of the survey also detects whether the user has some specific applications installed – on my case, it successfully detected FireFox, OpenOffice, and Zone Alarm. Data from those specific results are not listed on the survey results page yet, though. Personally, being both a FireFox and an OpenOffice zealot, I was pretty happy to see something like this popping up there.

Finally, Tom at The Steam Review has some additional discussion about it.

Update: awesomely-named RockPaperShotgun.com also has an analysis of some of the preliminary results of the new survey.

Run to the hills: it’s multiplayer first person shooter invasion

If this October’s (legendary) release of TF2 (see how well it’s doing here) and ET:QW (follow it here) didn’t drain all the productivity you might have left, feel free to make use of your stash of sick days at work as the demo for UT3 has just been released (it’ll soon show up here too).

Lastly, just as a quick note: in the future I’ll arbitrarily separate some specific mod data from their specific game data, so games like TF2 and CS:S won’t show up as being mods of HL2 (they’re sold as separate SKUs after all). I’ll also have to remake the graphic generating algorithm to work around the server script execution time limit – the reason why the mods page isn’t getting updated. No data is lost, though, as gathering them is a separate process that’s still doing fine.

This website gathers data for various First Person Shooter games for PCs, and then build graphics with those numbers. This brings no answers, just questions. Where do we go from here?