Updates stopped, full archive available
In case this isn’t clear enough, I’ve stopped the update script for the website; it won’t update stats or graphs anymore. My free time has been running thin for a while, and it has become increasingly difficult to monitor the website and correct mistakes in data reading when they occurred – which was too often, since I had to use proxies to read data from GameSpy’s servers. Also, my solution ran on a shared hosting service, and it didn’t have the CPU cycles available to do updates or generate graphics as properly as I wanted it to. Instead of let it run indefinitely and grab data that’s more and more unreliable, I decided to just shut it down and save the time for other things.
For all it’s worth, though, I’m making the data I’ve gathered for the past 4 years or so available for anyone. They’re dumps of the complete tables, in CSV format. You can open it on any tool that supports CSV files, such as Microsoft Excel or OpenOffice Calc. You will also need 7-Zip to uncompress the .7z files.
This data is not for the faint of heart. Do not expect to open this on a spreadsheet and make a beautiful chart, it won’t work. For one thing, because the data is pretty massive; but also because there’s no time consistency to the data rows, since I had to resort to occasionally averaging data to make the table smaller. Read below for more information.
- games: the list of games as they appear on Gamespy’s site, plus their id (the same as gamespy’s list id), its developer name, and its short title (used on the mods graph). All games found on GameSpy’s top 40 list were automatically added to this list (and subsequentially to the stats lists). If a game had the read_mods field on, it was also read from GameSpy’s website even if it didn’t make to the top 40 games.
- games_stats: the game stats. Contains number of players and servers. Fields year/month/day/hour contain the time that data was gathered; however, if, when adding the last hour (23) it already has 23 other stats for that game on that same year/month/day (that is, the game had statistics for all hours of a given day), it would average them all (set the statistics as “day”) and set the hour as “0”. It’s a bit dubious (hour “0” can either mean the averaged stats, or stats for hour 0), but it worked. In theory, if the data was 100% consistent, you’d never find any value in the “hour” field, only 0s, since it’d all be averaged eventually. But since it does not average the data when less than 24 samples are available for that day (like, when the server was down, or when gamespy was down, or when the script timed out before adding) it would sometimes leave the data alone. They’re still averaged by the graph generation script; this is mainly a database thing, meant to make it smaller and faster to read.
- mods: the list of mods, similar to the games one. Their id is internal. Also added automatically as new mods were found.
- mods_stats: same as games_stats, but for mods.
On the CSV file, field values are separated by semicolons and enclosed by quotes. The first line contains the field names.
Problematic ranges
Due to the problems that occured a few times when reading data, some of the data is ignored when making the graph, and is instead simulated: when creating the graphs, the graph-making script simply ignore these days and interpolates values from the nearest available data instead (previously it’d simply plot the data as available, making it look weird, but I changed it later and made it bypass these days). The list of “problematic days” (hard-coded into the graphic plotting script) are as follow:
- 20/june/2006 to 28/june/2006 (all data): on these days, the script didn’t really work because (apparently) it was blocked by the gamespy server. Data for these days isn’t available.
- 12/october/2006 to 13/october/2006 (mods): I introduced a feature that allowed the mods of the 20 top games to be read automatically (previously it would only read the mods of a few selected games). However, this created a bug in which the games that were set to always have its mod data read (ie, HL1, HL2) would simply ignore the mod data. That is to say, the mod data on these couple of days is lost for many games, so the data is instead simulated for all mods.
- 6/february/2007 to 13/february/2007 (all data): GameSpy had some kind of crash on their stats and stopped working on these days, giving either cached results (on the games pages) or no result at all (on the mods pages). Data for these days are more or less available but are corrupted and should not be considered.
- 7/april/2007 to 11/april/2007 (all data): GameSpy went crazy and a few games where getting ignored (including Half-life 1 and 2, Enemy Territory, Call of Duty 1 and 2, Quake 3, and others), listing little to no players and servers.
- 13/november/2007 to 14/april/2007 (all data): The update script was broken for a few hours (server was down?) so data isn’t available.
- 28/june/2008 to 4/july/2008 (Unreal Tournament): there was some kind of problem reading data for those days (it generated some super high numbers), so the game and mod data was deleted and interpolated from the known numbers.
- 29/june/2008 to 2/july/2008 (Unreal Tournament 2004): there was some kind of problem reading data for those days (it generated some super high numbers), so the game and mod data was deleted and interpolated from the known numbers.
Remember that this data belongs to GameSpy and they must be credited wherever needed. I merely store the statistics found on their page.
I started gathering the data from GameSpy on 14/12/2004 (for reference, that was shortly after CS:S release, if I remember correctly). The graphs on the site only show the last 365 days.
Like mentioned above, the data I’ve gathered is not 100% consistent, but it’s pretty close. I still stand by the data I gathered; while there’s no website with 100% accurate stats out there, it is my belief that GameSpy has been the most consistent one on average. There’s a lot criticism over it, as it should be (but also a lot of it unfounded or just plain false), but if one looks at the alternatives the critics point to, they leave a lot more to be desired and perform a lot worse on several other games, or just use a very different methodology; for example, it was common for people to point to Steam stats when stating that GameSpy numbers were wrong, but they failed to notice Steam stats also used local/singleplayer numbers on their counts, while GameSpy stats uses information from publicly known servers only. That is all to say, GameSpy stats was not perfect but it was the better of the bunch given my objectives.
It’s not my intention to continue a discussion that will never have an end, though. To be quite honest, this has been a great experiment for a few years. I certainly liked seeing how the audience for each game fluctuated over the years, and when new games were released. It’s too bad it has to close down; here’s to hope we’ll see something better or more accurate in the future.
December 28th, 2008 at 3:08 am
:-/
December 28th, 2008 at 6:21 am
Indeed 🙁
December 28th, 2008 at 6:15 pm
I remember keeping track of csports.net numbers for about a year (2002?). The more anomalies that popped up, the more discouraging it became as you realized how inaccurate it all was.
I like the way Steam stats does it, consistently disregarding bots and including singleplayer, but a graph that showed both perspectives would be perfect. Also, it’s a relatively new method and only includes games run through Steam. So, it’s only accurate to games that require Steam, mostly VALVe games. Oh well.
Thanks for this, though. I might trouble myself with this data sometime. 😉
December 29th, 2008 at 6:16 am
The problem with the way Steam does things is that it counted all kinds of singleplayer and local lan games. Not a *problem* per se, but it’s different of what I was trying to do (measure public online stats), so for me the numbers were useless – for example, CS 1.6 numbers would be highly inflated not because of its online presence, but because of its popularity among cybercafés with local, closed servers. For me that’s not part of the the online popularity.
Nowadays I’m convinced the best way to do it would be with a mixed solution: GameSpy for some things (specially games whose master servers are mantained by GS), game-monitor for others, maybe a separate master server and query list for some games… things like that. But that would be a more serious endeavor then what OGZ was.
December 30th, 2008 at 1:32 am
Oh, I wasn’t saying it was a good way to get the statistics for what you were after. I was saying that it represents an interesting data set to me when looking at the larger picture. Just as much as I like to see what pure public play popularity is, I’d also love to see numbers across the board for people just playing the game. The only way to do that is through a client on everyone’s computer and Steam is the closest to doing that. Unfortunately it doesn’t track non-Steam distributed games.
January 13th, 2009 at 4:59 pm
Well, shame to see you go – but thanks for the mad amount of work this has been over the last few years. It has always been good to see the fruits of our own labors up in the charts!
Cheers,
Alan
February 10th, 2009 at 8:23 am
R.I.P. 🙁
February 24th, 2009 at 12:12 pm
Hey, good job on keeping it up for so long!