FAQ
These are the frequently asked questions about this page. Actually, no one has ever asked them, as the site is brand new; I just came up with them before someone bothered to ask. This is supposed to be informative, anyway, so everything I remember about the whole experiment is listed here.
Q. Why?
A. For quite some time, I was kind of hoping some of the sites which already grab player share data would do that, but one day I became tired of waiting for that to happen and decided to do my own. My justification was that I wanted to learn how to use regular expressions on PHP for data parsing – something I’ve never done – and because I wanted to create some pretty graphic components using Flash, my main development platform.
Q. So why there’s no Flash content on the website?
A. Because, later, I dropped the Flash frontend entirely – way too much meta data for it to load – and switched to a static graphic presentation, using GDLib, something I’ve never done before either. So I guess the learning justification still stands…
The original Flash version was my dream of a cool interactive graphic. There was so much I could do – dynamic time range changing, zooming, game toggling for better views, percent/raw data switching, players/servers/player per server switching – it would be really nice once finished. However, as I left the project hibernating to create a large data set, the Flash solution has shown to be unpractical. Large chunks of data – say, for one year – would be way too hard to receive and to draw. So instead of building a huge data queue with local caching and updating, I opted to drop the intelligent/interactive frontend idea altogether and create several different static graphs which are easy to view.
Q. Why didn’t you do it with SVG?
A. Same reason it wasn’t done with Flash: way too much data to download and to draw. Just think about it: a graph of one year of data, for the mods stats, would take 620500 points just for the data lines. Can you imagine how big would an SVG file be for that amount of complex lines? And how slow it could get? In the end, a static ~15kb PNG image is just a better option. I have full control of the size of the images that are created, so in the future maybe I’ll just created zoomed in versions of them or increase the size of the standard images.
Q. Is the raw data available for download or purchase?
A. All the data I’ve gathered over the years is available for download for third-parties for free. I don’t sell the data. However, for this, there’s a few important things that need to be understood: first, that the data is not 100% consistent (some days are missing, for example); second, that I average the day after the data for the whole 24 hours have been gathered (because the database is already too big), so there is no hourly data available; third, the data is gathered by me but comes from GameSpy stats, so you should give credit where credit’s due; and fourth, the database download might put a bit of stress on the SQL server, and count as part of my bandwidth, so I reserve myself the right to refuse access to it to anybody if I feel like.
If you still want the data, contact me giving your reasons why you need it and I’ll provide a download link. The link provides database download in CSV format (can be opened by Excel or OpenOffice Calc) and can be visited any time you need it for a new, updated download (it’s a permanent link). The data is not pretty, but quite if you want some better, custom filtering.
Q. The graph is fucking hard to make out, and those colors suck.
A. I know. Sorry. I’m still playing out with possibilities. Colors are currently random. I’m testing different solutions for this issue.
Q. Does these graphs represent the popularity of a game?
A. Absolutely not. What they do represent is the percentage of online players on a given day for specific games. They don’t represent the amount of offline or singleplayer players, the quality of a game, the maturity of a game community, or the overall fun you can have with a game.
Q. Are you affiliated with GameSpy?
A. Neither I or this page is affiliated with GameSpy. This page isn’t supported by GameSpy. I’ve sent them an email explaining the site’s intent but they never replied – I hope they won’t block my data mining, that’s all. Also, ASE is my preferred gaming frontend – even though it’s basically dead, I’m a registered user – so if you want to bitch about how you don’t like/like GameSpy software, please find someone else as I don’t care for your frontend preference.
Q. How is the data mined?
A. Some PHP scripts do the trick. Once an hour, everyday, update.php visits the Gamespy Stats for all games, and saves that on a huge database. Then it goes to each of the games’ pages (including some of the games that aren’t always listed on the main page, which I opted to force updating) and grabs that data too, so it knows the mods data. New games and mods are also added as they appear on the stats. Then, once a day, make_graph.php averages all this hourly data and creates some day-based graphics – the graphics you see on each stats page. The data is also compressed sometimes – all hourly data is averaged and stored as daily data – so the database won’t get too big. For reference, as of august 2006, there are 156,147 records stored in 4 different MySQL tables, taking 3.8mb.
Q. But I don’t use GameSpy software! How can it know what I am playing?
A. It’s a common misunderstanding thinking that the GameSpy data is grabbed by the client or their game browser. This is not true. What GameSpy – and all other online players statistics page do – is parse the list of servers available for each game and sum the number of players on each game or gametype. So even if you use your obscure mIRC script to play Quake 3:Arena matches with your IRC friends, you’ll be counted as long as you’re on a publicly known server. There is no balance on bias played on games that feature GameSpy frontend or backend software either; at worse, just an increased accuracy on cases where GameSpy hosts master servers (Battlefield 2 and Battlefield 2142 from what I know).
Q. The numbers for <some game> are different from what I found on <some other statistics page>. Why?
A. Yes, that happens. Specific game numbers such as servers and players vary from one statistics website to another. This makes these statistics a bit less accurate.
Unfortunately, there’s no easy solution to this – right now, I’m only using Gamespy data, believing it’s as demographically distributed as possible. But there are lots of variations between each statistic-gathering website and each game – and no website has consistently lower numbers, so it’s unfair to classify one website as being more or less accurate than the other. Finding out the seemingly “most accurate” website depends on the game you’re basing your comparisons on.
The thing is, all statistics websites must have a list of servers available for each game. For most of the games, these lists are public (maintained by a master server). For some (specially old games), it’s not, so they depend on private lists, or on the server admin’s ability and will to do a ping on some public list so his server will be known. Different services maintain different lists, so it’ll always be impossible to have a definitive list.
With that said, when viewing the graphics, you should always remember they aren’t scientifically accurate. It’s an approximation. All statistics pages are.
Also remember that, on a graph, it’s important to detect the player fluctuation from one game or the other, not the absolute number of players. If <game x> is losing players and <game y> is gaining players, it means one is fading out and the other is getting popular, simple as that; if <game z> is gaining players rapidly, it means it’s becoming popular, either by way of word of mouth or player retention (or both). Even if a statistics website is ignoring a bunch of servers, it’s to be expected that this behavior will reflect globally.
Think of it as a marketing research: when people do researches, they don’t interview 100% of the population, they just interview a sample range that allows them to have a rough idea of what the entire population thinks. The same goes with these data: they will never be 100% accurate so you have to take some error margin into account.
Q. But SteamPowered.com stats show very different numbers. Counter-Strike has two times the number of players listed by GameSpy! Which one is correct?
A. As crazy as it sounds, both are. The definition of “players” they take into account is different, though. Simply put, GameSpy stats lists the number of online players found on public servers, while Steam Powered statistics lists both public, and private, or local, players.
To be more precise, GameSpy “sees” servers which are broadcasted online and which anyone can join (even if passworded). It do so based on a master server list. Players found on such servers are listed on GameSpy stats.
SteamPowered stats, on the other hand, are ran by Valve, which maintain Half-Life, Half-Life 2, and all derivative games based on their Source engine, such as Counter-Strike and Counter-Strike source. As such, they have internal access to Steam data, and create their listing not only with data from public servers, but data from private servers as well. This includes LAN-based servers which are authenticated online, servers that are not flagged to be listed as public servers, and single player games of all kinds.
SteamPowered stats shows a more complete picture of play, which is not covered by any other statistic website. As such, it would be unfair to compare its stats to any other game – since we only have the “public” numbers available from other games. In this vein, the numbers found on websites such as GameSpy, ServerSpy, and Game-monitor, are the correct ones to use, because they take into consideration public online players only.
There is no conspiracy theory to be drawn here. This information that SteamPowered also uses ‘private’ data on their stats has been confirmed by Valve staff (thanks!) through email.
Q. Are bots counted on these stats?
A. It’s difficult to answer. Most of the times, yes – since some admins chose to add a minimum number of bots to their online servers to "keep it going". But since only real, public and broadcasted servers are added to the list, bots are probably kept to a minimum, so I don’t believe this impacts the graph data a lot. Also, it’s impossible to not read these bots data, and there’s nothing I or anyone else can do. Also, there’s no such thing as "<x> stats doesn’t count bots!" – not all games report bots correctly, if at all. CSports.net claims it doesn’t count bots, but it’s done by adding the common bots names to some filters so it’s not 100% accurate – it can block people who use bot names, will still let bot with custom names slip, etc.
Q. Why don’t you use other source of data such as ServerSpy or Game-monitor instead, or mix them all?
I’d prefer not to mix numbers from several different sources on the statistics; each of them has different ways of gathering those numbers, and I’d wish for a more even ground. I agree the current solution is not very good, but in my eyes, it’s the best to detect fluctuation of player share even though the absolute numbers are not that precise.
And unfortunately, replacing the data source would invalidate any possible comparison to the data already gathered, requiring a database reset, so it’s something I’d like to avoid. Each have its own little perks and little issues with specific games, so it’s not like there’s a good reason for that, anyway. This is a possibility in the future, but right now, it’s pretty far; using GameSpy stats is as good as any of the other websites available.
Q. Can I link the images directly on my site/page/forum post/blog?
A. Please don’t. This will just hurt my bandwidth, and graphics vary with time anyway. Link to the website instead, or host an image at some other image-hosting service.
Q. But this game doesn’t count my game because <X>!
A. So… what do you want me to do? This is just a graph of online known player shares for certain games. If the amount is incorrect for some technical reason, what can we do? Blame the developers? Blame GameSpy? Blame the players? Whatever it is, it doesn’t matter; it’s just a number.
Q. Are other types of graph planned?
A. Yes. Right now, I have the groundwork done for developer-share graphics, and for mods for specific games (like, one entire page for each game). Just the graph generator data is missing, so I’ll probably do it soon. I also have stats for the number of servers (and of course, players/server) but right now I don’t think there’s much cool stuff to be done with that.
Q. Why did a lot of new mods appear on October of 2006, and a few other mods lose players?
A. I had made a mistake on the way graphs are generated. Previously, to avoid asking GameSpy’s server for too much data, the script would only ask for the stats main page (game stats) then read the specific game page (mod stats) for a few selected games. The mistake was that only games with this attribute enabled would get their mod data read; this means that new games that suddenly appeared on the list (say, Battlefield 2142) would get ignored until I added that field to them.
This is the wrong way to do it, simply because the popularity of the game should rule the decision on whether to read its mod data or not. So from October of 2006 on, I’ve made a change that enables it to read the mods for the top 20 games of the main page, plus games that have their "force mod read" attribute set. Hence why there’s a lot of new lines on the mods graphics – because mods that didn’t appear there but were quite popular were finally listed.
Additionally, the reason why a few mods dropped in numbers suddenly (for example, Counter-Strike and Counter-Strike:Source) wasn’t because players moved to other games; rather, there was a bug on my code that I only noticed a couple of days after implementing it. In brief, some of the top 20 games on the initial list were being ignored (for example, Half-Life) and their mod numbers weren’t being read anymore (although the games number were).
This succession of mistakes make it impossible to securely determine the immediate impact of the launch of BF2142 and its demo (released this month) on other games and mods, but at least everything is fixed now, and should work better in the future.