2011 Lies, Damn Lies and Eclipse Upload Statistics

Here’s the results of the 2011 Eclipse Release Torrent Experiment.

As a Friend of Eclipse, I get access to Eclipse releases a day in advance. I used that opportunity to seed all the torrents and make them available for the masses, just like in 2009 and 2010, In 2009 I published download popularity charts. This is a repeat of that 2009 experiment.

Downloads By Product

Whereas the two day experiment from 2009 showed a disproportional number of JEE downloads, this time the other major products have closed the gap. The most surprising of these is CDT. Keep it up, folks.

Downloads By Platform

The only platform where 64-bit downloads are not a majority is Windows, and even then it’s only by a small amount. This is a big change since 2009.

Here’s a breakdown of 32-bit vs. 64-bit releases, by product:

Here you can see that the SDK, CPP and Java releases still belong to the 32-bit OSes, and JEE and Reporting are popular among 64-bit OSes.

Once again Windows is the more popular than the other two, combined, for all products, with only JEE coming close to being contended by Linux and OSX.

Raw Data

Here’s the upload counts, platform x product. Make of it what you will.

and the same data without Windows

Finally, here’s a graph of upload statistics, normalized against the 48-hour mark. The purpose of this graph was to prove or debunk the theory that 48-hours was more than enough time to deduce product and platform popularity.

Yeah, I’m not making much out of that, either.

Conclusion

Total Uploads: 1489
Total data uploaded: 270 GB

Well, it’s been fun collecting the data once again this year. Here’s the shirt I wore while composing this post.

You might also feel this way so buy your own.

By the way..

Are you a Friend of Eclipse?

Addenda

Some notes about data collection:

  • As opposed to last time, I ran this for eight full days. Actually it’s now been thirteen days since the release, but I’m only publishing information about the first eight days for the sake of laziness.
  • This data only represents uploads after the public release, and not uploads during the advanced seeding.
  • I lost 10 hours or so in the middle of the week because my machine froze and I didn’t know. I don’t think this had a significant impact on the data.
  • I did nothing to cap bandwidth for any of these Eclipse distributions. However, I suspect that after a day my upload bandwidth was capped.
  • As opposed to the 2009 experiment, I downloaded all possible releases. Wow, is this a large release.

I collected data on both megabytes uploaded from my torrent client, as well as ratio of upload to download. Since all copies of my products were fully downloaded, an upload ratio of 2.5 means that virtually 2.5 copies of that distribution were uploaded from my machine. Of course torrent doesn’t ship data in full files, it’s just pieces of the distributions, here and there.

These are my four primary, and therefore, potentially disputable, assumptions:

  • More popular products will be uploaded by more people.
  • Upload ratios are a better measurement of popularity than megabytes uploaded. If product X is 50% larger than the size of product Y, equal bandwidth dedicated X and Y do not denote equal popularity. I just think people using torrent aren’t really worried about the size of their Eclipse installations.
    • Similarly, I consider negligible any difference in compression ratios between the win32 zip file format and other gzipped tar files.
  • 48 hours of data collection is more than enough time to collect data, and taking more than the first 48-hours of data will not yield significantly different results. Nonetheless, I collected much more data than that.
    • I presume that files that are not as well seeded as others will take more time to initially download, and as such, will not contribute much to the other uploads during the first part of this process, and so may exaggerate the results slightly. Given that, I suspect the correctness of the 48-hour window will be the most disputed assumption.
  • People don’t care about how long it takes, if they’re using bittorrent. I assume it’s a “set and forget” type of tool.

4 responses to “2011 Lies, Damn Lies and Eclipse Upload Statistics

  1. Neat stuff, Robert.  Did you lump the Linux Tools package in with the CDT package?  The package is only available for Linux so if you’re not on a Linux browser, the EPP download page won’t show it.

    http://www.eclipse.org/downloads/packages/eclipse-ide-cc-linux-developers-includes-incubating-components/indigor

  2. I did not. All the torrents I used were made available via the email I received the day prior to the Indigo release. It included these products:

    * Eclipse IDE for Java EE Developers
    * Eclipse IDE for Java Developers
    * Eclipse IDE for C/C++ Developers
    * Eclipse Modeling Tools
    * Eclipse for RCP and RAP Developers
    * Eclipse for Javascript Web Developers
    * Eclipse for Testers
    * Eclipse IDE for Java and Report Developers
    * Eclipse IDE for Parallel Application Developers
    * Eclipse IDE for Scout Developers
    * Eclipse Classic 3.7

  3. I don’t know why the Linux C/C++ package wasn’t included in the Friends of Eclipse torrent email but it definitely exists since I’ve been seeding at home for a few weeks now. The stats for that are uninteresting, though, since it’s only available for Linux x86 and x86_64 🙂

  4. I’ll point this out to Ian.

Leave a comment