Eclipse Scava Downloads

Scava logo

This web site hosts the open datasets generated in the course of the Crossminer research project. Crossminer has been terminated in 2019, and since then the datasets are maintained by Castalia Solutions as a service for the Eclipse and Research communities.

The datasets include various pieces of data retrieved from the Eclipse forge: Mailing lists, Project development data, and AERI stacktraces in handy CSV and JSON formats. Each dataset has a R Markdown document describing its content and providing hints about how to use it. Examples provided mainly use the R statistical analysis software.

All datasets are published under the Creative Commons BY-Attribution-Share Alike 4.0 (International).

All data is anonymised, please see the dedicated document to learn more about privacy and the anonymisation mecanism.

We're open: if you'd like to contribute, or for any request or question, please see the Eclipse GitLab project page.

Eclipse projects

We generate comprehensive data extracts of a set of Eclipse projects, including data sources like:

These datasets are updated weekly, at 2am on Sunday. If you would like to add a project, please submit an issue.


Eclipse mailing lists

The Eclipse Mailing lists dump is an extract of all emails posted on the Eclipse mailing lists.

More information can be found on the official Eclipse page for mailing lists.

AERI Stacktraces

The AERI stacktraces dataset is a list of exceptions encountered by users in the Eclipse IDE, as retrieved by the AERI system. The Automated Error Reporting (AERI) system has been developed by the people at Code Trails and retrieves information about exceptions. It is installed by default in the Eclipse IDE and has helped hundreds of projects better support their users and resolve bugs. This dataset is a dump of all records over a couple of years, with useful information about the exceptions and environment.

Last update of the dataset occured on 2018-02-11.



More information about the AERI system can be found on the Code Trails website.

About Scava

Scava is the Eclipse spin-off of Crossminer, a EU-funded research project. More information can be found at the following places:


All datasets are published under the Creative Commons BY-Attribution-Share Alike 4.0 (International).

All code is, unless otherwise stated, published under the Eclipse Public Licence, v2.