Learning from Vulnerabilities Dataset

Dataset Supporting the ESORICS CyberICPS 2020 Workshop Paper "Learning From Vulnerabilities - Categorising, Understanding and Detecting Weaknesses in Industrial Control Systems" by Richard J. Thomas and Tom Chothia.

Read the Paper » Referencing the Dataset » Dataset Information and Release »

Referencing this Dataset

We encourage the use of our Dataset by the research community. If you do use it, we ask that you cite the Dataset and credit the University of Birmingham.

This Dataset is licensed under the Creative Commons Attribution 4.0 International license.

The Citation and BibTeX can be exported using the buttons below.

R.J. Thomas and T. Chothia. (2020) "Learning from Vulnerabilities - Categorising, Understanding and Detecting Weaknesses in Industrial Control Systems" in: Katsikas S. et al. (eds) Computer Security. CyberICPS 2020. Lecture Notes in Computer Science. Springer, Cham.
@InProceedings{uob-esorics2020, author="Richard J. Thomas and Tom Chothia", title="Learning from Vulnerabilities - Categorising, Understanding and Detecting Weaknesses in Industrial Control Systems", booktitle="Computer Security", year="2020", publisher="Springer International Publishing", address="Cham"}

The Dataset

Everything you need to know about this Dataset.

Dataset Information:

The 'Learning from Vulnerabilities' Dataset was curated by scraping CISA ICS-CERT Advisories, the NIST NVD CVE feeds, MITRE CVE exports and the MITRE CWE list. The workflow that imports the data held in these sources to form our Dataset is given in our paper.

This Dataset contains all ICS advisories between 2011 and March 2020. Some key statistics are given below:

Data Schema

The Dataset has been broken down into a set of tables for simple referencing and to provide 'single sources of truth'. The schema is given below, with a description of the fields contained in those tables. Each schema matching SQL and CSV files. Where text is given in '[]', this relates to the corresponding Dataset file.

Dataset Releases

This Dataset is available as a set of SQL, CSV and JSON files for database servers and integration with other data analysis tools and software.

Base Tables

These tables define common data that do not change (e.g. Vendors, CWEs and mappings). These are used as foreign keys for the original, validation and full datasets.

Base tables supporting the rest of the dataset
Schema File SQL CSV JSON
esorics2020-icsa_vendors SQL CSV JSON
esorics2020-icsa_cwe SQL CSV JSON
esorics2020-cwe_groups SQL CSV JSON
esorics2020-cwe_group_member SQL CSV JSON

Original Dataset

The Dataset containing ICS advisories from 2011 to August 2019.

Original Dataset Files
Schema File SQL CSV JSON
esorics2020-icsa_alert SQL CSV JSON
esorics2020-icsa_statistics SQL CSV JSON
esorics2020-icsa_statistics_paper_dev SQL CSV JSON
esorics2020-extended_icsa_statistics_paper_dev SQL CSV JSON

Validation Dataset

The Dataset of ICS Advisories and data through to between September 2019 and March 2020 which were used to validate the categorisation for our paper.

Validation Dataset Files
Schema File SQL CSV JSON
esorics2020-validation_icsa_vendors SQL CSV JSON
esorics2020-validation_icsa_alert SQL CSV JSON
esorics2020-validation_icsa_statistics_paper_dev SQL CSV JSON
esorics2020-extended_validation_icsa_statistics_paper_dev SQL CSV JSON

Full Dataset

The full dataset of ICS Advisories and data through to March 2020.

Full Dataset Files
Schema File SQL CSV JSON
esorics2020-merged_icsa_vendors SQL CSV JSON
esorics2020-merged_icsa_alert SQL CSV JSON
esorics2020-merged_icsa_statistics_paper_dev SQL CSV JSON

Have Questions?

If you have any questions, please feel free to get in touch with us. Our contact addresses are in the paper.