Chemical outlier dataset

2019-07-29T07:09:36Z (GMT) by Mario Lovric

The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.

The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules.

  1. Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102.
  2. https://pubchem.ncbi.nlm.nih.gov/
  3. RDKit: Open-source cheminformatics; http://www.rdkit.org