Chemical outlier dataset
The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.
The data is derived from a published chemical dataset on boiling point measurements  and from public data . Features were generated by means of the RDKit Python library . The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules.
- Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102.
- RDKit: Open-source cheminformatics; http://www.rdkit.org