Dataset with manually validated version histories of Stack Overflow posts
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
We used this dataset to evaluate different string similarity metrics for SOTorrent (http://sotorrent.org/).
The dataset has been created with this tool: https://github.com/sotorrent/so-posthistory-gt
The dataset has been used in this project: https://github.com/sotorrent/metrics-comparison