Forró em Vinil Dataset
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
In the work described in the paper Measuring disruption in song similarity networks, Falcão et al. developed a study of disruption measurement over a collection of a Brazilian music tradition called Forró.
This dataset contains all the audio information used during analysis. 27,352 audio files were used to build a song similarity network, from which disruption information was derived.
list_songs.txt and list_features.txt contain an indexed list of all the audio files analysed and their MFCC-based feature vectors, respectivelly. Data from both files can be mapped by using the indexes (i.e., the i-eth line in list_features.txt refers to the feature vector for the i-eth song informed by list_songs.txt)
Similiarity Network.txt contains the similarity network built according to the similarities calculated between the feature vectors. The network is presented in a GEXF (Graph Exchange XML Format) format, and can be visualized with softwares like Gephi.
Disruption Ranking.txt summarizes the Disruption Indexes of all songs, in ascending order.