Feeling quite fateful and predestined, I thought of using some simple audio processing, machine learning and visualization techniques to simply analyze my music collection. O
f course, I know nothing about music science knowledge, so analysis will not involve any content related to music science knowledge, purely "play" nature of the analysis.
Install Python and add to the environment variables, pip installation needs the relevant modules;
file provided in the relevant file and add it to the environment variable, for example:
Start the spin-off
For convenience, all music files are converted to .wav format before being analyzed.
Start with the simplest!
Let's take a look at the sound waveforms of different singers first:
Feel the waveform map is so confusing, it seems that the amount of data caused by too much, so I intend to change a strategy, only draw the first 10 seconds of each song waveform map to compare, after all, a good start is half the success?
It still seems interesting, but I don't see what it looks like, the difference between the waveform structure of the same singer's song and the waveform structure of the song sung by different singers seems to be quite large.
Although there is no rule that the difference between the waveform structure of a song sung by the same singer must be very small, the difference between the waveform structure of the song sung by different singers must be very large.
Well, it's a little confusing, or it's better to be casual. S
o let's try to extract the characteristics of the song.
The song features we intend to extract are:
(1) The statistical moment of the song waveform, including mean, standard deviation, bias, and peak state, and we obtain the performance of these features on different time scales by smoothing the window (incremental smoothing, 1,10,100,1000 lengths respectively);
(2) In order to reflect the short-term change of the signal, we can calculate the statistical moment of the first-order difference amplitude of the waveform, and also obtain the performance of these characteristics (mean, standard deviation, eccentricity and peak state) on different time scales by smoothing the window;
(3) Finally, let's calculate the frequency domain characteristics of the waveform, where we only calculate the energy ratio of the song in different bands (dividing the entire band into 10 parts), but if the waveform data of the song is changed quickly, the calculation is too large, so let the waveform data pass through the smooth window of length 5 and then make a quick Fourier transformation.
To sum up, we've got 42 feature values for the song. L
et's try to use these feature values to cluster the 43 songs I've downloaded these days.
First, in order to facilitate the visualization of the results, we use PCA to degrade the data (42-dimensional features to 2-dimensional features), for convenience, we directly adjust
the library (s
implementation, the results printed as follows:
OK, then we can cluster the data after the dimension reduction, here we will implement the k mean clustering algorithm ourselves instead of simply modulation, the final clustering result is shown in the following figure (k=4):
Next we try to normalize the 42 feature values of the song, and then do the PCA and clustering above, while making k-3, and the final clustering result is shown in the following image:
Emmm, it looks like it's worse.
But I found that I liked the song "Tail Ring" for 8 years!
It's still great, haha
Of course, there is a problem here, the song's 42 feature values are manually selected, perhaps not very good to show the song characteristics, and the correlation between these characteristics is not 0, that is, the existence of redundant characteristics.
https://www.christianpeccei.com/musicmap/ article uses genetic algorithms to select 18 feature values from 42 feature values as the song's final feature vector, the results are as follows:
Too lazy to reappeat,
re-cluster directly with his conclusions,
with the following results (k=3):
Emmm, it's like half a pound or two.
Well, it's about learning the basics of audio processing, machine learning, and visualization.
All source code and footage are available in the relevant file,