Loading [MathJax]/extensions/MathMenu.js
1D CNN Architectures for Music Genre Classification | IEEE Conference Publication | IEEE Xplore

1D CNN Architectures for Music Genre Classification


Abstract:

This paper proposes a 1D residual convolutional neural network (CNN) architecture for music genre classification and compares it with other recent 1D CNN architectures. T...Show More

Abstract:

This paper proposes a 1D residual convolutional neural network (CNN) architecture for music genre classification and compares it with other recent 1D CNN architectures. The 1D CNNs learn a representation and a discriminant directly from the raw audio signal. Several convolutional layers capture the time-frequency characteristics of the audio signal and learn various filters relevant to the music genre recognition task. The proposed approach splits the audio signal into overlapped segments using a sliding window to comply with the fixed-length input constraint of the 1D CNNs. As a result, music genre classification can be carried out on a single audio segment or on aggregating the predictions on several audio segments, which improves the final accuracy. The performance of the proposed 1D residual CNN is assessed on a public dataset of 1,000 audio clips. The experimental results have shown that it achieves 80.93% of mean accuracy in classifying music genres and outperforms other 1D CNN architectures.
Date of Conference: 05-07 December 2021
Date Added to IEEE Xplore: 24 January 2022
ISBN Information:
Conference Location: Orlando, FL, USA
References is not available for this document.

I. Introduction

In the last decade, music has become more and more accessible to everyone across the globe. The exponential growth of musical data has prompted the development of new tools that are more personalized than radio broadcasts, for example. In addition to allowing consumers to listen to their favorite songs, these tools also allow them to discover music from different backgrounds. To such an aim, several approaches to music information retrieval (MIR) have been developed [1]. MIR focuses on researching and developing computational systems to advance the access, organization, and understanding of music information.

Select All
1.
T. Lidy, C. N. Silla, O. Cornelis, F. Gouyon, A. Rauber, C. A. A. Kaestner, and A. L. Koerich, “On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-western and ethnic music collections,” Signal Processing, vol. 90, no. 4, pp. 1032–1048, 2010.
2.
G. Tzanetakis and P. R. Cook, “Musical genre classification of audio signals,” IEEE Trans Speech Audio Process, vol. 10, no. 5, pp. 293–302, 2002.
3.
C. H. L. Costa, J. D. Valle, and A. L. Koerich, “Automatic classification of audio data,” in IEEE Intl Conf Syst, Man, Cybernetics, 2004, pp. 562–567.
4.
D. Turnbull and C. Elkan, “Fast recognition of musical genres using RBF networks,” IEEE Trans Knowl Data Eng, vol. 17, no. 4, pp. 580–584, 2005.
5.
A. L. Koerich and C. Poitevin, “Combination of homogeneous classifiers for musical genre classification,” in IEEE Intl Conf Syst, Man, Cybernetics, vol. 1, 2005, pp. 554–559.
6.
N. Scaringella, G. Zoia, and D. Mlynek, “Automatic genre classification of music content: a survey,” IEEE Signal Process. Mag., vol. 23, no. 2, pp. 133–141, 2006.
7.
C. N. Silla, A. L. Koerich, and C. A. A. Kaestner, “A machine learning approach to automatic music genre classification,” Journal of the Brazilian Computer Society, vol. 14, no. 3, pp. 7–18, 2008.
8.
S. Dieleman and B. Schrauwen, “End-to-end learning for music audio,” in IEEE Intl Conf on Acoustics, Speech and Signal Process, 2014, pp. 6964–6968.
9.
K. Choi, G. Fazekas, M. Sandler, and K. Cho, “Convolutional recurrent neural networks for music classification,” in IEEE Intl Conf on Acoustics, Speech and Signal Process, 2017, pp. 2392–2396.
10.
Y. M. Costa, L. S. Oliveira, and C. N. Silla, “An evaluation of convolutional neural networks for music classification using spectrograms,” Applied Soft Computing, vol. 52, pp. 28–38, 2017.
11.
J. Pons, O. Nieto, M. Prockup, E. Schmidt, A. Ehmann, and X. Serra, “End-to-end learning for music audio tagging at scale,” in Intl Society for Music Inf Retrieval Conf, 2018, pp. 1–8.
12.
J. Lee, J. Park, K. L. Kim, and J. Nam, “SampleCNN: End-to-end deep convolutional neural networks using very small filters for music classification,” Applied Sciences, vol. 8, no. 1, 2018.
13.
T. Kim, J. Lee, and J. Nam, “Comparison and Analysis of SampleCNN Architectures for Audio Classification,” IEEE Journal of Selected Topics in Signal Processing, vol. PP. no. 8, pp. 1–1, 2019.
14.
K. M. Koerich, M. Esmaeilpour, S. Abdoli, A. S. Britto Jr., and A. L. Koerich, “Cross-representation transferability of adversarial attacks: From spectrograms to audio waveforms,” in Intl Joint Conf on Neural Networks, 2020, pp. 1–7.
15.
S. Abdoli, P. Cardinal, and A. L. Koerich, “End-to-end environmental sound classification using a 1d convolutional neural network,” Expert Systems with Applications, vol. 136, pp. 252–263, 2019.
16.
Y. M. G. Costa, L. E. S. Oliveira, A. L. Koerich, and F. Gouyon, “Music genre recognition based on visual features with dynamic ensemble of classifiers selection,” in 20th Intl Conf Syst, Signals, Image Proc, 2013, pp. 55–58.
17.
T. Kim, J. Lee, and J. Nam, “Sample-level CNN architectures for music auto-tagging using raw waveforms,” in IEEE Intl Conf on Acoustics, Speech and Signal Process, 2018, pp. 366–370.
18.
Y. M. G. Costa, L. E. S. Oliveira, A. L. Koerich, F. Gouyon, and J. G. Martins, “Music genre classification using LBP textural features,” Signal Processing, vol. 92, no. 11, pp. 2723–2737, 2012.
19.
A. L. Koerich, “Improving the reliability of music genre classification using rejection and verification,” in Intl Society for Music Inf Retrieval Conf 2013. pp. 511–516.
20.
M. Esmaeilpour, P. Cardinal, and A. Koerich, “A robust approach for securing audio classification against adversarial attacks,” IEEE Trans Inf Forensics Security, vol. 15, pp. 2147–2159, 2020.
21.
M. Esmaeilpour, P. Cardinal, and A. L. Koerich, “From sound representation to model robustness,” arXiv preprint arXiv: 2007.13703, pp. 1–12, 2021.
22.
T. N. Sainath, R. J. Weiss, A. W. Senior, K. W. Wilson, and O. Vinyals, “Learning the speech front-end with raw waveform cldnns,” in 16th Annual Conf of the Intl Speech Communication Assoc, 2015, pp. 1–5.
23.
R. Collobert, C. Puhrsch, and G. Synnaeve, “Wav2letter: an end-to-end convnet-based speech recognition system,” CoRR, vol. abs/1609.03193, 2016.
24.
Z. Zhu, J. H. Engel, and A. Y. Hannun, “Learning multiscale features directly from waveforms,” in 17th Annual Conf Intl Speech Communication Assoc, N. Morgan, Ed., 2016, pp. 1305–1309.
25.
J. Thickstun, Z. Harchaoui, and S. M. Kakade, “Learning features of music from scratch,” in 5th Intl Conf on Learning Repres, 2017.
26.
C. N. Silla Jr., A. L. Koerich, and C. A. A. Kaestner, “The latin music database,” in Intl Society for Music Inf Retrieval Conf Philadelphia, USA, 2008, pp. 451–456.
27.
G. Song, Z. Wang, F. Han, and S. Ding, “Transfer Learning for Music Genre Classification,” in IFIP Adv in Inf and Communication Technology, 2017, vol. 510, pp. 183–190.
28.
S. Oramas, O. Nieto, F. Barbieri, and X. Serra, “Multi-label music genre classification from audio, text and images using deep features,” in 18th Intl Society for Music Inf Retrieval Conf, 2017, pp. 23–30.
29.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
30.
Intl Telecommunications Union, Algorithms to measure audio programme loudness and true-peak audio level, Recommendation ITU-R BS.1770–4, Geneva. Switzerland. 2015.

Contact IEEE to Subscribe

References

References is not available for this document.