
تعداد نشریات | 21 |
تعداد شمارهها | 610 |
تعداد مقالات | 9,029 |
تعداد مشاهده مقاله | 67,082,939 |
تعداد دریافت فایل اصل مقاله | 7,656,398 |
MFCC based hybrid fingerprinting method for audio classification through LSTM | ||
International Journal of Nonlinear Analysis and Applications | ||
دوره 12، Special Issue، اسفند 2021، صفحه 2125-2136 اصل مقاله (988.1 K) | ||
نوع مقاله: Research Paper | ||
شناسه دیجیتال (DOI): 10.22075/ijnaa.2022.6049 | ||
نویسندگان | ||
K. Banuroopa؛ D. Shanmuga Priyaa* | ||
Department of Computer Science, Karpagam Academy of Higher Education, Coimbatore, India | ||
تاریخ دریافت: 11 آبان 1400، تاریخ بازنگری: 08 آذر 1400، تاریخ پذیرش: 16 آذر 1400 | ||
چکیده | ||
In this paper, a novel audio finger methodology for audio classification is proposed. The fingerprint of the audio signal is a unique digest to identify the signal. The proposed model uses the audio fingerprinting methodology to create a unique fingerprint of the audio files. The fingerprints are created by extracting an MFCC spectrum and then taking a mean of the spectra and converting the spectrum into a binary image. These images are then fed to the LSTM network to classify the environmental sounds stored in UrbanSound8K dataset and it produces an accuracy of 98.8\% of accuracy across all 10 folds of the dataset. | ||
کلیدواژهها | ||
Audio fingerprinting؛ MFCC؛ Audio Classification؛ LSTM | ||
مراجع | ||
[1] S. Baluja and M. Covell, Audio fingerprinting: combining computer vision data stream processing, IEEE Int. Conf. on Acoustics, Speech, and Signal Process. 2 (2007). [2] V. Boddapati, A. Petef, J. Rasmusson and L. Lundberg, Classifying environmental sounds using image recognition networks, Procedia Comput. Sci. 112 (2017) 2048–2056. [3] P. Cano and E. Batlle, A review of audio fingerprinting, J. VLSI Signal Process. 41 (2005) 271–284. [4] M. Covell and S. Baluja, Waveprint: Efficient wavelet-based audio fingerprinting, Pattern Recognit. 41(11) (2008) 3467–3480. [5] J.K. Das, A. Ghosh, A.K. Pal, S. Dutta and A. Chakrabarty, Urban sound classification using convolutional neural network and long short term memory based on multiple features, Fourth Int. Conf. Intell. Comput. Data Sci. (2020) 1–9. [6] T. Elliot, J. Howard, R. Lisam, K. Fehling, A.d. Luca and D. Haba, Dense vs convolutional vs fully connected layers, https://forums.fast.ai/t/dense-vs-convolutional-vs-fully-connected-layers/191, (2016). [7] D. Ellis, Robust landmark-based audio fingerprinting, Online Serial, 2009. [8] Y. Jiang, C. Wu, K. Deng and Y. Wu, An audio fingerprinting extraction algorithm based on lifting wavelet packet and improved optimal-basis selection, Multimed. Tools Appl. 78 (2019) 30011–30025. [9] T. Kalker and J. Haitsma, A highly robust audio fingerprinting system, Proc. ISMIR’2002, 2002 (2002) 144–148. [10] H.B. Kekre, N. Bhandari and N. Nair, A review of audio fingerprinting and comparison of algorithms, Int. J. Comput. Appl. 70(13) (2013). [11] F. Kurth, A ranking technique for fast audio identification, Proc. Int. Workshop Multimedia Signal Process. (2002) 186–189. [12] I. Lezhenin, N. Bogach and E. Pyshkin, Urban sound classification using long short-term memory neural network, Federated Conf. Comput. Sci. Inf. Syst. (2019) 57–60. [13] V. Nair and G.E. Hinton, Rectified linear units improve restricted boltzmann machines, Proc. 27th Int. Conf. Mach. Learn. (ICML-10) (2010) 807–814. [14] K.J. Piczak, Environmental sound classification with convolutional neural networks, IEEE 25th Int. Workshop on Machine Learn. Signal Process. (2015) 1–6. [15] T. Qiao, S. Zhang, Z. Zhang, S. Cao and S. Xu, Sub-spectrogram segmentation for environmental sound classification via convolutional recurrent neural network and score level fusion, arXiv preprint arXiv:1908.05863, (2019). [16] G. Richly, L. Varga, F. Kovacs and G. Hosszu, Short-term sound stream characterization for reliable, real-time occurrence monitoring of given sound-prints, Proc. 10th Mediter. Electrotech. Conf. MeleCon 2 (2000) 526–528. [17] T.N. Sainath, O. Vinyals, A. Senior and H. Sak, Convolutional, long short- term memory, fully connected deep neural networks, IEEE Int. Conf. Acoustics, Speech and Signal Process. (2015) 4580–4584. [18] J. Salamon and J.P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification, IEEE Signal Process. Lett. 24(3) (2017) 279–283. [19] J. Salamon, C. Jacoby and J.P. Bello, A dataset and taxonomy for urban sound research, Proc. 22nd ACM Int. Conf. Multimedia (2014) 1041–1044. [20] J. Sharma, O.-C. Granmo and M. Goodwin, Environment sound classification using multiple feature channels and attention based deep convolutional neural network, Proc. Interspeech 2020 (2020) 1186–1190. [21] S. Sri Ranjani, V. Abdulkareem, K. Karthik and P.K. Bora, Application of SHAZAM-based audio fingerprinting for multilingual Indian song retrieval, Adv. Commun. Comput. 347 (2015) 81–92. [22] T. Stokes, Spectro-temporal landmarking with rank-ordered local maxima for audio fingerprinting, 16th Int. Soc. Music Inf. Retr. Conf. (2015).[23] Y. Tokozume and T. Harada, Learning environmental sounds with end-to-end convolutional neural network, IEEE Int. Conf. Acoustics, Speech and Signal Process. (2017) 2721–2725. [24] A. Wang, The shazam music recognition service, Comm. ACM, 49(8) (2006). [25] X. Zhang, B. Zhu, L. Li, W. Li, X. Li, W. Wang, P. Lu and W. Zhang, SIFT-based local spectrogram image descriptor: a novel feature for robust music identification, EURASIP J. Audio, Speech, and Music Process. 2015(1) (2015) 1–15. | ||
آمار تعداد مشاهده مقاله: 44,486 تعداد دریافت فایل اصل مقاله: 1,238 |