Wavelet packet transform and multilayer perceptron to identify voices with a mild degree of vocal deviation

Transformada Wavelet packet y Perceptrón Multicapa para identificación de voces con grado leve de desvío vocal


Introduction. Laryngeal disorders are characterized by a change in the vibratory pattern of the vocal folds. This disorder may have an organic origin described by anatomical fold modification, or a functional origin caused by vocal abuse or misuse. The most common diagnostic methods are performed by invasive imaging features that cause patient discomfort. In addition, mild voice deviations do not stop the individual from using their voices, which makes it difficult to identify the problem and increases the possibility of complications.

Aim. For those reasons, the goal of the present paper was to develop a noninvasive alternative for the identification of voices with a mild degree of vocal deviation applying the Wavelet Packet Transform (WPT) and Multilayer Perceptron (MLP), an Artificial Neural Network (ANN).

Methods. A dataset of 74 audio files were used. Shannon energy and entropy measures were extracted using the Daubechies 2 and Symlet 2 families and then the processing step was performed with the MLP ANN.

Results. The Symlet 2 family was more efficient in its generalization, obtaining 99.75% and 99.56% accuracy by using Shannon energy and entropy measures, respectively. The Daubechies 2 family, however, obtained lower accuracy rates: 91.17% and 70.01%, respectively.

Conclusion. The combination of WPT and MLP presented high accuracy for the identification of voices with a mild degree of vocal deviation.


Download data is not yet available.

Citado por


Mateus Morikawa
Danilo Hernane Spatti
María Eugenia Dajer


Imamura R, Tsuji DH, Sennes LU. Fisiologia da laringe. In Pinho S, Tsuji DH, Bohadana S, editors. Fundamentos de Laringologia e Voz. 1st ed. Rio de Janeiro: Revinter Ltda; 2006.

Behlau M, Rocha B, Englert M, Madazio G. Validation of the Brazilian Portuguese CAPE-V Instrument—Br CAPE-V for Auditory-Perceptual Analysis. J Voice. 2020. doi: https://doi.org/10.1016/j.jvoice.2020.07.007

Patel S, Shrivastav R. Perception of dysphonic vocal quality: some thoughts and research update. Perspect Voice Voice Dis. 2007;17:3–6. doi: https://doi.org/10.1044/vvd17.2.3

Eadie T, Sroka A, Wright DR, Merati A. Does knowledge of medical diagnosis bias auditory-perceptual judgments of dysphonia? J Voice. 2011;25:420–429. doi: https://doi.org/10.1016/j.jvoice.2009.12.009

Yamasaki R, Madazio G, Leão SHS, Padovani M, Azevedo R, Behlau M. Auditory-perceptual Evaluation of Normal and Dysphonic Voices Using the Voice Deviation Scale. J Voice. 2016;31:67-71. doi: https://doi.org/10.1016/j.jvoice.2016.01.004

Webb AL, Carding PN, Deary IJ, MacKenzie K, Steen N, Wilson JA. The reliability of three perceptual evaluation scales for dysphonia. Eur Arch Otorhinolaryngol. 2004;261:429-434. doi: https://doi.org/10.1007/s00405-003-0707-7

Karnell MP, Melton SD, Childes JM, Coleman T, Dailey S, Hoffman H. Reliability of clinician-based (GRBAS and CAPE-V) and patient-based (V-RQOL and IPVI) documentation of voice disorders. J Voice. 2007;21:576-590. doi: https://doi.org/10.1016/j.jvoice.2006.05.001

Kempster GB, Gerratt BR, Verdolini Abbott K, Barkmeier-Karemer J, Hillman RE. Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. Am J Speech Lang Pathol. 2009;18:124-132. doi: https://doi.org/10.1044/1058-0360(2008/08-0017)

Tan BT, Fu M, Spray A, Dermody P. The use of wavelet transforms in phoneme recognition. Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96; 1996 Out 3 – Out 6; Philadelphia, USA. IEEE; 2002. p. 2431-2434. doi: https://doi.org/10.1109/ICSLP.1996.607300

Lima AAM, Barros FKH, Yoshizumi VH, Spatti DH, Dajer ME. Optimized Artificial Neural Network for Biosignals Classification Using Genetic Algorithm. J Control Autom Electr. 2019;30:371-379. doi: https://doi.org/10.1007/s40313-019-00454-1

Oliveira HM. Análise de Fourier e Wavelets: Sinais Estacionários e não Estacionários. Recife: Editora Universitária, UFPE; 2007.

Jiao S, Shi W, Liu Q. Self-adaptative partial discharge denoising based on variation mode decomposition and wavelet packet transform. Chinese automation congress; 2017 Out 20 – Out 22; Jinan, China. IEEE; 2018 Jan. p. 6. doi: https://doi.org/10.3390/en12173242.

Ramirez-Villegas JF, Ramirez-Moreno DF. Wavelet packet Energy, Tsallis entropy and statistical parameterization for support vector-based and neural-based classification of mammographic regions. J Neurocomputing. 2012;77(1):82-100. doi: https://doi.org/10.1016/j.neucom.2011.08.015.

Zhang Y, Dong Z, Wang S, Ji G, Yang J. Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with Tsallis entropy and generalized eigenvalue proximal support vector machine (GEPSVM). J Entropy. 2015;17(4):1795-1813. doi: https://doi.org/10.3390/e17041795

Barizão H, Fermino MA, Dajer ME, Liboni LHB, Spatti DH. Voice disorder classification using MLP and wavelet packet transform. 2018 International Joint Conference on Neural Networks (IJCNN); 2018 Jul 8 – Jul 13; Rio de Janeiro, Brazil; IEEE; 2018. p. 8. doi: https://doi.org/10.1109/IJCNN.2018.8489121

Alves M, Silva G, Bispo BC, Dajer ME, Rodrigues PM. Voice Disorders Detection Through Multiband Cepstral Features of Sustained Vowel. J Voice. 2021;35(5):1-10. doi: https://doi.org/10.1016/j.jvoice.2021.01.018

Silva IND, Spatti DH, Flauzino RA. Redes Neurais Artificiais para engenharia e ciências aplicadas. São Paulo: Artliber; 2010.

Haykin S. Redes Neurais: Princípios e Prática. 2nd ed. Hamilton: Bookman; 2001.

Souzanchi-K M, Owhadi-Kareshk M, Akbarzadeh-T MR. Control of elastic joint robot based on electromyogram signal by pre-trained Multi- Layer Perceptron. 2016 International Joint Conference on Neural Networks (IJCNN); 2016 Jul 24 – Jul 29; Vancouver, Canada; IEEE; 2016. doi: https://doi.org/10.1109/IJCNN.2016.7727891

Baracho SF, Pinheiro DJLL, de Melo VV, Coelho RC. A hybrid neural system for the automatic segmentation of the interventricular septum in echocardiographic images. 2016 International Joint Conference on Neural Networks (IJCNN); 2016 Jul 24 – Jul 29; Vancouver, Canada; IEEE; 2016. doi: https://doi.org/10.1109/IJCNN.2016.7727868

Bevilacqua V, Salatino AA, Di Leo C, Tatolli G, Buongiorno D, Signorile D, et al. Advanced classification of Alzheimer's disease and healthy subjects based on EEG markers. 2015 International Joint Conference on Neural Networks (IJCNN); 2015 Jul 12 – Jul 17; Killarney, Ireland; IEEE; 2015. doi: https://doi.org/10.1109/IJCNN.2015.7280463

Silva EHD, Morikawa M, Suterio VB, et al. Aplicação De Rede Neural Artificial Especialista Em Reconhecimento De Transtornos Vocais Moderados. In: Dallamuta J, Ajuz Holzman H, organizers. Engenharia Elétrica: Comunicação Integrada no Universo da Energia. 1st ed. Ponta Grossa: Atena Editora; 2021. doi: https://doi.org/10.22533/at.ed.3732123021

MATLAB. version 9.3 (R2017b). Natick, Massachusetts: The MathWorks Inc.; 2017.

Zambon FC. Estratégias de enfrentamento em professores com queixa de voz. [thesis]. [São Paulo]: Universidade Federal de São Paulo; 2011.

Paliwal KK, Lyons JG, Wójcicki KK. Preference for 20 40 ms window duration in speech analysis. 2010 4th International Conference on Signal Processing and Communication Systems; 2010 Dec 13 – Dec 15; Gold Coast, Austrália; IEEE; 2011. doi: https://doi.org/10.1109/ICSPCS.2010.5709770

Lima AAM. Classificação de Disfonias Utilizando Redes Neurais Artificiais e Transformadas Wavelet Packet. [Bachelor’s thesis]. [Cornélio Procópio]: Universidade Tecnológica Federal do Paraná; 2018.

Lever J, Krzywinski M, Altman N. Classification evaluation. Nat Methods. 2016;13:603–604. doi: https://doi.org/10.1038/nmeth.3945.

Medeiros JdaSA, Santos SMM, Teixeira LC, Cortes Gama AC, de Medeiros AM. Sintomas vocais relatados por professoras com disfonia e fatores associados. J Audiol Commun Res. 2016;21:1-8. doi: https://doi.org/10.1590/2317-6431-2015-1553

Giannini SSP, Ferreira LP. Voice disorders in teachers and the International Classification of Functioning, Disability and Health (ICF). Rev. Investig. Innov. Cienc. Salud [Internet]. 2021 Aug. 3 [cited 2022 Feb. 5];3(1):33-47. doi: https://doi.org/10.46634/riics.60

Cantor-Cutiva LC, Cuervo-Diaz DE, Hunter EJ, Moreno-Angarita M. Impairment, disability, and handicap associated with hearing problems and voice disorders among Colombian teachers. Rev. Investig. Innov. Cienc. Salud [Internet]. 2021 Aug. 3 [cited 2022 Feb. 5];3(1):4-21. doi: https://doi.org/10.46634/riics.48


Download data is not yet available.