English Accent Classification Using Deep Learning Techniques

Sarah Jassim  Ahmed; Husam Ali  Abdulmohsin

doi:10.30526/39.2.4277

Authors

Sarah Jassim Ahmed Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq. https://orcid.org/0009-0003-1582-5443
Husam Ali Abdulmohsin Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq. https://orcid.org/0009-0002-9351-4176

DOI:

https://doi.org/10.30526/39.2.4277

Keywords:

Accent classification, Deep learning, English accents, speech recognition, Attention mechanism

Abstract

In recent years, significant advancements have been made in deep learning technology within the field of speech applications, which has resulted in an increased interest in accent classification. The growing need for accurate speech recognition technology requires enhancing the ability of machines to identify accents, which results in giving a critical challenge in speech processing. The variety of English accents poses significant difficulties for automated speech recognition (ASR) systems, adversely impacting transcription accuracy and speaker intelligibility. This study aims to address this challenge by developing a deep learning model efficient in accurately classifying regional English dialects throughout the United Kingdom and Ireland. The proposed system combines a one-dimensional Convolutional Neural Network with a Gated Recurrent Unit (1D-CNN-GRU) architecture and utilizes Mel-Frequency Cepstral Coefficients (MFCCs) as acoustic features. The UK and Ireland English Dialect (UIED) dataset, consisting of 17,877 recordings across six accent categories (Welsh, Northern, Southern, Scottish, Irish, and Midlands English), was utilized for assessment. Experimental results indicate that the proposed model surpasses previous techniques, with an accuracy of 98.71%, hence underscoring its efficacy in capturing accent-specific temporal and spectral patterns. The findings improve the development of accent-resistant ASR systems and establish a basis for future research using transformer-based embedding and prosodic characteristics.

Author Biographies

Sarah Jassim Ahmed, Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq.

-
Husam Ali Abdulmohsin, Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq.

Husam Al-Asadi Currently works at the Department of Computer Science, University of Baghdad. Husam does research in Parallel Computing, Distributed Computing and Computer Communications (Networks). Their most recent publication is 'Hybrid soft computing approach for determining water quality indicator Euphrates River'.

References

1. Habbash M, Mnasri S, Alghamdi M, Alrashidi MQ, Tarawneh AS, Gumair A, Hassanat ABA. Recognition of Arabic accents from English spoken speech using deep learning approach. IEEE Access. 2024;12:37219–37230. https://doi.org/10.1109/ACCESS.2024.3374768.

2. Grigaliūnaitė J, Melnik-Leroy GA. Automatic accent identification using less data: a shift from global to segmental accent. Arab J Sci Eng. 2025;50(10):7481–7494. https://doi.org/10.1007/s13369-024-09344-4.

3. O’Grady W, Archibald J, Aronoff M, Rees-Miller J. Contemporary Linguistics Analysis. 8th ed. 1992.

4. Flege JE, Schirru C, MacKay IRA. Interaction between the native and second language phonetic subsystems. Speech Commun. 2003;40(4):467–491. https://doi.org/10.1016/S0167-6393(02)00128-0.

5. Behravan H, Hautamäki V, Kinnunen T. Factors affecting i-vector based foreign accent recognition: a case study in spoken Finnish. Speech Commun. 2015;66:118–129. https://doi.org/10.1016/j.specom.2014.10.004.

6. Bogach N, et al. Speech processing for language learning: a practical approach to computer-assisted pronunciation teaching. Electronics. 2021;10(3):235.

7. Walsh D, Dev S, Nag A. Hilbert–Huang-transform based features for accent classification of non-native English speakers. In: 2023 34th Irish Signals and Systems Conf (ISSC); 2023. https://doi.org/10.1109/ISSC59246.2023.10162075.

8. Aju O. A review of accent-based automatic speech recognition models for e-learning environment. Covenant J ICT. 2022;10(2). Available from: https://journals.covenantuniversity.edu.ng/index.php/cjict/article/view/3146

9. Rizwan M, Anderson DV. A weighted accent classification using multiple words. Neurocomputing. 2018;277:120–128. https://doi.org/10.1016/j.neucom.2017.01.116.

10. Tang H, Ghorbani AA. Accent classification using support vector machine and hidden Markov model. In: Lect Notes Comput Sci. Springer; 2003. p. 629–631. https://doi.org/10.1007/3-540-44886-1_65.

11. Hou J, Liu Y, Zheng TF, Olsen J, Tian J. Multi-layered features with SVM for Chinese accent identification. In: ICALIP 2010 Proc. IEEE; 2010. p. 25–30. https://doi.org/10.1109/ICALIP.2010.5685023.

12. Salifu A, Mensah HN, Tchao ET, Acheampong FA, Agbemenu AS, Kponyo JJ. Enhancing speech recognition through diverse shared features accent classification. Int J Speech Technol. 2025;28(2):461–481. https://doi.org/10.1007/s10772-025-10198-w.

13. Ibrahim NJ, Idris MYI, Yusoff MYZM, Rahman NNA, Dien MI. Robust feature extraction based on spectral and prosodic features for classical Arabic accents recognition. Malays J Comput Sci. 2019;(Spec Issue 3):46–72. https://doi.org/10.22452/mjcs.sp2019no3.4.

14. Jassim S, Abdulmohsin HA. Accent classification using machine learning techniques: a review. Int J Comput Inf Syst Ind Manag Appl. 2025;17:421–451.

15. Dar MA, Jagalingam P. Machine learning and deep learning approaches for accent recognition: a review. IEEE Access. 2025.

16. Ölmez E, Akdoğan V, Korkmaz M, Er O. Automatic segmentation of meniscus in multispectral MRI using regions with convolutional neural network (R-CNN). J Digit Imaging. 2020;33(4):916–929. https://doi.org/10.1007/s10278-020-00329-x.

17. Qiu J, Wu Q, Ding G, Xu Y, Feng S. A survey of machine learning for big data processing. EURASIP J Adv Signal Process. 2016;2016(1):1–16. https://doi.org/10.1186/s13634-016-0355-x.

18. Cetin O. Accent recognition using a spectrogram image feature-based convolutional neural network. Arab J Sci Eng. 2023;48(2):1973–1990. https://doi.org/10.1007/s13369-022-07086-9.

19. Ke W. Study on recognition and classification of English accents using deep learning algorithms. J Intell Syst. 2023;32(1). https://doi.org/10.1515/jisys-2023-0174.

20. Mikhailava V, Lesnichaia M, Bogach N, Lezhenin I, Blake J, Pyshkin E. Language accent detection with CNN using sparse data from a crowd-sourced speech archive. Mathematics. 2022;10(16). https://doi.org/10.3390/math10162913.

21. Kashif K, Alwan A, Wu Y, De Nardis L, Di Benedetto MG. MKELM based multi-classification model for foreign accent identification. Heliyon. 2024;10(16). https://doi.org/10.1016/j.heliyon.2024.e36460.

22. Bhadra R, Sahu M, Agrebi M, Singh PK, Badr Y. A hybrid deep feature selection framework for speaker accent recognition. In: Leveraging Computer Vision to Biometric Applications. Chapman and Hall/CRC; 2024. p. 154–177. https://doi.org/10.1201/9781032614663-8.

23. Song T, Nguyen LTH, Ta TV. MPSA-DenseNet: a novel deep learning model for English accent classification. Comput Speech Lang. 2025;89. https://doi.org/10.1016/j.csl.2024.101676.

24. Demirsahin I, Kjartansson O, Gutkin A, Rivera C. Opensource multispeaker corpora of the English accents in the British Isles. In: Proc 12th Int Conf Lang Resour Eval (LREC); 2020. p. 6532–6541. Available from: https://aclanthology.org/2020.lrec-1.804/

25. Ali AT, Abdullah H, Fadhil MN. Speaker recognition system based on Mel frequency cepstral coefficient and four features. Iraqi J Comput Commun Control Syst. 2021;1(4):8.

26. Hussien AAR, Abdullah NAZ. A review for Arabic sentiment analysis using deep learning. Iraqi J Sci. 2023;64(12):6572–6585. https://doi.org/10.24996/ijs.2023.64.12.37.

27. Mohammed SN, Hassan AK. Automatic voice activity detection using fuzzy-neuro classifier. J Eng Sci Technol. 2020;15(5):2854–2870.

28. Zheng F, Zhang G, Song Z. Comparison of different implementations of MFCC. J Comput Sci Technol. 2001;16(6):582–589. https://doi.org/10.1007/BF02943243.

29. Alashaikh AS, Alhazemi FM. Efficient mobile crowdsourcing for environmental noise monitoring. IEEE Access. 2022;10:77251–77262.

30. Al-Jumaili Z, Bassiouny T, Alanezi A, Khan W, Al-Jumeily D, Hussain AJ. Classification of spoken English accents using deep learning and speech analysis. In: Lect Notes Comput Sci. Springer; 2022. p. 277–287. https://doi.org/10.1007/978-3-031-13832-4_24.

31. Kumar R, Singh K, Mahato DP, Gupta U. Face-based age and gender classification using deep learning model. Procedia Comput Sci. 2024;235:2985–2995. https://doi.org/10.1016/j.procs.2024.04.282.

32. Abdulmohsin HA, Stephan JJ, Al-Khateeb B, Hasan SS. Speech age estimation using a ranking convolutional neural network. In: Lect Notes Netw Syst. Springer; 2022. p. 123–130. https://doi.org/10.1007/978-981-19-0604-6_11.

33. Ahmed HM, Mahmoud HH. Effect of successive convolution layers to detect gender. Iraqi J Sci. 2018;59(3):1717–1732. https://doi.org/10.24996/IJS.2018.59.3C.17.

34. Zare S, Ayati M. Simultaneous fault diagnosis of wind turbine using multichannel convolutional neural networks. ISA Trans. 2021;108:230–239. https://doi.org/10.1016/j.isatra.2020.08.021.

35. Ozer I. Pseudo-colored rate map representation for speech emotion recognition. Biomed Signal Process Control. 2021;66:102502. https://doi.org/10.1016/j.bspc.2021.102502.

36. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T. Recent advance in convolutional neural networks. Pattern Recognit. 2018;77:354–377. https://doi.org/10.1016/j.patcog.2017.10.013.

37. Yevnin Y, Chorev S, Dukan I, Toledo Y. Short-term wave forecasts using gated recurrent unit model. Ocean Eng. 2023;268:113389. https://doi.org/10.1016/j.oceaneng.2022.113389.

38. Python Software Foundation. Python: version 3.11.6. 2023. Available from: https://www.python.org/downloads/release/python-3116/

39. TensorFlow Development Team. TensorFlow: version 2.18.0. 2024. Available from: https://pypi.org/project/tensorflow/2.18.0/

40. [40] Ismail M, Maarof MA, Hamzah FA, Jeffrey YM, Abidin AZ, Omar N, Awang S. Development of a regional voice dataset and speaker classification based on machine learning. J Big Data. 2021;8(1):1–18. https://doi.org/10.1186/s40537-021-00435-9.

41. Ozer I, Cetin O, Gorur K, Temurtas F. Improved machine learning performances with transfer learning to predicting need for hospitalization in arboviral infections against the small dataset. Neural Comput Appl. 2021;33(21):14975–14989. https://doi.org/10.1007/s00521-021-06133-0.