SEARCH WITHIN CONTENT
Citation Information : International Journal on Smart Sensing and Intelligent Systems. Volume 8, Issue 1, Pages 235-254, DOI: https://doi.org/10.21307/ijssis-2017-757
License : (CC BY-NC-ND 4.0)
Received Date : 05-November-2014 / Accepted: 12-January-2015 / Published Online: 01-March-2015
Every individual has some unique speaking style and this variation influences their speech characteristics. Speakers’ native dialect is one of the major factors influencing their speech characteristics that influence the performance of automatic speech recognition system (ASR). In this paper, we describe a method to identify Hindi dialects and examine the contribution of different acoustic-phonetic features for the purpose. Mel frequency cepstral coefficients (MFCC), Perceptual linear prediction coefficients (PLP) and PLP derived from Mel-scale filter bank (MFPLP) have been extracted as spectral features from the spoken utterances. They are further used to measure the capability of Auto-associative neural networks (AANN) for capturing non-linear relation specific to information from spectral features. Prosodic features are for capturing long - range features. Based on these features efficiency of AANN is measured to model intrinsic characteristics of speech features due to dialects.
 R. Huang, J. H. L. Hansen and P. Angkititrakul, “Dialect/Accent Classification using
Unrestricted Audio”, IEEE Transaction on Audio, Speech and Language Processing,
15(2), pp. 453-464, 2007.
 J. C. Wells, “Accent of English”, 1982, VOL. 2; Cambridge University Press, Landon.
 S. Sinha, S. S. Agrawal and A. Jain, “Dialectal influences on acoustic duration of Hindi
phonemes”, Proceeding of International Conference of The International Committee for
the Co-ordination and Standardization of Speech Databases and Assessment
Techniques (OCOCOSDA), November 25-27, 2013. pp. 1-5.
 D. Mishra and K. Bali,“A comparative phonological study of the dialects of Hindi”, in
Proceedings of International Congress of Phonetic Sciences XVII , August 17-21,
2011, pp. 1390-1393.
 A. H. M. Russell and M. Carey, “Human and computer recognition of regional accents
and ethnic groups from British English speech”, Computer Speech and Language,
27(1), pp. 59-74, 2013.
 E. L. Goh, “Gender and accent identification for Malaysian English using MFCC and
Gaussian mixture model”, Doctoral dissertation, Faculty of Computing, Universiti
Teknologi, Malaysia, 2013.
 P. Dhanalakshmi,S. Palanivel and V. Ramalingam, "Classification of audio signals
using AANN and GMM", Applied Soft Computing, 11(1), pp.716-723, 2011.
 A. Waibel, “Prosody and speech recognition”. Morgan Kaufmann, 1988.
 K. Sreenivasa Rao, "Role of neural network models for developing speech
systems", Sadhana, 36(5), pp. 783-836, 2011.
 S. Gray and J.H.L. Hansen, “An integrated approach to the detection and classification
of accents/dialects for a spoken document retrieval system”. IEEE Workshop on
Automatic Speech Recognition and Understanding, November 27- December 1, 2005,
 A.S. Ghotkar and G. K. Kharate, “Study of vision based hand gesture recognition using
Indian sign language”, International Journal on Smart Sensing and Intelligent Systems,
7(1), pp. 96-115, March 2014.
 K. S. Rao, and S. G. Koolagudi, “Identification of Hindi dialects and emotions using
spectral and prosodic features of speech”. International Journal of Systemics,
Cybernetics and Informatics, 9(4), pp. 24-33, 2011.
 S. Sinha, A. Jain and S. S. Agrawal, “Speech Processing for Hindi Dialect
Recognition”. Advances in Signal Processing and Intelligent Recognition Systems,
Vol 264, pp. 161-169, 2014.
 R.K. Aggarwal and M. Dave, “Integration of multiple acoustic and language models for
improved Hindi speech recognition system”, International Journal of Speech
Technology, 15(2), pp. 165-180, 2012.
 R.K. Aggarwal and M. Dave, "Performance evaluation of sequentially combined
heterogeneous feature streams for Hindi speech recognition system",
Telecommunication Systems, 52(3), pp. 1457-1466, 2013.
 A. Jansen and P. Niyogi, “A geometric perspective on speech sounds”, Tech. Rep. TR-
2004-06, University of Chicago, June 2005.
 A. Errity and J. McKenna, “A comparision of linear and nonlinear dimensionality
reduction methods applied to synthetic speech”, Proceedings of the Annual Conference
of International Speech Communication Association (INTERSPEECH), Brighton,
September 6-10, 2009, pp. 1095-1098.
 Ma Zongming, "Sparse principal component analysis and iterative thresholding", The
Annals of Statistics, 41(2), pp. 772-801, 2013.
 A. Zolnay et al., “Using multiple acoustic feature sets for speech recognition”. Speech
Communication, 49(6), pp. 514-525, 2007.
 A. Che Soh, K.K.Chow, U. K. Mohammad Yusuf, A. J. Ishak, M. K. Hassan,
S.Khamis, “Development of neural network-based electronic nose for herbs
recognition”, International Journal on Smart Sensing and Intelligent Systems,7(2), pp.
584-609, June 2014.
 M. A. Kramer, “Nonlinear principal component analysis using autoassociative neural
networks”. AIChE journal, Wiley online, 37(2), pp. 233-243, 1991.
 K. Sreenivasa Rao, D. Nandi and S. G. Koolagudi. "Film segmentation and indexing
using autoassociative neural networks." International Journal of Speech
Technology, 17(1), pp. 65-74, 2014.
 S. Davis and P. Mermelstein, “Comparison of parametric representations for
monosyllabic word recognition in continuously spoken sentences”. IEEE Transactions
on Acoustics, Speech and Signal Processing, 28(4), pp. 357-366, 1980.
 A. N. Mishra, M. Chandra, A. Biswas and S. N. Sharan, “Robust features for connected
Hindi digits recognition”. International Journal of Signal Processing, Image Processing
and Pattern Recognition, 4(2), 79-90, 2011.
 Marie-José Kolly and Volker Dellwo, "Cues to linguistic origin: The contribution of
speech temporal information to foreign accent recognition", Journal of Phonetics, Vol.
42, pp. 12-23, 2014.
 A. Gaddam, G. Sen Gupta and S.C. Mukhopadhyay, “Sensors for Smart Home”,
Chapter -7, of the book Human Behavior Recognition Technologies: Intelligent
Applications for Monitoring and Security, edited by Hans Guesgen and Stephen
Marsland, IGI Global, ISBN 978-1-4666-3683-5, page 130-156, 2013.
 M. Kulshreshtha and R. Mathur, “Dialect Accent Feature for Establishing Speaker
Identity: A case study”, Springer Briefs in Electrical and Computer Engineering, 2012.
 Anindya Nag and Subhas Mukhopadhyay, Smart Home: Recognition of activities of
elderly for 24/7; Coverage issues, Proceedings of the 2014 International Conference on
Sensing Technology, Liverpool, UK, Sep. 2 to 4, 2014, pp. 480-489, ISSN 1178-5608,
 M. Sigmund, “Statistical Analysis of Fundamental Frequency Based Features in Speech
under Stress”. Information Technology and Control Journal, 42(3), pp. 286-291, 2013.
 S. A. Zahorian and H. Hu, “A spectral/temporal method for robust fundamental
frequency tracking”. The Journal of the Acoustical Society of America, 123(6), pp.
 Y.X. Lai, Y.M. Huang and S.C.Mukhopadhyay, Interconnecting Communication for
Recognition and Automation services on Home Grid, Proceedings of IEEE I2MTC
2012 conference, IEEE Catalog number CFP12MT-CDR, ISBN 978-1-4577-1771-0,
May 13-16, 2012, Graz, Austria, pp. 2346-2350.
 P. G. Deivapalan, M. Jha, R. Guttikonda and H. A. Murthy, “Donlabel: an automatic
labeling tool for Indian languages.” Proceedings of Fourteenth National Conference on
Communication (NCC), February 1-3, 2008, pp. 263-268.
 T. Quazi, S.C. Mukhopadhyay, N. Suryadevara and Y. M. Huang, Towards the Smart
Sensors Based Human Emotion Recognition, Proceedings of IEEE I2MTC 2012
conference, IEEE Catalog number CFP12MT-CDR, ISBN 978-1-4577-1771-0, May
13-16, 2012, Graz, Austria, pp. 2365-2370.
 B. Yegnanarayana, “Artificial Neural Networks”. Prentice-Hall, New Delhi,2004,
 B. Yegnanarayana and S. P. Kishore, “AANN: an alternative to GMM for pattern
recognition”, Neural Networks, 15(3), 459-469, 2002.