SEARCH WITHIN CONTENT
Citation Information : International Journal on Smart Sensing and Intelligent Systems. Volume 7, Issue 1, Pages 283-300, DOI: https://doi.org/10.21307/ijssis-2017-656
License : (CC BY-NC-ND 4.0)
Received Date : 05-November-2013 / Accepted: 08-February-2014 / Published Online: 27-December-2017
This paper proposes a new perceptual hashing algorithm for speech content identification with compressed domain based on MDCT (Modified Discrete Cosine Transform) Spectrum Entropy. It aims primarily to solve problems of large computational complexity and poor real-time performance that appear when applying traditional identification methods to the compressed speeches. The process begins by extracting the MDCT coefficients, which are the intermediately decoded results of compressed speeches in MP3 format. In order to reduce the computational complexity, these coefficients are divided into sub-bands and the energy of MDCT spectrum is then calculated. Sub-bands of MDCT spectrum energy are then mapped to a similar mass function in information entropy theory. The function will be used as a perceptual feature and set to extract binary hash values. Experimental results show that the proposed algorithm keeps greater robustness to content-preserving operations while also maintaining efficiency. As a result of the partial decoding process, the real-time performance can meet the requirements of applications in real-time communication terminals.
 F. Karray, M. Alemzadeh, J. A. Saleh and M. N. Arab, “Human-Computer Interaction: Overview on State of the Art”, International Journal on Smart Sensing and Intelligent Systems, Vol. 1, No. 1, pp. 137-159, 2008.
 X. Niu and Y. Jiao, “An overview of perceptual hashing”, Acta Electronica Sinica, Vol. 36, No. 7, pp. 1405-1411, 2008.
 N. K. Verma, R. K. Sevakula, J. K. Gupta, S. Singh, S. Dixit and A. Salour, “Smartphone Application for Fault Recognition”, International Journal on Smart Sensing and Intelligent Systems, Vol. 6, No. 4, pp. 1763-1782, 2013.
 J. Haitsma, T.Kalker and J. Oostveen, “Robust Audio Hashing for Content Identification”, International Workshop on Content-Based Multimedia Indexing, Vol. 4, pp. 117-124, 2001.
 G. Grutzek, J. Strobl, B. Mainka, F. Kurth, C. Porschmann and H. Knospe, “Perceptual hashing for the identification of telephone speech”, Speech Communication; 10.ITG Symposium, Proceedings of.VDE, Germany, 2012, pp.1-4.
 Y. Jiao, L. Ji and X. Niu, “Robust speech hashing for content authentication”, IEEE Signal Processing Letters, Signal Processing Letters, IEEE, Vol. 16, No. 9, pp. 818-821, 2009.
 J. Gu, L. Guo, H. Liang and L. Cheng, “Effective robust speech authentication algorithm based on perceptual characteristics”, Journal of Chinese Computer Systems, Vol. 7, pp. 1461-1466, 2010.
 L. Ghouti and A. Bouridane, “A robust perceptual audio hashing using balanced multiwavelets”, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’06), France, 2006, pp. 209-212.
 R. Lancini, F. Mapelli and R. Pezzano, “Audio content identification by using perceptual hashing”, in Proceedings of IEEE International Conference on Multimedia and Expo(ICME’04), Taipei, 2004, pp. 739-742.
 P. J. O. Doets, M. M. Gisbert and R. L. Lagendijk, “On the comparison of audio fingerprints for extracting quality parameters of compressed audio”, Electronic Imaging 2006, International Society for Optics and Photonics, 2006, pp. 60720L-60720L-12.
 M. Li, “MDCT-based compressed domain perceptual audio hashing”, Harbin, Harbin institute of technology, 2008.
 Y. Jiao, “Research on perceptual audio hashing”, Harbin, Harbin institute of technology, 2009.
 P. Noll, “MPEG digital audio coding”, IEEE Signal Processing Magazine, Vol. 14, No. 5, pp. 59-81, 1997.
 L. Chang, X. Yu, W. Wan, C. Li and X. Xu, “Research and realization of speech segmentation in MP3 compressed domain”, Jounal of Computer Applications, Vol. 29, No. 4, pp. 1188-1192, 2009.
 Y. Wang, L. Yaroslavsky and M. Vilermo, “On the relationship between MDCT, SDFT and DFT”, in Proceedings of the 5th International Conference on Signal Processing, Beijing, 2000, pp. 44-47.
 Y. Liang, C. Bao, B. Xia, Y. He, X. Zhou and N.Li, “Compressed domain speech enhancement based on Gaussian mixture model”, Acta Electronica Sinica, Vol. 40, No. 10, pp. 2031-2038, 2012.
 H. Misra, S. Ikbal, H. Bourlard and H. Hermansky, “Spectral entropy based feature for robust ASR”, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP'04), Canada, 2004, pp. I-193-6
 Y. Liu, W. Li, X. Li, Z. Wang and R. Feng, “A robust compressed-domain music fingerprinting technique based on MDCT spectral entropy”, Acta Electronica Sinica, Vol. 38, No. 5, pp. 1172-117, 2010.
 Underbit Technologies, Inc, “MAD: MPEG Audio Decoder, http://www.underbit.com/ products/mad”, 2013.
 J. Haitsma and T. Kalker, “A Highly Robust Audio Fingerprinting System”, in Proceedings of International Symposium on Music Information Retrieval (ISMIR ’02), Paris, 2002, pp. 107 – 115.