Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units | EURASIP Journal on Applied Signal Processing 2004 17 2614-2625 2004 Hindawi Publishing Corporation Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units T. Nagarajan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 India Email raju@lantana.iitm.ernet.in H. A. Murthy Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 India Email hema@lantana.tenet.res.in Received 16 January 2004 Revised 17 June 2004 Recommended for Publication by Chin-Hui Lee In the development of a syllable-centric automatic speech recognition ASR system segmentation of the acoustic signal into syllabic units is an important stage. Although the short-term energy STE function contains useful information about syllable segment boundaries it has to be processed before segment boundaries can be extracted. This paper presents a subband-based group delay approach to segment spontaneous speech into syllable-like units. This technique exploits the additive property of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection. By treating the STE function as a magnitude spectrum of an arbitrary signal a minimum-phase group delay function is derived. This group delay function is found to be a better representative of the STE function for syllable boundary detection. Although the group delay function derived from the STE function of the speech signal contains segment boundaries the boundaries are difficult to determine in the context of long silences semivowels and fricatives. In this paper these issues are specifically addressed and algorithms are developed to improve the segmentation performance. The speech signal is first passed through a bank of three filters corresponding to three different spectral bands. The STE functions of these signals are computed. .