In file ../include/sigpr/EST_sigpr_utt.h:
void srpd | (EST_Wave &sig, EST_Track &fz, |
Super resolution pitch trackerer.
Super resolution pitch trackerer.srpd is a pitch detection algorithm that produces a fundamental frequency contour from a speech waveform. At present only the super resolution pitch detetmination algorithm is implemented. See (Medan, Yair, and Chazan, 1991) and (Bagshaw et al., 1993) for a detailed description of the algorithm. </para><para>
Frames of data are read in from <parameter>sig</parameter> in chronological order such that each frame is shifted in time from its predecessor by <parameter>pda_frame_shift</parameter>. Each frame is analysed in turn.
</para><para>
The maximum and minimum signal amplitudes are initially found over the duration of two segments, each of length N_min samples. If the sum of their absolute values is below two times <parameter>noise_floor</parameter>, the frame is classified as representing silence and no coefficients are calculated. Otherwise, a cross correlation coefficient is calculated for all n from a period in samples corresponding to <parameter>min_pitch </parameter> to a period in samples corresponding to <parameter>max_pitch</parameter>, in steps of <parameter>decimation_factor</parameter>. In calculating the coefficient only one in <parameter>decimation_factor</parameter> samples of the two segments are used. Such down-sampling permits rapid estimates of the coefficients to be calculated over the range N_min <= n <= N_max. This results in a cross-correlation track for the frame being analysed. </para><para>
Local maxima of the track with a coefficient value above a specified threshold form candidates for the fundamental period. The threshold is adaptive and dependent upon the values <parameter>v2uv_coeff_thresh </parameter>, <parameter>min_v2uv_coef_thresh </parameter>, and <parameter> v2uv_coef_thresh_rati_ratio</parameter>. If the previously analysed frame was classified as unvoiced or silent (which is the initial state) then the threshold is set to <parameter>v2uv_coef_thresh</parameter>. Otherwise, the previous frame was classified as being voiced, and the threshold is set equal to [\-r] <parameter>v2uv_coef_thresh_rati_ratio </parameter> times the cross-correlation coefficient value at the point of the previous fundamental period in the former coefficients track. This product is not permitted to drop below <parameter>v2uv_coef_thresh</parameter>.
</para><para>
If no candidates for the fundamental period are found, the frame is classified as being unvoiced. Otherwise, the candidates are further processed to identify the most likely true pitch period. During this additional processing, a threshold given by <parameter>anti_doubling_thres</parameter> is used.
</para><para>
If the <parameter>peak_tracking</parameter> flag is set to true, biasing is applied to the cross-correlation track as described in (Bagshaw et al., 1993). </para><para> </para><para>
Alphabetic index HTML hierarchy of classes or Java
This page is part of the
Edinburgh Speech Tools Library documentation
Copyright University of Edinburgh 1997
Contact:
speech_tools@cstr.ed.ac.uk