INTERSPEECH 2010 Tutorial Program

Discriminative Training - Fundamentals and Applications

  • Chin-Hui Lee (Georgia Institute of Technology)


Recently discriminative training (DT) has attracted new attentions in the speech and language processing communities because of its ability to learn parametric representations and achieve better performance and enhanced robustness than those with model parameters obtained by conventional training methods without changing the structure and complexity of the models being used. When probabilistic distributions are used to characterize the above representations discriminative training often implies learning decision boundaries instead of approximating density functions. Instead of estimating parameters separately to approximate individual densities DT attempts to jointly estimate all the parameters of the competing distributions all together to meet the performance requirements of a specific problem setting.

In general there are two major families of DT methods. The first is function based DT. Rather than estimating parameters with the conventional minimum mean squared error (MMSE), maximum likelihood (ML), maximum a posteriori (MAP), or maximum entropy (ME) criteria, one chooses a different objective function to optimize. Well-known methods include maximum mutual information (MMI), minimum discriminative information (MDI), minimum description length (MDL), etc. The choice of the objective functions to be used often depends on the specific problems to be solved. For example if two-class categorization is involved as in text-independent speaker verification we can approximate each class with a Gaussian mixture model (GMM), and use the ML, MAP or MMI estimation criteria to learn the parameters of the competing distributions.

The second category is decision-feedback based DT in which a decision function that determine the performance of the training and testing procedure on the training set is embedded in the optimization formulation so that the parameters can be learned by adjusting their current values to optimize the desired evaluation metrics in the direction guided by the feedback obtained from the current set of decision parameters. Some popular techniques are minimum classification error (MCE), minimum verification error (MVE), minimum phone error (MPE), maximal figure-of-merit (MFoM), maximum or minimum area under the receiver operating characteristic curve (AUC), maximum margin of separation, etc. Again the choice of techniques to be used depends heavily on the decision function to be used and the evaluation metrics to be applied. For example in multi-class text categorization the decision function is usually the argmax operation among scores of all competing categories. Furthermore the evaluation metric can be micro or macro F1 or area under the precision-recall ROC curve. We can then use the MFoM learning algorithm to obtain all the parameters of all topic categories using any combination of feature vectors and score functions.

In this tutorial we will review the theory of popular discriminative training methods commonly used in the speech and language processing communities. We will then describe the utility of DT and show why DT offers attractive alternatives to conventional estimation procedures, especially in the cases when the underlying distributions of the data or competing classes are not completely known. We will then formulate DT algorithms for widely-used parametric representations, such as GMM, hidden Markov model (HMM), linear discriminant function (LDF), artificial neural network (ANN), linear discriminative analysis (LDA), and vector quantization. Finally we describe properties of DT algorithms and illustrate how DT can be used in many speech and language processing applications, including feature extraction, acoustic modeling and language modeling for automatic speech recognition, speaker recognition, utterance verification, spoken language recognition, and text categorization. We will compare performance of models obtained before and after DT to show its effectiveness in enhancing performance and robustness of pattern recognition and verification algorithms.


Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Dr. Lee received the B.S. degree in Electrical Engineering from National Taiwan University, Taipei, in 1973, the M.S. degree in Engineering and Applied Science from Yale University, New Haven, in 1977, and the Ph.D. degree in Electrical Engineering with a minor in Statistics from University of Washington, Seattle, in 1981.
Dr. Lee started his professional career at Verbex Corporation, Bedford, MA, and was involved in research on connected word recognition. In 1984, he became affiliated with Digital Sound Corporation, Santa Barbara, where he engaged in research and product development in speech coding, speech synthesis, speech recognition and signal processing for the development of the DSC-2000 Voice Server. Between 1986 and 2001, he was with Bell Laboratories, Murray Hill, New Jersey, where he became a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. His research interests include multimedia communication, multimedia signal and information processing, speech and speaker recognition, speech and language modeling, spoken dialogue processing, adaptive and discriminative learning, biometric authentication, and information retrieval. From August 2001 to August 2002 he was a visiting professor at School of Computing, The National University of Singapore. In September 2002, he joined the ECE Faculty at Georgia Institute of Technology.
Prof. Lee has participated actively in professional societies. He is a member of the IEEE Signal Processing Society (SPS), Communication Society, and the International Speech Communication Association (ISCA). In 1991-1995, he was an associate editor for the IEEE Transactions on Signal Processing and Transactions on Speech and Audio Processing. During the same period, he served as a member of the ARPA Spoken Language Coordination Committee. In 1995-1998 he was a member of the Speech Processing Technical Committee and later became the chairman from 1997 to 1998. In 1996, he helped promote the SPS Multimedia Signal Processing Technical Committee in which he is a founding member.
Dr. Lee is a Fellow of the IEEE, and has published more than 300 papers and 25 patents. He received the SPS Senior Award in 1994 and the SPS Best Paper Award in 1997 and 1999, respectively. In 1997, he was awarded the prestigious Bell Labs President's Gold Award for his contributions to the Lucent Speech Processing Solutions product. Dr. Lee often gives seminal lectures to a wide international audience. In 2000, he was named one of the six Distinguished Lecturers by the IEEE Signal Processing Society. He was also named one of the two ISCA's inaugural Distinguished Lecturers in 2007-2008. Recently he won the SPS's 2006 Technical Achievement Award for "Exceptional Contributions to the Field of Automatic Speech Recognition".

This page was last updated on 21-June-2010 3:00 UTC.