INTERSPEECH 2010 Tutorial Program

Mobile Voice Search

  • Mazin Gilbert (AT&T)
  • Alex Acero (Microsoft)


The proliferation of mobile and hand-held devices along with advances in multimodal and multimedia technologies are giving birth to a new wave of mobile applications that enable users to quickly and more naturally find information using voice input. Mobility is central to these applications that capitalize on multimodal input to search for media such as videos, music, business listings, products and services, or to surf the web or send an SMS.

Mobile voice search has been a central area of interest in both academia and industry. Numerous special sessions and panel discussions have been organized on technologies and applications related to mobile voice search at ICASSP, Interspeech, HLT/ACL, SpeechTek, Voice Search (or AVIOS), SpokenQuery, ASRU, CIVR, and MIR. Over the past one-year, several companies have announced services that enable consumers to use natural voice interfaces to search local listings, the web or to send an SMS. Initial reports by AT&T, Google, Microsoft/Tellme, Nuance and Vlingo/Yahoo show consumers are rapidly adopting voice search for their everyday needs.

The commercialization of mobile voice search applications is giving rise to new technical challenges. Examples of these challenges include creating speech recognition systems that are robust to surrounding noisy environments, searching through a large amount of unstructured and semi-structured data, understanding a variety of inquires from keywords to natural language questions, summarization of multimedia search results, and personalizing the user interface to more easily adapt to preferences and geo-locations.

This tutorial on Mobile Voice Search is motivated by the explosion of applications on mobile devices that apply multimodal and multimedia processing. It is suitable for researchers and students, who would like to acquire a broader prospective on how speech, multimodal and search technologies are fueling a new generation of mobile applications that are radically changing the way people and businesses communicate. The tutorial will help to attract different research communities, namely, speech and language, multimodal and multimedia, mobile applications, and user interface. It will not only be an opportunity to strengthen the importance and highlight the impact of mobile voice research but it will also encourage more interdisciplinary research among the various different communities.


Mazin Gilbert is the Executive Director of Speech and Language Technologies at AT&T Labs-Research. He has a Ph.D. in Electrical and Electronic Engineering, and an MBA for Executives from the Wharton Business School. Dr. Gilbert has 24 years of research experience working in industry at Bell Labs and AT&T Labs and in academia at Rutgers University, Liverpool University, and Princeton University. Dr. Gilbert is responsible for advancing and developing AT&T's technologies and prototypes in areas of speech and language processing for mobility, IPTV and enterprise. These include fundamental and forward looking research in automatic speech recognition, spoken language understanding, voice search, multimodal user interfaces , speech analytics, and social media. Dr. Gilbert has over 100 publications in speech, language and signal processing and is the author of the book entitled, "Artificial Neural Networks for Speech Analysis/Synthesis (Chapman & Hall; 1994)". He holds 35 US patents and is a recipient of several national and international awards including the Most Innovative Award from SpeechTek 2003 and the AT&T Science and Technology Award, 2006. Dr. Gilbert is a Senior Member of the IEEE. He is; Member, Editorial Board for Signal Processing Magazine (2009-present); Member, ISCA Advisory Council (2007-present); Chair, IEEE/ACL workshop on Spoken Language Technology (2006); Chair, SPS Speech and Language Technical Committee (2004-2006); Teaching Professor, Rutgers University (1998-2001) and Princeton University (2004-2005); Chair, Rutgers University CAIP Industrial Board (2003-2006); Associate Editor, IEEE Transaction on Speech and Audio Processing (1995-1999); Chair, 1999 Workshop on Automatic Speech Recognition and Understanding; Member, SPS Speech Technical Committee (2000-2004); Technical Chair and Speaker for several international conferences including ICASSP, SpeechTek, AVIOS, and Interspeech.

Alex Acero received a M.S. degree from the Polytechnic University of Madrid, Madrid, Spain, in 1985, a M.S. degree from Rice University, Houston, TX, in 1987, and a Ph.D. degree from Carnegie Mellon University, Pittsburgh, PA, in 1990, all in Electrical Engineering. Dr. Acero worked in Apple Computer's Advanced Technology Group in 1990-1991. In 1992, he joined Telefonica I+D, Madrid, Spain, as Manager of the speech technology group. Since 1994 he has been with Microsoft Research, Redmond, WA, where he is presently a Research Area Manager directing an organization with 60 engineers conducting research in audio, speech, multimedia, and natural language. He is also an affiliate Professor of Electrical Engineering at the University of Washington, Seattle. Dr. Acero is author of the books "Acoustical and Environmental Robustness in Automatic Speech Recognition" (Kluwer, 1993) and "Spoken Language Processing" (Prentice Hall, 2001), has written invited chapters in 4 edited books and over 200 technical papers. He holds 82 US patents. Dr. Acero is a Fellow of IEEE. He has served the IEEE Signal Processing Society as Vice President Technical Directions (2007-2009), Director of Industrial Relations (2010-2012), 2006 Distinguished Lecturer, member of the Board of Governors (2004-2005 and 2010-2012), Associate Editor for IEEE Signal Processing Letters (2003-2005) and IEEE Transactions of Audio, Speech and Language Processing (2005-2007), and member of the editorial board of IEEE Journal of Selected Topics in Signal Processing (2006-2008) and IEEE Signal Processing Magazine (2008-2010). He also served as member (1996-2000) and Chair (2000-2002) of the Speech Technical Committee of the IEEE Signal Processing Society. He was Publications Chair of ICASSP98, Sponsorship Chair of the ASRU 1999, General Co-Chair of ASRU 2001, and Sponsorship Chair for ASRU 2009. Since 2004, Dr. Acero, along with co-authors Drs. Huang and Hon, has been using proceeds from their textbook "Spoken Language Processing" to fund the "IEEE Spoken Language Processing Student Travel Grant" for the best ICASSP student papers in the speech and language area. Dr. Acero was sponsorship co-chair for Interspeech 2006, member of the editorial board of Computer Speech and Language and he served as member of Carnegie Mellon University Dean's Leadership Council for College of Engineering.

This page was last updated on 21-June-2010 3:00 UTC.