The National Academies of Sciences, Engineering and Medicine
Pakistan - U.S. Science and Technology Cooperation Program
Development, Security, and Cooperation
Policy and Global Affairs
Home About Us For Applicants Funded Projects Special Events

Pakistan-US Science and Technology Cooperation Program                                                            
Phase 3 (2007 Deadline)

Telephone-Based Speech Interfaces for Access to Information by
Nonliterate Users 
  

Roni Rosenfeld, Carnegie Mellon University (CMU)
Sarmad Hussain, National University of Computer and Emerging Sciences
Pakistani Funding (HEC):  $  60,000
US Funding:    $ 125,000
Project Dates on US side: June 1, 2008 - January 31, 2011

Project Overview

Information access is an essential, yet often-overlooked tool for socioeconomic development. While literate and affluent members of society have many ways to obtain information, there are alarmingly few options for the relatively impoverished nonliterate majority. Print media are unusable due to literacy issues, television and radio are mostly noninteractive, and face-to-face training is expensive. Although computers can provide an interactive learning experience, they are not viable for a variety of reasons. Cell phones provide a mechanism for human-computer communication for automated, self-service information access, as well as a host of other automated services. However, limited expertise in speech technology, the dearth of computer-based local language resources, and the lack of targeted research towards speech interfaces for nonliterate users have meant that such interfaces have not been developed, much less evaluated. Dr. Rosenfeld and Dr. Hussain devised their project to take the first steps in this direction in Pakistan. In this project, they aimed to design, develop, and evaluate an actual information access system for health information in Pakistan. Through this research project, they investigated the use of speech interfaces in a field-deployed system and also developed a speech recognition engine that could be easily adapted to other domains. The project should have the additional benefit of building the R&D capacity of Pakistani universities in the field of speech technology and enabling wider dissemination of this capacity through the development of coursework, which would pave the way for the creation of similar capacities in multiple Pakistani languages.

Major Results

  • Developed and tested a speech-based, telephone-based automated dialog system in both Urdu and Sindhi for healthcare information access for low-literate community health workers
  • Designed, collected an prepared an Urdu speech corpus consisting of 42 hours of speech from 82 speakers, completed transcribed and with a transliteration lexicon
  • Constructed and released three Urdu acoustic models (male, female, both) using Carnegie Mellon University's Sphinx speech recognition system
  • Developed a technique for cross-language pronunciation modeling that allows the rapid deployment of small vocabulary dialog systems in low-resource languages such as Sindhi, Balochi or any local dialects
  • Provided direct training to eight students (four of them Pakistani nationals) and impacted more than 18 individuals by informal training and involvement in the project (13 of them Pakistani)
  • Published eight papers in peer-reviewed international conferences

Quarterly Update

The project deliverables have been completed and the project has closed on both sides. This project was affected by financial, visa, and security-related challenges as well as issues related to Dr. Hussain’s 2010 departure from his university to take another job. Nevertheless, the project produced several positive outcomes, including the collection of a speech corpus, development of acoustic and language models and speech processing tools for public release, curriculum enhancement, facilitation of one student’s master’s thesis, and completion of one publication and eight conference presentations. Dr. Rosenfeld reports that he continues to send students to Pakistan to collaborate with his partners there, and their results have been impressive. The most  recent development arising from this collaboration has been the release of Polly, a telephone-based system for reaching low-literate populations via a simple voice-based game, then providing them with development-related voice-based services. As of August 2012, Polly is in active use in Pakistan and has reached nearly 100,000 people. A brief video demonstrating the system is available through this link. Additional reports on the project from inception through completion are available through the links below.

Progress Report Summaries

Show all progress summaries | Hide progress summaries

2010 Show summary || Hide summary 

2009 Show summary || Hide summary 

2008 Show summary || Hide summary 

Back to Pakistan-US Science and Technology Program Phase 3 Grants List

 

PGA_167383PGA_071792PGA_085287PGA_052637PGA_052647PGA_052640PGA_058463PGA_083755PGA_169090PGA_182420