Automatic Speech Attribute Transcription

Home

ASAT Project
Register

Documentation
Papers
Meeting Slides

ASAT Wiki

Software  

Links

Sponsors

What is ASAT?

Automatic Speech Attribute Transcription (ASAT) is a collaborative speech research paradigm and cyberinfrastructure with applications to Automatic Speech Recognition (ASR).

Speech is the most natural eans of communication among human beings. There is also a rich set of human information embedded in speech beyond just word sequences. Mining of speech information is therefore of great importance both in theory and in practice. It is also critical for the intelligence and security communities to have spoken translation systems that are reliable and achieve high performance. Although we have learned a great deal about how to build practical automatic speech recognition, or ASR, systems for almost any spoken language without the need of a detailed understanding of the language, the existing technology is somewhat fragile in that careful designs have to be rigorously practiced to overcome technology deficiencies. Furthermore, the accuracy often declines dramatically in adverse conditions to an extent that the ASR system becomes unusable, even for cooperative users. When compared with human speech recognition, or HSR, the state-of-the-art ASR systems usually give much larger error rates even for rather simple tasks operating in clean environments. It is interesting to note that human beings perform speech recognition by integrating multiple knowledge sources from bottom up. It has long been postulated that a human determines the linguistic identity of a sound based on detected evidences that exist at various levels of the speech knowledge hierarchy, from acoustics to pragmatics. Indeed, people do not continuously convert a speech signal into words as an ASR system attempts to do. Instead, they detect acoustic and auditory evidences, weigh them and combine them to form cognitive hypotheses, and then validate the hypotheses until consistent decisions are reached. The above human-based model of speech processing suggests a candidate framework for developing next generation speech technologies that have the potential to go beyond the current limitations. In order to bridge the performance gap between ASR and HSR systems, the narrow notion of speech-to-text in ASR has to be expanded to incorporate all related human information “hidden” in speech utterances. This collection of information includes a set of fundamental speech sounds and their linguistic interpretations, a speaker profile that encompasses gender, accent and other speaker characteristics, the speaking environment that describes the interaction between speech and acoustics, etc. Collectively, we call this set of speech information, speech attributes. They are not only critical for ASR but also useful for many other speech applications. Because of its interdisciplinary nature, a collaborative speech research paradigm to facilitate scientific cooperation is essential. However efforts in integrating detailed knowledge, from acoustics, speech, language and their interactions, are hampered by the current ASR formulation as a “blackbox” of models trained to “remember” the training data. This makes it difficult for the ASR community to take advantage of the vast body of literature developed in the speech and language science communities. Instead of the conventional top-down, network decoding paradigm for ASR, we propose a bottom-up, event detection and evidence combination paradigm for speech research to facilitate collaborative Automatic Speech Attribute Transcription (ASAT).

The goals of the proposed project are: (1) develop feature detection and knowledge integration modules to demonstrate ASAT and ASR; (2) build an open source, highly shared, plug-‘n’-play ASAT cyberinfrastructure for collaborative research to lower entry barriers to ASR; and (3) provide an objective evaluation methodology to monitor technology advances in individual modules and across the entire system.

Project Meetings

Oct 13,  2006 (Rutgers):

Feb 23,  2006 (Berkeley):

Apr 28 & 29,  2005 (GaTech):

Nov 12, 2004 (OSU)

Sep 13, 2004

All the agenda and meeting slides can be found on the meeting slides part.

Documentation

List all the papers published by all partners of ASAT project.

Software

All the tools developed by ASAT partners are available for free download but you must first register for a username and password for accessing these tools. Registration is free but does require a valid e-mail address; your password for site access will be sent to this address.

ASAT News

Oct 18, 2006
  • All ASAT related news will be put here!

Visitors since 25-JAN-2005: