What is ASAT?
Automatic Speech Attribute Transcription
(ASAT) is a collaborative speech research paradigm and cyberinfrastructure with
applications to Automatic Speech Recognition (ASR).
Speech is the most natural eans of communication among human beings. There is also a rich
set of human information embedded in speech beyond just word sequences. Mining of speech information is therefore of great importance both in
theory and in practice. It is also critical for the intelligence
and security communities to have spoken translation systems that
are reliable and achieve high performance. Although we have
learned a great deal about how to build practical automatic
speech recognition, or ASR, systems for almost any spoken
language without the need of a detailed understanding of the
language, the existing technology is somewhat fragile in that
careful designs have to be rigorously practiced to overcome
technology deficiencies. Furthermore, the accuracy often
declines dramatically in adverse conditions to an extent that
the
ASR system becomes unusable, even for cooperative users. When
compared with human speech recognition, or HSR, the
state-of-the-art ASR systems usually give much larger error
rates even for rather simple tasks operating in clean
environments. It is interesting to note that human beings
perform speech recognition by integrating multiple knowledge
sources from bottom up. It has long been postulated that a human
determines the linguistic identity of a sound based on
detected evidences that exist at various levels of the speech
knowledge hierarchy, from acoustics to pragmatics. Indeed,
people do not continuously convert a speech signal into words as
an ASR system attempts to do. Instead, they detect
acoustic and auditory evidences, weigh them and combine them to
form cognitive hypotheses, and then validate the
hypotheses until consistent decisions are reached. The above
human-based model of speech processing suggests a candidate
framework for developing next generation speech technologies
that have the potential to go beyond the current limitations.
In order to bridge the performance gap between ASR and HSR
systems, the narrow notion of speech-to-text in ASR has
to be expanded to incorporate all related human information
“hidden” in speech utterances. This collection of information
includes a set of fundamental speech sounds and their linguistic
interpretations, a speaker profile that encompasses gender,
accent and other speaker characteristics, the speaking
environment that describes the interaction between speech and
acoustics, etc. Collectively, we call this set of speech
information, speech attributes. They are not only critical for
ASR but
also useful for many other speech applications. Because of its
interdisciplinary nature, a collaborative speech research
paradigm to facilitate scientific cooperation is essential.
However efforts in integrating detailed knowledge, from
acoustics,
speech, language and their interactions, are hampered by the
current ASR formulation as a “blackbox” of models trained to
“remember” the training data. This makes it difficult for the
ASR community to take advantage of the vast body of literature
developed in the speech and language science communities.
Instead of the conventional top-down, network decoding
paradigm for ASR, we propose a bottom-up, event detection and
evidence combination paradigm for speech research to
facilitate collaborative Automatic Speech Attribute
Transcription (ASAT).
The goals of the proposed project are: (1)
develop
feature detection and knowledge integration modules to
demonstrate ASAT and ASR; (2) build an open source, highly
shared, plug-‘n’-play ASAT cyberinfrastructure for collaborative
research to lower entry barriers to ASR; and (3) provide an
objective evaluation methodology to monitor technology advances
in individual modules and across the entire system.
Project Meetings
Oct 13, 2006 (Rutgers):
Feb 23, 2006 (Berkeley):
Apr 28 & 29, 2005 (GaTech):
Nov 12, 2004 (OSU)
Sep 13, 2004
All the agenda and meeting slides can be found on the meeting
slides part.
Documentation
List all the papers published by all partners
of ASAT project.
Software
All the tools developed by ASAT
partners are available for free download but you must
first
register for a
username and password for accessing these tools. Registration is free but does require a valid e-mail
address; your password for site access will be sent to this
address.
ASAT News
Oct 18, 2006
- All ASAT related news will be put here!