ASRU 2011


Speech Recognition with Segmental Conditional Random Fields

Geoff Zweig (Microsoft Research)

Novel techniques in speech recognition are often hampered by the long road that must be followed to turn them into fully functional systems capable of competing with the state-of-the-art. In this work, we explore the use of Segmental Conditional Random Fields (SCRFs) as an integrating technology which can augment the best conventional systems with information from novel scientific approaches. We begin by describing the methodology and its relationship to other methods such as augmented statistical models and structured SVMs. We then illustrate the approach with work done at Microsoft and Johns Hopkins University, in which we find that the SCRF framework is able to appropriately weight different information sources, as varied as phoneme detections and template matching scores, to produce significant gains on Broadcast News, Wall Street Journal, and Voice Search tasks. The talk concludes with a discussion of the research challenges associated with SCRFs.

  Outline

0:01:41

Intro

0:04:41

Motivation

0:07:45

From HMMs to SCRFs

0:25:24

The SCARF Toolkit

0:26:02

The SCARF Toolkit - Language Modeling

0:30:23

The SCARF Toolkit - Inputs

0:33:27

The SCARF Toolkit - Features

0:39:53

Experimental Results - Using Multiphone Detectors

0:43:01

Experimental Results - Using Templates

0:46:45

Experimental Results - Using Multiple Sources

0:48:52

Some Research Challenges

0:51:24

Conclusions