Modeling Dialectal Variation for Swiss German Automatic Speech Recognition <BR>(Oral presentation)

Modeling Dialectal Variation for Swiss German Automatic Speech Recognition
(Oral presentation)

Abbas Khosravani (Idiap Research Institute, Switzerland), Philip N. Garner (Idiap Research Institute, Switzerland), Alexandros Lazaridis (Swisscom, Switzerland)

We describe a speech recognition system for Swiss German, a dialectal spoken language in German-speaking Switzerland. Swiss German has no standard orthography, with a significant variation in its written form. To alleviate the uncertainty associated with this variability, we automatically generate a lexicon from which multiple written forms of a given word in any dialect can be generated. The lexicon is built from a small (incomplete) handcrafted lexicon designed by linguistic experts and contains forms of common words in various Swiss German dialects. We exploit the powerful speech representation of self-supervised acoustic pre-training (wav2vec) to address the low-resource nature of the spoken dialects. The proposed approach results in an overall relative improvement of 9% word error rate compared to one based on an expert-generated lexicon for our TV Box voice assistant application.

InterSpeech 2021

Modeling Dialectal Variation for Swiss German Automatic Speech Recognition
(Oral presentation)

Search in Audio

Related Recordings

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept
(Oral presentation)

Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System
(Oral presentation)

InterSpeech 2021

Modeling Dialectal Variation for Swiss German Automatic Speech Recognition (Oral presentation)

Search in Audio

Related Recordings

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept (Oral presentation)

Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System (Oral presentation)

Modeling Dialectal Variation for Swiss German Automatic Speech Recognition
(Oral presentation)

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept
(Oral presentation)

Out-of-vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System
(Oral presentation)