Layer Pruning on Demand with Intermediate CTC <BR>(Oral presentation)

Layer Pruning on Demand with Intermediate CTC
(Oral presentation)

Jaesong Lee (Naver, Korea), Jingu Kang (Naver, Korea), Shinji Watanabe (Carnegie Mellon University, USA)

Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods, intermediate CTC and stochastic depth, to train a model whose performance does not degrade much after pruning. We present an in-depth analysis of layer behaviors using singular vector canonical correlation analysis (SVCCA), and efficient strategies for finding layers which are safe to prune. Using the proposed method, we show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU, while each pruned sub-model maintains the accuracy of individually trained model of the same depth.

InterSpeech 2021

Layer Pruning on Demand with Intermediate CTC
(Oral presentation)

Search in Audio

Related Recordings

Pushing the Limits of Non-Autoregressive Speech Recognition
(Oral presentation)

Real-time End-to-End Monaural Multi-speaker Speech Recognition
(Oral presentation)

InterSpeech 2021

Layer Pruning on Demand with Intermediate CTC (Oral presentation)

Search in Audio

Related Recordings

Pushing the Limits of Non-Autoregressive Speech Recognition (Oral presentation)

Real-time End-to-End Monaural Multi-speaker Speech Recognition (Oral presentation)

Layer Pruning on Demand with Intermediate CTC
(Oral presentation)

Pushing the Limits of Non-Autoregressive Speech Recognition
(Oral presentation)

Real-time End-to-End Monaural Multi-speaker Speech Recognition
(Oral presentation)