LACOPE: Latency-Constrained Pitch Estimation for Speech Enhancement <BR>(3 minutes introduction)

LACOPE: Latency-Constrained Pitch Estimation for Speech Enhancement
(3 minutes introduction)

Hendrik Schröter (FAU Erlangen-Nürnberg, Germany), Tobias Rosenkranz (Sivantos, Germany), Alberto N. Escalante-B. (Sivantos, Germany), Andreas Maier (FAU Erlangen-Nürnberg, Germany)

Fundamental frequency (f₀) estimation, also known as pitch tracking, has been a long-standing research topic in the speech and signal processing community. Many pitch estimation algorithms, however, fail in noisy conditions or introduce large delays due to their frame size or Viterbi decoding. In this study, we propose a deep learning-based pitch estimation algorithm, LACOPE, which was trained in a joint pitch estimation and speech enhancement framework. In contrast to previous work, this algorithm allows for a configurable latency down to an algorithmic delay of 0. This is achieved by exploiting the smoothness properties of the pitch trajectory. That is, a recurrent neural network compensates delay introduced by the feature computation by predicting the pitch for a desired point, allowing a trade-off between pitch accuracy and latency. We integrate the pitch estimation in a speech enhancement framework for hearing aids. For this application, we allow a delay on the analysis side of approx. 5ms. The pitch estimate is then used for constructing a comb filter in frequency domain as post-processing step to remove intra-harmonic noise. Our pitch estimation performance is on par with SOTA algorithms like PYIN or CREPE for spoken speech in all noise conditions while introducing minimal latency.

Search in Audio

Related Recordings

Multiple Sound Source Localization Based on Interchannel Phase Differences in All Frequencies with Spectral Masks
(3 minutes introduction)

Hyungchan Song , Jong Won Shin

Cancellation of Local Competing Speaker with Near-field Localization for Distributed Ad-Hoc Sensor Network
(3 minutes introduction)

Pablo Pérez Zarazaga , Mariem Bouafif Mansali , Tom Bäckström , Zied Lachiri

InterSpeech 2021

LACOPE: Latency-Constrained Pitch Estimation for Speech Enhancement (3 minutes introduction)

Search in Audio

Related Recordings

Multiple Sound Source Localization Based on Interchannel Phase Differences in All Frequencies with Spectral Masks (3 minutes introduction)

Cancellation of Local Competing Speaker with Near-field Localization for Distributed Ad-Hoc Sensor Network (3 minutes introduction)

LACOPE: Latency-Constrained Pitch Estimation for Speech Enhancement
(3 minutes introduction)

Multiple Sound Source Localization Based on Interchannel Phase Differences in All Frequencies with Spectral Masks
(3 minutes introduction)

Cancellation of Local Competing Speaker with Near-field Localization for Distributed Ad-Hoc Sensor Network
(3 minutes introduction)