|Hendrik Schröter (FAU Erlangen-Nürnberg, Germany), Tobias Rosenkranz (Sivantos, Germany), Alberto N. Escalante-B. (Sivantos, Germany), Andreas Maier (FAU Erlangen-Nürnberg, Germany)|
Fundamental frequency (f₀) estimation, also known as pitch tracking, has been a long-standing research topic in the speech and signal processing community. Many pitch estimation algorithms, however, fail in noisy conditions or introduce large delays due to their frame size or Viterbi decoding. In this study, we propose a deep learning-based pitch estimation algorithm, LACOPE, which was trained in a joint pitch estimation and speech enhancement framework. In contrast to previous work, this algorithm allows for a configurable latency down to an algorithmic delay of 0. This is achieved by exploiting the smoothness properties of the pitch trajectory. That is, a recurrent neural network compensates delay introduced by the feature computation by predicting the pitch for a desired point, allowing a trade-off between pitch accuracy and latency. We integrate the pitch estimation in a speech enhancement framework for hearing aids. For this application, we allow a delay on the analysis side of approx. 5ms. The pitch estimate is then used for constructing a comb filter in frequency domain as post-processing step to remove intra-harmonic noise. Our pitch estimation performance is on par with SOTA algorithms like PYIN or CREPE for spoken speech in all noise conditions while introducing minimal latency.