|Matthew Maciejewski (Johns Hopkins University, USA), Shinji Watanabe (Johns Hopkins University, USA), Sanjeev Khudanpur (Johns Hopkins University, USA)|
Speech enhancement techniques typically focus on intrinsic metrics of signal quality. The overwhelming majority of deep learning-based single-channel speech separation studies, for instance, have relied on a single class of metrics to evaluate the systems by. These metrics, usually variants of Signal-to-Distortion Ratio (SDR), measure fidelity to the “ground truth” waveform. This can be problematic, not only for lack of diversity in evaluation metrics, but also in cases where a perfect ground truth waveform may be unavailable. In this work, we explore the value of speaker verification as an extrinsic metric of separation quality, with additional utility as evidence of the benefits of separation as pre-processing for downstream tasks.