Information Preservation Pooling for Speaker Embedding
        
       
        | Min Hyun Han, Woo Hyun Kang, Sung Hwan Mun, Nam Soo Kim | 
|---|
Many recent studies on speaker embedding focused on the pooling technique. In the task of speaker recognition, pooling plays an important role of summarizing inputs with variable length into a fixed dimensional output. One of the most popular pooling  method  for  text-independent  speaker  verification  system is  attention  based  pooling  method  which  utilizes  an  attention mechanism to give different weights to each frame.  Utterance-level features are generated by computing weighted means and standard  deviations  of  frame-level  features.   However,  useful information in frame-level features can be compromised during the pooling step. In this paper, we propose a information preservation pooling method that exploits a mutual information neural estimator to preserve local information in frame-level features during the pooling step.  We conducted the evaluation on VoxCeleb datasets, which shows that the proposed method reduces equal error rate from the conventional method by 14.6% 
	
		
	
	                       
      




 Show speech transcript
Show speech transcript

