paper or model based compressive sensing for this and multi party speech recognition this is a joint board we'd here mobile and are then one can show and and that's we focus on is the uh a problem of competing speech sources which is uh a a common and one of the most challenging and demanding you should see many speech applications and a lot of uh our work he's basically to perform the speech separation prior to recognition the scenario that we are considering is that case that the number of microphones it's less and the number of source so it's a actually and on their that on mine a speech separation problem but if you are hmmm the number of measurements is even less on the number of on unknown sources and the sparse component and and it's easy pro is one of the most promising approaches to deal with this problem a idea is that we caff down there to to mine speech separation problem as a sparse recovery where we leverage compressive sensing theory are you to still the and to in all other wars uh we proposed to integrate the sparse component analysis at to the front and processing of a speech recognition systems to provide some compliments are you process to make it robust and not in can i will follow we the very brief introduction and on compressive sensing to help me put the work in into the context and then i will explain the details of our method which is blind source separation via a model based compressive sensing um then i will provide the experiments so set top and speech recognition results and the concluding that the most four compressive sensing in and not said it's sensing thing why a dimensionality reduction and the idea is that uh when a signal and uh the sparse signal X high dimension all but the the fact is that the dimensionality of the signal is somewhat misleading and that's true information content are only in but that in very few on the are quite a few such as signal the information content of of as the sparse signal like these could be pretty there are uh what uh by a a kind of a dimensionality reduction measurements which we did not to year we fight through a very and be captured we very few measurements by so in a in a case that naturally is kind of dimensionality reduction happen and we can leverage compressive sensing theory in these cases to for a the and compressive sensing in theory a on three ingredients first of all is a sparse representation we have to come up with a representation of the signal a which is a sparse meaning that very few of the whole corpus are kept in most of the energy of the signal from a geometric perspective if the signal you V R C a a in fact most of the space is more in and the see the sparse signal be only in and played a a link with the court in eight signal not like these the information content of by of make it could be captured we five i if a the the i'm yeah provide an exam at three to the sparse uh a or X um meaning that the is stands or oh the information between the sparse vector the pairwise distances are pretty there in our observation time an are in a like and that's two he key ingredients have a sparse representation and her and measurement compressive sensing guarantees to recover the not i directly improve believe by searching for the sparse solution which matched observation oh i but in practice we don't have a sparse representation in most of the case that a lot for many natural in signal such as images and a speech is a kind of a sparse representation which we call compressed bell could be obtained well i some it's transformation in case of a speech such a transformation is in fact get worries action the it's a kind of a spectral brown of the speech uh has been illustrated here and you see that very few of the whole the spectrogram spectrographic representation has a a um values and the uh a uh and we if we sort the chord it's of the signal the the sorted coefficients show a very rapid decay which is a according to the power law um a signal not like these uh would be um would be cold compress the bell and this could be a a a in our framework of compress see moreover i are the you can even see in our spectrographic map there is an underlying structure are following the sparse coefficients for instance here you see that most of the large it'd coefficients are of those sort of cluster together which could we could further level structure and they're like the coefficients to improve the recall perform and to further you use the number of required observation i get very brief introduction of compressive sensing a i will explain the details all yes that's but source separation or a model based that sparse recovery which from now on i we'll just call bss M R in fact that a more duration from the very reach each each rich are in the context all a sparse component analysis and i have provided that very few of them of the paper is just as like there's men but i was most the but the fact is that the this is much longer and mostly the paper a all square you mouth and scott rick card were very much in aspiring for us to do you think of have the intuition that sparse sparse component analysis could put in help a speech recognition systems in overlapping many you that of a sparse component and it's is it's spatial cues have been used for late in can be covering of this know um and and in this context a what uh some work a um of what kinds of or and colleagues a at least in i P S N the mode that all us a which had uh to formulate a source week already as a sparse recovery a a source localisation as a sparse recovery we uh a a a finally the our yes that's ms are is nothing except that a sparse component analysis work to where which provides a joint a a framework for source localisation and separation what is the new one out S the segments are is that we experiment the model underlying the sparse coefficients we deal we convolutive mixtures and we use and you efficient and accurate already had agreed all a call from the C is the first thing the end used to come up with a kind of a sparse representation of the all node C not that we desire to recover the idea here here is that we describe as the plan or ready or the room into G D N re for this characterization use i if that the the are still dense that each of the speaker i Q by an exclusive three so if three of loss a free of a speaker was are competing um um only three or the grease are active and all the rest have absolute be the energy lee that kind of a spatial a sparse representation that we could obtain for uh simultaneous the speech source i i the spectral the sparsity we mused the short time fourier transform and a spectro-temporal representation now we in time L these two representations a spatial all and a start to gather and it should use the spatial a spectral representation all of where am we did we denote not eat here at a each component of it is in fact the signal not coming from each query any the meeting inside are are the spectral components due to their an people and that this thing we're yeah and is the in and measurements recent work of and scoring published in font uh and the might using a one ten or moon review he recognised the kind of natural manifestation of compressive sensing measurements through a of greens function projection lee aspired us to model our measurement make or matrix uh using the image model but a technique which has been already proposed by john out and break the U and uh the i'd yeah you made model uh is that uh when the room using for or brown and i'm speaking here is not in only me but that happens of my image is with respect to all these walls stick together and we could model that these uh we the greens function with this particular form a frequency domain which each component has been attenuated with respect to it's these that of the image to the so a sense or and has been the late which is the and the speaker so oh using this model we could do uh find the projection S you to to each sensor meant for each of the in in in the room and now we by all these predictions and construct our measurements a matrix five which is how power microphone you mention three now introducing the sparse representation of X which is our unknown we have a the no one observations of microphones which you how by we was suppose that we have a M microphones and we have a measurement matrix with image model i all is to recover X from very few measurements after me why the channel and is that for for i uh has the non trivial a space and a a like source coming from the now the space we give the same so though mm according to a linear algebra such a system doesn't have a nice addition how work the solution would be to sparse the solution and bases uh what a sparse recovery help us to and give them enough information to overcome that you posed as a of a of our inverse problem and you but do we do here is that we use a loss recovery at agree the uh it was presented in uh session that uh yesterday on a learning a low dimension signal models and the i of all these uh if fact a a known to the family of a check par thresholding method and the idea is that scenes project eighteen the signal into the whole the space and finding the sparse the solutions is in hard and it's the combinatorial a a problem and in i straight you hard thresholding approach as a we kind of a price they make an and i trade you manner to the a sparse solution by keeping only the hmmm hmmm in a sparse i and a the largest value coefficients and discarding all and this has been done on um um and a model based a of what be D is that we checked only largest a a large just uh energy of their uh of the blocks and discarded the rest of the blocks and now i i in one to their our experiments and up oh for the speech court who's we use our route to which was not overlapping but the overlap it be interference that selected randomly from its to me we discrete twice the a plan or are your of the room into two fifty by fifty cents a meet and reads and the reverberation time was two hundred sec in in a scenario we tested our method one of bad two uh a three competing the speak yours of air our target the speech coming from a work too interference one and two are active and in the second scenario uh interference three and four and older as are i'm ten and the result in the case of a story recording and um separation and using a when and three sources are competing are the following our on route to is kind of digit recognition task which for wise the training in two conditions one of that the hmm M M by and has been trained only using clean we not trance as and the other one is using multi condition or noisy the trance this to train our eight to the model and the baseline nine now overlapping in speech being clean condition is fifty nine percent cent sixty one person remote condition training and after or a separation and perform speech recognition and we could i of up to ninety two percent the multi condition string and um a a ball eighty percent of relative improvement have been a then in the second scenario five sources were active and the uh we but one of them appealing panelling a space all of this work is that uh we could we are very much for the and valley the microphone and the geometry to we could use oh oh in two cases once a one we use only two microphones and is just say can you use only for microphone and separated the speech and then perform a speech recognition and and the word accuracy rates are provided this in the table a as you to ninety four per and if or microphones have been used to do this source separation and the relative improvement would be up to eighty five first right the she that is that a would like to come back with that the information bearing components of for a speech recognition are indeed the sparse and that's some for all the main and these years to some compelling evidence that sparse component analyses is what is a potential approach to deal with the problem of overlapping in realistic really stick applications of the speech recognition and or or or or are we use a a a kind of model that the sparse recovery you we showed that we could go beyond this part i'm sorry it it was method oh are you could construct the audio are and destructive source and motion to or i to leave five yeah the tormented of are old so that we could all put also have some kind of a rat quantitative or at least of the using like a S I R they are all of one measures which have been proposed for to do source separation but lot thing so hmmm goal what's finally to speech recognition and we just so that the speech recognition results with keep the best fine final performance performance or or evaluation of how the system would work for speech rec a a source separation so true two to work i a a a a a a a source of a yeah we have well that's true oh so call types but uh so "'cause" subject to we we have uh we sent to some poles and the there are some some cases of the overlapping which with the seal be here or your you're back background have a a not like the kind of musical noise that we expect from binary man king case because the sparse recovery could be in some sense the even look that as the kind of soft my can still the kind of artifacts or P i i a i oh yeah i well and the measurement may treat in uh it depends on they are each for the environment inter element spacing the many factors that have been um um in written in very detail been considered in the car paper but in in our case are you can be was in fact as five well as so we know like some kind of precondition conditioning by orthogonalization and the details some are we can use sending are but in theory um a in still is for instance for that for a for very specific acoustic conditions uh we could still that are a P also hold uh the process a