a media signal processing section on joint about of it you are are you these low signal processing we have uh spain bass and uh for each one we have a more less twenty minutes mean that's work fall a last was uh uh we start with the first paper are but these on a out your visual synchronization recovery multimedia content represent a is a drawn grow say okay uh and they are all queries is G by me from where as we there of these group of technology in on sweets or one please and everyone uh mine is tools of the uh from U P F which then a that i don't might cop is audio-visual synchronization recovery in multimedia content this is the a line of my talk first i'm gonna introduce the colour problem of what visual synchronisation in multimedia content and then i'm going to explain the contribution of my work and then i will explain in detail what used the proposed method and some some details about the correlation measures to measure the correlation between audio and video signals and then i i sure use some experimental result and then yeah i going to conclude might talk with does some summary and future work so the problem of do we just synchronization in multimedia content is can be explained in this context so when you have some multimedia content it usually contain both audio and video so when you talk about the quality of multimedia we have this the the quality components from these two uh two modalities so for a audio we have a lower is noise for jitter component and in video we have reading use jerking knees picks so noise et cetera but another important part is that to quality the two signals have some maturing uh interaction so for example the quality all the two signals are mutually interact each other and also there is a problem of synchronization of the two modalities so this is the problem that i wanna talk to so usually we expect something can was audio video signal in our life this system also my wife and uh if we if she close my name the and then i suspect this shape of mouse uh at the same time and this is our expectation i of the synchronization in our daily life and there some some start there are some studies about this synchronization problem in uh audio and video signals and people found that there is there is also some tolerance in the synchronisation for sample there is a or an inter sensory integration do which is about two hundred millisecond wide during which the audio which your perception is not degrade when the synchronization is we in this uh error found so for example if you see this graph the the two signals even if they are not perfectly yeah uh synchronise the in this uh area the people to say perceive but the two signals are synchronise so based on many studies of all synchronization uh uh also some center document from i to you so i Q you this document specified of susceptibility a threshold uh as as a round uh plus minus one hundred millisecond and uh so so that the some send are or some some looking the systems should follow this guideline but i i have oh a word this boundary we people uh start to perceive the uh uh a line uh audio and video signals so but unfortunately we you may have some a synchrony in the audio signals in i and this may happen during the all all steps in the mote and the processing chain so for example in acquisition we know the speed all of the lights and the sound is different and doing the editing dating they may have different processing times or or people can make a mistake simply and during turn transmission they may suffer from different network transfer than delay what doing the right restitution maybe they have different uh delay in decoding oh the result of this uh a synchrony is first of all the qualities stick right at so maybe it people get angry about that fact and a for the more the people don't understand the content actually so to so this problem in our work we developed the old automatic algorithm and to detect where there is whether data there is that a synchrony audio and video signal and the cover original synchronization and it for this we exploit the what do we regional correlation structure which is in here and to there in that two signal so the features of the method is first we don't have any assumption on the content so therefore we don't need any training and also this can be of applied to any kind of content both speech and non-speech content as as long as there is a a being more the motion responsible for the for the sound and in part to large we did you use two different correlation measures to compare and we compared the results so let me explain data to in detail the proposed method the idea is quite simple so when we have the one two oh audio and video signals we we don't know whether they are uh a line well or not we shift of the audio signal a relative to the video signal a step by step and the measured the correlation and we find the maximum at the moment where we get the maximum correlation between the two signals so the algorithm can be summarised like this so the first that is to extract some features and then we divide the signal in to some some small uh unit that where we can apply some correlation analysis so first to be divide the host signal in two in in the temporal dimension so that we have some some small uh segment we called it as a temporal problem here and this is applied for both audio and video and then for their we segment the video signal at the image frames into smaller small tires which is uh in our case we use four by four pixels says and uh so that we we have to not for uh by doing this we find where the actually the sound is coming from so then for each hypothetical time shift so you this hypothetical time you've means we ship the audio signal step by step one by one and then for each temporal block we do some analysis and then get the correlation and the correlation is the maximum correlation all between audio time shift to audio and then the B you signal in the in this style and we'd we after we measure that the correlation or over the whole whole image frame and then we take the maximum and we expect the location well having this maximal maximum correlation is he's the sound source and then we have to this we a compute the average of this maximum correlation over the temporal problem so to from the beginning of the signal to the end of the signal and beep or from this and then we can now for each time shift we have the correlation measure of the two signals and then we choose a max some value and the finally after all this that's we we find the time shift in uh uh you know a smaller resolution the here the time shift is done at the resolution of the video frame rate so the we get the correlation measures at each uh a you frame rate so when you have this kind of correlation curve for different time she that's save V get the maximum here but actually we do the probably fitting over the three points and then you get the maximum value here so this is the fine O time shift that we can get so um yeah there's a quite clear here but the question is what kind of correlation measure we can use so i can are two different methods one the mutual information and the other ones a can relation the mutual information use as you know probably no well it's uh on any measuring the sure dependence between two signals and in particular a use the quadratic sure information proposed by more a on uh in to them for this uses and then use uh coder cathartic entropy and the the it also use the parzen pdf estimation for estimating the the marginal and the the joint pdf so the question is given by this so and here we need to well the the each pdf using uh some of the the and colours and so this got it since it's a person's F estimation this kind are set on each data point and and we have a parameter that have to be a fixed the which is a which of the couch in connors this is a user parameter that we have to set in our experiment to be you we did the some research search and then take to the the best one yeah that correlation measure is the can and calculation is a measure of correlation in the space where the project it the signals have or maximum correlation so finding this projection is uh equivalent to finding a common representation space all of the two signals so this is a question of the correlation can of calculations so as you can see we need to find this uh projection vector W here i have a which project the input vector X and Y which is clear which are correspond to which correspond to the audio and video yeah signals and we try to maximise to i that the correlation measure and this problem can be solved i that the problem she's she's uh available in many uh publication oh these are the two correlation measures that i use so now let me explain some experiment result so i tested the the algorithm in three what do we just sequence is to are speech and the other one is non-speech speech and i selected the of synchrony between zero to one plus my one second and for features are use uh quite simple method "'cause" uh i found this to work very well but of course the more complex and of can be also you for visual features i use the i take that the the and then uh take the i-th tip the there but along the time dimension and also for audio feature i used i i i collected the energy and then to the derivative in the temporal dimension and the analysis uh unit need in time it was uh fifty video frames which correspond to a around to run two seconds the betting on the the sequence and i as i mentioned the spatial pile was four by four picks says but this is after down sampling the image frames the image frame was down sick then two one uh sixteen so for one for in each time dimension oh here you can see the some the riddle some so these are the three sequences that i used the first one is uh monologue a by a guy and second one there are two guys but only disguise speaking the other guy move so bit bit uh is leaves stories head or use i and the third one is uh and make is in it includes the bumping sound by the pen on the table and this is the result so act axis means the the simulated of synchrony from zero to plus minus one i thousand millisecond and the Y the estimation error in millisecond and for the black are for me the results from using the which information and the white part means to results from can on the correlation so if you see that results a first if you see the result of which information normally is okay but there are some cases where the error is uh is not acceptable for them but this case is more than a hundred second this is a more than four hundred millisecond this is out of that a uh facial acceptability thresholds and the main each mainly typical main uh reason for this was that the you know i mentioned that in you sure information i we need to set the parameter of the couch and with and i tried all different kind of uh variance but this was the best and i couldn't five and this was a bit better results so for some cases the best with is some value but on the other and the for the other cases it's some different that is so that was the main difficulty in using each information uh a on the other hand if you look at the kind of calculation results uh it's always uh less than one hundred millisecond and i as i mentioned before the acceptability a first showed is around one hundred miliseconds so here we can say you can quote that for this sequence is the can and correlation was uh successful and this figure shows a briefly is simply C shows the the how the correlation measure changes according to the the hypothetical time ship a a for and this case i use the the perfectly synchronized so the just signals so that colour in the meter this is the correct hypothesis while this column than the right side column is so wrong but that's so this is the cool thirty one which means all round about one second oh here you can see that the correlation pager or can cannot co correlation measure is uh but is larger when they are synchronized in the middle column uh then in comparison to the the right side so for but zero point seven versus to a point eight and in this case the bottom case zero point six versus zero point nine and one or more thing you can see here is that the black area it when i calculate when i measure the correlation between different ties and the audio signal i i take i took only that ties which have the motion you side so i when there the motion is uh a negligible then i didn't do the analysis to save the computation and that that is a as uh black part so in this case can see the the on on the light ties are quite small in comparison to the whole scene a uh final conclusion so to summarise uh uh we propose a automatic synchronization my thought and i tried uh different correlation measures and the found that coder information implementation it was uh quite sensitive to the couch some parameters where at is the current canonical correlation the uh sure you overall all quite a quite robust a result and one thing i like to mention here is that uh the signal our approach was uh also applied to three D there was big video uh synchronization so in still scott big video you have to video streams and if a they are if they are not synchronized and you see uh double region for them here i if you can see clearly but you see the laptop the lead of the laptop is you you the twice one is here when you the ones here so this this synchronization problem may also a in this case and we applied the similar technique and beast "'cause" solve the problem and this was uh a to a present it next uh so a previous year in i C uh final the future work is uh we have an the test the method uh more uh one diapers contents because the be use only three content here and also we'd like to continue studying a on the this synchronization problem in different uh media like mobile i H T V or three right i time i'm just for one question councils a i think i know that uh if i remember correctly i think is on only or for for speech a for so i think they found it tried found the find the lip area first and then they use some some lit specific features to to recover the synchronisation that's what i remember but it this case the difference is i i didn't do that i think i and thank you uh we move uh second the paper