so that had a lot of our paper is the real-time conjugate gradient for online fmri classification um so first i be with on a schematic oh the real-time fmri system and then i will move on to show you some um previous work on online learning algorithms and our proposed real-time conjugate gradient and in the last part i show you some test results um so come initial model if the are experiments are done in a batch processing so the experiment will give a certain kind of task to the subject in this brings again or a for example with um brain state classification and um by the end of the experiment we gather at time series of brings again image and to we apply certain kind of um offline learning algorithms to have some inference results um in contrast a real-time fmri system you know a time if from my see some we don't need to wait at the end of the experiment at each time point we have uh three D um brings again image and uh by using and online learning algorithm we can have the inference results of the current bring state before we have the next um rings scanned image and um the benefit of doing so is that by using the real-time feedback um the experiment or can use um the real time feedback to monitor the data colour T um or to do the real mine reading or to modify the task while it is still going on in the if we give the real time feedback back to the subject in the scan are then we can be you to bring computer interface um however the um um benefits comes with the channel so the main a for the real-time fmri system um is still computational complexity because um we want to process uh if data my data which is usually of dimension tend to the power five with the one T are which is usually to choose three again and we also want to choose and accurate and also adaptive algorithm so that we can allow the experiment or to modified the task um on the fly um so this is our proposed mass or which is called the real-time conjugate gradient it is motivated by a wide used um um algorithm in neuroimaging community which is card the partially squares and a what L was miss line so we um we do but training me and the class flying in real and uh in a real if test um it shows that our algorithm is fast and it can rich and accuracy about ninety percent with being zero point five seconds um using uh and all dinner a personal computer and i also show you test results um which shows that the algorithm is adaptive so there are many online learning algorithms out there and uh some of them have been applied to fmri application um but not uh but not all of them are real um uh a real online algorithms some of them are are a trained offline and to and do the prediction on but in our and definition of online learning algorithm we meaning that we need both training and the classification a um in real so here are some examples of the two online learning algorithm including a um generally in your model we independent component analysis and support vector machine um so as i i've mentioned our algorithm is based on the partially squares soul let let me first give you a brief review of what is a partially square algorithm so here are were input data is uh matrix acts which is of dimension and i K where and is done number of um um uh the number of examples which are to bring can image and K is the dimension of the image which is usually a um the power and on the on the order of ten to the power of five and the output is the corresponding bring state um to the um brings can image so um partially least square assumes that both the input and output are generated by a i is same set of latent factors have so we can express acts as F P transpose and Y as F Q where he and Q are loading factors for X and Y respectively um and uh partially square is and iterative master in each iteration it finds the new latent factors and then it does that you knew very and a regression to find the loading factors P in Q and in the last to it does all ran one deflation to use abstract the current um to subtract the contribution of the current to latent factors and then it moves on to the next iteration um because it is and iterative mess so it is not um so efficient in a real-time context so in two thousand nine um and uh an improvement of the traditional partially surely is proposed it is called the rich partially square so the mean ideal of the rich partially square is uh that they add a new are the ad or rich parameter to the covariance matrix so that we can extract all the latent factors in only one step instead of doing uh multiple iteration um however this algorithm is do not efficient enough for um our desired online um uh a what desired real-time system so what we want is and i wear them which has a arable performance as the partial least squares but it is more efficient so when we look into the partially squares we found that um these two papers so that's partially square is a actually a conjugate gradient out them um still based on that we proposed a new um real-time conjugate gradient algorithm to fit in our war um system so um that's formalised the problem here so our um for the real-time system at each time would receive a new example S which is um the brings scanned image and and our classifier trained at T minus one makes a prediction based on the new example and after the um i was the makes the prediction we receive the true label from the subject and do we update our work classifier with the new uh with this information um so the problems become support quadratic minimization problem um because a um to make the algorithms more efficient instead of using all the past exam we take a sliding window of the examples um so at each time um we are on we only use the pass H examples for the training um so there two benefits of doing this the first one is like a mention it the efficiency and what's more important it makes the um algorithm to be adaptive so now the problems becomes this and which is a quadratic minimization problem and the conjugate gradients can stop it so what is the conjugate gradient um it is an algorithm to solve the quadratic minimization problem and uh it shares of a similar structure as the gradient descent it is and iterative master um with the their roles initialization it's search the directions which are conjugate which is conjugate date to all the previous directions and it does a line search on that on each direction and terminate in H um so to further speed up the whole um algorithm we make two major modifications the first is a good starting point um so conventional like on conjugate gradient starts at the zero initialization but because we using a sliding window of the data um each time we are only adding a new data point and remove one oh and and all data point so it is reasonable to assume that um the classifier at time T is very close to the class of far to the previous classifier so we are using the previous training without as the starting point for the search for all occurrence classifier and uh that is makes the algorithm faster which i will show you later in the experiment a out and it also encodes the past memory um which um makes them um which makes the algorithm um have to oh information of the past and uh another modification is instead of letting the algorithm terminates in each that um we make uh we lead it terminate in i'm max steps so i max mediates the past to memory um was done current data so if i max E cost you to H then um no matter where you your start point as um we don't have the a we don't have any memories of the past well only in training on the current data we have if i max i it's less then H then it has the partial past memories um of the previous training um so we in a paper we also show that compared with stuff partially squares if we have the pose a we have the same um zero initialization then i'll what was them um yeah know i in schools taps with partial partially squares algorithm um which means and the same initialization our algorithm can have a comparable performance as the partial square um so now i we show you some test results we have done um we tested um these three algorithms the first is our proposed real-time conjugate gradient L with them and the second one is to a partially squares applied to the window size data and the third one is to traditional bridge partially square apply to the we know site data in we test it on three synthetic data and three real fmri data sect for the for synthetic dataset we generate two hundred examples each of dimension two sows and and which choose two sets of features each of dimension a hundred so when the label is one we said um one of the features that too have value one and when the label with minus one which choose another the other features that and the label is a repeating pattern of one and minus one and we as some Z noise and the second synthetic data we um it's it we generate the and a similar way as the first one but just we randomise the labels and for the certain one um we designed this to test them adaptive nets of our um L words them so um in for each um a hundred and fifty example where using a new model to generate the data and for the fmri data test the first task is of visual perception task uh so we show the subject in this skin they're of actually uh a a in about which is either on the left side or on the right side so when to chuck about it on the left then the right part of the visual cortex of the subject will be activated and vice versa um and uh the label of which is the position of the child about um is there are repeating pattern of um left and the right and do we have um uh and we have a day a point which is um of dimension mentioned about a hundred and twenty two cells and every three seconds and the second data test is similar to the first one i except that we minimize the label and the certain one we use a we used a public available dataset which is published in a two thousand signs paper by S and it is that can't worry related object vision task so we take ten rounds of one subject from the data so basically they're show being either a face image to or a house image to the subject in the scanner and uh each data point is of dimension about a hundred and sixty three cells and so this is the test results of the three algorithms um so here we showed the um prediction accuracy and also the average training time for each algorithm um and as you can see um our our um a among these three algorithms oh algorithm them is um always the fastest just one and uh in most cases it um the uh our with them have a higher accuracy than the other two and the only case it doesn't do as good as the other two is the think that a data three which is um the data when we change the model which generates the yeah and later i will show you how we can improve this out so that our algorithm can have a comparable um performance as the other two um so this is a um results um this is the prediction of out of the synthetic dataset the black line here is um is still label of its the true label of the example with either one or minus one and the blue line is the prediction out as you can see it fits nicely with the true label and uh there is a global learning curve you can see on the uh on the plots because that is because we choose the previous train you results as the starting point which encodes the memory so that the algorithm gets more and more confident when it sees more and more examples and um uh and on the right side uh on the right is the prediction um plot for the synthetic data sets three um so the reason why i why with them doesn't do ask good as the other two in uh model changing on context it's because we have uh our them has the past memory um so when we think the same model the memory can help you to learn faster but if you change modelled the memory of the past actually hurts you um so it's the tradeoff between the memory and adaptive this um in this is the um prediction with out on the real fmri to and uh um as i said um i will show you are and that a a good starting point for the them really matters because if we look at the rate um training uh the residual in the training phase we can see that um at first we don't have any memory of the past so the residual is very high but with being ten to twenty eight pose using um and the memory helps so as to reduce the residual um i from like five thousand two um all um zero point one percent of the initial ish but of the um residue at the very for speaking so um and also we can use this information to improve our performance when the model changed because as you know is every time the model changed the residual um becomes high again so by detecting the said and change of the residual we can uh make the model to of get all the path um or the past memories and the start over again so we can um improve our were um but improve the performance of our with them so that it can has them um can it can have a comp arable i'm performance as the other two so for some future work um so right now we are are only um generate the prediction was that's but we and we have use it to um as our real feedback to the experiment or the sub oh the subject so uh in in next step where considering using that information um to be would like a brain computer interface and the we also want to try more complicated except experiments and we also want to compare with other um real time algorithms out there so um i think you and uh i like to take some questions if there so question on something work no so uh uh uh a week to P the uh right don't guy i'll go was all used to use the precondition the so have you think about that and it is possible to include that in you i'll go with a we do we use a precondition mouse since C is much to apply at the beginning and it's them crime of normalisation "'em" of the matrix so on do any precondition um but you like this thing to make a each renting or you can and do the um um but it cost well great uh okay thank you so hmmm