0:00:20so that had a lot of our paper is the real-time conjugate gradient for online fmri classification
0:00:27um so first i be with on a schematic
0:00:31oh the real-time fmri system and then i will move on to show you some
0:00:36um previous work on online learning algorithms and our proposed
0:00:40real-time conjugate gradient
0:00:42and in the last part i show you some test results
0:00:46um so come initial model if the are experiments are done in a batch processing
0:00:52so the experiment will give a certain kind of task to the subject in this brings again or
0:00:57a for example with um brain state classification
0:01:01and um
0:01:02by the end of the experiment
0:01:03we gather at time series of brings again image
0:01:07and to we apply certain kind of um offline learning algorithms to have some inference results
0:01:14um
0:01:15in contrast
0:01:16a real-time fmri system
0:01:18you know a time if from my see some we don't need to wait at the end of the experiment
0:01:23at each time point
0:01:24we have uh three D um brings again image
0:01:28and uh by using and online learning algorithm we can have the inference results of the current bring state before
0:01:36we have the next
0:01:38um rings scanned image
0:01:39and um
0:01:41the benefit of doing so is that by using the real-time feedback
0:01:46um the experiment or can use
0:01:48um the real time feedback to monitor the data colour T
0:01:52um or to do the real mine reading
0:01:55or to modify the task while it is still going on
0:01:59in the
0:02:00if we give the real time feedback back to the subject in the scan are then we can be you
0:02:05to bring computer interface
0:02:07um
0:02:09however the um
0:02:10um benefits comes with the channel
0:02:13so the main
0:02:13a for the real-time fmri system um is still computational complexity
0:02:19because um we want to process uh
0:02:22if data my data which is usually of dimension tend to the power five with the one T are which
0:02:28is usually to choose three again
0:02:30and we also want to choose and accurate and also adaptive algorithm
0:02:35so that we can allow the experiment or to modified the task um on the fly
0:02:42um
0:02:43so this is our proposed mass or which is called the real-time conjugate gradient
0:02:48it is motivated by a wide used um
0:02:51um algorithm in neuroimaging community which is card the partially squares
0:02:56and a what L was miss line
0:02:58so we um we do
0:03:00but training me and the class flying in real
0:03:04and uh in a real if test
0:03:06um it shows that our algorithm is
0:03:09fast
0:03:10and it can rich and accuracy about ninety percent with being zero point five seconds
0:03:15um
0:03:16using uh
0:03:17and all dinner a personal computer
0:03:19and i also show you test results
0:03:22um which shows that the algorithm is adaptive
0:03:26so there are many online learning algorithms out there
0:03:30and uh some of them have been applied to fmri application
0:03:34um but not uh but not all of them are real
0:03:38um uh a real online algorithms some of them are are
0:03:42a trained offline and
0:03:43to and do the prediction on
0:03:46but in our and definition of online learning algorithm we meaning that we need
0:03:51both training and the classification a um in real
0:03:55so here are some examples of the two online learning algorithm including a
0:04:00um generally in your model we independent component analysis and support vector machine
0:04:06um so as i i've mentioned our algorithm is based on the partially squares soul let let me first give
0:04:13you a brief review of what is a partially square algorithm
0:04:16so here are were input
0:04:19data is uh matrix acts which is of dimension and i K where and is done
0:04:24number of um um
0:04:26uh the number of examples which are to bring can image and K is the dimension of the image which
0:04:32is usually a um
0:04:34the power and on the on the order of
0:04:36ten to the power of five
0:04:38and the output
0:04:40is the corresponding bring state um to the
0:04:43um brings can image
0:04:45so
0:04:46um partially least square assumes that
0:04:48both the input and output are generated by a
0:04:53i is same set of
0:04:54latent factors have so we can express acts as F P transpose and Y as F Q
0:05:00where
0:05:01he and Q are loading factors for X and Y respectively
0:05:06um and uh partially square is and iterative master
0:05:10in each iteration it finds the new latent factors and then it does that you knew very and a regression
0:05:16to find the loading factors P in Q
0:05:19and in the last to it does all ran one deflation to use abstract the
0:05:24current um to subtract the contribution of the current to latent factors
0:05:29and then it moves on to the next iteration
0:05:32um because it is and iterative mess so it is not
0:05:36um
0:05:37so efficient in a real-time context
0:05:40so
0:05:41in two thousand nine
0:05:43um and
0:05:44uh an improvement of the traditional partially surely is proposed it is called the rich partially square
0:05:51so the mean ideal of the rich partially square is
0:05:54uh that they add a new are the ad or rich parameter to the covariance matrix
0:06:00so that we can extract all the latent factors in only one step instead of doing uh multiple iteration
0:06:07um however this algorithm is do not efficient enough for
0:06:12um
0:06:12our desired online um
0:06:15uh a what desired real-time system
0:06:17so what we want is
0:06:19and i wear them
0:06:20which has a arable performance as the partial least squares
0:06:24but it is more efficient
0:06:26so when we look into the partially squares we found that
0:06:29um these two papers so that's partially square is a
0:06:33actually a conjugate gradient out them
0:06:36um still
0:06:37based on that we proposed a new um real-time conjugate gradient algorithm to fit in our war
0:06:44um system
0:06:46so um that's formalised the problem here so our um for the real-time system
0:06:52at each time would receive a new example
0:06:56S which is um the brings scanned image
0:06:58and and our classifier trained at T minus one makes a prediction based on the new example
0:07:05and after the
0:07:07um i was the makes the prediction we receive the true label from the subject
0:07:11and do we update our work classifier with the new uh with this information
0:07:17um so the problems become support quadratic minimization problem
0:07:22um
0:07:23because a um
0:07:24to make the algorithms
0:07:27more efficient
0:07:28instead of using all the past exam
0:07:31we
0:07:32take a sliding window of the examples
0:07:35um so at each time
0:07:37um we are on we only use the pass H examples for the training
0:07:42um
0:07:44so there two benefits of doing this the first one is like a mention it
0:07:48the efficiency and what's more important it makes the
0:07:51um algorithm to be adaptive
0:07:54so now the problems becomes
0:07:56this
0:07:56and which is a quadratic minimization problem
0:07:59and the conjugate gradients can stop it
0:08:02so
0:08:03what is the conjugate gradient
0:08:05um it is an algorithm to solve the quadratic minimization problem
0:08:09and uh it shares of a similar structure as the gradient descent
0:08:13it is and iterative master
0:08:15um with the their roles initialization it's search the directions
0:08:20which are conjugate which is conjugate date to all the previous directions
0:08:24and it does a line search on that on each direction and terminate in H
0:08:30um so to further speed up the whole um algorithm we make two major modifications
0:08:37the first is a good starting point
0:08:40um so conventional like on conjugate gradient starts at the zero initialization
0:08:45but because we using a sliding window of the data
0:08:48um each time we are only adding a new data point and remove one oh and and all data point
0:08:55so it is reasonable to assume that
0:08:57um the
0:08:58classifier at time T is very close to the class of far
0:09:02to the previous classifier
0:09:04so we are using the previous training without as the starting point for the search for all occurrence classifier
0:09:12and uh that is
0:09:13makes the algorithm faster which i will show you later in the experiment
0:09:17a out and it also encodes the past memory
0:09:21um which um makes them
0:09:24um which makes the algorithm
0:09:26um
0:09:27have to
0:09:28oh information of the past
0:09:30and uh
0:09:31another modification is
0:09:33instead of letting the algorithm terminates in each that
0:09:36um we make uh we lead it terminate in i'm max
0:09:40steps so i max mediates the past to memory
0:09:44um was done current data
0:09:46so if i max E cost
0:09:48you to H then um no matter where you your start point as
0:09:52um we don't have the a we don't have any memories of the past well only in training on the
0:09:57current data we have
0:09:59if i max i it's less then H then
0:10:02it has
0:10:02the partial past memories
0:10:04um of the previous training
0:10:09um so we
0:10:10in a paper we also show that compared with stuff partially squares
0:10:14if we have the pose
0:10:16a we have the same um zero initialization then i'll what was them
0:10:20um
0:10:21yeah know i in schools taps
0:10:23with partial partially squares algorithm
0:10:26um
0:10:26which means
0:10:28and the same initialization our algorithm can have a comparable performance as the partial square
0:10:36um so now i we show you some test results we have done
0:10:40um we tested um
0:10:42these three algorithms the first is our proposed real-time conjugate gradient L with them
0:10:47and the second one is to a partially squares applied to the window size data and the third one is
0:10:52to traditional bridge partially square apply to the we know
0:10:56site data
0:10:57in we test it on three synthetic data and three real fmri data sect
0:11:02for the for synthetic dataset we generate two hundred examples each of dimension two sows and
0:11:08and which choose two sets of features each of dimension a hundred
0:11:13so when the label is one we
0:11:15said um one
0:11:16of the features that too
0:11:18have value one and when the label with minus one which choose another the other features that
0:11:23and the label is a repeating pattern of one and minus one
0:11:27and we as some Z noise
0:11:30and the second synthetic data we um it's
0:11:33it
0:11:34we generate the
0:11:35and a similar way as the first one but just we randomise the labels
0:11:40and for the certain one um we designed this to test them
0:11:44adaptive nets of our
0:11:45um L words them so
0:11:47um in for each um a hundred and fifty example where using a new model to generate the data
0:11:55and for the fmri data test
0:11:58the first
0:11:59task is of visual perception task
0:12:02uh so we show the subject in this skin they're of actually uh a
0:12:06a in about
0:12:08which is either on the left side or on the right side
0:12:11so when to chuck about it on the left
0:12:13then the right part of the visual cortex of the subject will be activated
0:12:17and vice versa
0:12:19um and uh the label of which is the position of the child about
0:12:24um is there are repeating pattern of
0:12:27um left and the right and do we have
0:12:30um
0:12:31uh and we have a day a point which is
0:12:33um
0:12:34of dimension mentioned about a hundred and twenty two cells and every three seconds
0:12:39and the second data test is similar to the first one
0:12:42i except that we minimize the label
0:12:46and the certain one we use a we used a public available dataset
0:12:50which is published in a two thousand signs paper by
0:12:54S
0:12:54and it is that can't worry related object vision task
0:12:58so we take ten rounds of one subject
0:13:00from the data
0:13:02so basically they're show being either a face image to or a house image to the subject in the scanner
0:13:08and uh each data point is of dimension about a hundred and sixty three cells and
0:13:15so this is the test results of the three algorithms
0:13:19um so here we showed the um prediction accuracy and also the average training time for each algorithm
0:13:25um
0:13:26and as you can see
0:13:28um our our um a among these three algorithms oh algorithm them is
0:13:32um
0:13:33always the fastest just one
0:13:35and uh in most cases it um the
0:13:38uh our with them have a higher accuracy than the other two
0:13:42and the only case it doesn't do as good as the other two is the think that a data three
0:13:47which is
0:13:48um the data when we change the model which generates the yeah and later i will show you how we
0:13:55can improve this out
0:13:56so that our algorithm can have a comparable
0:13:59um performance as the other two
0:14:02um
0:14:04so this is a
0:14:06um results um this is the prediction of out of the synthetic dataset
0:14:11the black line here is
0:14:13um
0:14:13is still label of its the true label of the example with either one or minus one
0:14:19and the blue line is the prediction out
0:14:22as you can see it fits nicely with the true label
0:14:25and uh there is a global learning curve you can see on the uh on the plots
0:14:30because that is because we choose the previous
0:14:34train you results as the starting point which encodes the memory
0:14:37so that the algorithm gets more and more confident
0:14:40when it sees more and more examples
0:14:45and um
0:14:47uh
0:14:48and on the right side uh on the right is the prediction um plot for the synthetic data
0:14:54sets three
0:14:55um
0:14:56so the reason why i why with them doesn't do ask good as the other two in uh
0:15:01model changing on context it's because we have uh our them has the past memory
0:15:06um so when we think the same model the memory can help you to learn faster but if you change
0:15:12modelled
0:15:13the memory of the past actually hurts you
0:15:16um
0:15:17so it's the tradeoff between the memory and adaptive this
0:15:21um
0:15:23in this is the um
0:15:25prediction with out on the
0:15:27real fmri to
0:15:30and uh
0:15:32um
0:15:32as i said
0:15:33um i will show you are and that
0:15:36a a good starting point for the them really matters
0:15:39because if we look at the rate um training uh the residual in the training phase
0:15:43we can see that
0:15:45um at first we don't have any memory of the past so the residual is very high
0:15:49but with being ten to twenty eight pose
0:15:52using um
0:15:53and the memory helps so as to reduce the residual um
0:15:58i from like five thousand two um all
0:16:02um zero point one percent of the initial ish but of the
0:16:06um residue at the very for speaking
0:16:08so
0:16:09um
0:16:10and also we can use this information to improve our performance when the model changed
0:16:16because
0:16:17as you know is every
0:16:18time the model changed
0:16:20the residual um becomes high again so by detecting the said and change of the residual we can uh make
0:16:27the model to of get all the path
0:16:30um
0:16:30or the past memories and the start over again
0:16:33so we can
0:16:34um improve our were um
0:16:37but improve the performance of our with them so that it can has them
0:16:41um can
0:16:41it can have a comp arable i'm performance as the other two
0:16:47so for some future work
0:16:49um so right now we are are only um generate the prediction was that's but we and we have use
0:16:55it to um as our real feedback to the experiment or the sub
0:16:59oh the subject
0:17:00so uh in in next step where
0:17:03considering using that information
0:17:05um to be would like a brain computer interface
0:17:08and the we also want to try more complicated except experiments and we also want to compare with other
0:17:15um real time algorithms out there
0:17:18so um i think you
0:17:19and uh i like to take some questions if there
0:17:30so
0:17:31question
0:17:34on something
0:17:35work
0:17:38no
0:17:40so uh
0:17:42uh uh a week to P the uh right don't guy i'll go was all used to use the precondition
0:17:49the so
0:17:50have you think about that and it is possible to include that in you
0:17:54i'll go with a
0:17:57we do we use a precondition mouse since C is much to apply at the beginning and it's them crime
0:18:02of normalisation "'em" of the matrix
0:18:05so on
0:18:06do any precondition
0:18:08um but
0:18:10you like
0:18:11this thing to make a
0:18:13each renting or
0:18:15you can and do the um
0:18:16um
0:18:17but it cost
0:18:19well
0:18:19great uh
0:18:24okay
0:18:26thank you
0:18:29so
0:18:30hmmm