0:00:13hi
0:00:14uh i'm for each of which you can reach from the machine learning group at technical university in berlin
0:00:19and i would present
0:00:20you a lotta recent work about stationary common patterns
0:00:24this is joint work with common be dora and able to a key cover now
0:00:31so here is an overview
0:00:32i would start with an introduction
0:00:35it's and tell you something about the common spatial patterns method
0:00:39and i was stationary this common spatial map headers method
0:00:43then i was show some results
0:00:46and concludes that all of a summary
0:00:51so our target application is brain computer interfacing
0:00:55and the brain computer interface system
0:00:57aims to translate the intent of a subject
0:01:00for example measure
0:01:01from brain activity
0:01:03you're in this case by E G
0:01:06into account for common for a computer application
0:01:09so it in is this case you a measure ring E G and
0:01:12you want to control those games is pinball game
0:01:15but you can also think of other applications like um
0:01:19controlling a wheelchair or a new row proceed
0:01:25so a very popular paradigm
0:01:27from uh for bci is motor imagery
0:01:30and motor imagery
0:01:32the subject
0:01:34imagine some motions with the right hand towards the left hand towards the feet
0:01:40and is this different emotions lead to different
0:01:42different patterns in C G
0:01:45and if your system is able to extract and classify this different patterns
0:01:50then you can come compared to a computer comment and control an application like
0:01:58so there are still some challenges
0:02:00so for example the E G signal is usually high dimension uh
0:02:04it has a lower spatial resolution
0:02:07that means you have a volume conduction effect
0:02:10and sit this noisy and non-stationary
0:02:14minus one stationary i mean that's is that signal properties change over time
0:02:20so what usually people do in bci as they apply some efforts
0:02:24uh some spatial filtering method
0:02:27for example the csp
0:02:29in order to reduce the dimensionality
0:02:33so it's of the goal is to combine electrodes and to like to project a signal to a
0:02:37to a subspace
0:02:38and increase the spatial resolution and hopefully the signal-to-noise ratio
0:02:43and simplified the learning problem
0:02:47but the problem of csp is that
0:02:49it's
0:02:49it's prone to overfitting and it's can be negatively affected by artifacts
0:02:55and
0:02:55it doesn't tech as a non tissue issue that means
0:02:58if you if your computer features
0:03:00applying csp
0:03:02then the features may still change
0:03:05quite a bit and
0:03:06and usually you classifier assumes
0:03:09a stable distributions so in machine learning to usually the else assume
0:03:12that's a
0:03:13training data and the three test data are comes from the same distribution and if you if you data if
0:03:18should distribution change too much
0:03:21then you it doesn't work so the classifier
0:03:23um
0:03:24we're not work
0:03:25all optimal
0:03:28so therefore we extend
0:03:30the csp my thought
0:03:31um
0:03:32and extract most stationary feature
0:03:37or like non-stationary at changes of the signal properties of a time
0:03:42and same may have very different sources and time scale
0:03:45for example
0:03:46you you may have changes in the and X road input then as
0:03:50when the electrodes gets lose all the gel between the scout and the electrode dries out
0:03:57you may also have muscular activity an eye movements
0:04:01they made it to artifacts in the data
0:04:04and
0:04:05usually also have a
0:04:07changes in task involve so when subjects could tired
0:04:11all differences between sessions
0:04:13so what i can no feedback conditions the calibration session whereas
0:04:17in the if pick session you provides
0:04:21so
0:04:22basically all those non stationarities
0:04:25a a bad for you because uh as the negative negatively
0:04:28at um affect you classifier
0:04:31and so there are two ways to deal with this you can
0:04:33one way is to extract better features to make your features more troubles and more invariant to this changes
0:04:39does this is the way we um we propose an our paper our we
0:04:44target of our paper
0:04:45the other way is to do adaptation so you can adapt the classifier to double will sustain change
0:04:55okay so a
0:04:56common spatial patterns methods
0:04:58it's and i thought we very popular and brain computer interfacing and and it maximises
0:05:04the variance from one class while minimizing the variance of the other class
0:05:09so we if you're you have like to conditions you imagine you have the imagination of the movement of the
0:05:14right hand and the left hand
0:05:17and a you you see that these two guys uh down here think maximise the variance of the signal now
0:05:23to the project signal the maximizer in the
0:05:26uh right hand
0:05:27uh condition but minimize the and the
0:05:30left hand condition
0:05:31and the two guys a off they do exactly the opposite so them the maximise the variance in the left
0:05:36condition but many in the right condition
0:05:40so
0:05:40why do we want to do so like in in B C i U
0:05:44goal is to discriminate between mental states
0:05:48and um
0:05:49you know that the variance of a band has filtered signal is equal to band power
0:05:55in is it's frequency but
0:05:57so and in you can discriminate mental state
0:06:02and by looking at the power in the specific frequency bands
0:06:06so when we need to sell
0:06:08um you can easily
0:06:09um detect changes uh between the conditions because you're you're looking at the bed power is finally you are looking
0:06:16at the bed power one specific frequency band a band
0:06:21and the csp can be solved as uh
0:06:23generalized eigenvalue problem because
0:06:26like you can formulate a garrison
0:06:29here so you want to maximise
0:06:31um
0:06:32this
0:06:33you want to maximise the project variance of one condition
0:06:36while minimizing the the variance of the common conditional
0:06:41equally you can also right here you want to minimize the variance of the other condition
0:06:45of
0:06:46sigma minus
0:06:48so we can solve this very easy
0:06:51it might not work
0:06:53but our idea is
0:06:55um we
0:06:56do not only want the projection
0:06:58which uh which has this properties but we also want that's a projection
0:07:03um
0:07:04if
0:07:05provide stationary features so we want to penalise non-stationary projection type attack directions
0:07:11so we introduce the penalty if
0:07:13P of W
0:07:14two than denominator also really cool of course for coefficient
0:07:19you're
0:07:19so we add this
0:07:20P of W
0:07:22here
0:07:22and then the final goal is to like to
0:07:26uh to maximise the project variance one condition while minimizing the variance in the other condition and
0:07:33minimizing this
0:07:34P a penalty term
0:07:39so
0:07:39the penalty term measures somehow non stationarities
0:07:43so we want to measure the the deviation
0:07:46between the average case so this is
0:07:49the sigma C is the average
0:07:51matrix of all trials from conditions C
0:07:55um the one condition
0:07:56and uh the can mark K C is the
0:08:00uh as
0:08:01the covariance matrix from the cape chunk a channel maybe
0:08:05may consist of one trial or more than one trials from the same cloth
0:08:09so
0:08:10you want to kind of
0:08:11to minimize the
0:08:13and the deviation from the from each trial
0:08:17of
0:08:18to the to the average case
0:08:20so this is like
0:08:21i don't turn because you want to be stationary
0:08:24in for for each class separately so you want to do it for each method
0:08:29hmmm
0:08:30yeah so the problem is if you
0:08:32and this quantity to the denominator
0:08:35then
0:08:36uh
0:08:37you want to get this form anymore because you cannot take out as W C outside to some
0:08:42because of this uh
0:08:44absolute value function here
0:08:46so you you want the egg to solve it as the generalized eigenvalue problem anymore
0:08:53so what
0:08:54what do we do about this we add a quantity which is related
0:08:58so we take this W vector outside
0:09:02the sum
0:09:03but introduce an operator F
0:09:05to make this difference matrix
0:09:07the to be positive definite
0:09:09because we are only interested in
0:09:12like in in the
0:09:13we don't
0:09:14win the variation
0:09:16the of both sides and three that in the similar way so we we do not care if
0:09:20like for example here we we do not care if this guy is big are
0:09:23oh this guy's bigger we are only interested in the difference after projection
0:09:28but
0:09:28here
0:09:29uh we do kind of the same but
0:09:32um
0:09:34we do this before projecting so we we do not do this after projecting up because we take this W
0:09:40outside the sum
0:09:41and we can also show that
0:09:43is this quantity gives an upper bound
0:09:46of the other quantity which we want that's
0:09:48to minimize
0:09:50with
0:09:50make sense to use it
0:09:53so we put this guy and the rayleigh coefficient of our objective function
0:09:58so a lot data set is
0:10:00we compare
0:10:01C S P and S E S P on the data set of at at subjects
0:10:05the foaming a motion meant three
0:10:08say when you to B C i so they did that for the first time
0:10:12we selected for each user as a best
0:10:14binary task combination and the that's parameters on the calibration data
0:10:20and we we
0:10:21we this song testing
0:10:24but test session with feedback back
0:10:26with three hundred trials
0:10:28we record that's so i E G from sixty eight three select
0:10:32electrodes
0:10:33and use log variance feature and the net the egg classifier uh and error rates to measure up performance
0:10:40we use a fixed number of fit respect class
0:10:45and select is the trade of parameter
0:10:48uh
0:10:49with cross validation and we also tried different chunk size a
0:10:53and select it's the best one also by a cross validation
0:10:57on the calibration date
0:11:00so if as some performance results that you had you see the scatter plots when using three csp directions back
0:11:07counts
0:11:08or using one csp direction class
0:11:10on the X axis used
0:11:11the error rate of
0:11:13csp P and on the Y is error rate of
0:11:17our approach
0:11:18and you can you can see that especially specially for subjects which
0:11:22a which fayer when using csp P like these guys they calm really better
0:11:27when with our method and
0:11:29that's the same as can be seen here
0:11:32and we compute that's um
0:11:34test statistic and the changes a significance our method works better especially for the subjects
0:11:41the which have
0:11:42a red light uh larger than thirty percent
0:11:45so we we can improve in those cases which which fail in when using
0:11:49csp we just somehow clear because if
0:11:52it's csp works
0:11:54well
0:11:54then you're
0:11:55patterns are probably really really good in the signal to noise ratio
0:11:59it's good so you do not have a lot of room to improve it
0:12:04but um
0:12:06as so the question is why does
0:12:07as C S P perform better
0:12:10a basically we know that's csp may fail to extract the current patterns when effective by defect
0:12:17and
0:12:18as you saw
0:12:19stationary csp P
0:12:21it's more robust to as artifacts because it treats artifacts as non-stationary
0:12:25nonstationary
0:12:27and it's we uses as non-stationary in the features
0:12:31and C S P is also known to all buffet
0:12:33and as csp S P at
0:12:35you know like this fit with lots not
0:12:39and produces more it's red uses changes and the features
0:12:43so for example you hear you see um
0:12:45the the result that subject performing
0:12:48left and right to motion imagery
0:12:50you see that both methods uh a but to extract the colour correct left hand that are
0:12:56so there activity of the on the right hemisphere this means that
0:13:00um it's the pattern for the left hand motion imagery
0:13:04but in the
0:13:05pose the right hand the csp method fayer
0:13:08because probably in this electrodes there is an artifact of the um
0:13:12this is an four gives the noise the signal all that signal
0:13:16uh it's
0:13:17kind of nonstationary
0:13:18and but
0:13:19scs piece
0:13:21if they're a bit affected by this
0:13:22artifacts as this electrode but it's
0:13:25it's a but to
0:13:26strike the
0:13:27more less correct header of the
0:13:30right hand
0:13:32and you also see here when you look at the distribution between
0:13:36uh training feature as and test features
0:13:39training features uh
0:13:40uh
0:13:41of the triangles and test features of the circles
0:13:44so you see that the distribution is the training phase of
0:13:47S S of P
0:13:49look this
0:13:50usually like like here
0:13:52but it changes a lot when when you go to the test distribution when when you when you look at
0:13:57the test features
0:13:58so that
0:13:59the distribution is completely difference in the test
0:14:02that's case
0:14:04but um
0:14:05when we use C S P we extract most stable features most stationary features
0:14:10so the the distribution between training and
0:14:14and test phase
0:14:15is um
0:14:16it's more less the same
0:14:17so you you can classify in this case to think that if i a lot better
0:14:21so here's the decision boundary and to see that
0:14:25a in that that have a case you really fail
0:14:27to classify
0:14:28a correct you here
0:14:32okay so in summary
0:14:34re
0:14:34extend that's a popular csp method
0:14:38to extract stationary features
0:14:41a S P significantly increase the classification a if especially for subjects
0:14:47we perform badly with
0:14:49csp
0:14:50and unlike other methods like invariant csp
0:14:53we are completely data-driven
0:14:56we do not require additional recordings or models of the expected changes
0:15:02and we also showed that it was not presented in this paper that the combination of stationary features and
0:15:09unsupervised adaptation can further improve classification performance
0:15:15so i want to thank you for your attention
0:15:18we have to and
0:15:37um can you explain more details about um uh
0:15:41dot function yeah
0:15:43in in our town
0:15:47you mean um
0:15:49yeah so the function just one yeah
0:15:51so this function F is the set but it's kind of a heuristic because it makes
0:15:55you're metrics this difference metrics makes it
0:15:58positive
0:15:59definite
0:16:00so it means it's flits
0:16:01the sign of all the negative eigenvalue
0:16:04and it's as i
0:16:06why you want to do so because
0:16:07um
0:16:09we want to use some you what you want to sound of K
0:16:12of possible value a positive value so you want to
0:16:15of for example here you some of like
0:16:17oh what okay of possible uh a positive deviations
0:16:22and you kind of want to do the same here
0:16:25so you make this met the difference metrics positive definite
0:16:28and then we can show that this is an upper bound
0:16:30on on the other quantity
0:16:32so so here you did yeah on the operation dot um duh free to sign on the whole new eigen
0:16:39brazil has you and the expanding this right
0:16:42uh
0:16:43so what are we with computers difference metric then we do a eigen decomposition uh_huh and then flipped uh uh
0:16:48the sign of or negative eigenvalues
0:16:52okay so you keep on the positive ones unpleasantly
0:16:55yeah
0:16:55okay
0:16:56an exit that they're actually i
0:16:58eigen vectors like the directions are kind of this
0:17:02flipped
0:17:02or like when you have a
0:17:04eigenvector with a negative
0:17:06eigenvalues and you few flip it
0:17:08simply but you do not like
0:17:10change a lot but you only flip it
0:17:11because you are only interested in positive contributions
0:17:14yeah yeah
0:17:15okay
0:17:15thing
0:17:20oh uh while you're
0:17:23you know i need a lead to the chunks
0:17:25you know uh really all you have some
0:17:28uh
0:17:29no particle you can use clustering to find some similarities as well no you you you can you can simply
0:17:35use
0:17:36the channel size of one that means that you use
0:17:38each trial
0:17:40that each trial is enters the channel
0:17:42you can do for example this we can do this uh try to wise
0:17:46well you can put
0:17:47the
0:17:48trials from the same class which a subsequent
0:17:51together in one chunk
0:17:52so we do not apply any for clustering we only like put some together
0:17:57overall we we do it for each trial separate
0:18:06my question about your
0:18:09yeah money consuming and that at different me
0:18:14no this is was only one uh one one test
0:18:17session
0:18:18okay
0:18:23uh the question what the clustering of the chunk sizes
0:18:26so if you
0:18:27if you use the chunk size which is not a than one would you could
0:18:30the look
0:18:31average old part of you know and stationarity
0:18:35and yeah so this is what this was the idea to use chunk sizes because
0:18:39with you use chunk size of one then you like detect
0:18:43the changes on a small uh times K
0:18:46if you take that
0:18:47chunk sizes then
0:18:49you time scale
0:18:50we also be bigger because we average out the changes which only a curve for example in one trial
0:18:56so we we tried different
0:18:57chunk sizes and like select is the best one using cross-validation
0:19:06oh