0:00:13 | hi |
---|---|

0:00:14 | uh i'm for each of which you can reach from the machine learning group at technical university in berlin |

0:00:19 | and i would present |

0:00:20 | you a lotta recent work about stationary common patterns |

0:00:24 | this is joint work with common be dora and able to a key cover now |

0:00:31 | so here is an overview |

0:00:32 | i would start with an introduction |

0:00:35 | it's and tell you something about the common spatial patterns method |

0:00:39 | and i was stationary this common spatial map headers method |

0:00:43 | then i was show some results |

0:00:46 | and concludes that all of a summary |

0:00:51 | so our target application is brain computer interfacing |

0:00:55 | and the brain computer interface system |

0:00:57 | aims to translate the intent of a subject |

0:01:00 | for example measure |

0:01:01 | from brain activity |

0:01:03 | you're in this case by E G |

0:01:06 | into account for common for a computer application |

0:01:09 | so it in is this case you a measure ring E G and |

0:01:12 | you want to control those games is pinball game |

0:01:15 | but you can also think of other applications like um |

0:01:19 | controlling a wheelchair or a new row proceed |

0:01:25 | so a very popular paradigm |

0:01:27 | from uh for bci is motor imagery |

0:01:30 | and motor imagery |

0:01:32 | the subject |

0:01:34 | imagine some motions with the right hand towards the left hand towards the feet |

0:01:40 | and is this different emotions lead to different |

0:01:42 | different patterns in C G |

0:01:45 | and if your system is able to extract and classify this different patterns |

0:01:50 | then you can come compared to a computer comment and control an application like |

0:01:58 | so there are still some challenges |

0:02:00 | so for example the E G signal is usually high dimension uh |

0:02:04 | it has a lower spatial resolution |

0:02:07 | that means you have a volume conduction effect |

0:02:10 | and sit this noisy and non-stationary |

0:02:14 | minus one stationary i mean that's is that signal properties change over time |

0:02:20 | so what usually people do in bci as they apply some efforts |

0:02:24 | uh some spatial filtering method |

0:02:27 | for example the csp |

0:02:29 | in order to reduce the dimensionality |

0:02:33 | so it's of the goal is to combine electrodes and to like to project a signal to a |

0:02:37 | to a subspace |

0:02:38 | and increase the spatial resolution and hopefully the signal-to-noise ratio |

0:02:43 | and simplified the learning problem |

0:02:47 | but the problem of csp is that |

0:02:49 | it's |

0:02:49 | it's prone to overfitting and it's can be negatively affected by artifacts |

0:02:55 | and |

0:02:55 | it doesn't tech as a non tissue issue that means |

0:02:58 | if you if your computer features |

0:03:00 | applying csp |

0:03:02 | then the features may still change |

0:03:05 | quite a bit and |

0:03:06 | and usually you classifier assumes |

0:03:09 | a stable distributions so in machine learning to usually the else assume |

0:03:12 | that's a |

0:03:13 | training data and the three test data are comes from the same distribution and if you if you data if |

0:03:18 | should distribution change too much |

0:03:21 | then you it doesn't work so the classifier |

0:03:23 | um |

0:03:24 | we're not work |

0:03:25 | all optimal |

0:03:28 | so therefore we extend |

0:03:30 | the csp my thought |

0:03:31 | um |

0:03:32 | and extract most stationary feature |

0:03:37 | or like non-stationary at changes of the signal properties of a time |

0:03:42 | and same may have very different sources and time scale |

0:03:45 | for example |

0:03:46 | you you may have changes in the and X road input then as |

0:03:50 | when the electrodes gets lose all the gel between the scout and the electrode dries out |

0:03:57 | you may also have muscular activity an eye movements |

0:04:01 | they made it to artifacts in the data |

0:04:04 | and |

0:04:05 | usually also have a |

0:04:07 | changes in task involve so when subjects could tired |

0:04:11 | all differences between sessions |

0:04:13 | so what i can no feedback conditions the calibration session whereas |

0:04:17 | in the if pick session you provides |

0:04:21 | so |

0:04:22 | basically all those non stationarities |

0:04:25 | a a bad for you because uh as the negative negatively |

0:04:28 | at um affect you classifier |

0:04:31 | and so there are two ways to deal with this you can |

0:04:33 | one way is to extract better features to make your features more troubles and more invariant to this changes |

0:04:39 | does this is the way we um we propose an our paper our we |

0:04:44 | target of our paper |

0:04:45 | the other way is to do adaptation so you can adapt the classifier to double will sustain change |

0:04:55 | okay so a |

0:04:56 | common spatial patterns methods |

0:04:58 | it's and i thought we very popular and brain computer interfacing and and it maximises |

0:05:04 | the variance from one class while minimizing the variance of the other class |

0:05:09 | so we if you're you have like to conditions you imagine you have the imagination of the movement of the |

0:05:14 | right hand and the left hand |

0:05:17 | and a you you see that these two guys uh down here think maximise the variance of the signal now |

0:05:23 | to the project signal the maximizer in the |

0:05:26 | uh right hand |

0:05:27 | uh condition but minimize the and the |

0:05:30 | left hand condition |

0:05:31 | and the two guys a off they do exactly the opposite so them the maximise the variance in the left |

0:05:36 | condition but many in the right condition |

0:05:40 | so |

0:05:40 | why do we want to do so like in in B C i U |

0:05:44 | goal is to discriminate between mental states |

0:05:48 | and um |

0:05:49 | you know that the variance of a band has filtered signal is equal to band power |

0:05:55 | in is it's frequency but |

0:05:57 | so and in you can discriminate mental state |

0:06:02 | and by looking at the power in the specific frequency bands |

0:06:06 | so when we need to sell |

0:06:08 | um you can easily |

0:06:09 | um detect changes uh between the conditions because you're you're looking at the bed power is finally you are looking |

0:06:16 | at the bed power one specific frequency band a band |

0:06:21 | and the csp can be solved as uh |

0:06:23 | generalized eigenvalue problem because |

0:06:26 | like you can formulate a garrison |

0:06:29 | here so you want to maximise |

0:06:31 | um |

0:06:32 | this |

0:06:33 | you want to maximise the project variance of one condition |

0:06:36 | while minimizing the the variance of the common conditional |

0:06:41 | equally you can also right here you want to minimize the variance of the other condition |

0:06:45 | of |

0:06:46 | sigma minus |

0:06:48 | so we can solve this very easy |

0:06:51 | it might not work |

0:06:53 | but our idea is |

0:06:55 | um we |

0:06:56 | do not only want the projection |

0:06:58 | which uh which has this properties but we also want that's a projection |

0:07:03 | um |

0:07:04 | if |

0:07:05 | provide stationary features so we want to penalise non-stationary projection type attack directions |

0:07:11 | so we introduce the penalty if |

0:07:13 | P of W |

0:07:14 | two than denominator also really cool of course for coefficient |

0:07:19 | you're |

0:07:19 | so we add this |

0:07:20 | P of W |

0:07:22 | here |

0:07:22 | and then the final goal is to like to |

0:07:26 | uh to maximise the project variance one condition while minimizing the variance in the other condition and |

0:07:33 | minimizing this |

0:07:34 | P a penalty term |

0:07:39 | so |

0:07:39 | the penalty term measures somehow non stationarities |

0:07:43 | so we want to measure the the deviation |

0:07:46 | between the average case so this is |

0:07:49 | the sigma C is the average |

0:07:51 | matrix of all trials from conditions C |

0:07:55 | um the one condition |

0:07:56 | and uh the can mark K C is the |

0:08:00 | uh as |

0:08:01 | the covariance matrix from the cape chunk a channel maybe |

0:08:05 | may consist of one trial or more than one trials from the same cloth |

0:08:09 | so |

0:08:10 | you want to kind of |

0:08:11 | to minimize the |

0:08:13 | and the deviation from the from each trial |

0:08:17 | of |

0:08:18 | to the to the average case |

0:08:20 | so this is like |

0:08:21 | i don't turn because you want to be stationary |

0:08:24 | in for for each class separately so you want to do it for each method |

0:08:29 | hmmm |

0:08:30 | yeah so the problem is if you |

0:08:32 | and this quantity to the denominator |

0:08:35 | then |

0:08:36 | uh |

0:08:37 | you want to get this form anymore because you cannot take out as W C outside to some |

0:08:42 | because of this uh |

0:08:44 | absolute value function here |

0:08:46 | so you you want the egg to solve it as the generalized eigenvalue problem anymore |

0:08:53 | so what |

0:08:54 | what do we do about this we add a quantity which is related |

0:08:58 | so we take this W vector outside |

0:09:02 | the sum |

0:09:03 | but introduce an operator F |

0:09:05 | to make this difference matrix |

0:09:07 | the to be positive definite |

0:09:09 | because we are only interested in |

0:09:12 | like in in the |

0:09:13 | we don't |

0:09:14 | win the variation |

0:09:16 | the of both sides and three that in the similar way so we we do not care if |

0:09:20 | like for example here we we do not care if this guy is big are |

0:09:23 | oh this guy's bigger we are only interested in the difference after projection |

0:09:28 | but |

0:09:28 | here |

0:09:29 | uh we do kind of the same but |

0:09:32 | um |

0:09:34 | we do this before projecting so we we do not do this after projecting up because we take this W |

0:09:40 | outside the sum |

0:09:41 | and we can also show that |

0:09:43 | is this quantity gives an upper bound |

0:09:46 | of the other quantity which we want that's |

0:09:48 | to minimize |

0:09:50 | with |

0:09:50 | make sense to use it |

0:09:53 | so we put this guy and the rayleigh coefficient of our objective function |

0:09:58 | so a lot data set is |

0:10:00 | we compare |

0:10:01 | C S P and S E S P on the data set of at at subjects |

0:10:05 | the foaming a motion meant three |

0:10:08 | say when you to B C i so they did that for the first time |

0:10:12 | we selected for each user as a best |

0:10:14 | binary task combination and the that's parameters on the calibration data |

0:10:20 | and we we |

0:10:21 | we this song testing |

0:10:24 | but test session with feedback back |

0:10:26 | with three hundred trials |

0:10:28 | we record that's so i E G from sixty eight three select |

0:10:32 | electrodes |

0:10:33 | and use log variance feature and the net the egg classifier uh and error rates to measure up performance |

0:10:40 | we use a fixed number of fit respect class |

0:10:45 | and select is the trade of parameter |

0:10:48 | uh |

0:10:49 | with cross validation and we also tried different chunk size a |

0:10:53 | and select it's the best one also by a cross validation |

0:10:57 | on the calibration date |

0:11:00 | so if as some performance results that you had you see the scatter plots when using three csp directions back |

0:11:07 | counts |

0:11:08 | or using one csp direction class |

0:11:10 | on the X axis used |

0:11:11 | the error rate of |

0:11:13 | csp P and on the Y is error rate of |

0:11:17 | our approach |

0:11:18 | and you can you can see that especially specially for subjects which |

0:11:22 | a which fayer when using csp P like these guys they calm really better |

0:11:27 | when with our method and |

0:11:29 | that's the same as can be seen here |

0:11:32 | and we compute that's um |

0:11:34 | test statistic and the changes a significance our method works better especially for the subjects |

0:11:41 | the which have |

0:11:42 | a red light uh larger than thirty percent |

0:11:45 | so we we can improve in those cases which which fail in when using |

0:11:49 | csp we just somehow clear because if |

0:11:52 | it's csp works |

0:11:54 | well |

0:11:54 | then you're |

0:11:55 | patterns are probably really really good in the signal to noise ratio |

0:11:59 | it's good so you do not have a lot of room to improve it |

0:12:04 | but um |

0:12:06 | as so the question is why does |

0:12:07 | as C S P perform better |

0:12:10 | a basically we know that's csp may fail to extract the current patterns when effective by defect |

0:12:17 | and |

0:12:18 | as you saw |

0:12:19 | stationary csp P |

0:12:21 | it's more robust to as artifacts because it treats artifacts as non-stationary |

0:12:25 | nonstationary |

0:12:27 | and it's we uses as non-stationary in the features |

0:12:31 | and C S P is also known to all buffet |

0:12:33 | and as csp S P at |

0:12:35 | you know like this fit with lots not |

0:12:39 | and produces more it's red uses changes and the features |

0:12:43 | so for example you hear you see um |

0:12:45 | the the result that subject performing |

0:12:48 | left and right to motion imagery |

0:12:50 | you see that both methods uh a but to extract the colour correct left hand that are |

0:12:56 | so there activity of the on the right hemisphere this means that |

0:13:00 | um it's the pattern for the left hand motion imagery |

0:13:04 | but in the |

0:13:05 | pose the right hand the csp method fayer |

0:13:08 | because probably in this electrodes there is an artifact of the um |

0:13:12 | this is an four gives the noise the signal all that signal |

0:13:16 | uh it's |

0:13:17 | kind of nonstationary |

0:13:18 | and but |

0:13:19 | scs piece |

0:13:21 | if they're a bit affected by this |

0:13:22 | artifacts as this electrode but it's |

0:13:25 | it's a but to |

0:13:26 | strike the |

0:13:27 | more less correct header of the |

0:13:30 | right hand |

0:13:32 | and you also see here when you look at the distribution between |

0:13:36 | uh training feature as and test features |

0:13:39 | training features uh |

0:13:40 | uh |

0:13:41 | of the triangles and test features of the circles |

0:13:44 | so you see that the distribution is the training phase of |

0:13:47 | S S of P |

0:13:49 | look this |

0:13:50 | usually like like here |

0:13:52 | but it changes a lot when when you go to the test distribution when when you when you look at |

0:13:57 | the test features |

0:13:58 | so that |

0:13:59 | the distribution is completely difference in the test |

0:14:02 | that's case |

0:14:04 | but um |

0:14:05 | when we use C S P we extract most stable features most stationary features |

0:14:10 | so the the distribution between training and |

0:14:14 | and test phase |

0:14:15 | is um |

0:14:16 | it's more less the same |

0:14:17 | so you you can classify in this case to think that if i a lot better |

0:14:21 | so here's the decision boundary and to see that |

0:14:25 | a in that that have a case you really fail |

0:14:27 | to classify |

0:14:28 | a correct you here |

0:14:32 | okay so in summary |

0:14:34 | re |

0:14:34 | extend that's a popular csp method |

0:14:38 | to extract stationary features |

0:14:41 | a S P significantly increase the classification a if especially for subjects |

0:14:47 | we perform badly with |

0:14:49 | csp |

0:14:50 | and unlike other methods like invariant csp |

0:14:53 | we are completely data-driven |

0:14:56 | we do not require additional recordings or models of the expected changes |

0:15:02 | and we also showed that it was not presented in this paper that the combination of stationary features and |

0:15:09 | unsupervised adaptation can further improve classification performance |

0:15:15 | so i want to thank you for your attention |

0:15:18 | we have to and |

0:15:37 | um can you explain more details about um uh |

0:15:41 | dot function yeah |

0:15:43 | in in our town |

0:15:47 | you mean um |

0:15:49 | yeah so the function just one yeah |

0:15:51 | so this function F is the set but it's kind of a heuristic because it makes |

0:15:55 | you're metrics this difference metrics makes it |

0:15:58 | positive |

0:15:59 | definite |

0:16:00 | so it means it's flits |

0:16:01 | the sign of all the negative eigenvalue |

0:16:04 | and it's as i |

0:16:06 | why you want to do so because |

0:16:07 | um |

0:16:09 | we want to use some you what you want to sound of K |

0:16:12 | of possible value a positive value so you want to |

0:16:15 | of for example here you some of like |

0:16:17 | oh what okay of possible uh a positive deviations |

0:16:22 | and you kind of want to do the same here |

0:16:25 | so you make this met the difference metrics positive definite |

0:16:28 | and then we can show that this is an upper bound |

0:16:30 | on on the other quantity |

0:16:32 | so so here you did yeah on the operation dot um duh free to sign on the whole new eigen |

0:16:39 | brazil has you and the expanding this right |

0:16:42 | uh |

0:16:43 | so what are we with computers difference metric then we do a eigen decomposition uh_huh and then flipped uh uh |

0:16:48 | the sign of or negative eigenvalues |

0:16:52 | okay so you keep on the positive ones unpleasantly |

0:16:55 | yeah |

0:16:55 | okay |

0:16:56 | an exit that they're actually i |

0:16:58 | eigen vectors like the directions are kind of this |

0:17:02 | flipped |

0:17:02 | or like when you have a |

0:17:04 | eigenvector with a negative |

0:17:06 | eigenvalues and you few flip it |

0:17:08 | simply but you do not like |

0:17:10 | change a lot but you only flip it |

0:17:11 | because you are only interested in positive contributions |

0:17:14 | yeah yeah |

0:17:15 | okay |

0:17:15 | thing |

0:17:20 | oh uh while you're |

0:17:23 | you know i need a lead to the chunks |

0:17:25 | you know uh really all you have some |

0:17:28 | uh |

0:17:29 | no particle you can use clustering to find some similarities as well no you you you can you can simply |

0:17:35 | use |

0:17:36 | the channel size of one that means that you use |

0:17:38 | each trial |

0:17:40 | that each trial is enters the channel |

0:17:42 | you can do for example this we can do this uh try to wise |

0:17:46 | well you can put |

0:17:47 | the |

0:17:48 | trials from the same class which a subsequent |

0:17:51 | together in one chunk |

0:17:52 | so we do not apply any for clustering we only like put some together |

0:17:57 | overall we we do it for each trial separate |

0:18:06 | my question about your |

0:18:09 | yeah money consuming and that at different me |

0:18:14 | no this is was only one uh one one test |

0:18:17 | session |

0:18:18 | okay |

0:18:23 | uh the question what the clustering of the chunk sizes |

0:18:26 | so if you |

0:18:27 | if you use the chunk size which is not a than one would you could |

0:18:30 | the look |

0:18:31 | average old part of you know and stationarity |

0:18:35 | and yeah so this is what this was the idea to use chunk sizes because |

0:18:39 | with you use chunk size of one then you like detect |

0:18:43 | the changes on a small uh times K |

0:18:46 | if you take that |

0:18:47 | chunk sizes then |

0:18:49 | you time scale |

0:18:50 | we also be bigger because we average out the changes which only a curve for example in one trial |

0:18:56 | so we we tried different |

0:18:57 | chunk sizes and like select is the best one using cross-validation |

0:19:06 | oh |