| 0:00:15 | one of communication which is just as important as much | 
|---|
| 0:00:21 | namely nonverbal communication | 
|---|
| 0:00:25 | and in my i will discuss | 
|---|
| 0:00:28 | a how to enrich a | 
|---|
| 0:00:31 | the precise and useful function of computers with the human stability | 
|---|
| 0:00:37 | to show i mean of the message nonverbal behaviors | 
|---|
| 0:00:42 | also here in the collaboration between the woman and a robot to see they are | 
|---|
| 0:00:48 | not just collaborating this even the kind of close effective upon between down | 
|---|
| 0:00:56 | unease is actually of the for both of my of the research | 
|---|
| 0:01:01 | so that the problem i will be a structure is always select will first talk | 
|---|
| 0:01:07 | about | 
|---|
| 0:01:08 | at the recognition of social issues in human robot interaction but of course the technology | 
|---|
| 0:01:15 | is also useful for any kind | 
|---|
| 0:01:18 | of a solution to see there are also source signals in human | 
|---|
| 0:01:23 | interaction | 
|---|
| 0:01:25 | or in a man virtual agent interaction | 
|---|
| 0:01:29 | then average hold out that the generation of soldiers you in human robot interaction | 
|---|
| 0:01:36 | of course dropped what should not be just able to interpret the human signal | 
|---|
| 0:01:42 | it should also be able to respond with appropriately | 
|---|
| 0:01:47 | the next topic will be a dialogue management | 
|---|
| 0:01:51 | in a still virtual human robot in that should be able to talk about what | 
|---|
| 0:01:56 | the talent hiking | 
|---|
| 0:01:58 | also a pilot a solution is a mutual gaze back channels | 
|---|
| 0:02:04 | and to handle all these challenges we need of course a lot of data | 
|---|
| 0:02:10 | and so the last a part of my whole will be on how gradient learning | 
|---|
| 0:02:16 | for focus | 
|---|
| 0:02:18 | and at least is | 
|---|
| 0:02:19 | which will ease at fort of human by using | 
|---|
| 0:02:25 | scenarios to wise or | 
|---|
| 0:02:28 | so let's start with the recognition of a social use and human robot interaction | 
|---|
| 0:02:36 | so what kind of i c nodes are interested in | 
|---|
| 0:02:40 | basically in speech and facial expressions guys holster gestures body movements | 
|---|
| 0:02:47 | and approximate | 
|---|
| 0:02:49 | about not only the in are not only interested in the solution can use of | 
|---|
| 0:02:54 | an individual person | 
|---|
| 0:02:56 | but also in interaction patterns such as synchrony on maybe we interpersonal added you for | 
|---|
| 0:03:06 | example with the don't mean and a person | 
|---|
| 0:03:09 | all agent in interaction | 
|---|
| 0:03:11 | and i was also engagement | 
|---|
| 0:03:15 | so how engaged are the participants in than in a church | 
|---|
| 0:03:20 | so if you look at the literature the most attention has high to facial features | 
|---|
| 0:03:29 | so i don't want to go in detail here just mentioned | 
|---|
| 0:03:35 | and spatial i should according to the system which is used applied of to recognize | 
|---|
| 0:03:42 | but also channel eight facial expressions | 
|---|
| 0:03:45 | and the basic idea is to define such units | 
|---|
| 0:03:50 | i to characterize sosa emotional expressions | 
|---|
| 0:03:54 | others such as a hundred raised out of which is usually an indicator of the | 
|---|
| 0:04:00 | happiness | 
|---|
| 0:04:02 | also a lot of what has been spent on | 
|---|
| 0:04:06 | cool emotion recognition | 
|---|
| 0:04:09 | you're just for inspiration i show you how to signal of the same utterance baseball | 
|---|
| 0:04:16 | in different emotions | 
|---|
| 0:04:19 | you can see here the pitch point where it is quite a different | 
|---|
| 0:04:24 | depending on the emotion expressed | 
|---|
| 0:04:27 | and there has been some effort to find a wouldn't predictors of for vocal | 
|---|
| 0:04:34 | a motion it i would like to mention geneva minimalistic a set of features | 
|---|
| 0:04:42 | which was recently introduced and which actually titanium that's why would waste is also if | 
|---|
| 0:04:49 | you compare the two feature set | 
|---|
| 0:04:52 | consisting of a semblance analysis of features so if you like some will try to | 
|---|
| 0:05:00 | get of speech is a binary or deep neural network approaches us so it would | 
|---|
| 0:05:07 | be | 
|---|
| 0:05:08 | it put it here to compare arguably side it's a with the police is obtained | 
|---|
| 0:05:13 | and it by the chili that minimalistic a feature set | 
|---|
| 0:05:17 | so if you look at the literature you might get the impression okay you get | 
|---|
| 0:05:22 | very high recognition why the four emotions | 
|---|
| 0:05:26 | it even a little bit a scary a few wiped it to model and a | 
|---|
| 0:05:32 | test run in real words and mapping of the find out okay | 
|---|
| 0:05:37 | we started as sometimes even comes | 
|---|
| 0:05:41 | close up to four | 
|---|
| 0:05:43 | randomized the | 
|---|
| 0:05:46 | we sites | 
|---|
| 0:05:47 | so why is that so actually a previous research has focused on the analysis of | 
|---|
| 0:05:55 | equipments the basic emotions the motions | 
|---|
| 0:05:58 | that are quiet or extreme prototypical | 
|---|
| 0:06:03 | emoticons such as happiness that knows this task anger | 
|---|
| 0:06:08 | i emotional responses of what what's can usually not be mapped to a men's basic | 
|---|
| 0:06:16 | a motion so we see here for example use that's and because of the point | 
|---|
| 0:06:24 | i know | 
|---|
| 0:06:25 | that a post edit any woman and to create web why the happy in the | 
|---|
| 0:06:31 | interaction with the robot but it's not clearly | 
|---|
| 0:06:35 | with | 
|---|
| 0:06:36 | a couple of years ago | 
|---|
| 0:06:38 | a colleague of mine and one about clean a heated that we are interested in | 
|---|
| 0:06:44 | this study | 
|---|
| 0:06:46 | so actually they invest to investigate the motion recognition rate for acted emotions | 
|---|
| 0:06:53 | for read a motion and motions type in the with that of course the sound | 
|---|
| 0:06:59 | natural and it actually cost was just to distinguish between what they the motion no | 
|---|
| 0:07:06 | unknown motion so not the very difficult task | 
|---|
| 0:07:10 | and what i don't a motion is | 
|---|
| 0:07:12 | they got one hundred percent so help | 
|---|
| 0:07:16 | for an emotion | 
|---|
| 0:07:19 | is a little bit more natural than acted emotions they got eight percent | 
|---|
| 0:07:25 | which is okay but not really exciting because you know chances fifty percent if we | 
|---|
| 0:07:31 | just need to recognise what distinguish between | 
|---|
| 0:07:35 | mutual the motion and | 
|---|
| 0:07:38 | abortions | 
|---|
| 0:07:40 | and finally for obvious that of course scenario they just got a seventy percent | 
|---|
| 0:07:47 | so obviously systems developed under laboratory conditions of how perform poorly unless ordered | 
|---|
| 0:07:56 | a scenarios | 
|---|
| 0:07:58 | and the challenge is actually adaptive real time applications | 
|---|
| 0:08:05 | so usually if you look at the clutch if you look at speakers people obtain | 
|---|
| 0:08:11 | you will find out that most studies a offline start this so they take the | 
|---|
| 0:08:17 | call was | 
|---|
| 0:08:18 | and the calls | 
|---|
| 0:08:21 | is usually a pair | 
|---|
| 0:08:23 | and for example expressions that cannot be locally and in that one was the city | 
|---|
| 0:08:29 | emotional states | 
|---|
| 0:08:31 | a simple thing that our | 
|---|
| 0:08:33 | and the also a | 
|---|
| 0:08:36 | yup and of course the start from the assumption that whole process is a segment | 
|---|
| 0:08:42 | that in some way | 
|---|
| 0:08:44 | but the in so we don't life we also have a one handed noise the | 
|---|
| 0:08:50 | on the other data | 
|---|
| 0:08:53 | so we might as seen you information | 
|---|
| 0:08:56 | and also our pictures can only rely on previously seen data so we cannot | 
|---|
| 0:09:01 | look into the future | 
|---|
| 0:09:03 | but of course that the system has to at least one | 
|---|
| 0:09:06 | in a real time | 
|---|
| 0:09:09 | so the question is what can you about what they're | 
|---|
| 0:09:13 | and one other thing though we might consider would be or | 
|---|
| 0:09:18 | the context | 
|---|
| 0:09:19 | so if you know at the picture why matching in which emotional state | 
|---|
| 0:09:27 | a couple s | 
|---|
| 0:09:29 | so we have any idea of pos just people who don't know | 
|---|
| 0:09:35 | to compute a context | 
|---|
| 0:09:37 | any ideas what emotional state | 
|---|
| 0:09:40 | to go would be | 
|---|
| 0:09:44 | your quite what so usually actually in other people's say okay anyway distress | 
|---|
| 0:09:53 | its candidate | 
|---|
| 0:09:55 | i you are actually very good of size three that | 
|---|
| 0:10:00 | because it's a actually a jealousy | 
|---|
| 0:10:04 | i do actually the first cousins who actually of how it immediately a correct motion | 
|---|
| 0:10:09 | i nevertheless i don't say a four system | 
|---|
| 0:10:14 | and even able to a type of the facial action you want it in a | 
|---|
| 0:10:18 | perfect manner would have problems to find how without knowing the context | 
|---|
| 0:10:25 | that the least actually other channels | 
|---|
| 0:10:28 | so there are some | 
|---|
| 0:10:31 | recent research has been done actually to consider of the context and we science electro | 
|---|
| 0:10:38 | some improvement | 
|---|
| 0:10:40 | so a couple of years ago we investigate the agenda specific motion in the motion | 
|---|
| 0:10:48 | like recognition | 
|---|
| 0:10:49 | and so we were able to improve the recognition rates by training gender-specific a model | 
|---|
| 0:10:57 | and that's an approach was a done by christina format so actually she can see | 
|---|
| 0:11:05 | that the success and failure you don't | 
|---|
| 0:11:08 | it would during an application for example it student is heading a little time | 
|---|
| 0:11:13 | and that's to smiling a then interacting with the way application | 
|---|
| 0:11:19 | so probably the student is not a really happy to might be a to student | 
|---|
| 0:11:25 | does not try to system | 
|---|
| 0:11:27 | serious | 
|---|
| 0:11:28 | and i even though this approach is quite used for quite reasonable it has not | 
|---|
| 0:11:35 | be in a pick up so much | 
|---|
| 0:11:38 | so | 
|---|
| 0:11:39 | we arg one see that you got the dialogue behind me out of the virtual | 
|---|
| 0:11:43 | agent in the job and do you training scenario | 
|---|
| 0:11:47 | so for example when a job interview a task difficult questions about a the weaknesses | 
|---|
| 0:11:53 | of the candy that | 
|---|
| 0:11:55 | then it is also i had to something the pilot a | 
|---|
| 0:11:59 | a likely the motion that state | 
|---|
| 0:12:02 | and they are some of the time | 
|---|
| 0:12:05 | the to align actually a temp where context using bidirectional long short-term and you were | 
|---|
| 0:12:14 | networks | 
|---|
| 0:12:15 | so the context | 
|---|
| 0:12:16 | a might be a good option to oakland see that | 
|---|
| 0:12:22 | and not a maybe obvious thing to one see that use a multi modality here | 
|---|
| 0:12:28 | you can see you know what has bought cell where no it's just one | 
|---|
| 0:12:32 | and it to one of four two with a look at nearly to say so | 
|---|
| 0:12:37 | actually or it's | 
|---|
| 0:12:38 | for me it's not possible to recognize any difference in the face | 
|---|
| 0:12:43 | but if you look at the bottom | 
|---|
| 0:12:46 | you'll get a match the other pictures so on the right | 
|---|
| 0:12:50 | this moment is obviously why that's right i guess correct way no but not you | 
|---|
| 0:12:55 | very happy about l | 
|---|
| 0:12:57 | but nonetheless at least two from | 
|---|
| 0:13:00 | home or a demonstrated by a the face | 
|---|
| 0:13:05 | so | 
|---|
| 0:13:07 | multimodal fusion the data how | 
|---|
| 0:13:12 | that is an interesting to start by a team at all and a whole rate | 
|---|
| 0:13:16 | on my remote affect the detection | 
|---|
| 0:13:20 | and a study us to investigate that many studies that have outperformed the possibly with | 
|---|
| 0:13:28 | them at a study | 
|---|
| 0:13:29 | and radio show what | 
|---|
| 0:13:31 | that improvement how correlates with the naturalness of the calls which is actually that you | 
|---|
| 0:13:38 | so as a step four of them | 
|---|
| 0:13:41 | acted emotions | 
|---|
| 0:13:43 | you get quite high recognition rate and if you use multiple modalities | 
|---|
| 0:13:48 | so you can even get improvement of more than ten percent | 
|---|
| 0:13:52 | but for to difficult task namely spontaneous emotions | 
|---|
| 0:13:56 | the improvement left and i was then which is really bad you because the | 
|---|
| 0:14:02 | should we a hundred | 
|---|
| 0:14:05 | the user to additional devices just get less than five percent recognition rates | 
|---|
| 0:14:11 | and this assumption actually is that in the natural interaction | 
|---|
| 0:14:17 | a sheep are actually a of shall a motion in a you once is a | 
|---|
| 0:14:24 | menace or may not show a motion | 
|---|
| 0:14:28 | so more channels are the same express if a | 
|---|
| 0:14:32 | manner | 
|---|
| 0:14:33 | and first investigate a tractable | 
|---|
| 0:14:38 | assumption of we all looked at the call so we have we had a corpus | 
|---|
| 0:14:45 | i would hate affect just by the video and then just find audio | 
|---|
| 0:14:50 | and then you don't with that note i should mismatch on or | 
|---|
| 0:14:55 | and then we don't at the recognition rate and actually or when the annotations a | 
|---|
| 0:15:02 | mismatch | 
|---|
| 0:15:04 | and so the robot a match the low well | 
|---|
| 0:15:08 | like recognition weights | 
|---|
| 0:15:10 | so it will show you another example look at a woman the here | 
|---|
| 0:15:16 | so we have let's look at the second rate and here the woman shows a | 
|---|
| 0:15:21 | neutral face | 
|---|
| 0:15:23 | and the voice is happy | 
|---|
| 0:15:26 | and a little bit late error rates the other way round one is of the | 
|---|
| 0:15:30 | face looks at it but it's new well | 
|---|
| 0:15:34 | and i was sort of question is a watch a whole fusion approach to in | 
|---|
| 0:15:41 | such a situation | 
|---|
| 0:15:43 | and a yellow i sketch a potential solution | 
|---|
| 0:15:48 | so we my a so you show actually modality a specific recognizer might decide when | 
|---|
| 0:15:57 | to quantum leap you would | 
|---|
| 0:15:58 | and then interpolated | 
|---|
| 0:16:01 | and the y-axis interpolation or we get a better recognition besides | 
|---|
| 0:16:08 | so if you look at the literature so most of fusion approaches actually used in | 
|---|
| 0:16:13 | one this fusion approaches | 
|---|
| 0:16:16 | and synchronous fusion approaches are carried wise a it could situation of multiple modalities | 
|---|
| 0:16:23 | within the same time frame so for example people at a complete seven and eight | 
|---|
| 0:16:31 | just analyze the face | 
|---|
| 0:16:33 | and avoid over complete | 
|---|
| 0:16:36 | sentence | 
|---|
| 0:16:38 | i think owners fusion | 
|---|
| 0:16:41 | approaches | 
|---|
| 0:16:42 | actually a | 
|---|
| 0:16:44 | they a color rate and a modality is not bad all at different times | 
|---|
| 0:16:52 | so they do not assume that for example audio and video | 
|---|
| 0:16:57 | a expressed | 
|---|
| 0:16:59 | at the same time | 
|---|
| 0:17:01 | and therefore they are able to track channel to a simple nature of cops the | 
|---|
| 0:17:06 | other modalities so it's very important if you use the fusion approach and like to | 
|---|
| 0:17:13 | use of approach that is thus able | 
|---|
| 0:17:17 | two point see that and what a dependency is | 
|---|
| 0:17:21 | and it depends what if we wish of modalities but also | 
|---|
| 0:17:26 | the interdependence between modalities | 
|---|
| 0:17:29 | and that is only possible | 
|---|
| 0:17:31 | if you go for frame-wise recognition approach | 
|---|
| 0:17:36 | so we don't this approach either but a first year | 
|---|
| 0:17:39 | so we adopt at an event bayes fusion approach where we once you to events | 
|---|
| 0:17:46 | as an additional | 
|---|
| 0:17:48 | layout of at stretch between or sink nodes | 
|---|
| 0:17:51 | and higher-level emotional states | 
|---|
| 0:17:54 | even though the such as are allowed to have no | 
|---|
| 0:17:59 | or similar kinds of the social few | 
|---|
| 0:18:04 | and a in this way we were able to try to work how the temporal | 
|---|
| 0:18:10 | relationships between channel | 
|---|
| 0:18:12 | and learn when to provide information | 
|---|
| 0:18:15 | and also in case of some data on be seeing | 
|---|
| 0:18:20 | another approach is still a delivers a reasonable recognition besides | 
|---|
| 0:18:26 | so let's have a look at an exam well it's a simplified example it's over | 
|---|
| 0:18:31 | here we have audio and we have a facial expressions | 
|---|
| 0:18:36 | and the fusion approach my comma | 
|---|
| 0:18:41 | ways | 
|---|
| 0:18:43 | so what degree of whether it's | 
|---|
| 0:18:45 | and now let's assume for some reason the audio is no longer available | 
|---|
| 0:18:50 | and why interpolation | 
|---|
| 0:18:52 | we still a get a wide reasonable | 
|---|
| 0:18:56 | with is | 
|---|
| 0:18:57 | so we compare | 
|---|
| 0:19:01 | and number of those seen owners fusion approaches i think there is a fusion approaches | 
|---|
| 0:19:07 | and he went written of fusion | 
|---|
| 0:19:10 | and so for example of forty a synchronous fusion approaches so we call | 
|---|
| 0:19:16 | consider for example you wouldn't networks we also once it's not understand to | 
|---|
| 0:19:24 | take into account the temporal history of signals | 
|---|
| 0:19:29 | and also a bidirectional long on a short time you will networks | 
|---|
| 0:19:34 | a to be able to look in the future | 
|---|
| 0:19:38 | and to learn to tamper what history and what you can see here which is | 
|---|
| 0:19:43 | quite a whitening | 
|---|
| 0:19:46 | that or i think colin is a fusion the | 
|---|
| 0:19:49 | approaches actually up outperform a that are | 
|---|
| 0:19:53 | then one is a fusion approaches | 
|---|
| 0:19:56 | so a message i call it is if you fuses modalities | 
|---|
| 0:20:02 | usually do for approach that its first a able to point see that | 
|---|
| 0:20:07 | the can we wish of modality is | 
|---|
| 0:20:10 | but also in the dependency between modalities | 
|---|
| 0:20:15 | actually i mean actually i am i | 
|---|
| 0:20:35 | i don't i right away | 
|---|
| 0:20:48 | like a rational | 
|---|
| 0:20:54 | and actually two | 
|---|
| 0:20:58 | a postech development of | 
|---|
| 0:21:02 | social see their processing approaches for on-line recognition task | 
|---|
| 0:21:08 | we developed a framework which is called justice i for social signal | 
|---|
| 0:21:12 | in the quantization | 
|---|
| 0:21:14 | and this framework a synchronized with the modalities and it supports equal clear | 
|---|
| 0:21:21 | machine learning i nine words or offering a various kinds of machine learning | 
|---|
| 0:21:27 | approaches | 
|---|
| 0:21:28 | and | 
|---|
| 0:21:29 | we are able to actually or | 
|---|
| 0:21:34 | you with the natural at all modalities and sentences and whenever stands and uses and | 
|---|
| 0:21:41 | it becomes available | 
|---|
| 0:21:42 | my people write read will for it | 
|---|
| 0:21:45 | so we consider a motion capturing as you are the ones you doing of various | 
|---|
| 0:21:51 | kinds of | 
|---|
| 0:21:52 | i try to a stationary i like a smoothed by | 
|---|
| 0:21:56 | i traded | 
|---|
| 0:21:58 | and | 
|---|
| 0:21:59 | also a text is | 
|---|
| 0:22:02 | so basically all kinds of | 
|---|
| 0:22:05 | sensors that our company | 
|---|
| 0:22:07 | but way level | 
|---|
| 0:22:09 | so this was the top one or | 
|---|
| 0:22:12 | emotion recognition now i would like to come up to the as a side namely | 
|---|
| 0:22:18 | to the generation of those used by the robot | 
|---|
| 0:22:22 | it's nice that it is not sufficient to recognize the motion | 
|---|
| 0:22:25 | you also need to respond appropriately approaches a list apart appropriate responses | 
|---|
| 0:22:33 | and | 
|---|
| 0:22:36 | i guess it's a clear so why would nonverbal human signals a where we all | 
|---|
| 0:22:43 | and update not only express emotions but also edit you would | 
|---|
| 0:22:48 | intention | 
|---|
| 0:22:49 | also called only high interpersonal relations with the plate sample | 
|---|
| 0:22:55 | you are interested in talking to have a | 
|---|
| 0:22:59 | or not | 
|---|
| 0:23:00 | and nonverbal the three minutes kind of course also be you with | 
|---|
| 0:23:05 | other to understand be worth messages | 
|---|
| 0:23:10 | and in general will make the communication | 
|---|
| 0:23:12 | more natural implausible | 
|---|
| 0:23:15 | so we see that there are a couple of years ago a with and how | 
|---|
| 0:23:18 | well what | 
|---|
| 0:23:19 | of course the not what a leader is not how well | 
|---|
| 0:23:23 | and expressive case fetters so we have to look for after options and so we | 
|---|
| 0:23:29 | looked for action | 
|---|
| 0:23:31 | a tendency is | 
|---|
| 0:23:32 | which are related to motion selection and this is actually want to show before you | 
|---|
| 0:23:39 | start at so it's very common in | 
|---|
| 0:23:42 | in sports | 
|---|
| 0:23:44 | so you have proposed chat bots a person | 
|---|
| 0:23:48 | and to sports is not yet it but it's quite clear what is coming next | 
|---|
| 0:23:56 | and so we among a cisco we simulated actually tendencies such as approach | 
|---|
| 0:24:03 | panic attack and submission | 
|---|
| 0:24:05 | and it turned out that people were able to | 
|---|
| 0:24:08 | wait and is a ds the action can see is | 
|---|
| 0:24:13 | later we actually | 
|---|
| 0:24:16 | got a robot from hand mobile kind | 
|---|
| 0:24:19 | and here we actually try to simulate of facial | 
|---|
| 0:24:24 | expressions | 
|---|
| 0:24:26 | and you well kind of image that is all three start from the facial action | 
|---|
| 0:24:32 | coding system i mentioned | 
|---|
| 0:24:35 | well | 
|---|
| 0:24:36 | and a actually identify forty actually you would minutes of forty human of high | 
|---|
| 0:24:45 | for the question or can we simulate a report the action units | 
|---|
| 0:24:51 | and the for the robot | 
|---|
| 0:24:53 | so we write about the and a this the simulation of just seven hatch you | 
|---|
| 0:24:59 | wouldn't | 
|---|
| 0:25:00 | and these robot has a syntactic a skin and on the skin your house on | 
|---|
| 0:25:06 | modal is and the motors can move a and a beep or form eight | 
|---|
| 0:25:11 | a to form a the skin | 
|---|
| 0:25:13 | do we not only a little a two hour | 
|---|
| 0:25:16 | simulate the seven action units and at a question is whether this is enough and | 
|---|
| 0:25:21 | i show you show a video | 
|---|
| 0:25:24 | so it pretty what is in german with english as a high that's a lot | 
|---|
| 0:25:28 | is introduced focus about non-verbal signals it does not necessarily that you want to understand | 
|---|
| 0:25:35 | what is you start | 
|---|
| 0:25:38 | you can just a discussion of actually what the machine about information the machine have | 
|---|
| 0:25:44 | to be close she did not consider at stage the semantics of or utterances | 
|---|
| 0:25:50 | to about position is equal | 
|---|
| 0:25:55 | it can you see also would not test so it is equal to one can | 
|---|
| 0:26:04 | once will be given by its because i | 
|---|
| 0:26:10 | i | 
|---|
| 0:26:12 | yes i understand what is not one but also talk about | 
|---|
| 0:26:20 | but also useful what it is not quite often what it | 
|---|
| 0:26:25 | i don't think it's one is that all data that are not handled by a | 
|---|
| 0:26:31 | weighted sum of all | 
|---|
| 0:26:36 | is that it is not able to account for instance the hopefully it does come | 
|---|
| 0:26:42 | zero point all possible | 
|---|
| 0:26:45 | a problem with this is no i o | 
|---|
| 0:26:55 | in to compute so that you mentioned you can | 
|---|
| 0:26:59 | one for training | 
|---|
| 0:27:04 | okay just to show you that really does not can see that the semantics another | 
|---|
| 0:27:08 | example | 
|---|
| 0:27:12 | that's my | 
|---|
| 0:27:15 | schuller | 
|---|
| 0:27:20 | about done | 
|---|
| 0:27:22 | are you | 
|---|
| 0:27:25 | the system can do not work about online to one hundred fifty yet but not | 
|---|
| 0:27:38 | really constant talk detector e | 
|---|
| 0:27:44 | so just to show that you can't | 
|---|
| 0:27:47 | i have a conversation with emotional features are that's of course not over | 
|---|
| 0:27:53 | and a few well maybe we | 
|---|
| 0:27:57 | the of course a use a different from and to see so maybe we | 
|---|
| 0:28:05 | my a held at a it's not a over | 
|---|
| 0:28:09 | so what is the embassy still embassy it is an emotional response and its stance | 
|---|
| 0:28:16 | from the comprehension of emotional state of and also | 
|---|
| 0:28:22 | pairs | 
|---|
| 0:28:24 | and a so that the emotional state of the other person | 
|---|
| 0:28:29 | might be similar to your own emotions at but that's not have to be design | 
|---|
| 0:28:36 | a motion | 
|---|
| 0:28:37 | and embassy like what is either deeper such a of emotional state of an a | 
|---|
| 0:28:43 | set of parents and facilities is what we can more of a signal processing technology | 
|---|
| 0:28:50 | and it is also like well i guess so we don't think about the situation | 
|---|
| 0:28:55 | of the also use somehow | 
|---|
| 0:28:59 | need to know | 
|---|
| 0:29:00 | and of what at the outset person is feeling and why not start to oppose | 
|---|
| 0:29:05 | that it | 
|---|
| 0:29:07 | and also you are required to decide the how to respond to the ad suppose | 
|---|
| 0:29:14 | a motion | 
|---|
| 0:29:16 | so for example in the tutoring system | 
|---|
| 0:29:19 | if | 
|---|
| 0:29:20 | the student is in the very emotional state and depressed | 
|---|
| 0:29:24 | in a high it could be a disaster if the virtual agent would actually minimal | 
|---|
| 0:29:30 | a emotional state | 
|---|
| 0:29:32 | of the student because it might make a student | 
|---|
| 0:29:36 | moura | 
|---|
| 0:29:37 | depressed | 
|---|
| 0:29:38 | so | 
|---|
| 0:29:40 | it is actually a week what is a tree or | 
|---|
| 0:29:45 | this is a potential and want to not to show | 
|---|
| 0:29:49 | and we can realize kind of have say listen now | 
|---|
| 0:29:55 | so where we can see a motion we try to understand a emotional state | 
|---|
| 0:30:02 | and understanding and the motion state of the knots that appears in | 
|---|
| 0:30:08 | we could choose an internal reaction and that the question is should be external is | 
|---|
| 0:30:15 | a reaction and of what are two ways that i virtual you'll another | 
|---|
| 0:30:20 | examples was actually and how much will be | 
|---|
| 0:30:25 | simulated and appraisal a model | 
|---|
| 0:30:29 | a lot of the dialog alive will show you is actually is that of course | 
|---|
| 0:30:34 | so first and of what we do in this kind of a tie and all | 
|---|
| 0:30:39 | so we be able and motions | 
|---|
| 0:30:41 | a lot so | 
|---|
| 0:30:43 | we also a common to on the user's the emotions so the story will be | 
|---|
| 0:30:50 | a pilot a forgotten | 
|---|
| 0:30:52 | four point of medication | 
|---|
| 0:30:54 | and | 
|---|
| 0:30:56 | function and to see it is so we had to robert shows console a power | 
|---|
| 0:31:01 | of a button medication to increase awareness but it is doing it in a supplement | 
|---|
| 0:31:08 | no | 
|---|
| 0:31:09 | actually not what we are still at | 
|---|
| 0:31:13 | to a much | 
|---|
| 0:31:15 | and no overt so dropped what will show the some intention as | 
|---|
| 0:31:20 | while | 
|---|
| 0:31:21 | the palm down to the user | 
|---|
| 0:31:23 | so i will apply deal with | 
|---|
| 0:31:26 | but the video and what is actually a kind of amazing | 
|---|
| 0:31:32 | this is that it is disappointing fine edge while it is all | 
|---|
| 0:31:36 | here | 
|---|
| 0:32:07 | okay | 
|---|
| 0:32:12 | i | 
|---|
| 0:32:26 | a | 
|---|
| 0:32:39 | okay and a actually a to develop a better understanding of four emotions of users | 
|---|
| 0:32:47 | we are currently investigating how to combine the social signal processing of with affective as | 
|---|
| 0:32:54 | you rate of mind and cases actually what operation where is that happily an apart | 
|---|
| 0:33:01 | from the if i | 
|---|
| 0:33:03 | in a support | 
|---|
| 0:33:05 | so partly other developed a model of the whole and i don't know | 
|---|
| 0:33:10 | actually to simulate emotional behaviors | 
|---|
| 0:33:14 | and the basic idea is actually | 
|---|
| 0:33:17 | what | 
|---|
| 0:33:19 | have some and motion of stimulation and then change a ways of what do you | 
|---|
| 0:33:24 | recognise in terms of sources used | 
|---|
| 0:33:27 | actually matches and how well | 
|---|
| 0:33:29 | a simulation | 
|---|
| 0:33:31 | and the even type just a little bit of errors are | 
|---|
| 0:33:36 | we do not just once you to how a list one was so that | 
|---|
| 0:33:41 | and emotional state | 
|---|
| 0:33:43 | we also points you know how people | 
|---|
| 0:33:46 | actually show like to like they'll motions to show you an example | 
|---|
| 0:33:52 | so let's see that | 
|---|
| 0:33:55 | shape so if you are not regulated well you want a motion is either so | 
|---|
| 0:34:00 | the person who | 
|---|
| 0:34:03 | just flash they had a dollar | 
|---|
| 0:34:07 | and that this is the typical | 
|---|
| 0:34:10 | emotional expression | 
|---|
| 0:34:12 | we would expect | 
|---|
| 0:34:14 | and a people usually awake you like a motion is actually i like to better | 
|---|
| 0:34:20 | whole always the emotional state | 
|---|
| 0:34:24 | and or shy of the at different weights to like motions | 
|---|
| 0:34:32 | so avoided is one reaction but you put it text yourself so we have for | 
|---|
| 0:34:37 | example you say okay and i four and a but also at a gas a | 
|---|
| 0:34:43 | person | 
|---|
| 0:34:44 | and | 
|---|
| 0:34:46 | what you panacea actually other that we have a quite a different is no actually | 
|---|
| 0:34:53 | you know people might show depending on the way they regulate their motion and if | 
|---|
| 0:34:59 | you use a typical the machine learning approach actually | 
|---|
| 0:35:04 | to analyze distortion no | 
|---|
| 0:35:07 | you would never know i'm be able to find one motions | 
|---|
| 0:35:11 | because don't know | 
|---|
| 0:35:13 | how do people go back to rely on the emotional state so here is and | 
|---|
| 0:35:19 | have a price we have to discussion already yesterday | 
|---|
| 0:35:23 | maybe you can us | 
|---|
| 0:35:27 | machine learning approaches as like boxers recognise certain signals | 
|---|
| 0:35:33 | a fine tuning as some understanding actually | 
|---|
| 0:35:37 | a map | 
|---|
| 0:35:39 | to see that want to emotional states | 
|---|
| 0:35:41 | and it's even more important | 
|---|
| 0:35:44 | if the system has to respond what emotional state so matching a | 
|---|
| 0:35:49 | you talked to somebody on the on the guys not really understanding what's your problem | 
|---|
| 0:35:53 | you | 
|---|
| 0:35:54 | and i just at behaving like what we can you like well and | 
|---|
| 0:36:02 | a responding in a schematic a manner we were able shall | 
|---|
| 0:36:09 | and behaviour | 
|---|
| 0:36:10 | so it would like at the end of are also called me but all what | 
|---|
| 0:36:15 | is the weighted dialogue between a | 
|---|
| 0:36:18 | humans and or | 
|---|
| 0:36:20 | robots | 
|---|
| 0:36:21 | and only actually a client by dpi apply a job which can decide no | 
|---|
| 0:36:27 | on engagement and human robot in the action | 
|---|
| 0:36:32 | we looked at so | 
|---|
| 0:36:35 | signs of engagement in human robot a dialogue act of the amount of mutual gaze | 
|---|
| 0:36:41 | below a direct gaze turn taking | 
|---|
| 0:36:45 | and i just show you example the here it's a path of gain between a | 
|---|
| 0:36:52 | robot and you can result | 
|---|
| 0:36:54 | and to use that is where we hyped weight loss there's a so that the | 
|---|
| 0:36:59 | robot notes when it was is a loopy | 
|---|
| 0:37:03 | and in this specific as scenario | 
|---|
| 0:37:06 | all you know simulated directed gaze which is that kind of | 
|---|
| 0:37:12 | functional same | 
|---|
| 0:37:14 | so | 
|---|
| 0:37:15 | the robot is able to detect which all check | 
|---|
| 0:37:19 | the use that is a focusing on and this makes the interaction more efficient because | 
|---|
| 0:37:25 | there is no longer forced to describe | 
|---|
| 0:37:28 | o j lo detector i also implemented a hallway a scenario is or should gaze | 
|---|
| 0:37:36 | for distortion case actually voice | 
|---|
| 0:37:38 | do not have we deal function | 
|---|
| 0:37:41 | so i'd the dialogue was completely understandable without distortion we just wanted to know | 
|---|
| 0:37:48 | that's my to any difference | 
|---|
| 0:37:51 | so it just a very quickly | 
|---|
| 0:37:55 | we have a direct that a gaze assorted one who is the following two options | 
|---|
| 0:38:02 | and pointing the object or just looking at the object | 
|---|
| 0:38:06 | and for mutual gaze of both in that interval establish eye gaze | 
|---|
| 0:38:12 | the next thing what we realise was case is a disambiguation | 
|---|
| 0:38:18 | and a case applies disambiguation is interesting in so yes other people | 
|---|
| 0:38:25 | a few option which was then look away again | 
|---|
| 0:38:30 | so we need a different disambiguation approach | 
|---|
| 0:38:34 | that for example powerpointy then for example for pointing gestures when two point usually just | 
|---|
| 0:38:40 | point one and that's it you know what into the one time | 
|---|
| 0:38:45 | and so case is | 
|---|
| 0:38:47 | then we | 
|---|
| 0:38:49 | different | 
|---|
| 0:38:50 | and we also a real is | 
|---|
| 0:38:53 | so that some typical gaze behaviour is that you in a turn taking | 
|---|
| 0:38:58 | so speakers a new way usually from the addressee to indicate | 
|---|
| 0:39:05 | that they are for it to process of thinking about what to say next | 
|---|
| 0:39:11 | and also to show that they don't one and it should be a drop that | 
|---|
| 0:39:15 | and are typically at the end of an utterance the speakers | 
|---|
| 0:39:20 | low would you have a person | 
|---|
| 0:39:23 | because they want to know how we are suppose | 
|---|
| 0:39:26 | what the as opposed | 
|---|
| 0:39:28 | thinking about what has been set | 
|---|
| 0:39:31 | so basically | 
|---|
| 0:39:33 | we realize a shared folder of what follows the user's hand movements and drop what | 
|---|
| 0:39:40 | follows to users he's | 
|---|
| 0:39:42 | we will i social around eight | 
|---|
| 0:39:45 | so here to what i see and recognise this mutual gaze | 
|---|
| 0:39:49 | and finally to an eye dropper to make a nice is going to use that | 
|---|
| 0:39:54 | you tell | 
|---|
| 0:39:55 | and that will show you | 
|---|
| 0:39:58 | we deal | 
|---|
| 0:40:06 | so i decided to leave at the top and because i realise the top is | 
|---|
| 0:40:11 | much better roundy then the problem i did it is one of the | 
|---|
| 0:40:20 | how do okay | 
|---|
| 0:41:06 | the red wine it's of course ambiguous nothing more i k | 
|---|
| 0:41:12 | which man | 
|---|
| 0:42:19 | e | 
|---|
| 0:42:25 | thus | 
|---|
| 0:42:46 | again you know that | 
|---|
| 0:42:53 | and the we did an evaluation well this where | 
|---|
| 0:42:58 | and what we found was that actually of the object wanting was more effective than | 
|---|
| 0:43:04 | distortion grounding | 
|---|
| 0:43:07 | so the people were there are able to interact more efficiently with object a groundings | 
|---|
| 0:43:12 | of the dialogs were much shorter | 
|---|
| 0:43:15 | and the word lattice misconceptions | 
|---|
| 0:43:18 | and it's not distortion rounding error you not a improve the perception | 
|---|
| 0:43:23 | of the interaction | 
|---|
| 0:43:25 | which is of course appear because we spend quite some time one mutual gaze | 
|---|
| 0:43:32 | i one assumption is that people wear out that once waiting on the task instead | 
|---|
| 0:43:38 | of the social interaction with the robot | 
|---|
| 0:43:40 | and we might investigate if you have a more sources ask for example looking at | 
|---|
| 0:43:46 | family for both | 
|---|
| 0:43:47 | and the distortion gaze a might become more important | 
|---|
| 0:43:52 | and its assumption is a which we do not yet a try | 
|---|
| 0:43:57 | that some people are focusing more on the task in some without focusing more on | 
|---|
| 0:44:01 | the social interactions you can be classified like these | 
|---|
| 0:44:06 | and a specific people | 
|---|
| 0:44:08 | might appreciate the social gaze a more | 
|---|
| 0:44:12 | the analysis | 
|---|
| 0:44:14 | so have finally i would like a to come to reason a development is so | 
|---|
| 0:44:20 | we started one | 
|---|
| 0:44:23 | interactions in or dialogue | 
|---|
| 0:44:27 | and data from both sides of always | 
|---|
| 0:44:30 | do you make an interactive machine but also to machines in route a robot | 
|---|
| 0:44:36 | come do we she can interact | 
|---|
| 0:44:38 | the human | 
|---|
| 0:44:39 | so the o project which was already mentioned yesterday | 
|---|
| 0:44:45 | we have collected a corpus of which people a dialogue between | 
|---|
| 0:44:51 | you minutes | 
|---|
| 0:44:52 | and the dialogue has the in i'm not trying to label | 
|---|
| 0:44:58 | and we actually or integrated active learning and hope wait a litany | 
|---|
| 0:45:05 | in the annotation work so basically i think it is that the system actually | 
|---|
| 0:45:11 | this is which samples of the show you label | 
|---|
| 0:45:16 | pick the right relatedness and it also this is which sound shall be no actually | 
|---|
| 0:45:26 | a from like that at all | 
|---|
| 0:45:29 | and so one of which is forced to select examples | 
|---|
| 0:45:34 | for which a did not she and actually | 
|---|
| 0:45:38 | tie a low confidence | 
|---|
| 0:45:40 | and always that approach so we've well at the o to o | 
|---|
| 0:45:47 | make up the annotation process | 
|---|
| 0:45:51 | significantly more efficient | 
|---|
| 0:45:53 | and of these basically integration of the no one system is as i a system | 
|---|
| 0:45:59 | which i mentioned earlier | 
|---|
| 0:46:02 | and for the interactions that it is actually that you do an additive high main | 
|---|
| 0:46:10 | which is the essence of interruptions | 
|---|
| 0:46:13 | from | 
|---|
| 0:46:15 | called as a between a human | 
|---|
| 0:46:20 | it down | 
|---|
| 0:46:21 | so i to come to one compare emotion | 
|---|
| 0:46:25 | i think that a human robot in that capture cannot come we can treat here | 
|---|
| 0:46:32 | until a | 
|---|
| 0:46:34 | the problem of | 
|---|
| 0:46:35 | appropriate social interaction between robots and human | 
|---|
| 0:46:40 | for it | 
|---|
| 0:46:40 | in particular | 
|---|
| 0:46:42 | if a what is employed in | 
|---|
| 0:46:47 | the people it's how you | 
|---|
| 0:46:49 | and of what we need of course is a fully integrated into consisting of perception | 
|---|
| 0:46:54 | reasoning | 
|---|
| 0:46:55 | learning and responding | 
|---|
| 0:46:58 | and a particular it is at the moment is a big gap between the perceptual | 
|---|
| 0:47:04 | and the reason nine so the reasoning is | 
|---|
| 0:47:08 | kind of the net like that | 
|---|
| 0:47:10 | at the moment in favour of a black box the | 
|---|
| 0:47:14 | approaches | 
|---|
| 0:47:15 | which is useful for | 
|---|
| 0:47:18 | actually attended i o | 
|---|
| 0:47:20 | so we should use as such as laughter | 
|---|
| 0:47:24 | but after that so we need to reason about what | 
|---|
| 0:47:28 | actually distortion signal a marine | 
|---|
| 0:47:32 | and of course i know my disciplinary expertise is a | 
|---|
| 0:47:37 | necessary in order to emulate aspects of social intelligence that's why | 
|---|
| 0:47:41 | we call up with a lot we so | 
|---|
| 0:47:44 | psychologist | 
|---|
| 0:47:46 | and so we might a lot of software publicly or a way that well in | 
|---|
| 0:47:51 | particular its as i system distortion no | 
|---|
| 0:47:55 | interpretation and there's no way as its i leave work on the nist you make | 
|---|
| 0:48:01 | a small | 
|---|
| 0:48:02 | the | 
|---|
| 0:48:05 | install we entirely and finite state automaton | 
|---|
| 0:48:11 | and of which the of was actually at is to various virtual agents but also | 
|---|
| 0:48:17 | to all kinds of | 
|---|
| 0:48:18 | robots | 
|---|
| 0:48:20 | and of these is actually and | 
|---|
| 0:48:24 | problem thinking when | 
|---|
| 0:49:46 | is so actually that's a good a point | 
|---|
| 0:49:50 | because | 
|---|
| 0:49:53 | you to do it making dropped what is of a with of point able to | 
|---|
| 0:49:57 | recognize o where is looking | 
|---|
| 0:50:01 | at a much higher level of accuracy at any human would be | 
|---|
| 0:50:05 | and some people because they are just used explicitly also pointed and of course if | 
|---|
| 0:50:13 | you and not change its flexible kind of a reference i don't act | 
|---|
| 0:50:19 | in that particular we deal discourse features are just stuff | 
|---|
| 0:50:24 | for the illustration | 
|---|
| 0:50:26 | these boards and as a model somebody would you wanna pollard benefits | 
|---|
| 0:50:31 | of a it quickly a but also had this kind of behavior or we just | 
|---|
| 0:50:37 | got here we have the people off policy with a non contact with a | 
|---|
| 0:50:42 | of what some people show some people use pointing some people do not use pointing | 
|---|
| 0:50:49 | up by a nevertheless it will always a good usually do not point and not | 
|---|
| 0:50:55 | low so i wake up with | 
|---|
| 0:50:59 | had a information | 
|---|
| 0:51:02 | and because it meant to have a study are usually people believe you want has | 
|---|
| 0:51:08 | now and so they are really concentrating on this task | 
|---|
| 0:51:13 | and so that's probably why | 
|---|
| 0:51:15 | okay not | 
|---|
| 0:51:18 | at | 
|---|
| 0:51:19 | appreciate so much at a social okay so it is not bestow and the people | 
|---|
| 0:51:24 | actually makes no solution i would want to turn taking opening is realized release the | 
|---|
| 0:51:30 | turn taking a dialog was more efficient because it was clear out | 
|---|
| 0:51:36 | open dropped what the was expecting a user type that on a in terms of | 
|---|
| 0:51:42 | subjective evaluation considers did not to do the what was a behaviour or natural or | 
|---|
| 0:51:52 | a source what if | 
|---|
| 0:51:55 | men and i case it's really a task based | 
|---|
| 0:51:59 | scenario | 
|---|
| 0:52:00 | i it's not have time to show live video humans collaborating on data on to | 
|---|
| 0:52:07 | say a | 
|---|
| 0:52:08 | and we have | 
|---|
| 0:52:10 | some examples of human interaction is left not sure that the human robot interaction and | 
|---|
| 0:52:18 | syntactic in cases we had to human knowledge at that very well okay fact that | 
|---|
| 0:52:25 | we and various taking not | 
|---|
| 0:52:27 | for statistics was very close to take not they have to look at each other | 
|---|
| 0:52:32 | for data on the table | 
|---|
| 0:52:34 | and this was followed by a wide interest | 
|---|
| 0:53:30 | s two because actually correctly skewed documents we | 
|---|
| 0:53:36 | acquired they do so you have not one but what which looks like it would | 
|---|
| 0:53:43 | for this got what look like points and so intuitively the people of course top | 
|---|
| 0:53:51 | down in a very well may be justified condition in a more expressive no according | 
|---|
| 0:53:58 | to which i | 
|---|
| 0:54:00 | s two s p o | 
|---|
| 0:54:02 | more clearly | 
|---|
| 0:54:03 | and it was also used for people to a related to drop what so we | 
|---|
| 0:54:10 | brought one what to and | 
|---|
| 0:54:12 | it home and fist people well | 
|---|
| 0:54:15 | valley points a and a set you know why not fall at home we would | 
|---|
| 0:54:21 | just want to be a tweet that by i will | 
|---|
| 0:54:25 | and then is that okay as long as to what just calls it's okay it | 
|---|
| 0:54:30 | cannot close | 
|---|
| 0:54:31 | and | 
|---|
| 0:54:33 | and india and dropped what performance this is actually a to send out of are | 
|---|
| 0:54:39 | realised exactly how they had something called out and | 
|---|
| 0:54:45 | actually taken to do not what you like to have real data that you like | 
|---|
| 0:54:51 | a example somebody to take et al | 
|---|
| 0:54:55 | and it is sometimes they were also | 
|---|
| 0:54:58 | a p a surprise was one ladies she was | 
|---|
| 0:55:02 | the one hundred years the what affords you was really clear | 
|---|
| 0:55:07 | we call it can still and she's at | 
|---|
| 0:55:10 | it's just plastic i have a high round dropped were extracted by a strange but | 
|---|
| 0:55:18 | you're right with use | 
|---|
| 0:55:21 | i don't i brought lots of people find it easier or want to talk and | 
|---|
| 0:55:26 | what expressed | 
|---|
| 0:55:32 | details on thank you press | 
|---|
| 0:55:35 | i | 
|---|
| 0:56:17 | it's probably and that's that in | 
|---|
| 0:56:19 | because for example in | 
|---|
| 0:56:22 | since holders gain | 
|---|
| 0:56:25 | people actually intentionally shall was quite sure a particular emotional state whereas when regulate motion | 
|---|
| 0:56:37 | usually do not really | 
|---|
| 0:56:39 | think about it | 
|---|
| 0:56:43 | and that there's a | 
|---|
| 0:56:44 | that some quite some properties pulses just a few hundred and so that the general | 
|---|
| 0:56:51 | expression years with i just can't seem high location just looking at least is used | 
|---|
| 0:56:58 | machine learning always have a kind of evaluation able | 
|---|
| 0:57:04 | to recognize | 
|---|
| 0:57:07 | emotional and the state of what has actually you | 
|---|
| 0:57:11 | and what situation | 
|---|
| 0:58:00 | i believe that of the phase is quite important | 
|---|
| 0:58:05 | so i was in the presentation by a company that was really proud of their | 
|---|
| 0:58:10 | robot and did not have facial expressions it is not have just thinking | 
|---|
| 0:58:17 | and somebody in the audience that i don't understand the point is just a loudspeaker | 
|---|
| 0:58:25 | and what is the point so i think the | 
|---|
| 0:58:30 | the party as i want a back to face as important as well and what | 
|---|
| 0:58:35 | the | 
|---|
| 0:58:35 | now we have washed up to an issue | 
|---|
| 0:58:39 | okay we have this property of before and that was possible with the case apart | 
|---|
| 0:58:47 | head pose actually | 
|---|