0:00:15and you
0:00:17thanks for coming back for this session
0:00:22this is work by these three students mostly sarah plenary almost entirely you can of
0:00:27all undergraduate students they are kind of converge to meet the same time and we're
0:00:30interested in this
0:00:31and now they're all different and they're doing other things so i'm here i'm just
0:00:35this person
0:00:37presenting us and are marginally action is voiced or cd
0:00:42in the next couple minutes you look up or policy is not can be offended
0:00:45if
0:00:46you look up or i don't know is it is it is a real state
0:00:48in united states
0:00:51it does exist and voices the capital of that state if you know that's anything
0:00:55to be part of it's a nice university a really i really enjoyed been there
0:00:58and i run the speech language an interactive machines group sort of in your early
0:01:02research group there have only been there for about two years
0:01:09just start
0:01:11actually wanted right attention to this bottom reference here this is what we're doing in
0:01:15this paper builds a large enough of on the nova covered all paper and this
0:01:18is a lot of oliver lemons lab in a real number of
0:01:23and they did some research on it was basically social robotics which is pretty similar
0:01:27to what we're doing and we follow a lot of the methodology here
0:01:31but what we wanted to look at and we have this little robot we wanted
0:01:34to do some language grounding studies with it and then one of my students asked
0:01:38this
0:01:39question
0:01:40that we couldn't like go of she said well are people gonna treat this robot
0:01:44the way we want them to treated
0:01:46like first language acquisition and i was thinking well
0:01:50i don't know maybe we should study this
0:01:52and that's actually what happened with this paper
0:01:56but a lot of motivation comes from all of a great work in grounded
0:02:01grounded semantics in symbol grounding of w
0:02:05you have lots of other people i mean are not all mentioned here but here's
0:02:09if you that we kind of build a focus on the point is well
0:02:14that's just getting to hear the point is if you're a person and you're interacting
0:02:18with a child
0:02:20and the child's learning language child doesn't know language to the degree that an adult
0:02:24load knows language
0:02:26an entire season object and the idea that when the child nodes that pretty much
0:02:30all l objects have had annotation
0:02:32i paraphrase or a single word or something and so this child sees this object
0:02:37here and the channel maybe doesn't know the annotation for this object and so the
0:02:40adults there's lots of all
0:02:44and the choppy numbers this and it's quite amazing and this is kind of what
0:02:49grounding is doing when you read it when you do this with a machine like
0:02:53a robot has to perceive this object somehow represent this object somehow a lot of
0:02:59the work up until now has been done with
0:03:01with vision as the main modality for grounding language into some
0:03:07some perceptual modality
0:03:11but once you have
0:03:13a robot once you have an embodied agents
0:03:17if a start assigning anthropomorphic characteristics to them based upon have a look and on
0:03:22a based upon how to act
0:03:24as soon as they see a robot like immediately think is this manner woman's tells
0:03:28inches it's is it sympathetic how can interact with this thing what can i expect
0:03:32and really as soon as someone says this is a robot people think it has
0:03:36a don't intelligence and you don't want that if you have a first language acquisition
0:03:40tasks that you want the robot to do
0:03:43and that was the question my student task and if we have this little robot
0:03:47we want to do first language acquisition task in a setting that is very similar
0:03:50to the way children acquire their language
0:03:54we have to we cannot assume that the that people who interact with the robot
0:03:59are gonna treated likely what child
0:04:01and so that's what we set off to do so we want to actually projects
0:04:05are what age do anything a robot user but the academic level that's what we're
0:04:10working on here so the main research question is this does the way are about
0:04:14the way a robot verbally interact affect how humans perceive the age of robot
0:04:19short answers yes it's after wants to go ahead and put your head down
0:04:23have a little rest if you don't care about rest but we can we can
0:04:26sort of jesus apart a little bit kinda show you what we did
0:04:31we didn't experiment
0:04:34we have some robots one of the very the appearance of how looked
0:04:39and it's three different ones which will show you moment we varied the way the
0:04:43robot's verbally interacted
0:04:45and we as participants sure robot how to build a simple possible that was kind
0:04:50of the language on task force running wasn't actually have
0:04:54but they were there are interacting with the robot's in this very simple dialogue setting
0:04:58and we recorded the participants there what we had a camera pointed out them as
0:05:04they were interacting with these robot some record their speech interface
0:05:07maybe then after they interact with each row about their thought a questionnaire about the
0:05:12perceptions
0:05:14and then after gathered all this data we analyse it and we're well we recorded
0:05:19the data analyzed it with the facial motions prosody linguistic complexity
0:05:24and we found correlations between data and the perceived age
0:05:27and we from that we can predict
0:05:31so it is a three robots were used
0:05:34because we had
0:05:36and because we wanted a robot that was kind of anthropomorphic in one that wasn't
0:05:39so here is a non anthropomorphic robots could work it looks basically like a rumbling
0:05:43with broken act on it
0:05:45and then this is on keys cosmo i don't know if you see in itself
0:05:48very small robot it's marketed as a choice has a nice python is to k
0:05:51and then we just had an uninvited on physical spoken dialogue system which we
0:05:58which we affectionately named an overall
0:06:00an overall not a robot
0:06:03so there are three robots
0:06:06and it's kind of embarrassing what we did with the robot's but we have to
0:06:10squeeze settings that we wanted to test because we want to see how do how
0:06:13to people treated based on how this robot interaction from
0:06:16and the only things we only speech that we have a robust produce was feedback
0:06:20and there were two settings of this feedback one was minimal feedback on like yes
0:06:25okay
0:06:26which was basically marking phonetic receipt we call this the lower low setting like i
0:06:30heard that i heard i heard but whether or not it understood that's kind of
0:06:35a one year
0:06:37and then we had another feedback which mark semantic understanding much sure okay i see
0:06:41a higher-order repeat or something like to show i understood correctly understood you these are
0:06:48they're all feedback like it's not really in taking the floor it's not really doing
0:06:51anything really a lot of dialogue going on here but there's these two settings and
0:06:57then we found that it makes quite a difference these settings
0:07:00other than that the robot student move
0:07:03which from the kabuki thing on the light was on an that was a
0:07:07non-causal had a in its default setting it had this little animated eyes are just
0:07:10kind of the round but they didn't it into anything it and move it and
0:07:14per participant the task i think there just talking
0:07:18until this we have six settings where three robots into speech settings
0:07:23so the task was this we had a we had a we had we'd set
0:07:27a robot down right here
0:07:29whether the cookie the
0:07:31the cosmo robot or we just not have anything there for the no problem setting
0:07:35and then we have these cameras here record participant
0:07:38and we had these just ask for with this also we have these little puzzle
0:07:42pieces and don't know if you recognise them
0:07:44on this paper there's three different target shapes that they can like with these three
0:07:48pieces in each of these shapes had a name
0:07:52the only instructions we gave these participants was
0:07:54sure about how to build these each of these each of these shapes make sure
0:07:58at the and you tell the robot what the name is
0:08:01and just using one after another
0:08:03and what would happen is as they interact with the robot the robot would give
0:08:07some feedback depending on the setting as they're talking to its own kind of interacting
0:08:12with the but of course it was controlled by wizard
0:08:15so the procedure went like this we randomly
0:08:18but a robot here that interact with it
0:08:21based on a questionnaire about this interaction and then we give them a new set
0:08:24of
0:08:25puzzle tiles on a new list of
0:08:27target shapes
0:08:28it interactive that robot have a quite a questionnaire again for that interaction and then
0:08:33they'd have the third robot
0:08:34with a new set of shapes and possible target shapes on and then that thought
0:08:39for questionnaire
0:08:41the things we randomly assign was the robot presentation order you order of the puzzle
0:08:44we had a different we had two different voices for the codebook in the spoken
0:08:48dialogue system from amazon was a male and female voice that was randomly assigned words
0:08:54"'cause" my head it's had its own voice
0:08:56and then we had a different language setting
0:08:59so that the high and low language which stay the same for all three interactions
0:09:02we just sort flip a coin beginning and then they would get that one for
0:09:06all three of them
0:09:08and so we collected data from the camera facing the participants that which was audio
0:09:12and video and then of course the questionnaire
0:09:16in the end we got one participant send mail and eleven female what we can
0:09:19further time
0:09:21and each interact with all three robots folding sixty three interactions we collected and fifty
0:09:25eight questionnaires for had to be thrown out because you want correct for correctly filled
0:09:29out
0:09:31and then we move interested data analysis
0:09:36for each interaction
0:09:38with individual robot's we would take a snapshot every five seconds and averaged over the
0:09:42emotion distribution from the microsoft emotions a few not familiar with this api
0:09:46you can send the actual like this the eight k and i will give you
0:09:49just
0:09:50i is a different emotions
0:09:52so here's an example here someone kind of mostly neutral
0:09:56little bit is spread over the other ones you're someone who's happy little bit is
0:09:59a the other ones
0:10:00there's some of these mostly neutral but there's more contemporary look at that you're like
0:10:03this contempt there and the content actually came up a little bit in our in
0:10:08our study so we collected the state
0:10:13and just to give you some numbers here about what we found of emotions most
0:10:17of the time people were in their in we're just neutral and then about eleven
0:10:21percent of time they were enhanced eight times that surprising content for the next most
0:10:25common ones
0:10:27and then the other ones were negligible less than one percent on average for all
0:10:30for all settings all robots everything
0:10:33but then we
0:10:35the robot's in the different settings individually so if you marginalise out the robot's and
0:10:39just look at low and high setting we find that people spend a lot more
0:10:42time being happy with the robot's then in the high setting
0:10:46and this just getting given genetic receipt it's and part of this is
0:10:51in the high setting it's marking that are semantically understood you and people got really
0:10:55frustrated with "'cause" expected more interaction from the robot's but they weren't they are doing
0:10:59more than just giving this verbal feedback
0:11:02so you want very happy with a with any role and i said
0:11:08and that's kind of the dictate come here the robot's themselves a little more happiness
0:11:12with cosmo they would rather interact slightly with a with a
0:11:16and in by a spoken dialogue system then with a codebook e
0:11:19for whatever reason
0:11:22and you can sort of tease apart but them in their individual settings here
0:11:26all refer to just a paper to get
0:11:29you dig in the more detail
0:11:31we looked at prosody the very simply just for each interaction we average the f
0:11:35zero for the entire actually might have in about you know a couple minutes of
0:11:39speech and just the just the participant would not the robot
0:11:43and here some results for that's a if you just
0:11:47if you just look at should marginalise out the robot's in the low setting people
0:11:52had a higher pitch
0:11:54where is not have setting at all
0:11:56the location this kind of goes with you know literature of people who talk to
0:12:01children raise their voice is a little bit that's kind of what we want
0:12:05but even the small difference in feedback next that of the pitch difference
0:12:10in all the robot's and then
0:12:13if you just look at cosmo on the low and high setting or marginalise out
0:12:17the low and high setting you just look at the robot's people talked with got
0:12:21to discount the robot at a very high much higher pitch than the other two
0:12:24about these were kind of negligible
0:12:26is a kind a negligible neither a little bit different but i mean not a
0:12:29whole lot of different so
0:12:31the way the robot looks the way the robot talks on prosody kind of tells
0:12:35us that
0:12:36both make a difference here
0:12:41we then
0:12:43for each user interacts with transcribed speech using speech at a time courses can make
0:12:47some mistakes but we just kind of one with it
0:12:50segments the transcriptions into sentences by detection one sec selsa pretty
0:12:56pretty rough the way we did this we didn't taken to it too much we
0:12:59just sort of to check these transcriptions and passed through some tools that gave us
0:13:05some lexical complexity and syntactic complexity so we have
0:13:10lexical complexity analyze which causes lexical diversity means segmented type token ratios m s two
0:13:17t r and lexical sophistication
0:13:19these are nice measures that we can use and then we have
0:13:23for syntax for syntactic complexity we use the do you level analyser which is just
0:13:27a value between zero and seven
0:13:29zero meeting it's a very short you know one words to words sentence very syntactically
0:13:35simplistic and then but seven means it's a long sentence with a lot of complexity
0:13:42with the with the l d the ls nasty are it's very simple the process
0:13:48very similar to the results we get for prosody
0:13:50in the low setting people use very complex lexical word that very complex vocabulary the
0:13:57thing that was surprising that i want to show you here is the these syntactic
0:14:00complex its complexity and the low setting we have higher syntactic complexity we have more
0:14:06l seven more longer sentences versus high setting
0:14:10i mean for the most part they're saying very short one to word sentences in
0:14:15all settings with all robots button some cases there there's speaking on their speaking longer
0:14:20sentences we dug into just a little bit and we found some literature that serves
0:14:24in this is kind of what we what we found in our data
0:14:27in the low setting its get its infinite it receives not semantic understanding it's not
0:14:31signalling semantic understanding so they just kind of kept talking
0:14:36the sentences got the since in text more complex even if the vocabulary was press
0:14:40so the other measures
0:14:43low lexical sophistication but high some syntactic complexity because they just they just kept talking
0:14:53looking at the questionnaires for each interaction with the gaussian question hermit just one contrast
0:14:58in parents each with a five point scale between your some examples artificial life like
0:15:03unfriendly versus friendly in congress competent confusing versus clear
0:15:09and then we add the following two questions which was the information we are interested
0:15:12in
0:15:13if you could give the robot who interacted with a human age how old would
0:15:17you say
0:15:19we've been than the ages in these ranges we have under two to five six
0:15:22twelve thirteen seventeen eighteen to twenty four twenty five thirty four thirty five we know
0:15:27that thirty five and all there is a pretty much pronounced speech thing
0:15:32what level of education would be appropriate for the robot who interacted with sort of
0:15:36another proxy to age and we said preschool kindergarten each of each grade had its
0:15:42own value and then of course is
0:15:47so just looking at that time just
0:15:50the questionnaires on their own people assigned
0:15:54the low setting here
0:15:56people assigned you know on average lower ages and the high setting on average higher
0:16:00ages sets kind of expected and then looking at the robot's
0:16:05you know codebook you got could work in the no rollback high rate word with
0:16:09uninvited robot gets higher stage i think it's the sort of the most
0:16:14intelligent the smartest the oldest and then we get calls more here which is like
0:16:18the oldest six to twelve
0:16:21not surprising and education which tells a similar story
0:16:26you have the low setting on average much lower much younger
0:16:30what muscle or education rather high setting in and the difference is not much right
0:16:34it's just
0:16:36phonetic greasy verses signalling semantic understanding is just a different feedback strategy but makes a
0:16:41huge difference
0:16:42and then of course the robot's people treat them differently
0:16:46where you have the highest cosmo gets is a tenth grade and then the other
0:16:50ones get undergraduate
0:16:54and that was put the what we found from the questionnaires together with the some
0:16:59of some of the other features that we had i want to point out a
0:17:02few things here
0:17:04in the low setting if you look at prosody the f zero average you can
0:17:07look at is a questionnaire values and as both go up to correlate with each
0:17:11other
0:17:12so high if you have a higher pitch it means we think your friend you're
0:17:15intelligent kind of conscious knowledge and if you have a high or low complexity we
0:17:19think a more friendly
0:17:21in the high setting different things come up here sensible enjoyable natural human like and
0:17:27then lexical diversity
0:17:28and lexical
0:17:30sophistication this one i think is interesting in the high setting
0:17:33he if i'm using more complicated
0:17:37words to talk to the robot
0:17:39it is more likely from into this at about the robot and to be contentious
0:17:43against the robot
0:17:45in a big white at that was the interesting result people have high expectations of
0:17:49these other of the robot in the high setting
0:17:51well you understood me well same or do more they would they were asked followup
0:17:56questions and it would we wasn't allowed to say anything other than sort of
0:18:00given these simple feedback
0:18:03some other stuff which kind of gives
0:18:04tells a similar sort of look at the robot's instead of just the little high
0:18:07setting
0:18:10kind of the same thing sinus
0:18:13sinuses negatively correlated here with them as you are on the other robots have some
0:18:17things as well
0:18:19and this feature is negatively correlated with the low stage will the second most was
0:18:23represented there
0:18:24so we can begin in this little bit more the paper
0:18:30so to predict the perceived age an academic levels now that we have this data
0:18:33we want to use our prof prosodic linguistic and
0:18:40language features
0:18:41what prosodic you motion language features to predict
0:18:45the age and so you fifty eight data points five fold cross validation and we
0:18:50just use a simple logistic regression classifier
0:18:54nothing terribly complicated here not very much data if we use all seven labels we
0:18:59don't you very well if we find a splitting criterion say okay let's split
0:19:03at eighteen years old and see how well it does
0:19:06we can predict fairly well if someone thinks that a robot is of minor or
0:19:11an adult
0:19:13and for academic level we can we kind of the same thing and we found
0:19:16that we can split preschool with reasonable accuracy and
0:19:21so we can tell if someone thinks a robot is a preschool age so taken
0:19:26together you can tell someone if someone tree is
0:19:29assigning adulthood or minor her to adapt to a robot and if they're furthermore assign
0:19:34preschool academic level to the robot and that's actually what we want to do we
0:19:39want to be able to determine do they think my robot is preschool age of
0:19:43the language learning stage
0:19:46so it's and this we did some other stuff along with this stuff that i
0:19:51showed you that confirms the stuff of nova coveralls april actually still workshop
0:19:57where the robot verbally interact this is just a back again the way it looks
0:20:00changes the way human participants perceived robust agent academic levels
0:20:05perceived age academic level can be predicted using multiple features future work is what we've
0:20:10kind of verify because most the right of robot for the job for first language
0:20:15acquisition test and it doesn't look like human which
0:20:19we
0:20:20people don't wanna look at you and we thank you for your attention
0:20:48i was sick you know i was curious why use really dh i don't want
0:20:53nine or by data preschool is really small children cell
0:21:01it would have sleep day
0:21:03education level in many different ways right
0:21:08we worked
0:21:11we did try a couple of quite things and rely on that one work but
0:21:14also make kind of sense
0:21:16i don't versus models that's seems like a reasonable splitting criteria let's use that of
0:21:21course it's not the one we're looking for which is
0:21:23i think initially to make a child and that's what the preschool one does pretty
0:21:27well
0:21:28it just words that's the way i'm sorry
0:21:44when you have this
0:21:46chart of the predicted a for the low and high level kind of
0:21:51looked
0:21:54two of the
0:21:55the parser
0:21:57it was lots of there is no the u
0:22:00this one where i can read it was
0:22:03below is
0:22:06more likely to be perceived as being a child but also more likely to be
0:22:09perceived as being able to just
0:22:11is really unlikely to be a teenager
0:22:15there's these pesky
0:22:17undergraduate assignments
0:22:19so that to for example well i mean in general they both get a look
0:22:24at the academic level here okay i got quantities and then this one gets an
0:22:29additional to have great one but i mean there's a lot more like preschool here
0:22:32as it is more you know can and first grade stuff here i mean
0:22:36on average it is quite a bit more but there are some people who
0:22:40hi how that's why that's quite interesting and there are a
0:22:57i may have missed something in are also in your
0:23:01questionnaire supply
0:23:04saying
0:23:06people
0:23:09expectation
0:23:11the robot
0:23:13i was wondering give
0:23:15this was what people told you or
0:23:19your explanation of the data based on other kinds of things that we found a
0:23:27assess that the robot's as knowledgeable and so on some more
0:23:30i and experiment together so the iq stuff is what they said
0:23:37questionnaires the other stuff like the p the l and the ear
0:23:43right so q means a came from the questionnaire you means a came from the
0:23:47motion stuff that we got from the microsoft emotions api that we just read off
0:23:50of it
0:23:51so we have what they're telling as we have we're getting from just data we
0:23:55collected from mister the correlations we collected from that so it does it is our
0:23:59interpretation like with this set content we're saying okay a high setting
0:24:05we detected from them that the use i lexical diversity using our tools
0:24:10and so collected from than that they had high said that the content from my
0:24:14something like so in this case those things were correlated but in this other stuff
0:24:18this is what they reported like i they thought it was enjoyable my sensible
0:24:22whatever and in this case like in the low setting highlight when it was high
0:24:27lexical sophistication they questionnaire they would've given a high score on the questionnaire
0:24:34it's a testable
0:24:36so i understand that vectors
0:24:40really
0:24:42you know
0:24:46yes okay
0:24:52a common
0:24:53thank you