0:00:15welcome to the special session so it's actually a time is passed so that start
0:00:21now so this your we propose the special session entitled future directions of dialogue based
0:00:28on intelligent personal assistant so i'm enrichment cms from carnegie mellon and
0:00:34and i'd exp opengl is from toshiba research
0:00:38so next meeting so we have one hundred i'm sorry have one or how our
0:00:43from now
0:00:46i sorry seven
0:00:51yes so today
0:00:54it's a is a row of a personal assistant so many tech giants
0:01:01released dialog-based a person assistance and including i'm google in microsoft the many as a
0:01:09front and of their whole service s so it's a big deal so now dialogue
0:01:16system research is
0:01:18is very core crucial for that kind of service s so we are all dialogue
0:01:24system researchers
0:01:26our rock stars and you know society in this era i believe so that's why
0:01:31the actual we propose this process a session to discuss our next a future direction
0:01:37or vision
0:01:38so
0:01:40let's get started
0:01:42so this is today's agenda so we're gonna quickly introduce introduction to choose its average
0:01:50the common ground and is and then print discussion
0:01:54so we have four not able panelists from academia and industry so let us introduce
0:01:59maybe later and the and then q and i
0:02:03so this is actually kind of very flexible kind of discussion so we happy to
0:02:08get your questions and yelling from your from audience so we happy to have your
0:02:16opinions anytime
0:02:19so
0:02:21by the way so what's our prison system now so you know what it is
0:02:25so this eerie or contain no make a soft or and a spoke and also
0:02:32that's so
0:02:33i am is on so they are maybe we can hold it hold them helplessness
0:02:37distance but different kinds but
0:02:40assuming they are presents distance so anyway qtd a is that personal assistant is
0:02:46like this so if the agent that can perform task or services for individual so
0:02:53basically that's it so in this session we we're gonna define persons systems something like
0:02:59this just simply a personalised task management class spoken dialogue i capability
0:03:05so that it is
0:03:07that's our
0:03:09i definition of presents distance has a common ground in deception so it's with that's
0:03:15it
0:03:16so
0:03:17so that's look back level of it
0:03:20the past
0:03:21so
0:03:22i think we think so the current personal assistant has two major streams one is
0:03:31task and management also spoken dialog researchers i think i'm not the right person to
0:03:37describe advantages tree
0:03:38but so that side a purse not personal personalised has management's so because a best
0:03:47if a region one of the region will be apples knowledge a navigator so that
0:03:55was very actually very usually a vision so
0:03:59it's actually it's actually exceptions it presently this eerie just fall of that vision
0:04:06i believe also in two thousand three hundred arbour announced howl a project which is
0:04:12very big project so gonna discriminately more
0:04:16so we are so this the knowledge navigator so
0:04:20some of you most of you may be already knew this but
0:04:24i don't is actually be video so very additionally a video even now is very
0:04:29interesting so we don't have time is i
0:04:36okay i
0:04:39this research cheating one just checking
0:04:44a short circuits last year postech second extension was translated this mary his i am
0:04:50i right this is sufficient
0:04:55is that in i started lots of twelve o'clock
0:04:58you need to take there are actually on schedule and
0:05:02in some you have not sure exactly for stationary amazon rainforest
0:05:10this leads to those from last semester
0:05:18no that's not enough
0:05:19i need to review more recent literature propose a new articles i haven't read journal
0:05:24articles only
0:05:26find your financial gilders has probably still there are two for station is a it
0:05:32dialogue it is also rainfall some sarah
0:05:35it also covers a classifier absolute reduction in africa
0:05:39and increasing importance of so
0:05:42context you like this one but sorry i'm sorry should increase for secure features it's
0:05:46of these go to the u two that there is a will be they'll the
0:05:50feminist video so even also serially for example cannot can be that kind of quality
0:05:55so the me while so the overall
0:05:59and nouns and their big project called a hall perceptive assistant
0:06:04that learns
0:06:06and then it darpa award it's is a light and cmu so is actually the
0:06:13ball is a common general architecture and it in col and the rater
0:06:18and cmu more
0:06:22each instance of the paul architecture
0:06:24so and then this was paul slash colon architecture so the main focus was learning
0:06:31so learn from user
0:06:35and is a column had a lot of capabilities in terms of task management
0:06:40so example a one of the most dialogue-related
0:06:47capability was meeting assistant i think so it has
0:06:52for example dialogue act detection also summarisation and so on
0:06:57cell
0:06:58the other kinds of
0:07:01verincation there
0:07:03so and n is the rate or the reader is not i think if you
0:07:06know this piece i correctly so that was that was not so much a dialogue
0:07:11system but it was a male management and outer scheduling task agents
0:07:19so and it's eerie this was the most are used
0:07:26slice of a serially
0:07:29so it has it's it was very first agent that had
0:07:35spoken and in could also be the management's so but
0:07:40as you can see that the conversational dialog management
0:07:44interface user to be very small capabilities so this is really that's is you know
0:07:50that's the past and
0:07:54but i'd a few comments about where we are now the student maybe them start
0:07:58the conversation
0:08:00like many people say we are
0:08:04going through the spring or artificial intelligence
0:08:07and we are also transitioning doing you error of intelligent person and the systems
0:08:12there's many factors that have led to these
0:08:15and
0:08:16this can mean
0:08:17i second that maybe we can bill long-term relationships with whether it is one they
0:08:23are year a lifetime
0:08:26and that these a sequence could be so sure
0:08:29so many factors contribute to this but i think there are two main cochlea mostly
0:08:35the advancements in hardware we have cheap and powerful hardware and we have
0:08:41a lot of pervasive smart devices whether the as smartphones during bracelets whatever
0:08:48so this creates a lot of be
0:08:51and very use beta so we have
0:08:53powerful machines and b i that we can make a little as you work
0:08:58the these enables us to tactile problem that were previously
0:09:02a little harder to soul
0:09:04in this is evident by the availability of the tools that we have now in
0:09:08the web
0:09:08where there open source or no
0:09:11so tools like nucleus are for speech recognition or both frameworks over
0:09:18maniac things
0:09:20how can we combined of these things into ending a separate utterances and
0:09:25one of the ways we can think about it and of course this is open
0:09:28to debate
0:09:30is that we can simply be assistant into the cognitive functions like it has into
0:09:35the communication channels
0:09:37so the ses and in
0:09:39needs to be able to reason about there were about the knowledge about everything need
0:09:44to be able to communicate with the u
0:09:46so it needs human computer interaction
0:09:49and it needs also a lot of interface is equivalent to devices whether that you
0:09:53like or smartphone
0:09:55car syllables anything
0:09:58i either
0:10:00so maybe you like to mention are assumed in also mentioned earlier
0:10:04that an agent needs to handle multiple complex tasks maybe sometimes
0:10:10i don't characteristics are like the video we showed before
0:10:14we need seam seamless and context aware
0:10:19understanding and generation we need maybe start can be in
0:10:23ability to incorporate new knowledge into what agent knows and so one
0:10:27and there's are sort of challenges like for example communicated we could of these devices
0:10:32maybe is not very interesting thing the research where is for you know new students
0:10:36but it's a very big problem when we try it processes
0:10:41and that it what it wanted to me
0:10:44was that
0:10:46the agent needs to be able to you dark evolving relationship with the user like
0:10:50i mentioned before maybe select from and they are five maybe it's or maybe for
0:10:54lifetime
0:10:55you don't need to be able to reason about it would
0:10:59in this sort of
0:11:01backing the context over time like i see here so events change things changing the
0:11:07world and we need to be able to refer to get passed to the future
0:11:09we present
0:11:11and these are all points for
0:11:14discussion so what is the future of the person of a system
0:11:18we have here
0:11:21for topics for discussion just exam star our conversation
0:11:27what is the current state
0:11:28of a person and assistance in research and in industry
0:11:33what are some big technical were connected to be absolutely induce all before we can
0:11:37get the next generation personal assistance
0:11:40how can we can't big data
0:11:42in terms of collecting the data what kind of so we just do we need
0:11:45how do we manage privacy issues card we you know all of these things learned
0:11:50from data
0:11:51i do process
0:11:53stored in a minute of these things and then we
0:11:57a topic about the future a revision of the future of a version of this
0:12:01is constructed kind and what it cannot be
0:12:05and so we have for notable boundaries that should be for
0:12:11i would you introduction stored in the interest of nine
0:12:15the first we have where professors even who is a professional for information engineering
0:12:21i'd information engineering department division of an invasion of errors in
0:12:26as a long track record of research on spoken dialogue
0:12:30particularly speech synthesis recognition dialogue management among other things
0:12:34because she of numbers of words for each contribution
0:12:37in this going to name a few
0:12:39of the signal processing society technical achievement award
0:12:43middle of scientific achievement from is curve
0:12:47in other things
0:12:48james flanagan speech audio processing of interest of time or sorta really one many other
0:12:54words based a pair of words
0:12:57on proficient german
0:12:59we have comments are
0:13:03what is a senior speech scientist accommodation
0:13:05what you "'cause" work in several voice in april product of common
0:13:09in this approach is leading a group of researchers currently working on models for i
0:13:14think several service and also holds and i don't open to position language technology since
0:13:19you could at carnegie mellon
0:13:21because a lot of experience in speech recognition and translation in the plastic as workforce
0:13:25only multimodal technologies and toshiba research
0:13:30we have a professor jeffrey become who is an associate professor human-computerinteraction institute of carnegie
0:13:36mellon
0:13:37with along track record in crowdsourcing
0:13:40and crowd power words used in the work for natural language applications
0:13:45prior to joining carnegie mellon you with an assistant professor to begin with different just
0:13:49there
0:13:50because it should main action a word someone weights and then it's of cardio to
0:13:54were either respond which as well as you can see what one of
0:13:58it might you can use thirty five innovators under thirty five
0:14:02and we have a urine wrong
0:14:05with a coupon there in c e o or for three are good at i
0:14:09a startup company got developed conversational interfaces
0:14:14you cause many years of natural language processing a spoken dialogue systems research and development
0:14:19experience
0:14:20if you guys formally worked for by do the super research period where
0:14:25and university college london from bridget good use for each d
0:14:31so next we will ask our panelist introduce themselves you've a little bit of
0:14:37the fusion talks
0:14:39after that we will use seed questions and whenever you want you can raise your
0:14:44hand and
0:14:45ask different questions
0:14:47so
0:14:50i start okay
0:14:53there we have an from question which of
0:14:57it should be written down
0:15:01current state s and bottleneck to i don't really think this very much
0:15:06say about that
0:15:09you may disagree but i think broadly speaking
0:15:13we have enough in place that we know how to build a different bit
0:15:18of a system
0:15:19from speech input through understanding response generation
0:15:25interface to the backend
0:15:28i mean when people use serial available now we pick up the edge cases and
0:15:34we laugh at you know series failure to do this and so on
0:15:38but if you actually focus on what we system can do
0:15:42and you comparing what they could to five years ago it actually pretty remarkable progress
0:15:47in my view
0:15:49it's mostly engineering
0:15:51and i think that the over the next few years it will mostly be engineering
0:15:56but makes these systems ever more capable broader coverage a makes a few a stupid
0:16:04mistake
0:16:06within each individual sort of subtopic it's clear we can all we do better
0:16:13and i think they'll be no is no shortage of things we were interested in
0:16:17research the focused phone but fundamentally my view is that there isn't a huge kind
0:16:23of missing piece that no one knows how to the until we still that
0:16:29we call build a virtual personal assistant not one like in the film which i
0:16:35haven't that ceasing by the way well i started watching it they also able i
0:16:39think that fell asleep a twenty minute
0:16:41people to only with i should watch that than actually manage it but we're not
0:16:45gonna get to that stage any time soon but i think we can be there's
0:16:50a long way to go and it's mostly engineering
0:16:53i think one of the big problems that perhaps is kind the community has
0:16:58is the data problem
0:17:02there is
0:17:03i've worked on in spoken dialogue for a long time
0:17:09mechanical turk
0:17:13enough silence revolutionised what we could do
0:17:16because we
0:17:18once mechanical turk became widely available we can build a system on we can be
0:17:23employed and we can pay people to use it
0:17:27and we can get several files and dialogues if you like and i saw something
0:17:32doing and i will not run time we can we can measure of performance
0:17:37but apple now process about a hundred million
0:17:42also very compensation the week
0:17:46and i'm show google
0:17:49and i'm and non-face work or handling similar constantly now the kind the machine learning
0:17:55you can brew on that would that kind of data throughput really very different so
0:18:01what any and the academic and even contemplate doing
0:18:05no
0:18:06i think one of the issues they really have academia and industry if you like
0:18:13work together
0:18:14so that the academics can actually focused on the real datasets
0:18:21where the real information is on the right real data flow though and find ways
0:18:27to work
0:18:28and i mean and that leads me onto what is it my view one the
0:18:33biggest
0:18:34questions about taking these systems for words and that is the privacy issue it's something
0:18:41that different companies have a different take on
0:18:45but and i think the public at the moment a pretty much asleep
0:18:50on this issue
0:18:52many people don't know okay what the privacy issue is
0:18:57issues are you when you sign up to use very for example you scroll through
0:19:03this stuff no one reads
0:19:06well almost no one reads you click the bottom n you agree and most people
0:19:11have no clear what it to the we agree so
0:19:14apple has their you really rather strict privacy
0:19:19protocol
0:19:20and actually it's researches
0:19:24don't get to see private information and i can't speak for the other companies
0:19:31but it seems to me that we'd
0:19:34be these are issues which you need dealing with and i think many remote transparency
0:19:39the main evenly with some legislation but without transparently in some clear rules that sort
0:19:46of everyone's working to i think we're going to get come unstuck because there is
0:19:51a danger that something happens which is not good
0:19:54and is the backlash and then be systems become
0:19:59i don't know
0:20:02the great the people don't normally use them for reasons that the just a full
0:20:07of the but vector previously but also some very interesting research that you can do
0:20:13so one of the things that we work on apple is differentiable privacy
0:20:20basic idea is if you have one a client and you want to collect data
0:20:26from them what you can tell which you can collect the data are on the
0:20:30on from by the device and then you are noise to it you at sufficient
0:20:35nor use that you can't actually any longer identify the purse more in the
0:20:40any of the private content but when you take that they tell you what you
0:20:44aggregating with the simulated a noisy data from a hundred million devices you're effectively filter
0:20:52the noise and get the statistics you looking for
0:20:55without ever seeing the private information that was done any individuals device
0:21:00i think this very interesting research is starting to manage along those lines and maybe
0:21:06it it's a roots to being able to make the kind of information the more
0:21:11acceptable to academics
0:21:13if the right channels with doing this which claiming to protect the individual privacy but
0:21:19actually allow the data to be more widely engaged i don't know
0:21:23i'm the final thing i just wanted real probably different way to one on the
0:21:27vision thing
0:21:30i think really based scale for the companies that the doing this
0:21:35because
0:21:37the what first of all i think we will move to a situation where a
0:21:42what individuals how one person was system and the use that personal assistant for everything
0:21:48why would you want to switch if this one personal assistant knows everything about you
0:21:53by no go history or timeline what you like what you don't like what you
0:21:58did a week ago a year ago and they can of influence would be a
0:22:03real service that person
0:22:05now that's going to be as it
0:22:11thing than anything that you might have with anything that's being before so face but
0:22:15want to want to k values is always talking to facebook
0:22:22and so we try to make facebook a sticky as they can and on and
0:22:26various will be will try to estimate thing but once the virtual personal assistant really
0:22:32get yelling then you will have here r one and it'll be very difficult very
0:22:36high for to think about switching to anyone else's
0:22:40so if people really start using court on their in rows of this time to
0:22:45look you know larry data from three or too much
0:22:48obvious tracking which can we are working for conducted a larry but anyway so
0:22:54however it's a terrific siri cortana alexia
0:23:00if you if it people's that get a get really attached plus this they won't
0:23:06leave and then the money will start to flow in due course
0:23:11so who owns the purse list is gonna be a very big deal in the
0:23:16future
0:23:21thank someone
0:23:26that with
0:23:27like word
0:23:30or not
0:23:31the with some or all
0:23:34is to actually greater this topic like
0:23:40and compared to other processes not exactly hours of the
0:23:44not a one-to-one it really cool you a bit about that rollment feature vectors goal
0:23:51of money
0:23:52that is the different for your processes
0:23:58nevertheless so alexi voice overs it has to go to bring a lexical everywhere so
0:24:04i
0:24:06and also doesn't degrade offering
0:24:10it's a service
0:24:12so i to integrate this it will
0:24:15all different sorts of artwork therefore the that you might actually
0:24:21have
0:24:22a lexical or also
0:24:25it's just all over to like your foot or
0:24:29in your research
0:24:32and about the visual the us to
0:24:36right a space where you actually always have access to one of the person looks
0:24:40as a system
0:24:45the one of the bottleneck
0:24:48there i think it is likely that all of what you have a big
0:24:53number of devices to what it's looked at all
0:24:57that you want to you interact with the okay so that many different devices
0:25:04and you where what is the one can do and
0:25:07to enable is kind of six q
0:25:11i've to scale
0:25:13capabilities of
0:25:18what the system is able to do
0:25:21so one way to do that "'cause" i think that
0:25:25that's right up to
0:25:27enabling us to develop skills looks a little bit
0:25:32i think what has something similar to that
0:25:36of things about which allows
0:25:40enabling
0:25:41a number all a pattern it
0:25:45telling the functionality of the system
0:25:50to make life of the word
0:25:52see
0:25:53comparable
0:25:57and accuracy problem rather
0:26:00coupled a perspective
0:26:04as the most important thing right so you have to a great value for the
0:26:09people having a process that's one thing but actually utterances to something useful that adding
0:26:15more or
0:26:20with respect to data
0:26:23is a
0:26:25of privacy is really a
0:26:28topic also so
0:26:32we have just companion up when you go through this you can take exactly
0:26:37what alex over
0:26:40and request
0:26:42after an utterance or to get them you'd
0:26:46so that's of a very important is to keep the crust
0:26:54customer so i don't know
0:26:59using the other companies but not on
0:27:04it's below
0:27:10all of the utterances
0:27:13having so many different devices actually or planning to support so many different devices us
0:27:19a kind of representation
0:27:22for building statistical model is one of
0:27:26problem
0:27:28we have to write an article so you don't wanna send every time you have
0:27:32you like your
0:27:34data vector notation
0:27:37for
0:27:38from a doctor
0:27:40what i
0:27:41feasible
0:27:43but it's redundant and a wasteful
0:27:47so
0:27:49wrong
0:27:50from the perspective of scalable annotation scheme i think
0:27:54i think that is
0:27:55what role
0:27:58all
0:27:59otherwise
0:28:01like i think scale actually and we understand what we want to do or what
0:28:06to
0:28:07as i can listen to what
0:28:12the customer service
0:28:14and actually
0:28:16no
0:28:18like to pose if a customer service
0:28:20that's what color and they tell us what's wrong
0:28:26and
0:28:28but isn't
0:28:32one way a novel way
0:28:35is quoted a system for a lot of local
0:28:40people discuss to go
0:28:43for make records the true
0:28:47what
0:28:49what do cool
0:28:52this respect to dialogue a cue at lexical walls one
0:28:58there are several
0:29:00so it is not likely to the pride
0:29:04i
0:29:07the i think dialogue this will definitely something that a lexicon who are right now
0:29:13it's really want like from a machine so i wanna but on the light can
0:29:17spectral light
0:29:19if it doesn't really realise
0:29:23recognise which like you want to come up cool so many of them will have
0:29:27a for conversation with you but it's really very
0:29:32task-oriented in that so
0:29:35a hurting
0:29:36a longer conversation to collect at the moment is this
0:29:40what we are not
0:29:42and
0:29:43i think that was to question about how to commercialise
0:29:47systems okay thank you really have to create the trust in the process
0:29:52at
0:29:53that a that
0:29:56the system will work well so if someone tries that it doesn't work
0:30:01the goal for it again
0:30:04you lost a personal to we can with some criterion
0:30:09and
0:30:11models one for
0:30:14what is
0:30:15actually i think it sometimes
0:30:17doing something small as well then although promise
0:30:23and
0:30:26one of the technical bottlenecks that i'm currently c is related to machine learning if
0:30:31you get more data
0:30:33you not always converge to the same local minima
0:30:38which functions
0:30:40and
0:30:42people the a get a better experience but some for some people it breaks right
0:30:46so you have to have a way to make sure that actually not
0:30:51too much regression happens for large crowds of people or actually sometimes things
0:30:58in general are great a very with all
0:31:03and
0:31:04from and former four point of view the mechanism to make sure
0:31:12can fix to use it so it can kind of engineering problem or so
0:31:17to design a system
0:31:19allows you to
0:31:22crow to system
0:31:24in perception and make it more
0:31:30maintainable
0:31:39why have slight of
0:31:41i is a remember
0:31:45well i'm just bigram a carnegie mellon and i'm in the hci l t i
0:31:49institutes there and so i'm here because my group of the past couple of years
0:31:54have been developing crowd powered dialogue systems with all introduce you to
0:32:00and over the past couple of years we also been working to automate solve kind
0:32:04of explain what that means
0:32:06so i where some questions as was mentioned and so you know where are we
0:32:11well as we all know people are actually using
0:32:15using these systems now there's a talking to their devices which is pretty exciting right
0:32:19many of you raise your hands about the legs i talk to my watch and
0:32:24i'm not always just talking myself i'm
0:32:27clearly most people i've interacted with have a few specific function that they use those
0:32:35devices for that they've learned somehow those devices are pretty good i in so that
0:32:40the kind of this illustrate this point i was that the local library the other
0:32:44one open actual value your go
0:32:46and i found this work
0:32:48and it's work is called
0:32:51walking just your right a it's a great book
0:32:56i will tell you all of the things that you can talk to serious about
0:33:01and the recently reasonably large but which is pretty impressive at the bottom recorder and
0:33:05may happen shriek utterance but and it update now maybe it about inspect now
0:33:11what about what i think that that's where we're at we're at the point where
0:33:14we have system that can reliably do a few function back to you know pair
0:33:18number of functions and we're teaching people how
0:33:21to access those functions
0:33:24and so well we've been trying to do is put out a system crowd power
0:33:28system that explores what we might be able to do if our systems could
0:33:34be as robust as the human system and so on our system call chorus with
0:33:38we developed if the scrap our system in a way to work is that people
0:33:43talk to it able to hang out so they can talk to it but in
0:33:46speech recognition or type to it those messages go to a crowd that we were
0:33:51route on the man's within a
0:33:54a minute or so we get a group of workers but then also just responses
0:33:58and another workers about whether they think those responses are good an effect are good
0:34:01if for them back to
0:34:03the user and if the user once they can reply and some of the same
0:34:08workers a menu new workers of joint because the others have left well actually responded
0:34:12and they can have a dialogue in this fashion we have explored how we might
0:34:17maintain consistency over time so there's a memory space over here the crowd workers can
0:34:22and of remote access a learned about the user as they have a conversation with
0:34:27him or her so maybe i've learned that you are allergic to still versions of
0:34:32the next time you ask me for a restaurant recommendation i should not recommend a
0:34:36thief it'll task
0:34:39chorus all kinda different things and because the people maybe isn't so surprising that horses
0:34:43pretty good at responding
0:34:45and they have a travel at it was some idea of how to make spaghetti
0:34:48with it i i'll kind of crazy things
0:34:50and you two can ask really things
0:34:53by going to talking to the crowd artwork
0:34:57i hope and i encourage you to try to i would say it's perfect right
0:35:01we're doing a lot of things in the backend to try to corner responses but
0:35:07i think it'll be
0:35:09surprising and even though you know you're its people i'd that you'll be surprised at
0:35:14the red and robustness of responses of made
0:35:18also so that the right so that might be what you thinking so what you
0:35:23this is just people talking to people are not ever know my note that the
0:35:27most obvious thing in the world okay and you mostly right i mean there are
0:35:31some challenges when we introduce an improper off and where there are only doing the
0:35:36short after the never done before
0:35:38and if you work with mechanical turk might be surprised we can have to get
0:35:41people quickly and of they do more or less to think they're supposed to do
0:35:44i don't really surprises on the quality of answers good in back but again so
0:35:48what well what is i one reason why we might care about this is that
0:35:52by deploying a system that we wish we could automate
0:35:56we might learn about what we don't know how people actually want to interact with
0:36:00the system like this we get a lot of inside i think into that by
0:36:04deploying a system something that you don't necessarily get an artificial scenarios
0:36:09we don't a data driven improvement right so actually collecting a bunch of data will
0:36:13release it is that as we go and it's real data from real people asking
0:36:18you know questions that you know the first question or two there is an estimate
0:36:21of the curious eventually because they actually wanna
0:36:24one of the answer i think maybe the more interesting one though is that are
0:36:28thinking about hybrid workflows that combine a automation with people talking to two examples of
0:36:34things that we worked on just the taste to give you a sense of you
0:36:38know words is going or the person the system called guardian as this the crowd
0:36:42powered a dialogue system
0:36:44or web apis and so the just this we use the not mostly non-expert crowd
0:36:48as a mechanical turk workers to convert the api on programmable web that two little
0:36:56dialogue system which then the crowd helps to run so they do the slot filling
0:37:01they do transitioning of states and
0:37:03and formulating of responses
0:37:05and the what kind of need about that is that we're collecting data at different
0:37:09levels right so we're actually having
0:37:11the crowd provide data not apply you gave me a method and i provided a
0:37:16response which is may be difficult to learn from but at the lower level that
0:37:20you trying to start running the dialogue system with the crowd
0:37:25that then be done to try to push this away from just information queries into
0:37:28actions a is the have the crowd start to work with the user and a
0:37:33dialogue to create rule that their phone can then run i don't know how many
0:37:37people have used something called if t values if t
0:37:40okay so it's basically a way that you can set if the and roles of
0:37:43things like you know i was
0:37:45i was late per meeting this morning the crowd in table why were you latex
0:37:49the twelve the last night in something along with that particular of my car and
0:37:52so from that they can work with you just a well if the was what
0:37:56i had access to be a this api is says it's note overnight than at
0:38:01i put your i'm may alter your alarms with a little bit earlier so you
0:38:04wake up and
0:38:06right so turned out that this idea of using people along with your automated system
0:38:11is not actually new right so most software company is all words are software company
0:38:17the many startups have efforts in this paper they have them and sometimes a very
0:38:20exploded so we already have is but in mention
0:38:24so that is creativity vc this is one of a call centres what have you
0:38:27know their crowd their workers through a obviously can and you can guarantee more about
0:38:32confidentiality in other things well there's another example is likely to be your artwork and
0:38:37things like that i think what's really interesting about this is that
0:38:42we don't have to just rely on automation right anymore not rely on just automation
0:38:46right so whether it a call centre like this or it apple engineering you know
0:38:51more and more templates that can respond to pacific function but it knows that can
0:38:56support we are actually relying on people or amazon building out this key all features
0:39:01of that
0:39:01it's crowd of developers can build more scale intellects that you we are kind of
0:39:06relying on this and so here i i'm just saying well maybe we can even
0:39:10push this vector right so we put it out so that this can happen on
0:39:15the fly with the complete non-expert crowd and so you're dialogue system with a little
0:39:20bit of human input
0:39:21a might be able to do whatever you want the first time you ask
0:39:26f me thank you
0:39:36okay and
0:39:38by nist for a long span prosody and i'm wiry order to be might lead
0:39:42to this panel and have the opportunity to each and exchange ideas with the three
0:39:47cindy are pioneers the in this area
0:39:50and i'm speaking on behalf of the new it is about to start out in
0:39:55china that i can finally the name that really i so please all need to
0:40:00briefly introduce what about doing a trail
0:40:03so what was not writing right now is you we were show chinese conversation platform
0:40:09for creating conversational interfaces like chuckles
0:40:13so i mean so like sends it special because
0:40:18i see a lot of the last the technique that i saw reads as
0:40:22i see a lot of advance the technology has been developed for a unison a
0:40:27lot of major you feel languages like chinese is a languages that have a higher
0:40:32complexity is the independence facts
0:40:36so it's it would be a lot of more challenges in presiding training for example
0:40:41like a knowledge while of ice like go up all other become these microphone has
0:40:47put a lot of average downgrading knowlege graphs and they are what she would agree
0:40:52class on really need to use but in
0:40:56chinese i mean you in the that knowlege graphs criterion you can be used i
0:41:01is all is you much less reliable so there's a novel noise so that's you
0:41:07know like that of
0:41:09they the bottleneck of difficulty with face in chinese and also for example
0:41:16right now we're mining the vibe to find different ways also you scenes and the
0:41:20for example of a little to say were you in chinese we see like or
0:41:26force all the different
0:41:28expressions
0:41:29that's really disasters still
0:41:33and is also what we are trying to do for chinese it's a week optimize
0:41:38the technologies that people had developed that all other languages maybe you know so we
0:41:44construct right and reshape that to adapt which i need for example to avoid a
0:41:50big noise the knowledge graph we try to
0:41:53mining the liability in particular domains and try to quite a great you know
0:41:58a relatively smaller ontologies or smaller knowledge allows for you each the lion the those
0:42:04resolve the ambiguity you know higher labeled you wouldn't there are also using the homes
0:42:09using the information
0:42:14kind of
0:42:16then i'll try to use tools to solve the
0:42:19actual ambiguities and also we are going you know customised solutions for
0:42:25in to have can unite of things all warfare characters in like computer games i
0:42:30and animations it's gone seems away unique optimizer the system for our oak lines of
0:42:35python other companies that require these a conversational interfaces
0:42:40so
0:42:42we provide like open-domain she chided style systems like i still being by microphones and
0:42:48we do task oriented dialogue system that while and the way we also provide a
0:42:54highway of system solutions like similar to the a
0:42:59which was is then we by the way i mean i are used to measure
0:43:03in this product because they are leading chinese system seen in this area and the
0:43:10also because
0:43:12another component of our company you actually the particle creator of o one of the
0:43:17system and we week only the power to the otter project i you previous employer
0:43:24so
0:43:26the experience we gain from the previous systems is carried all the to our new
0:43:31company
0:43:32and the
0:43:35now we sing you the you know the relation of the system where you we
0:43:40believe that the
0:43:42the future we're choices than the in addition to be to be more
0:43:49capable
0:43:50it's to be more human like i mean is shorthand have distinguishable personalities and the
0:43:57you motions
0:44:00that's
0:44:02that's it is it is gain from the you know the previous quite a long
0:44:05so we see in the previous products weeded out
0:44:08so for example we see the you know
0:44:12the chitchat acquire we used actually takes significant the proportion of the that you can
0:44:18require log in the previous system we maybe would and this is even more you
0:44:23know
0:44:23more home and then you task oriented acquire is
0:44:26so i mean this is
0:44:28lady partially due to the subculture in a user in china
0:44:33button release is actually the people looking for complaining more than euros actually solving the
0:44:39task using this were actually use it
0:44:43and also we think they the or word resistance would be
0:44:49actually it's the onto only proactive
0:44:53i shall try to find the right timings will be the there always died off
0:44:56you know passed to be very simple
0:45:00and is
0:45:01so that's what we are comedian to do at real and the where we are
0:45:05we are working on a to make it happened
0:45:08so there are the you little to do that we also have a lot of
0:45:12challenge to solve all examples all to customise a personality of were treated that's a
0:45:17very difficult task
0:45:18and
0:45:20requires a lot of continuing work and the some sometimes a lot of their work
0:45:24and but i mean they're always
0:45:28there will always be solutions so we try some different you know technologies to you
0:45:34rewrites and as a so reshape sentences to make the language style done to be
0:45:39resting caleb also you know all
0:45:42you know human can be you will only in this kind of task like all
0:45:45can be
0:45:47you know community and i can actually ask the user real users to come to
0:45:52be able to these curves are
0:45:55also
0:45:57sorry
0:46:02also we
0:46:05also for asr everybody we can okay yes prior is used sorry
0:46:10sorry
0:46:12like join you
0:46:13i
0:46:16approaches it is very important because
0:46:19what we do is
0:46:21well not fit well i believe you everybody is doing that a very carefully and
0:46:25is back to the initial and i you addition to use of the user's privacy
0:46:31we also face like the political really you critical use rules
0:46:36region though use rules and the this than you know there's nothing we can award
0:46:40the visitors have to do with the you know where a cow previous
0:46:43strong classifiers pastoral are used strict you know the dictionary threshold the ins and mightily
0:46:49to
0:46:51you know to solve that the rights to so that's but this is also very
0:46:55important because what we i actually because we are between the chit chat system we
0:47:01after hasn't chitchat energies had park a lot
0:47:04because i
0:47:06we believe that you know causation is oprah size that generates information and you're in
0:47:13chitchat all maybe multiply motivated by chitchat a piece of information can be generated here
0:47:18in a conversation and this piece of information come you're we distribute either you are
0:47:24not a conversation
0:47:25and maybe got comment either you not a once a long if we can have
0:47:29to be made and all design this cycle
0:47:34the system then be self sufficient
0:47:36so you that way i mean we don't need to actually
0:47:41we improve the system anymore and it itself we'll gonna you know can be it
0:47:47is that the knowledge of self but you'll disability is the propriety c becomes a
0:47:52very crucial part "'cause" if you disclose it information that the user's cows to
0:47:58your system will not a person that's a really horrible usually okay moving the whole
0:48:03product company so we are like i don't know i mean i don't i will
0:48:08it better solution at this point and the
0:48:10like to discuss dependability and i had ears of all this
0:48:15so
0:48:16that's my predictions all and before share you
0:48:20a star with the more you're that is
0:48:23sense
0:48:29okay much so no we realise that we have thirty minutes or so i
0:48:34maybe we can
0:48:35i had a we can extend little bit but
0:48:37so i thank you for introduction so it looks like it sounds like so we
0:48:43are like professor steve young sad so we are engineering
0:48:49pace now so then we don't need more modeling or
0:48:54it's done or i don't know and so given that so maybe each panel is
0:49:01that pointed out that there is it a privacy issue
0:49:05two
0:49:06to make real
0:49:08more realistic
0:49:10the service
0:49:11so
0:49:13i was asked the point you think what's i maybe so let's go back to
0:49:18so
0:49:19so question one and maybe to what's a bottleneck we are facing is
0:49:25so
0:49:26on in terms of technology what's our biggest think
0:49:32i want to ask again
0:49:35at the
0:49:39you're gonna step no issue so i right and i don't and privacy issue so
0:49:46what's our on technical problem we're facing outs
0:49:51but you think
0:49:54i think we have a bunch of silos and i'd love to see them
0:49:57the together right i don't really useful right
0:50:04well as i so that i think you know we have all the bits none
0:50:07of them up of five and all of them can be improved we have been
0:50:11a bit though putting together systems that work
0:50:16you can if you use modularity and you if we can seamlessly switch
0:50:24portable click for maxine over there so maxine building a system that cmu which is
0:50:30actually and integration of different dialogue systems from research groups something around the well
0:50:39if there are enough of was and the and a bit like the chorus this
0:50:43them if the users talking to they sit and it appears to have huge coverage
0:50:50experts in many different areas
0:50:53the fact that actually modular and multiple different systems
0:50:57is completely you know that uses a can't see this there oblivious to it obviously
0:51:05if you start switching topics the way humans can do we within
0:51:10within the conversation it might fall apart with the w can do by building modular
0:51:15systems and scaling the
0:51:19you know i think there's a long way to go
0:51:22the wrong i'm not aware of a specific thing that we count the
0:51:28stopping as building these systems but maybe we should ask the old
0:51:35yes please
0:51:52well i think that sometimes something i think
0:51:55i do well the question is it if it's an engineering problem with the research
0:51:58community the as i said none of the components we have a perfect why i
0:52:04mean the
0:52:05and so as we go is done you know we by the dialog state tracking
0:52:10challenge there's lots of
0:52:13there's lots of things one could set so to improve slu and so on
0:52:19well i think the real challenge is actually how we make the data are available
0:52:23so that academics can actually work on serious datasets
0:52:29and not
0:52:30the something frank tori datasets you know of a thousand a few thousand dialogues
0:52:36in the list of interesting stuff that's not what you know microsoft or apple or
0:52:41am doesn't have they have datasets that the several of the magnitude bigger
0:52:46and it would be really great if we could leave bridge the academic community to
0:52:51actually be of the work on something that is
0:52:53really very large know whether the dial poles or something like that can generate similarly
0:52:59obliged a dataset
0:53:01but we can work on of the research community that be great but i think
0:53:06that's probably the major challenge
0:54:25well i think i the answer at all these questions is unless you have a
0:54:30system which
0:54:32real users are motivated to you
0:54:35then it's very difficult to get they they're watching all quantities so the reason that
0:54:42but google an apple and so on a have so much data of a certain
0:54:47time i is the people actually are motivated to use a variant google now on
0:54:52the lexical so
0:54:54and now one of the things that we don't know maybe this is the
0:55:00you know i arg awhile view on it is the degree to which beings the
0:55:05algorithms we develop a generic and the extent to which we can move them from
0:55:09one application to another without having large amounts of data no they that you know
0:55:15separate cases only just come out where we're actually
0:55:21inviting developers to so essentially attached that third party apps to use a series front-end
0:55:27a likes has a similar ecosystem the way these things will work is in fact
0:55:36that is certainly for initial deployment if you have a coming which specialises in dialogue
0:55:41software to interact with patients
0:55:44setting aside all about the very real
0:55:47you know i think of issues that may be those applications have ben but the
0:55:52algorithm the models we build may well be generic enough to bootstrap a reasonable working
0:55:58system and then the more data you collect about three get
0:56:01so i think to some extent this will evolve in time and you'll have better
0:56:07tools so explore some of those issues that you're so
0:57:06i agree inhibit the topic of this of this section is virtual best lexus
0:57:12and are they were defined that the beginning i think that you're pretty much on
0:57:16the edge of o
0:57:23that's all that's the v i justification
0:57:27no that's the vision actually don't know the that that's true you know i think
0:57:33the what the apple doesn't want you to do is to be locked into a
0:57:38lexus sufficient that when you buy something
0:57:41you're gonna user likes the for the advice to by its is and the mechanism
0:57:46so by also but lee i'm short that's what japanese on this thinking about is
0:57:51well you'll do not a in how he's going to make sure the amazon is
0:57:55the channel for buying it
0:57:57that it does work really well on lex all right now
0:58:00i
0:58:03well i don't know going work i
0:58:14and
0:58:15cell i would like to raise and grouping users here i think are very annotated
0:58:19and that little change
0:58:23an elephant you that had training case you grad asking theory things that from when
0:58:28he missed seeing ninety year old whenever i got addressed like first lock average and
0:58:33i found in for a year we went round of entity that's you get and
0:58:37here we extract lexical where satellite that scares the crap data is
0:58:44every time and that is wonderful they don't you know interactively and in that we
0:58:48relevant having to our home or our parking or slightly altering the trajectory of bare
0:58:53it's any cell i'd like to have what energy in a research why their wedding
0:59:00picnic excited about
0:59:02to do research on an outright
0:59:08regression so i always
0:59:13e value and to that but also have a possible children and they are able
0:59:17to talk to lex actually the older one love to talk to him and the
0:59:20younger one which is always understood it runs are set him
0:59:25i think that it out how many people were also inspired by the young ladies
0:59:30primer of diamond age but i think that's pretty fascinating we obviously there are privacy
0:59:36and confidentiality a concerns but you know children the children are the future and they
0:59:43will be the ones using these devices and
0:59:45i think we should be listening to an finding ways that they can shape the
0:59:49direction of use
0:59:50devices because they'll be the ones living with them
0:59:53so maybe what's your onto split
1:00:31so if you have any bad experiences the as you all children
1:00:36developed have it's all and things that you from this talking to say every that
1:00:40you're i really wish they haven't
1:00:43the injury
1:00:45along that line is it wasn't article i don't know or remember if it wasn't
1:00:51go up lexical we're to discuss the behavior kits that they don't use the polite
1:00:57word like
1:01:00a lot can you do this the right so because the machine of the data
1:01:05we like that it would have to techniques or something like that the parents got
1:01:08very upset about that because the changes
1:01:12in the is the whole kit okay so i'm not sure how to really fix
1:01:18that because but every the parent in the floor to the
1:01:23like that's perfectly and hear something that i think we can data and i think
1:01:28it's the iceberg
1:01:30so i have an extant counter most the time we just use its it like
1:01:35played of adl and listen to that purple ham sound you know don't have a
1:01:40copy anymore from forty years ago whenever
1:01:44i
1:01:46i one thing have noticed that in china tests that recently in several when they
1:01:50do all the time is set at time if an utterance i don't have to
1:01:53do that and say you say you know the next you know set a timer
1:01:57for five minutes or ten minutes to whatever and the echo star and my natural
1:02:01we actually need to say thanks
1:02:03since she's open the channel
1:02:06right i given in this task she's happened the channel she comes actions spanish the
1:02:11timer and i say thanks and the time we just keep skyline
1:02:16so that i k
1:02:18however i thanks
1:02:20the timing chest each guy i think is thanks that means a turn at the
1:02:26time there it's not that fast so what you have there is this crossover between
1:02:31social dialogue behaviour at back like a greeting or thanks and these task oriented and
1:02:37here and i think we have no idea
1:02:40have to get pragmatics in that situation in at great others to things and i
1:02:46think it's it is unlikely that i at and i spoken it in trying to
1:02:50explore the and it and here's another one and these are kind of maybe it's
1:02:56just engineering and i'm really next step
1:02:59and i hate came in aston lex's our and on the top
1:03:03and i stand that real ran on the topic
1:03:07i never seen a tuple are so kinda separate that i don't understand your query
1:03:15i said it's right in the context you know she she's right she's in this
1:03:22stage is no its not achieve any kind of a state right
1:03:26i think i hear some or i might pay you have to do next that
1:03:32this was a nice to christ's h cisco i mean and then she said saddam
1:03:36stay you know in nineteen fifty seven elvis presley made his first you know whatever
1:03:42right so i eight o'clock in the money i think and i
1:03:47i was i hate and make it more accent and she tells me exactly the
1:03:54same meaning everything time so i think we just have no
1:04:00we don't really know how to integrate these this kind of social mad with the
1:04:04task going behaviour that's mightily
1:04:07and
1:04:09the that the right thing is probably you love the wifi but that i was
1:04:18a response from the device itself so it can give you an answer about politeness
1:04:23that we found out what we did the first spoken dialogue challenge
1:04:28and i guess by now were allowed to say that there were three systems there
1:04:31was eighteen tedious cmu and this cambridge and they are served it community of greater
1:04:38pittsburgh to answer the phone on and we found out that when people spent to
1:04:43the cambridge system anywhere much more polite
1:04:47i the dataset a cable we but we it has something to do with the
1:04:53accent
1:04:55absolutely awesome
1:04:57i do not so i
1:05:01so i have i have a question of a user point of view from all
1:05:06this i think it there a certain number of checking points okay those in this
1:05:12you remember when the internet was first used
1:05:16what we were all using it we're going to general public can use the stuff
1:05:20and the general and the general public used it when a well made any interface
1:05:24that with super easy to use this
1:05:27right now there was another chipping point for is i and with the far-field my
1:05:32microphone array in cattle and i use it in you know you walk out of
1:05:38the shower and say what's the weather and then you know what to get out
1:05:42of the closet
1:05:43it's i think that is a huge thing and i do not know how the
1:05:48asian
1:05:49is going to follow that unless you have cameras in all parts of your house
1:05:55how visions gonna look around the corner because the mexican here around the corner
1:06:00and so i think it there are still some other chipping points
1:06:05e for the user and the user side sentences
1:06:09g i can use this and i'm gonna buy this has two hundred and some
1:06:13people approximately have done with
1:06:15i do so far
1:06:17so what do you think is checking points are
1:06:20but you don't sound maxine who or what
1:06:25i didn't get the vision that
1:06:27i actual computer vision all your vision you have used
1:06:44i think that when the expectation goes wrong it doing the you think that i
1:06:49know it can do to me just assuming that it can do whatever i say
1:06:52that will be a huge tipping point
1:06:55i mean also i didn't what we see in china is a the you know
1:06:59people looking for combining more than the you actual task or the acquire is so
1:07:04i mean i combined to give social be like a you like of all of
1:07:10these all than the lunch each accent but also we have a response you we
1:07:14idea of your require is and you can just it's of the their own goals
1:07:19very smoothly you can just i tried to be the exact what are what you
1:07:22take it as a front no you know applied or whatever so then even by
1:07:29the we i we try to combine these chitchat r is the task oriented are
1:07:33also make the
1:07:35were choice a stand and the then we see missus the over significant can the
1:07:40part of the you know people are looking for you know the chitchat is that
1:07:45of the already completing the task
1:07:47so that's i think that actually owns or that you level what's attracting people is
1:07:52that the human like coref the of this device and also you know how to
1:07:56solve the you know the this also offline the use of you the system but
1:08:00it also brings problems comes up to
1:08:05i think is not the technical secrete item also building the open-domain chitchat the way
1:08:09we mind you know conversations these social media as from the united and we got
1:08:15like are you know beat is of human conversations and redesign you know of features
1:08:21with there will be done of features to
1:08:25you know to score data is to see how a always replace it with we
1:08:29require at least for the most suitable replies map almost seems
1:08:35most suitable replies and run them useful one of them to reply but the problem
1:08:40is
1:08:41well you do this the u is really difficult to control what this it some
1:08:45kind of sight
1:08:46it's the user in the most sometime it was they you know progress things but
1:08:50the eventually it may say something bad or something you know you will you are
1:08:55not extracted it's will say so that's a currently so you are useful to be
1:09:00solved the i mean my point of view and the u
1:09:06i mean how these and you know what we see that you know that generate
1:09:09generative systems like laurent and base the a conversations it could provide a solution in
1:09:15some qualities you know it's a
1:09:18reach way to model yourself the already spike to a reference so
1:09:22i don't know yet i mean that's to
1:09:24we are exploring that direction
1:09:26well i think it's a mistake the maxine just to tie to think about things
1:09:31and associate associating alexi with a the thing that lame puts on account so
1:09:41all you need is a microphone you need the channel and what you once is
1:09:45that same voice
1:09:47that with the same knowledge about you to be accessible in as many different contexts
1:09:52as possible so when you get a new car and you have the same quest
1:09:57you know you ask questions you want so that access the same system when you're
1:10:02in the home wherever you are where the using your file in your watch
1:10:06you talking to a loudspeaker you're talking to television it wants to go through to
1:10:11the sign plus new knows assigned things
1:10:15in the same so that you don't have the land different protocols different
1:10:19you just want the same interface now
1:10:23or still not sure the by vision you mind cameras but you know in some
1:10:27circumstances there will be more inputs and the that is a big thing that's not
1:10:31really been done very well so far as integrating gestures what you can see around
1:10:37you into a into these systems
1:10:42but primarily
1:10:44the that they personal assistant is detached from hardware
1:10:49it is just that you know it's maybe in the cloud this may be running
1:10:54on your personal ecosystem
1:10:57but it's yours and it belongs to you and it's accessible at wherever you need
1:11:02to access it
1:11:04i mean that's one way this could go but it seems kind of automated think
1:11:08of this embodied agents that are with me at all times but that's very different
1:11:12than my of lived experience now write like when i'm at home
1:11:16i interact with people who know me but no many different way in a different
1:11:19context network from the on travel right
1:11:22i
1:11:23i'm not i'm not sure i'm not sure what people want but it is not
1:11:27clear to me that the same agent everywhere
1:11:32but so much power more powerful if it is the same age the same that
1:11:36was the same thing
1:11:38so the lower than maybe not everybody knows but i'd like a little on the
1:11:43for like that
1:11:46also available via we
1:11:48so would like you have any of the a and a also like the on
1:11:53the cycle of
1:11:56well
1:11:57what i did what the system you want a lexical
1:12:01and hear the topic
1:12:02and i can see the shopping the
1:12:05i get a one problem the other five
1:12:08and i could also ask for the same dropping the support of what so it
1:12:13it's the same the information
1:12:15so in that sense if it would be my car or somewhere else what
1:12:19got the same propping this work
1:12:22the sofa following up on those last two points
1:12:28i think that this issue of personal assistant and what are really means so is
1:12:32it is it something that only
1:12:35knows about you when you only know about that and it doesn't interact with other
1:12:38people at all
1:12:40or is it more of something that is an assistant for you in a social
1:12:45interaction of we think about human assistance for executive assistant and so on yes their
1:12:49report to one person maybe as a as you know
1:12:52a forty but they have to interact with a lot of people and the issue
1:12:55about your while shopping list and whether you should have access to that
1:12:58i think that is both the big scientific rather than engineering question and that
1:13:04as far as we come really is just say we can have
1:13:08walks in prevent certain people from accessing devices or certain functions but i don't think
1:13:13we've got more sophisticated
1:13:15in terms of saying how they would interact differently except that everybody has their own
1:13:19personalisation
1:13:21and you know i'm may wan other people to access some of my information from
1:13:25my personal system but not everything and how should that works very curious about what's
1:13:30in
1:13:30alexi now for managing groups even how do you
1:13:34how do you deal with people fighting about which music that i want to
1:13:38but what kind of answer our people bring about in terms of multiple users in
1:13:44were interacting with multiple people
1:13:46and
1:13:47one or more assistance
1:13:50the with respect to multiple uses the device is assigned to the older but obviously
1:13:57it is a fairly do what so it's in the living everybody can be
1:14:03or it my wife wants to put something on the shopping that i follow
1:14:09sure that of what it disrupting that because everybody can
1:14:13excessive in the family so it's like the whiteboard right so
1:14:19but are one of your a virtual want to respond plus the right so to
1:14:25speak
1:14:27so one or one point development but it works system was true words presumed
1:14:35but system should
1:14:38not respond to complete remote room impulse hmms
1:14:42random no marketers store parking number system how to understand
1:14:48but the observations the ones used or talking participants after a while useful realtor machine
1:14:55that syllables
1:14:57for stuff are just so do this actually works a lot better
1:15:04so that maybe we're going through this transitional true people have been not require a
1:15:11very
1:15:12proper way to address this time
1:15:16once the models as culturally establish some researchers who grew where
1:15:21personal monocle with a rich close to the room response would really precludes ago
1:15:27room and removed from you
1:15:29remember system at a machines were introduced
1:15:33the room but remember to o
1:15:37behind one person or something wrong understand talking to the machine trying to read through
1:15:43the correct then
1:15:47this is not gonna happen to the
1:15:49the cultural norms what we do this leads to each other we watched
1:15:56something simple true but you know to some prior to the throat a double point
1:16:02one from which rooms
1:16:05basically what is the remote to do
1:16:07i think it's more dimension room specific query
1:16:12the some form of words has to do acquisition of norwegian structure
1:16:19this is because we were able preschooler we do with the room with the parlance
1:16:24pixel value some criminals
1:16:28the reason we will use the language going to be homesick how exactly those with
1:16:33map onto a actions the three but the main the bark number four through four
1:16:42that's a little work on them
1:16:45remote for worst possible sorts porcelain rooms room
1:16:53would be most of these machines could produce the problem the room release brute
1:17:00will come
1:17:03work
1:17:04and remove you have any thoughts i agree okay i
1:17:13for syllables but the numbers knowledge so removing so
1:17:19familiar google knowledge order to
1:17:22from these works
1:17:24but open remote chance to come from somewhere remote will have to remove solve a
1:17:30problem
1:17:31i can't being misquoted so to be clear i didn't say that we all we
1:17:36have engineering will follow these things i think what place it was we now we
1:17:43know the engineering can it's can build
1:17:48systems which of going to be
1:17:52it's a significantly more capable man they are today
1:17:57but doesn't mean to say
1:17:58a lot of the things that the been mentioned here will remain problems that need
1:18:03folding
1:18:04i'm just saying i think we can scale them with formal capable mail to they
1:18:09would just engineering
1:18:10they still will be able to do what you're us all human
1:18:15and just to get quite common i was gonna say to lend well maybe now
1:18:20you realise it doesn't recognise thanks you might just sliced all the time at are
1:18:25alike so
1:18:31well that might become sounds such that share your children probably won't erase your children
1:18:37will be figured out likely to just size l
1:18:51no i
1:18:54i
1:18:56i think you like it slots think it
1:18:59covering the cost of alright if he wants to say that way you should be
1:19:03able to do
1:19:04so
1:19:18i think lin is exactly right i think we only know the tip of the
1:19:23iceberg in terms of how to in integrate pragmatics with all this wonderful technology which
1:19:28i agree it's amazing and wonderful and every time i you see or even though
1:19:32it doesn't do very much for me it still amazing to me if i remember
1:19:35way back when i just wasn't possible
1:19:37but it really is all about both sides i think are mentioning really interesting things
1:19:42about identity and ideas design partner specific processing and that from me
1:19:46it's what we only know the chip the iceberg about and so
1:19:49a fact that a bit more tomorrow but for instance you know my story is
1:19:53i have all of these devices in my hotel room and somehow i but series
1:19:56by mistake
1:19:57and units and might have my own in male and female voice started saying in
1:20:01almost units and that there is that the right for one and then you know
1:20:04how many whatever and so you know really their this notion of having the same
1:20:08character everywhere we deal is an interesting idea and you are trying to go for
1:20:13a coherent identity
1:20:14or rollie something that it's still a real problem we don't know how to control
1:20:17in context like all kinds of things we can go wrong just is trained and
1:20:21that i mean you have certain expectations of a partner
1:20:24you know thirty years ago and chai demanding brenda laurel how to handle
1:20:28that i was sitting in the middle them on and they were fighting
1:20:31i get fourth about
1:20:32agents everywhere or agents are evil and immoral okay and so that was the point
1:20:37if you thirty years the goal is a little psychologist in the ongoing an empirical
1:20:41question it'll work some of the time it will work at a time
1:20:45and so that was what we find that thirty years ago there is no doesn't
1:20:48at least at this conference if no one voice and abilities to be alone was
1:20:52extension item in around to
1:20:54to throw water on our parade that
1:20:57you know they're probably people out there that way then sell
1:21:00i think it's really important to think about the social things and you're right in
1:21:04terms of restaurant things in
1:21:06certain little functions i can do with mind that
1:21:09you know the big picture the annotation is wonderful but we're so far from here
1:21:20then we talked a lot about children
1:21:23and you know the how children want to interact with systems and course children will
1:21:28adapt
1:21:30to the language like we're talking about the figure out that well you don't really
1:21:33just in the way
1:21:35what i was used as motivation
1:21:37for a vision and dialogue systems is my father
1:21:42who's currently years old and
1:21:45back in when i was a new once in the late to about two thousand
1:21:48four two thousand five time frame i was really brought over or maybe little rebels
1:21:53part of our how many hope you system and i
1:21:55had my father use it in a completely destroyed it you know
1:21:59this is because he doesn't he's not gonna adapt to the text apology ready he's
1:22:04and it will be said you know this is kind of nice you know exactly
1:22:08what the do when children systems are really useful really useful was back in the
1:22:16forties
1:22:18as a what you mean because like unigram some of the coda
1:22:21and the when i with the we had a problem
1:22:25let's say than refrigerators brain you know making a sum is not pick up the
1:22:30phone
1:22:31and louise would pick with answer and choose the telephone operator she everybody in town
1:22:37little town
1:22:38it's a lilies you know why refrigerators making this clicking so you know what he
1:22:43would you which i do infeasible kind of rigidity about the project here we know
1:22:48bob he'd he services for tutors and let me let me just connect you
1:22:55bob's he's always over the joes diner at those times let me i mean can
1:22:58tell whether i and somebody might then still am is there is two thousand four
1:23:03two thousand five and
1:23:04and i thought you know that there's wisdom there which is that
1:23:08we shouldn't have the have people adapt to the technology we so is no we're
1:23:13gonna go build we use and you know that was the there was a motivation
1:23:17for louise and then which became cortana but
1:23:20the this kind of using technology to meet humans where they want to be naturally
1:23:26that forcing humans to that
1:23:34it's funny because right here is to thread simultaneously and had ever since that tension
1:23:40or you can have a nice debate which is we want to make technology more
1:23:46human like in each and he said
1:23:49and we want to make people talk like the machines "'cause" that'll be easier for
1:23:53the machines to understand and we can have to decide if we're going to make
1:23:58machines more human like
1:24:01we can we as in years till the humans change or we can integrate pragmatics
1:24:08and other aspects of natural human conversation in to what we teacher machines
1:24:27like my questions related to this i don't know very much better and size and
1:24:31one transmission mentioned many times that right can i so having emotions and high as
1:24:38it is an important part of that as an assistant will a dialogue system and
1:24:43so i'm just wondering a what is the kind states
1:24:46of interacting be interaction between researching has no assistance and active competing in some point
1:24:55in terms of recognizing emotion from the user and from you know prosody of this
1:25:00each and other aspects and also in terms of generation
1:25:05of utterances which contain motions
1:25:10and ask questions so we actually or
1:25:14no i don't number in that's really we actually not
1:25:19research all models i we i mean at this stage at the start off button
1:25:23will allow them from the email so for like you motion
1:25:30emotion recognition all you motion generation actually be we actually use you a sheep the
1:25:35lottery
1:25:37good performance comes fortunately in there are task if you want to recognise it anymore
1:25:42so all you want to you know reply i was because it mostly you don't
1:25:46actually those that for every
1:25:49replies
1:25:50so you just keep your procedure you use a battery or
1:25:53i mean you will double recall you just keep your is here tonight in
1:25:57by doing that in that way we can achieve likely you know or ninety five
1:26:03percent accuracy or something like that and we also learn allowed from the research community
1:26:09like doing the generative model for a chat board
1:26:16we actually a truly in a channel walter using sound you know