0:00:29thank you for all state all first and late
0:00:32an apology set
0:00:34alan black wouldn't dryness
0:00:37i'm phil collins from an f university
0:00:40that you all introduce yourselves
0:00:43everyone i'm become not item
0:00:45and you can communicate comedy what my last name
0:00:50and i work at
0:00:52educational testing service research and development
0:00:55where i work on
0:00:57multimodal dialogue systems for language learning and assess
0:01:02i and i said you try we from now google ai a working connotation ai
0:01:06but also
0:01:08a multimodal stuff but
0:01:10vision about four and also in
0:01:12efficient machine learning basic out you do
0:01:15conditioning on like computer memory constraint
0:01:19covers can so i am professor here at the age
0:01:26but also co founder and chief scientist at or above x
0:01:30spinoff company
0:01:32it's h
0:01:33developing social rubbled
0:01:35that for
0:01:37alright so i proposed a variety of
0:01:41what i hope for
0:01:44questions that would cause people to start thinking both about the field and also about
0:01:50their own research
0:01:51and trying to understand where this field it's going
0:01:57can i make the text a little bit bigger right then it can read everything
0:02:00but i can do that
0:02:03about that
0:02:05the back
0:02:10well do that
0:02:13the thought was
0:02:16i hope will get to talk about all these because they're all interesting topics
0:02:22the whole idea is to put everybody on the spot
0:02:25in one sense
0:02:27understand what it is we're doing here why we doing what we're doing
0:02:32are we working on
0:02:36the problem speak the problems that were working on simply "'cause" there's a corpus there
0:02:42it's easy to work on a corpus that exists rather than either create your for
0:02:47actually work on the hard problems rather than the problems that exist in this car
0:02:53so the question is are working on the right problems that's the first question
0:02:59will also want to talk about multimodal multiparty dialogues i wanna push the conversation into
0:03:06somewhat more open space
0:03:10very few people there are few people here in the room with thought about that
0:03:13but not a lot of people
0:03:17where they're our
0:03:19architectures that we're building which tend to be type you know i do they are
0:03:23pipelined or they're not pipeline then you know you should talk about
0:03:27why it is we wanna do each of those
0:03:32the next topic is why do i have to learn to talk all over again
0:03:36why don't like be able to just have account you know why can't conversation speech
0:03:40act and what not be something that domain independent that's related to the pipe one
0:03:47the explain ability question has to do with well g d p r is an
0:03:51interesting issue here
0:03:53but if is to dialogue system
0:03:56why did you say that
0:03:58i like to get a reasonable answer out
0:04:01so how do we get there and the last you know a very important problem
0:04:07what are the important problems what would you tell your graduate students of the most
0:04:11important like to work on next
0:04:14okay and the last question is
0:04:17okay think about
0:04:19the negative side of everything we're doing
0:04:22can you are technology or my technology their technologies be used for yellow for bad
0:04:29interactions for robot calls that are interactive now
0:04:33so lots of topics to talk about
0:04:37we can kind of start with the first one
0:04:39and then also down it shut up
0:04:43so i imagine that a lot of work here on slot filling systems
0:04:47so you ask your sis your system asks you what time you want me
0:04:51and use at earliest time available
0:04:55or you say what's the earliest time available when the system says six p m
0:04:59and you say too early
0:05:02so the system says seventy and so you say okay
0:05:05notice the user didn't fill the slot the two of them together fill the slot
0:05:10that's mixed-initiative collaboration et cetera there's lots of issues rather having to do with collaboration
0:05:18are we only working on slot filling because the corpus is there
0:05:24short would like to say
0:05:29we do i guess everybody can be comfortable by some attacks
0:05:32therefore nobody it i think it can keep it in track recorders
0:05:38played the lead so show answers
0:05:42just the dataset and metrics adding more than the dataset it's easy to evaluate and
0:05:46for sure systems accuracy have on this one metric we're because we know the actual
0:05:50values the true values and the precision recall single
0:05:54but i also think that
0:05:56it cannot be a slot filling system or the other extreme you know you go
0:06:01all the way the logic and say it has to be a fully constrained the
0:06:04system i think it has to be something in between and we have to be
0:06:07flexible to adapt to it could go from a slotfilling to actually being understand okay
0:06:12what slot
0:06:13attributes or values can be actually changed morphed into something you know that maybe that
0:06:18depending on some constraint for example temporal constraints right so the downside to going completely
0:06:24constraint is there's no way we can you ever program all that logic
0:06:28or even for the fact that like the system if you allow an automatically learn
0:06:32system to you know in for that from corpus there's so many different possible ways
0:06:37to infer that like i mean you're talking about this example like if you say
0:06:40only i mean how many earliest time should i give you like seven p n
0:06:46six fifty nine six fifty eight six fifty eight and sixteen learning work on something
0:06:51like well i it doesn't necessarily right so which is why selects it has to
0:06:56be something in between where you can
0:06:59program and then it's okay to actually get some of these you know
0:07:02heuristics or something where we say that okay
0:07:04i'm looking at thirty second blocks are one minute blocks of thirty minute blocks
0:07:08and then can be actually gradually x
0:07:10you know sort of extent that are open it up to learning something more nuanced
0:07:17i guess it depends on
0:07:19what you want to do so if you want of restraint system poses an intelligent
0:07:24nothing is really good coming up with belting systems you just give it a bunch
0:07:27of dayton
0:07:29you clean it really well but intelligence is something it so
0:07:34i think that this is not a knock on any of these two things because
0:07:37in some cases be do want between systems via be happy but between systems and
0:07:41that's what we wanna look
0:07:43but in other cases we might want without it
0:07:47not that be really close to that but this you want to get
0:07:50to something more which respects some kind of planning some kind of higher abstraction so
0:07:58if you wanna go that route but it really depends on what we're talking about
0:08:01just to build on
0:08:04so i think this of course related to the corpora that are all there but
0:08:09also like
0:08:10what are the practical systems that people are building which are often these kind of
0:08:15searching for a restaurant or something when you have the slots but
0:08:19so i think i think it would be interesting to open up and look
0:08:25completely different types of dialogue domains so i can give one track where their actual
0:08:30are practical problem second when you want example so far as we are developing an
0:08:36application with the robot performs job interviews
0:08:39and the robot might ask the user so tell me about
0:08:46a previous work you have already got the challenge that we manage to solve
0:08:51so the answer to that question is not very well with a set of slots
0:08:56that's you more it's quite hard but it is to come up what does that
0:08:59slot structure look like so that kind all and then you that will also be
0:09:06needed when we so open up to more application of the response we have now
0:09:10i think would be very interesting to address is also perhaps not very see
0:09:15to translate that to logic form a lower where an sql quick we're or something
0:09:21there's something else that is needed there's some kind of narrative that is coming from
0:09:25the user that you need to represent them that's what i one
0:09:30so definitely would be interesting to try to but for doing that you have to
0:09:35consider other domains i think
0:09:40what did you think about the
0:09:43the first talk this morning relative to
0:09:46semantic parsing verses slot filling
0:09:51that it was very interesting talk but it's more it's obviously if you have that
0:09:58kind of queries you need more complex semantic representations and so on
0:10:05we have different queries is a common way by a given the corpora we've collected
0:10:11you know what random because the corpora doesn't exist because we define it that way
0:10:16you know you actually go travel at you have a conversation with a travel
0:10:21and one would find perhaps of might be a little bit more
0:10:24open ended in the way you
0:10:28but it's like
0:10:30it still perhaps the user at querying something getting some information on all the system
0:10:36we sometimes as the other way around the estimates asking the user absolutely with without
0:10:41sources so well
0:10:42in fact the original
0:10:43task-oriented dialog
0:10:46with barbara rose his phd thesis in nineteen seventy four all the structure of task
0:10:50oriented dialogue where the other way around the system is telling the user and you
0:10:54we're trying to get the user to do something which of course are plenty of
0:11:00unlike arctic you had are added
0:11:02change a tire
0:11:04i just of the one more think that when we talk about this intelligence quite
0:11:08often we sort of completely think that that's this one inflection point instantly the machines
0:11:13are gonna learn how to reason and like you know understand everything i think one
0:11:18sort of nugget i want to mention is that
0:11:21whatever form logical form or anything else that we're gonna use being the important part
0:11:25is to see you mentioned collaborative right is the on language understandable by the system
0:11:30may not even generate like proper stuff right but is it understandable by the human
0:11:33on the other side read and allow them to you know get to the you
0:11:37know a better state and towards that and i think like
0:11:41we're not going to see like you know one system trained on travel domain subtly
0:11:45doing something
0:11:46amazing in a completely different domain but i think we should start paying attention to
0:11:50these because everything is machine learning the user's how well it systems doing and multiple
0:11:55domains right i mean start like generalising
0:11:57and think about the generalizability aspect when you're proposing models as well and also abstract
0:12:01location so that it to than the third in the fourth question
0:12:06to my whining i don't the head okay i that's okay well
0:12:14i don't sick
0:12:19so there's a lot of obviously in the intended trained systems
0:12:25where training dialogue system in addition to the language processing
0:12:29and some of the slot filling systems we're doing exactly the same thing
0:12:33which means you're dialog engine is
0:12:36is basically start with that domain
0:12:39and now you're gonna get a whole bunch a new kinds of domains and certainly
0:12:44my dialogue system doesn't how to talk anymore
0:12:47i don't know how to perform a request to understand the requested maybe there are
0:12:51two kinds of speech act
0:12:52that are coming in
0:12:55we saw this morning as a lot you know in the semantic parsing they're trying
0:12:58to deal with that huge amount of she mentioned
0:13:02as mere element and is a lot of variability in a language
0:13:05but i submit is much less variability
0:13:08in what happens to people's goals in the course of a
0:13:12in general you tend to achieve them you achieve the you fail you try again
0:13:18you augment what you're trying to do you replace what you're trying to do et
0:13:21cetera because actually i my suspicion is it's a relatively small state machine
0:13:27why seven both of those together what can i figure out one through machine really
0:13:31one or any other method
0:13:33and then deal with the all the variability in the language in a pipelined fashion
0:13:42versus train it all at once
0:13:45please i guess i mean the
0:13:48i agree i mean it's something reasonable to separate these things like this
0:13:52the motivation for parameter and learning is that you wouldn't have to have any knowledge
0:13:56about this
0:13:59representations in between so gonna have to have a lot of data so that the
0:14:02data but you don't need to know so much so i don't have a lot
0:14:05of data happen is that
0:14:07no that's the problem
0:14:09i mean go one thing for go with the rest for the rest
0:14:13in the standard as counteract so i think there is i mean that to that
0:14:19of end-to-end learning systems rate i mean they're end-to-end learning system but we say that
0:14:22all these components which are not pipelined fashion we can just gonna get rid of
0:14:26all of them and they can and the input and the final output
0:14:30in some settings i mean i would argue that you might actually have more data
0:14:33for that then the individual components right like for example speech-to-text a right then you
0:14:40all these fanatics annotations an intermediate you know annotations at all different levels in the
0:14:45system might actually have just the speech signal and the you know they're transcribed text
0:14:48or some response
0:14:50that might actually be easier to obtain and indoor settings i would say the into
0:14:55an systems at least
0:14:57given enough amount of data have actually in recent years provements and this is not
0:15:01just be planning i mean as the technology walter gonna see improvement in that like
0:15:06the recognition error goes down now the question is when do not do you don't
0:15:11have to do end-to-end learning in every scenario raymond there is also like okay you
0:15:15know i
0:15:16every into and learning system is not going to solve the error propagation problem right
0:15:20and then you might actually creating more issues because no you don't know how to
0:15:23debug the system there too many hyper parameters and like you have to deal but
0:15:27that that's actually a worse problems in some settings then actually you know just fine
0:15:31data just do the input and output annotations so i think it depends on the
0:15:36use case like
0:15:37if you have to prove the system or if their individual parts of the system
0:15:42that you need to actually sort of transfer over to a different domain or for
0:15:45other systems where you need that output not just like the last but by like
0:15:50something intermediate like for example
0:15:52it can be argued syntax is not necessary for every not task or domain using
0:15:57howling when the last time you actually so part-of-speech tagging paper in the recent years
0:16:01or even a parsing paper for that matter if you see the number of a
0:16:05percentage of a present is yellow re mlp or not collide means going down dramatically
0:16:10but doesn't mean that that's important to not important ready made exactly important depends on
0:16:14what you trying to do with that pretty using the dependency parses to do something
0:16:18in me do some reasoning over the structure substructures it is useful to generate a
0:16:23doesn't know what it on the other hand that's just a precursor to peer into
0:16:27an anti r machine translation system
0:16:30it's arguable that that's not necessary
0:16:33for the matrix that we're talking about parameter got automated metrics
0:16:36again that does not mean you're gonna solve that we have to solve those problems
0:16:39are used i can take models
0:16:41any depends so well on what you're trying to use a system for
0:16:47in some sense it's kind of a balanced rate so
0:16:50typically for example but we are kind of
0:16:53so this to take a specific example of what we're doing
0:16:56however when we're trying to bill so
0:16:58really building language learning module the building specific goal-oriented systems task-oriented systems a specific skills
0:17:04this thing see
0:17:06fluency of pronunciation or grammar or specific aspects of ground so
0:17:10so how do you go about and this is the whole so how question but
0:17:14you raised earlier which is about
0:17:15you know how do i build these generalisable systems are how to a kind of
0:17:19you know
0:17:20use the same pipeline across these different
0:17:24ceiling is similar tasks but there
0:17:26probing each probing different things
0:17:28so you start out with something perhaps which is because it's a limited domain you
0:17:33don't have much data anyway
0:17:35i have started more expert knowledge
0:17:37and then start collecting data
0:17:41to wizard-of-oz or some kind of outsourcing with some of the matter
0:17:45and ultimately get more data that you can kind of build a more hybrid kind
0:17:49of system
0:17:50which could either be end-to-end but also be informed by
0:17:55not that one so
0:17:58that's one way to
0:18:00i guess what what's problem kind of look at
0:18:03different points along this hybridization spectrum a combination of data to one another driven approaches
0:18:08have implications for how your pipelining a system in training the forces
0:18:13well i certainly don't agree
0:18:17while you guys
0:18:19but you know some of the techniques
0:18:22for instance are not gonna be particularly
0:18:27appropriate for certain types of tasks
0:18:29so for instance i think attending to a knowledge base forces
0:18:33computing actual complex query those two things can actually be very different
0:18:41frontal use a probability comparative and things like that
0:18:43it's not obvious to mean attention might solve
0:18:47i guess that's related to the first question that you will probably addressed the kind
0:18:52of dialogues that you can still with this
0:18:56method and the other ones you will not address
0:18:59so that's score so that the risk of
0:19:03where this research is going as we just keep drilling into the problems that we
0:19:08started with in we and not expanding or to go
0:19:13talking about expanding this goal
0:19:15i want to talk about
0:19:17or have you guys talk about multimodal dialog so i've got
0:19:22not just
0:19:23the speech but i but other modalities and their coordinated in interesting ways
0:19:29and about multiparty dot
0:19:32which guys
0:19:35any of your favourite speakers and stick it in a family stick an indoor environment
0:19:40family not have a conversation with your family
0:19:43and that device
0:19:44and it can track conversation amongst multiple people what time you want to be a
0:19:49merry want to be the month at three o'clock mary's is no i don't
0:19:53okay so what the system into
0:19:56what's representing as to what happened in that i'll we do that men
0:20:03do we have any representation of cool what's the belief state
0:20:07that we've seen in all these the
0:20:10all these papers is there any notion of believe actually going on
0:20:18the idea i mean there's a huge amount of thing to break open once you
0:20:21start what within the multi party set and just there's the physical situation had actually
0:20:26having a robot or gonna look at lex's physically situated it got a camera on
0:20:32and i'm sure they have that right and it's
0:20:37and it's can see what's going on in the room you can see who's talking
0:20:40it was talking to consider you know if you allow
0:20:44what do you to track out of all of that house is gonna actually helpful
0:20:47family rather than just
0:20:49and individual bunch of individual conversations
0:20:53this is a whole rate better bigger space what we've been dealing with how we're
0:20:58gonna go
0:20:59well really worry about the multimodal multi party
0:21:03adaptation so this is still very the this is the kind of dialogue that we
0:21:08are trying to model with for a for example what you have multiple people on
0:21:12one problem there is as you say sort of the
0:21:19the belief states or sort of typically you think about a bit that's what does
0:21:24the user
0:21:26up to this point or what have agreed to this point but if you have
0:21:29to people the might be of course to different states
0:21:33so if the two people are ordering and one say
0:21:37i would like a bird around the other once s like me to
0:21:41but not with onions or something referring to that and you have to keep track
0:21:45of course of what the two different person someone that sometimes of dialogue
0:21:49it's also
0:21:51you can't just are presented as individual adults it's common like we want to do
0:21:55this we would like to do exactly so that maybe you should have like three
0:21:59different representation one is what we want and one is
0:22:02i one on the other one
0:22:05the goal is to come to a consensus but this is i mean it's are
0:22:08watering things you could have different things and so long so it could be a
0:22:12mix of course
0:22:14and that thing that you can refer to what the other person is saying
0:22:18but also of course is to say if the two people are talking to each
0:22:21other to what extent the system listening to that which is probably has to form
0:22:27a part of real data part of the we
0:22:30right if it's part of it's all of us together are trying to solve this
0:22:35what we're gonna happen what we're gonna order in more when we're gonna go out
0:22:39for whatever
0:22:41we then the system has to be part of this collect
0:22:47and you have to have what we used to call in today's joint intention
0:22:51we're trying to do together
0:22:53but how we're how would you guys think about
0:22:57this problem
0:22:58a multi-user problem i guess the other thing to add to the mixes the multi
0:23:04modality of things right so absolute so for instance
0:23:09when you have audio video
0:23:11which one be within two first and how do you how do you to choose
0:23:16and of course is unknown situation that something
0:23:20it's is just
0:23:22just missus usually is i
0:23:24so and this also what we found is that the so
0:23:28maybe looking largely the education context for this kind of thing the teacher training or
0:23:31something that you looking at
0:23:33for instance a person interacting with you know
0:23:37a teacher interacting with this
0:23:39you know able to a class of student outcomes
0:23:44you know if the teacher dismisses one student how are you know you know
0:23:48is the student or is one of the students to
0:23:52so suppose they say for instance you like a low the in great but i'm
0:23:55pointing in that direction so who does the system you know attend to work as
0:24:01it into my speech is it into my just to
0:24:04and this is always that kind of
0:24:07or buckets may or but
0:24:12try to positive spin to that i think we are at this stage we can
0:24:17do belief tracking for sure that it is not at the level at be wanted
0:24:21to generate cannot but i believe we have developed system are very close to
0:24:27the technology that the point where we can actually do joint inference or video audio
0:24:32and textual signals where we can actually disentangle you know between different entities all you
0:24:40know corresponding at the same time and we can do the set scale
0:24:44you could do that but then how do you
0:24:46relatively prior knowledge of the simulated user the second point where i mean i'll give
0:24:51you a different scenario like that so we do this
0:24:54imagine it's not just like you know collaborative but we are i you know you
0:24:58can actually attribute that to a specific entity what if it's a parent and child
0:25:02mel whose preference you take into account the channels as a play the cartoon network
0:25:07and look for twenty four hours right for example women alexi do that store who
0:25:11will do this obviously there's a preference here like in the parents have to sort
0:25:16of winter
0:25:17the very tricky situation and it might not be as easy as like that in
0:25:21some sort of a general-purpose model that says you know these are the entities and
0:25:25like there's one model for k there are two people interacting and they have a
0:25:28joint intend to write it might be customisable powerhouse over or you know set of
0:25:33people and these might all vary across different sets of people at put together
0:25:37and the relationships between them as well so all these things have to be factored
0:25:41in right i'm into at the challenging mixer problems
0:25:46simple thing is we don't have to line everything right i mean like one suppose
0:25:50everybody things like machine learning we have to relearn everything you can just ask the
0:25:54user for preference for a time you could just a person thank you are people
0:25:58tell me what's your preference or just manually enter it like in an a or
0:26:02whatever it is right i mean that's is that just one bit is enough to
0:26:06sort of bootstrap the system or at least locking bunch of variables right which you
0:26:11know would have cost a lot of confusion downstream
0:26:15there's still hope i mean there it's
0:26:18have to be this interactive mode not this system observing a bunch of things and
0:26:22learning and then like certainly starting to do the writing of a point in time
0:26:28alright i'll move
0:26:30we finish what time
0:26:33six about
0:26:34and we
0:26:35okay and i think we wanna have a fixed
0:26:38so giving an audience participation
0:26:40so i will try to move along with some of the other
0:26:50but the next one
0:26:52that i had in mind was explained ability
0:26:55okay so we have always lovely machine learning systems
0:26:59you ask any of them why did you say that what do you get
0:27:07now the system could make up
0:27:10white said that but you actually want white set it to be causally connected to
0:27:14what it actually
0:27:17so what
0:27:19kind of architectures can you imagine
0:27:22that will gain hours
0:27:24explain ability
0:27:27in the general case
0:27:34whom like this
0:27:38i mean
0:27:39first the question is do you as a user really need to be able to
0:27:42ask that i mean are us to use are interested in what the system i
0:27:46did you recommend that i think it is a dialog assign a definitely want to
0:27:50know it's but then the question is do you have to get the answer to
0:27:52talk about restaurant we wanted me to go to
0:27:55you give me recommendations s a y okay
0:27:58so in that case like this
0:28:01i didn't you suggest that
0:28:03and i think that this not of course if it's if it's learn julie
0:28:14i between a and especially then you have to build a dialogue
0:28:17around that so whatever you where you're building your dialogue you have to train a
0:28:22dialogue on explaining
0:28:28there you might not have that data
0:28:31well that part of the point is
0:28:35i just it's just offer a counterpoint to get your really are so for instance
0:28:39in education this is really important so you if i'm and this is true for
0:28:45had this but mental health and any other found that so if i and perhaps
0:28:50radix as well
0:28:51so if i you know telling operation that you know what you have depression but
0:28:57seventy five percent probability you probably want to them what is what
0:29:00they probably want to know why or why you can plug conclusion
0:29:04are the same thing with the but someone what you're saying all you know what
0:29:07you're this your fluency score is nine out of ten or
0:29:11four out of ten by is it for i work and what we need to
0:29:15so in those kinds of case is really important having said that i think there
0:29:20is an increasing body of work in the em in literature especially for those interested
0:29:25in end-to-end models
0:29:26to and
0:29:28you know similar deep learning models really look at interpretability using a variety of techniques
0:29:33and i think it is that has been relatively unexplored in the dialogue community but
0:29:38i think we should really
0:29:40this is one of those things i would really at two i think one of
0:29:44those questions a little bit is what would you ask your graduate students or next-generation
0:29:49exactly one and interpretability but there are several techniques so the techniques that
0:29:55try to probe deep neural networks and trying to figure out what inputs are the
0:29:59most salient that you know lead to classification
0:30:03the techniques that look at
0:30:05visualizing neurons the techniques that look at visualising memory units
0:30:11and all the way up to so this is in terms of model interpretability but
0:30:14but even in terms of feature interpretability but you believe that will actually get chewed
0:30:18up to a comprehensible
0:30:21explanation to an actual in user
0:30:23not have them but so you wanna say something
0:30:27just gonna say that my point is gonna be about
0:30:30just because we say that a network is explainable doesn't mean i mean depends on
0:30:35you know who is looking at it right i mean if it says okay activation
0:30:38number for three sixes firing and that's causing like the positive class to go up
0:30:42by probability x right
0:30:45to the ml engineer scientist was actually think this model all great okay now go
0:30:49to fix it or you know like do something to but i think what probably
0:30:53more interesting it's lee at least for nlp and a lot would be like are
0:30:58there is some high-level abstractions or even you don't have to you know incomprehensible
0:31:02i sense that it can actually find in the let's eight knots alignments right where
0:31:06these sets of examples of like are basically leading to the same sets of outcome
0:31:11right i mean at higher level right so that higher level at time t right
0:31:15you could be of the phrase a level i could be at the semantic level
0:31:18but obviously a single higher i mean
0:31:21bending unexplainable system would then become as hard as actually generating before system itself right
0:31:26so then
0:31:28and so this is while i think the field has to go hand in hand
0:31:30but like you know the modeling work and also all the other work and applications
0:31:34well the vision community if you like has like advance for their in this respect
0:31:40and the lp community not just for probing networks and looking at activations in even
0:31:45learned approaches where you actually backprop to the network and
0:31:48look at regions and like you know sort of find like learn in online fashion
0:31:51which regions actually and what ceiling natural colours et cetera our triggering certain types of
0:31:56behaviours and sort of interpreting back from in an discrete fashion like it's a colour
0:32:01map or like in a certain types of object patterns around or you know like
0:32:05triangles et cetera
0:32:07i think we want to see more that nlp community getting the most interesting words
0:32:12that i've seen in the recent past like you know more of the probing type
0:32:15where you have these black box networks and the other methods are actually trying to
0:32:20providence you okay where they're gonna feel when are they gonna fit right and you
0:32:24be very surprised
0:32:26some of the state-of-the-art systems you just change one word in the input utterance and
0:32:29suddenly it'll flip the probability so there's a lot of women lineman other types of
0:32:33method which are looking at these things so i think explained ability and interpretability go
0:32:37kind of hand in hand
0:32:39for realizing consumer that you need to explain it
0:32:43it's not just
0:32:44probably nor on
0:32:48and so i think we actually need to come that's and groups and there are
0:32:52many people in a room we've worked on this problem
0:32:56in the past in its time i think that certainly
0:33:00in the learned systems need a figure out how they're gonna do this because
0:33:06it you don't the european can you will
0:33:13just the point i think the good news though is that i mean if you
0:33:16see the number of papers on this topic right you know over the last just
0:33:19two years i mean this is a very encouraging sign rate so it used to
0:33:23be like a who wants to actually talk about explains as i just built the
0:33:27system it does state-of-the-art you know like x y z
0:33:30and now i think for grad students i think it's a very interesting and very
0:33:34exciting field to be part of okay so that's the next question what's the most
0:33:37important thing people are to be working on the right
0:33:43i have my data
0:33:45you've got
0:33:47so i mean to start with i think it's very important that
0:33:50people work on different things so
0:33:54so we have a lot of different approaches but we can compare sum up everyone
0:33:59does similar things
0:34:02i also think sort of the
0:34:05in the intersection between dialogue
0:34:08speech and multimodality and so on because this arcane still separate feel so
0:34:14i mean if you look at
0:34:16this to google duplex demo for example that god's a lot of attention on people
0:34:22for that while this sounds really human like
0:34:25so if you look on a sum
0:34:28pragmatic level if you make a transcript out of that
0:34:31it's not the very sophisticated dialogue the model but the execution
0:34:36is great i we don't know if that was a sharp picked example but as
0:34:41it sounds at least it sounds fantastic so be able to actually execute the dialogue
0:34:49in a way that the has that kind of turn taking and that kind of
0:34:53conversational speech synthesis and so on
0:34:56using a model of the dialog a i think that something that is
0:35:01are explored in both the speech and the dialogue community
0:35:08explain ability is
0:35:09super important
0:35:11would say that
0:35:12i mean this sounds like there's so many factors associated or like multiple areas associated
0:35:16with this building more system so that we can make the system's less brutal the
0:35:22number of ways to achieve this rate and
0:35:25that's a very important topic and you can deduct a number of ways from the
0:35:28ml community from like in injecting more structured knowledge one of the things that all
0:35:33these things lead to in my been in is like
0:35:37not just for generation but all the other aspects of dialog really research problems
0:35:42what are the min viable sort of nuggets of knowledge that we have to encoding
0:35:47the rain or the system after encoders that it can learn to generate well i
0:35:51can then do recognise do the slots in turn spell it can be transferred to
0:35:55a new domain so
0:35:57is that like what is the equal and of a knowledge graph right i mean
0:36:00for like different dialogue systems i mean that we can actually sort of we can
0:36:03all agree on so i think if we come up with like some sort of
0:36:06a shared representation of that i mean which is interpretable to at least to some
0:36:09extent then i believe
0:36:12you know we can actually make even more for the progress right of course it's
0:36:15a hard problem right i mean and dialogue is like one of the hardest problems
0:36:19in and that's language as well so
0:36:21it's not just for looking up is what i'm talking about is like what are
0:36:25the things about like you know the channel well right i mean it doesn't have
0:36:29to cover hundred percent even like twenty percent of the knowledge can be encoded in
0:36:33the concept space and relationships between them such that i know this now for a
0:36:37new domain i might have to just
0:36:40get like access to very small amount of training data or like learn a little
0:36:43bit more do sort of market into existing concept or like sort of augmented by
0:36:47existing concept you know database
0:36:50i think that's
0:36:52a super interesting thing and this could be multimodal as well it's not just about
0:36:55like you know language it's about like
0:36:57what are the visual concepts i need to keep in mind right i mean the
0:36:59taxonomy of like objects relate to each other if i see a chair in forever
0:37:04table i mean i know you know what is the positional relevance between you know
0:37:07different things
0:37:08all these spatial coherence all these sort of thing freedom and so what are the
0:37:11mean mobile sets of relationships and you know concept that we need to one
0:37:16but better dialogues
0:37:21since gabriel and since you have already covered buns of things and say something complementary
0:37:25to that but add to this because i think these are really interesting problems and
0:37:28it was
0:37:30gonna at least my list anyway
0:37:33i just add that the
0:37:36working on low resource problems
0:37:38so for instance we already we always
0:37:41so this is in terms of languages domains
0:37:44and even you know the kinds of data sets that we kind of cv we
0:37:49didn't do or what train and this is been this is nothing new everyone where
0:37:52you're knows about this we all what we can do over trained on the restaurant
0:37:55data sets of the cambridge datasets a good reason of course because the publicly available
0:37:59but that's
0:38:00that's one thing but
0:38:03you know
0:38:04apart from plano get more data sets and that's obviously one of the things we
0:38:08want to do but
0:38:10you know can be look into how do we do minute that
0:38:13i don't this work already going on but perhaps more intense there's a lot of
0:38:17work on c one shot
0:38:19but trying to you know
0:38:21look at the better ways of adaptation better ways of working on new domains
0:38:28that with limited resources
0:38:30a given the existing resources perhaps using
0:38:33you know since but you know it begins by very techniques for machine translation or
0:38:38some other
0:38:40some of these other sister feels that
0:38:42you know we might not think of immediately but for instance
0:38:45this is starting to come up a lot more
0:38:47trying to use data which you know
0:38:51i kind of unconventional for dialogue what might be a useful for bootstrapping is kind
0:38:55of low resource settings
0:38:56that might be
0:38:58also something very interesting and useful to look at
0:39:01and especially for underserved domains so okay coming back to my to madison education
0:39:08these are not necessarily the climate is how may i help you or you know
0:39:12looking or those kinds of
0:39:16domains but i think there's to you know this is where you have a lot
0:39:19less data but still
0:39:21might be useful to kind of
0:39:24one thing we have very large loud structure maybe global don't it's block structure to
0:39:30the group
0:39:36and then
0:39:37that's all unique
0:39:40it's just the known structure and after that you already know how to have a
0:39:43cons you know what objects are you know with the actions are you know what
0:39:47the verbs or you know what they're preconditions and effects are why do you need
0:39:55but i mean dialogue constantly able some the well unreasonable has a file that is
0:40:00why don't why do we need any more than just
0:40:03a change and knowledge
0:40:07i don't need a big corpora "'cause" already learned head
0:40:11or in that got a huge vocabulary have that all these vectors
0:40:15one like just change the knowledge base
0:40:19then how because be to make it you know what's
0:40:21who needs universal just give me a alright i'm gonna do
0:40:25cancer diagnosis or i'm gonna do
0:40:29architecture where i'm gonna do whatever you know take arbitrary size
0:40:33i was just a great so for each of those domains you need that lack
0:40:36knowledge base and i
0:40:38i think i like that everybody may precision and that's what they're
0:40:44but even if the knowledge bases let's a huge and static reasoning over that is
0:40:49in keep changing rate i mean the same knowledge you might interpreted differently you know
0:40:55sometime later as it was would you doing right now it could be because our
0:40:59methods are not sophisticated enough or
0:41:01you know be basically some new information pops up i mean the fast a the
0:41:05same but you know the way you look at that changes over time right i
0:41:10and one give users about example for this but i think
0:41:14i don't think the problems are gonna go away anytime soon if anything the machine
0:41:19translation "'em" even the low resource setting
0:41:21this is existed for several decades right i mean i mean number of not make
0:41:25a similar to what he an unsupervised machine translation like now we use starting to
0:41:29see okay that more system actually scalable systems working this domain and it's i think
0:41:34that feels all and all the ml all a computer vision
0:41:38has this tendency to okay we focus on like the solvable immediate big crunch and
0:41:43problems and then you try to simplify are then like you know extent to the
0:41:47zero shot setting extent to you know or so sitting but it's not be starting
0:41:52from scratch all the stuff we learned about image method i mean convolutions are still
0:41:56them single useful most useful blocks that you're transferring over a and foreign language i
0:42:02would argue like over the last five years
0:42:04attention seems to be a common i get that seems to be trendy can have
0:42:07thousand variance of these networks but there's specific concept that even if transferred onto new
0:42:13problems right now you build models so
0:42:16hopefully these also would transfer you know as we start looking at you problems are
0:42:20extensions of
0:42:22well conceivably we should be thinking more about grand challenge problems but is going just
0:42:26usually a alexi challenge but
0:42:31larger ones you can get governments to support
0:42:34but you know governments now we're gonna start asking us there's last quest
0:42:39which is
0:42:42so you built this wonderful technology
0:42:46and now i'm getting phone calls the user interactive phone call that are trying to
0:42:51get me to do stuff
0:42:53either by stuff
0:42:55or in the worst case commit suicide or you know a variety of activities
0:43:01and these are by doing this
0:43:03and they understand language pretty well
0:43:06and they are
0:43:09there enough to cause some people to be convinced
0:43:13that they're dealing with the a person
0:43:17and even as far back as the a light there were people are convinced about
0:43:23the human this of that but these are you know who knows and letting these
0:43:28things lows
0:43:30how do we start that and ask
0:43:32you know we've seen that we see what happen in computer vision where people were
0:43:36really paying that much attention
0:43:39and certainly it's being this
0:43:43how do we prevent are technology phoneme is you
0:43:48obviously it's our problem
0:43:53and then we'll turn over to the floor for any
0:43:55you know will have enough time for twenty minutes questions
0:43:58as only ten minutes
0:44:00so you know obviously can do regulations that
0:44:05bots always have to say that there were able but the
0:44:11that would not will not stop people from doing that possibly
0:44:18so adversary older networks
0:44:21generated you know if the need for a year you're gonna have steve fakes in
0:44:26language processing and dialogue processing of wherever successful
0:44:30in that it might also come to stage where i don't pick up the phone
0:44:33calls myself anymore but it's under your by six mile bit makes it up in
0:44:38order to see if it's about corpsman
0:44:40and they were talking to each other violent argue that is
0:44:43try to convince my but that it
0:44:47i don't know but that it actually happen that i mean it does so i
0:44:50don't have take michael's but the local system takes the call for me
0:44:56which might be nice even if it's a human coding like having an secretary
0:45:01so and that could also be annoying so that in another way because the technology
0:45:05might not work so well in the to start with so you spouses falling and
0:45:10your part sphere text you and it might it might cause system from millions correct
0:45:17so these are other problems also
0:45:22so i think with every technology i guess like
0:45:24they're both sides right eigen this example you said like pots talking to other bartending
0:45:30i mean be awake those are then we think no they can and the generations
0:45:34or at least for some of these things are super a good that don't have
0:45:37the time the natural language exactly me just know the right keywords or trigger words
0:45:41and it can now imagine one if you're box has access to critical account and
0:45:44like the other what's a stock and then the code of the order you know
0:45:48like this like eighteen hundred dollar stuff right and
0:45:50it doesn't at a confirmation because the predicate info is already on so i think
0:45:54there like blog sites the both of these things right so but one thing i
0:45:59would say is
0:46:00we can like just work on the research of like you know improving the dialogue
0:46:04systems the recognition the machine learning and then sort of ignore or like sort of
0:46:09re actively you know sort of go back or because of g d p or
0:46:13something and go back and look at this problem track so this is also opened
0:46:16up new research in other fields right i mean and tested we can still process
0:46:20the bottom always gonna get better it's like spam right i mean
0:46:24you know the you have to their multiple ways to deal but that's rate of
0:46:27research also has to be like sort of state-of-the-art in terms of like how to
0:46:31deal with either zero so there are methods which actually now try to improve i
0:46:36take the adversarial in flip it and try to improve the robustness of the system
0:46:40basically using the same kind of adversary technique but like in a reverse way when
0:46:43you know the gradient in the other direction of during training time
0:46:48one way to look at it
0:46:51in the commercial systems like should be make the so the money p-value or the
0:46:55like number of tries these bots get like sort of increasingly more challenging or like
0:47:00you know the amount of course like many of these are generated you know thousands
0:47:03of times a day and also generated right so if there's that wonderfully cost to
0:47:08how these companies won't exist right or they will actually change the strategy so there
0:47:12are different ways of looking at these problems like them in the cost effectiveness the
0:47:15research one thing is
0:47:18i don't think it's gonna go away and i think that's if we solve this
0:47:21like you know that was no problem right now be fixed towards can be something
0:47:25that's it's a continually changing problem one example is like when we released like some
0:47:29of the systems like you know it's multiply et cetera was people don't know we
0:47:33have to it too "'cause" wait longer to actually build systems to actually
0:47:37detect sensitive content the messages because you don't want any of these smart system to
0:47:42say something stupid you'd rather not say anything man and you know traders be smart
0:47:46and suggest responses and that's a continually evolving problem right and its cultural it's you
0:47:52know depends on like the language so many different aspects to like
0:47:55so it's a very hard problem but better i mean those i think research also
0:48:00has to look into these aspects and like sort of
0:48:05going back to the psd is what kind of problems your work on thing we
0:48:08have plenty of problems that are uncovered by the advances we made in the last
0:48:13ten years writer is opening up like new areas for research as well so
0:48:18it's a constantly evolving challenging
0:48:21okay let's point we one open it
0:48:23okay let's open
0:48:26we got a mike
0:48:28we got a question
0:48:33i feel
0:48:38so i just want to fall on the explain ability discussion
0:48:41i think one useful nuggets from watching be asserted that like video this morning is
0:48:46that the all the users in that skit didn't trust region a set on not
0:48:51sure about that
0:48:52and it may make you think that russ is also very important for explanatory
0:48:56and i was wondering more specifically
0:48:59if the panel things that symbolic
0:49:02representations are necessary for
0:49:05modeling that sort of explain ability
0:49:07the structure for
0:49:08are we gonna the mean for the connectionist as a compared to connectionist models that
0:49:12we see today and then the role approaches
0:49:17well i think you can have both
0:49:20it occurs to me to use
0:49:22you are to be able to
0:49:24training no
0:49:26neural system with but ai planning system
0:49:30and then you've got a very fast executed neural system planning to can explore much
0:49:34bigger space and people can and then you actually have when you ask a wide
0:49:39you say that then you go back remote the planning system where essentially it's going
0:49:43to therapy in figure now why what i've said that
0:49:46right because there are causally connected you could imagine them
0:49:51actually producing the representation encoded to train it to do
0:49:56that would be my
0:50:00that's what am i get the answer questions
0:50:05so i think one more aspect about the trust is i mean
0:50:08do the user's trust the devices or like the technology itself right and in one
0:50:13interesting area that's i think fast case right now or like it's gonna be of
0:50:19increasing importance as privacy preserving i and
0:50:22the notion is whether you know data level there is on the device or you
0:50:26know what is shared you know to the color who can access it like i'm
0:50:30ideally percent where trust the veracity of the information that's coming back
0:50:34all these are interesting aspect right i mean i mean in addition to the symbol
0:50:37again initialize like the links the dimension i think this is going to so to
0:50:41be even more important in the coming years because like
0:50:45phone is where your most of the time these days right i mean that's not
0:50:48gonna change its if anything it's only gonna get worse right so and you interacting
0:50:53with these voices systems it like probably added exponential rate if you have one of
0:50:58people and you have an unplugged so i
0:51:01well i don't know as can be irritating sometimes right so which makes people do
0:51:07so i think that's also an interesting and very useful aspect of trust and then
0:51:14there's a like a elevator version of that like
0:51:17regulations in gtd are like and imposing like in making sure like
0:51:21there are third party sources it's which can verify this information right and it's not
0:51:25just one central entity that you know is being out and you believe everything right
0:51:33more questions
0:51:40not until january see so i wanted to make a comment and then the what
0:51:46so the first one that i cannot algorithm or with in not being open to
0:51:51an out-of-domain multi-modality explain ability we can already that's done had candle names and
0:52:01an alarming domain may human learning machine learning domains and what we need an does
0:52:08yes and the fact that we don't have large datasets and personally i can personally
0:52:14in my projects i can't wait for you know that they is a deep learning
0:52:20architecture tool
0:52:21be able to jump from restaurants easily to be able to understand the conversational that
0:52:27the patients and is engaging in when describing there is and so i'm not sure
0:52:33exactly what
0:52:35this solution is there but i see a narrowing that she actually and in a
0:52:44well as you need a narrowing on this task i wanted to and bring to
0:52:50your attention a very interesting paper i thought from ace it nothing to do we
0:52:55the each race and sharing and whatnot there is an accountant and that they are
0:52:59wasn't a and
0:53:01energy consumption and i one slip ring of and training what is the learning model
0:53:07as and i thought
0:53:09human there was a the task i wasn't sure so shall i you know some
0:53:15these technology i think that is also something that we may want to take into
0:53:20account when we
0:53:23train in retrained is machine learning
0:53:25using the people was completely i
0:53:27something like this is a difference you to ring radii screening so
0:53:34i think i can now that space and the last but i think the second
0:53:37point you made a is probably gonna be one of the most significant areas that
0:53:44are gonna come up like not just for and all the anything touching ml and
0:53:48then x five years
0:53:49on how we can use compute i mean there's a general tendency of maybe just
0:53:52keep increasing the compute on the cloud right i mean and they can keep using
0:53:55as much as you want by segment via might arise and like you get access
0:53:58to more t v resources if you're that's not gonna be true i think what
0:54:02you will see is like
0:54:04we training with more sources but you're also building more models and if you look
0:54:07at some of the you know a statement going from some gladly well gonna i
0:54:12ten x more compute power and
0:54:14i think we expressly my group you're actually looking at a lot it like on-device
0:54:19and also efficient machine learning and
0:54:22they used to be a concern that all
0:54:24these methods i mean if they have lower for rain or lexicon hundred printer memory
0:54:29you know their you know factor we have to sacrifice quality but i think at
0:54:34least for recognition classification sequence labeling et cetera and even for speech recognition too early
0:54:39this year i and i of you know
0:54:41seeing performance for these efficient models almost on par if not better than the see
0:54:45that so there's no reason to say that all i need all these resources to
0:54:49train the model there are much better ways to do it and that requires separately
0:54:53you know like you have to introspective research that goes into that optimisations and lex
0:54:58choices et cetera it's hard it's not there just making a black box
0:55:01there are some black box to the there but it's a very important problem
0:55:05and going to the first point out narrowing i think it is true but i
0:55:09wonder if it's not just the deep learning i mean and i'm sure this has
0:55:12happened in the you know previous tech it says well random and suddenly you know
0:55:16there's some spike in technology and you know everybody grounded to its that and then
0:55:21like over time that changes and like
0:55:23i would see this like the rise in deep learning and the power of these
0:55:28networks as i mean just the cord like you know that something everybody knows the
0:55:32a very good function approximation sorry i would rather use a state-of-the-art model in one
0:55:38of those black box components like for language modeling utterances
0:55:42then having to think and tweak about like you know what model to the use
0:55:45here right there are the focus on the domain problem vitamin like for how about
0:55:49the focus of the high-level system than like what is the utterance generation mechanism that
0:55:53i should use right it's hard but because
0:55:56requiring you know that was also understanding what goes on because how that has contracted
0:56:00the rest of the component but i would rather you and it's easier to access
0:56:04these can open so these days as compared to what it was before so there
0:56:09is i think a silver lining their
0:56:11you know that more people have access to these state-of-the-art models right now and they
0:56:15can use of mary's which of the using a very creative
0:56:20or in the back
0:56:25you on the smoothed from also
0:56:27and thank you for the discussion i have
0:56:30such as for the social impact
0:56:35what do you think we could do about informing and uses
0:56:39about the dangers of these technologies like
0:56:43do you think maybe is feasible at some point
0:56:46actually building blocks that help people
0:56:49recognize logical policies
0:56:51or marketing strategies and all these things what can we do what we do
0:56:57in terms of educating and uses
0:57:00you mean how to get defensive but
0:57:03no an l c was pointing out not directly does it all the defence
0:57:07the end user but the ball that teaches the and use the
0:57:13logical fallacies about marketing strategies the about the fact that there are what's around
0:57:19that try to manipulate you
0:57:22can we get this to the politicians
0:57:25i don't know logical fallacies input i mean them
0:57:29we have it is quite a small community compared to the
0:57:33entire population and of nobody knows about the politicians one okay
0:57:38there's just one can really get the robot calls
0:57:42so this
0:57:44i mean they're starting to care about deep fakes now that in the us congress
0:57:48all those converse people were
0:57:50misidentified for criminals from some f b i most one a database
0:57:54this suddenly start a carry
0:58:00so now they have no they carry
0:58:04now we have but i mean i agree you could definite haven't the this is
0:58:09in other applications of this area of dialogue system that are that this under started
0:58:14on that's systems for training for example to train you to do a job interview
0:58:20so the system would be you and you would
0:58:22see what it's like or and that here i mean
0:58:26it is the training scenario but you could training
0:58:29a lot of different domains
0:58:32or someone trying to sell something to you and trained on how to understand first
0:58:39is really trying to doing and so on
0:58:41so this kind of
0:58:43training scenarios using dialogue system for that i think that's a huge
0:58:47well like your idea of the defensive system by because a lot of the
0:58:52systems that you don't you know all the ads that are being pushed actually
0:58:57are you know the kind of things that they're gonna come and lots of modalities
0:59:02right be auditory soon your defensive system could take care that for you said you
0:59:08know the all pass l one thanks very much
0:59:12you know on the defence
0:59:15and you are gonna have to talk to me first
0:59:20no i don't get to you don't get to pass along here
0:59:23you know what it is you trying to push and so on
0:59:25so i realise that may not be in the interest of
0:59:30of commerce but it may be easy to rest of the
0:59:34the people who
0:59:35you know would like to be helped by these parts rather than attack by
0:59:40so i think i was a great suggestions
0:59:50the all i mean it enters common but you know
0:59:53david just before dinner
0:59:56i think of the gordon not so i c
1:00:00i also discussed the remaining earlier about the well trained system versus the intelligent systems
1:00:07in kind of ties in sets in a more just question and what you guys
1:00:11had a higher rate maybe sort of that neural plus symbolic approach would be best
1:00:17so why do you think more people are working on this kind of approach now
1:00:21i didn't say people working on it but
1:00:24i think just to the point of
1:00:27what should be could be looking at anything this is something that you know we
1:00:31want to probably look into more believable
1:00:33as opposed to you know just running behind and again i'm not think this is
1:00:38happening but
1:00:39this is the addition to kind of you know see this use dataset which is
1:00:43that it and it's easy to publish on and this is easy to get for
1:00:47instance their stance of low this can is despite darts so it's very easy to
1:00:51kind of log in late models right now
1:00:54and so yes we should probably do that but as long as the problem is
1:00:58that motivated
1:01:00but you know
1:01:03that temptation apart it would be good the kind of
1:01:05look at other aspects the problem that are not just statically plug and play
1:01:10i think that going
1:01:12last question
1:01:13believe it today the tram
1:01:17we're related to the you were so maybe a false dichotomy between pipelining and
1:01:24maybe other alternate but
1:01:27i mean in this slide i think
1:01:30the real issues more modularity okay where it doesn't necessarily imply sequential process or not
1:01:38it's a limited modular where
1:01:42there is insolence usually both directions which makes a point or
1:01:50for this set is the set of
1:01:54goals you're saying it may maybe for simple task execution fairly limited enumerable but
1:02:01when one h in dialogue with other people
1:02:06real situations
1:02:08we're usually thinking about multiple matches completing a single task so all the pieces of
1:02:13language or for
1:02:16user or one
1:02:19versus there are also useful for finding this reason how much my
1:02:26placing r c is giving
1:02:31so relations
1:02:33future work so the constraints first questions also
1:02:42either these extremes is really getting that's
1:02:47like a travel agent you'll probably
1:02:53constrained problem for ways but not just words this separate problem
1:03:01simple examples
1:03:04you think about like this
1:03:05speech like is this you know in four or were question
1:03:10it's not a separable from a propositional content fine
1:03:16chance it's like functional transformation
1:03:18after a little and g i let's you to constrain a be you can say
1:03:28and you know what i think about speaker identification
1:03:34well thank you all for coming and i think we have a dinner next