0:00:17but we have a session sure
0:00:24okay thank you got real so that we want to the office keynote
0:00:34so using this time i vector you do not use
0:00:37the first keynote
0:00:40the first keynote speaker is needed about the
0:00:43the proposal
0:00:44school of informatics university obeyed embark
0:00:47but not be due to use a proper so natural language processing in the school
0:00:52of informatics that's the university of edinburgh okay
0:00:55how results will can see his own
0:00:59one getting compute us to understand reasonably and generate natural language so zero talk about
0:01:06that he's got how kind of research activities
0:01:11there's a more information on the other proceedings a node but
0:01:16she doesn't
0:01:18okay right okay you can hear me i sat right
0:01:22okay at
0:01:23right like that what it like a was saying earlier this talk is gonna be
0:01:27about learning
0:01:28natural language interfaces with neural models
0:01:31and so i'm gonna give you a bit of
0:01:33and introduction as to what these natural language interfaces are
0:01:38and then we're gonna see how we build a more problems are related to them
0:01:41and you know what future lies ahead
0:01:44okay so what he's a natural language interface it's the most intuitive thing one wants
0:01:51to do to a computer
0:01:53you just want to speak to it the computer in an ideal world understands and
0:02:00executes what you wanted to do
0:02:02and this
0:02:03billy don't know it is like one of the first things that people
0:02:06wanted to do with nlp so in the sixties
0:02:10when we didn't have computers the computers didn't have memory
0:02:13we didn't have neural networks none of this
0:02:15the first systems that appeared out there
0:02:19had to do
0:02:21speaking to the computer
0:02:23and
0:02:23getting some response so green at are in nineteen fifty nine
0:02:29presented this system called the conversation machine
0:02:32and this was the system that was having conversations with a human can people guests
0:02:38or know what about
0:02:41the weather
0:02:43well it's always the weather
0:02:45first the weather and then everything else so that they said okay the what the
0:02:49weather is a bit boring let's talk about baseball
0:02:51and this work very primitive systems they just had models they had grammars you know
0:02:56it was all manual but the intent was there we want to communicate with computers
0:03:02well in a little bit more formally
0:03:04what the task entails
0:03:06is we have a natural language
0:03:08and natural language has to be translated
0:03:11by what you see the arrow thereby parser you can think of it as a
0:03:15model or some black box the takes the natural language
0:03:18and translates it
0:03:19to something
0:03:21the computer can understand
0:03:22and this cannot be natural language had it must be
0:03:25either sql or lambda calculus or some internal representation that the computer as
0:03:33to give you an answer
0:03:34okay
0:03:35so as an example
0:03:37it is again has been very popular within the semantic parsing field you query a
0:03:42database
0:03:44but you actually don't want to learn the syntax of the database and you don't
0:03:47want to learn a square you just ask the question what are the copy those
0:03:51of states bordering texas
0:03:53you translate these into these logical form you see down there
0:03:58okay you don't need to understand this is just something that the computer understands you
0:04:02can see there is variables it's a form a language
0:04:05and then you get the answer and i'm not gonna tell you the answer you
0:04:07can see here texas is bordering a lot of states
0:04:11now i start from asking data bases the questions another task and this is an
0:04:18actual task that people have deployed in the real world
0:04:21is instructing a role board to do something that you wanted to do
0:04:25again this is a another example you can tell the robot if you have it
0:04:29one of this little robots of make you coffee and you know go up and
0:04:32down the corridor
0:04:33you can say at the chair move forward three steps past the sofa
0:04:38again the robot has to translate this into some internal representation but you understands
0:04:43in order not to crash against the software
0:04:48another example is actually doing question answering and
0:04:52a there is a lot of systems like this using a big knowledge base like
0:04:58freebase doesn't exist anymore
0:05:01it's called knowledge graph
0:05:03but this is issue much graph with millions of entities and connections between them
0:05:07and the delayed congolese using
0:05:09it's when you ask a question i mean to have many modules but one of
0:05:13them is that
0:05:14so
0:05:14one of the questions you may want to ask is for the male actors in
0:05:18the titanic and again this has to be translated
0:05:21in some language
0:05:23that freebase or your knowledge graph understands and you can see here this is expressed
0:05:27in
0:05:28lambda calculus but you have to translate it meant that some sql that the freebase
0:05:33again
0:05:34understand
0:05:35so you see there is many applications in the real world of that
0:05:39necessitate semantic parsing or some interface with a computer
0:05:44and
0:05:45here comes the man himself so bill gates
0:05:48the costume mit publishes this technology review it's actually
0:05:54very interesting i suggest that you take a look
0:05:56and it's not very mit centric they talk about many things
0:06:00and so this year they when an asked a bill gates they said to him
0:06:04okay what do you think are the new technological breakthroughs theme pensions the of two
0:06:09thousand nineteen
0:06:10that will actually change the world
0:06:12and so if you read the review
0:06:14he starts by saying you know i want to be able to detect premature babies
0:06:19fine
0:06:20then he says you know with a couple free burger
0:06:23so no meat
0:06:24you make a burgers so you know because the world has so many animals
0:06:28then he talks about drugs for cancer and the very last
0:06:33he's
0:06:35smooth talking ai assistance so semantic parsing comes last which means that you know it's
0:06:41very important to bill gates
0:06:43now
0:06:44i don't know why i mean no why
0:06:47but anyway he thinks it's really cool
0:06:50and
0:06:50of course is not only bill gates
0:06:53every company you can fit coref has a smooth talking a is system or is
0:06:58working on one
0:07:00or using the back of their head or they have prototypes
0:07:02and there's so many of them
0:07:05i so i'll xa is your sponsor
0:07:08there is cortana a context has at least what will
0:07:13decided to be different of the call it will hold not some female name
0:07:17then god
0:07:18so there is get salience of these things
0:07:21and can i see is shorthand how many people have one of them at home
0:07:27very good
0:07:28do you think do you think that work
0:07:31how many how do you think they work
0:07:35exactly so here i want this think the set alarms for me all the time
0:07:42i mean it they work if you're in the kitchen they use a lexus set
0:07:45for half an hour
0:07:47or can do you have to monitor the kids homework
0:07:49but
0:07:50we want these things to
0:07:52go beyond simple commands
0:07:56now i'll just show here
0:07:58and there is the reason why there's so much talk about these smooth talking i
0:08:01assistance because
0:08:03they could have in society a four
0:08:06not able people for people who cannot see for people who are you know are
0:08:10disabled
0:08:11is actually pretty huge if it worked
0:08:14now i'm gonna show here
0:08:18if we deal
0:08:19the video is the parity of i'm as an l x to
0:08:23and you see it and then you understand immediately
0:08:26why
0:08:28there's no sound
0:08:30hello
0:08:33we check the sound as well before
0:08:39should i do something
0:08:44i raise of the volume is raised
0:08:48to the max
0:08:53amazon and everyone asking for help
0:09:05technology isn't always easy to use for people others are you thinking
0:09:12that's why i was on par with a darpa to present amazon so we only
0:09:20smart speaker device designed specifically we used five greatest generation it's to rule out and
0:09:26response in even remotely close to
0:09:31and there is a forty i agree i
0:09:48i
0:09:58i
0:10:03no
0:10:05using hold true
0:10:08one two three
0:10:21this is like your thermostat i was set to ten
0:10:25i one
0:10:28i feel may have
0:10:30you amazon co silver placed on the music they loved when they were a
0:10:44it also has a quick skin feature to help them find things
0:10:50right
0:10:55feature for a long rambling stories i is the one i
0:11:03so i
0:11:07i really great of yours did i say yours today to them as a nickel
0:11:17silver said to check or money order to do not go right i think that's
0:11:21not exist
0:11:22okay
0:11:23it's saturday night live sketch
0:11:25but you can see how we could help the elderly
0:11:30or those in need it could to remind you for example to take two pills
0:11:33or you know it could help you feel more comfortable in your own home
0:11:38now
0:11:40let's get a bit more formal a so what are we going to try to
0:11:43do here we will try to learn this mapping from the natural language
0:11:48to the
0:11:49for remote
0:11:50representation that the computer understands and the landing setting is we have
0:11:55sentence logical form
0:11:57and biological form i will use the terms logical form
0:12:01meaning representations interchangeably because
0:12:05the model so will be talking about do not care about what the
0:12:09meaning representation is what the program if you like that the computer will execute days
0:12:15so we assume we have sentence logical form pairs
0:12:19and this is a setting the most of the work has focused on a previously
0:12:26so it's like machine translation but except that you know the target is a an
0:12:32executable and which now
0:12:33this task
0:12:35is harder than it seems for three reasons
0:12:38first of all
0:12:40their ease
0:12:41it's severe mismatch between
0:12:44d natural language and the logical form
0:12:49so if you look at this example how much does it cost a flight to
0:12:53boston
0:12:54and look at the representation here
0:12:56you will immediately notice that
0:12:59they're not very similar this structures mismatch
0:13:02and a only there is a mismatch between the logical form
0:13:06and the natural language string
0:13:09but also its syntactic representation so you couldn't even using text if you wanted to
0:13:13get the matching
0:13:15so here for example
0:13:17flight
0:13:18would align to fly
0:13:20and two and boston to boston but then fair corresponds to these huge natural language
0:13:26phrase how much does it cost and the system must
0:13:30in federal of that
0:13:32now
0:13:33this is the first challenge of destruction mismatching
0:13:36the second challenge has to do with the fact
0:13:39that
0:13:40the former language
0:13:42the program if you like that we have to execute with a computer
0:13:46has structure any has to be well-formed
0:13:50you cannot just generate anything and hope that the computer will give you an answer
0:13:54so this is a structure prediction problem and
0:13:57if you look here for the male actors and the titanic there is
0:14:01three mating representations
0:14:03do people see which one is the right one
0:14:06i mean they all look similar you have to squint that it
0:14:09the first one
0:14:12hasn't bound variables the second one has apparent this is that is missing
0:14:17so the only right one is the last one
0:14:20you cannot do it approximately
0:14:22it's not like machine translation you're gonna get the gist of it you actually need
0:14:25to get the right logical form of that executes the computer
0:14:29now the fact challenge
0:14:31and this is when you deploy google holman lx that the people who developed these
0:14:35things immediately notice is that people will say
0:14:38i mean
0:14:40so
0:14:41the same intent can be realized in very many different expressions who created microsoft
0:14:47microsoft was created by
0:14:50who founded microsoft qualities the founder of microsoft and so on and so forth
0:14:55and all that maps to this little bit from the knowledge graph which is
0:15:01well under bill gates are the founders of microsoft
0:15:04and we have to be able the system has to be able you're semantic parser
0:15:08to actually deal
0:15:09we've all of these
0:15:10different ways that we can express
0:15:13are intent
0:15:14okay
0:15:15so in this talk we have three parts
0:15:18well actually three parts so first i'm gonna show you how with neural models we
0:15:23are dealing with this
0:15:24structural mismatch
0:15:25using something that is very familiar to all of you the encoder decoder paradigm
0:15:30then i will talk about the
0:15:33structure prediction problem and the fact that you're and not if you're like your formal
0:15:38representation has to be well-formed using this coarse to fine decoding algorithm i will explain
0:15:43it and then finally i will show you solution to the coverage problem
0:15:49okay
0:15:49now i should point out that there are many more challenges that and are there
0:15:53and i'm not going to talk about but it's good to flag of them
0:15:57where do we get the training data from so i told you that we have
0:16:00to have
0:16:01natural language logical form pairs to train the models for creates this and some of
0:16:06it is like i actually quite complicated
0:16:08what happens if you have out-of-domain queries if you have a parser trained on one
0:16:12domain let's say the weather and then you want to use it for baseball
0:16:17what happens if you don't have actually only
0:16:20independent questions and answers but you have codependent there's coreference between the aquarius now we're
0:16:26getting into the territory of dialogue
0:16:29what's with speech we all pretend here that speech is to solve problem it is
0:16:33and a lot of times alexi doesn't understand children doesn't in the some people with
0:16:37accents like me
0:16:39and then you talk to design wasn't people and you say but okay so do
0:16:42you use the lattice and the good old the lattice we use on a lattice
0:16:46of one because you know
0:16:48if it it's to each slows us down the so there is many
0:16:52technical and actual a challenge is that you know
0:16:56have to all work together to make this work this thing work
0:16:59okay
0:17:00so let's talk about the structure mismatches
0:17:03and so here the model is something you all must be a bit familiar with
0:17:08and it's
0:17:09one of the like
0:17:11there is three or four things with neural models that get a recycled a over
0:17:16and over again the encoderdecoder framework is one of them
0:17:19so we have natural language as input
0:17:22we encoded with using an lstm or whatever favourite model you have a you can
0:17:28use a transform all the transformers don't work for this task
0:17:31but well because the datasets are small
0:17:34whatever the next thing is you encoded you get a vector out of it then
0:17:38these encoded vector is serves as an input to
0:17:41another lstm that actually decoded into
0:17:46and logical form
0:17:47and you will not use here i say you decoded into a sequence
0:17:51or a tree
0:17:53i will not talk about trees but i should flak that there is a lot
0:17:57of work trying to decode
0:17:59the natural language into this tree structure which makes sense since
0:18:04the logical form has structures there's parentheses there is a there is a recursive
0:18:10however in my experience these models
0:18:13are weighted complicated to get to work
0:18:15and
0:18:17the advantage over the assuming that the logical form is a sequence is not that
0:18:21great so for the rest of the talk we will assume that we have sequences
0:18:25in and we get sequences out and we will pretend
0:18:28but the logical form is a sequence even though it isn't
0:18:32okay
0:18:33a little bit formally the model will map
0:18:36the natural language input
0:18:38which is a sequence of tokens x to logical form
0:18:41representation of its meaning a which is a sequence of tokens y
0:18:46and we are modeling the probability of
0:18:49the
0:18:50input
0:18:51given
0:18:51the representation of the meaning
0:18:53and the encoder
0:18:56we'll just in called the language into the vector this vector then will be fed
0:19:01into the decoder which will the generated conditioned on the encoding vector
0:19:05and of course we have the
0:19:08very important
0:19:09attention here the attention mechanism that the original models did not use attention but then
0:19:16everybody realised in particular in semantic parsing it's very important because it deals with this
0:19:22structure mismatching problem
0:19:25so i'm assuming people are familiar here it instead of actually generating the tokens in
0:19:32the logical form one by one without considering the input the attention will look at
0:19:37the input be able
0:19:38wait
0:19:39the output given the input and you will get things you will get some sort
0:19:44of certainty that you know
0:19:46if to generate mountain maps two mountain in my input
0:19:52now
0:19:53this is a very sort of simplistic view of semantic parsing
0:19:58it assumes that not only natural language is a string
0:20:01but what the logical form
0:20:03does is also a string and
0:20:06and this may be okay but maybe it isn't
0:20:10there is a problem so i and i'll explain
0:20:12so we train this model by maximizing the likelihood of the logical forms
0:20:17given the natural language input to this is a standard
0:20:21its time
0:20:22we have to predict the locks the logical form that for any input utterance
0:20:28and we have to find the one that actually maximizes this probability
0:20:33of the output given the input
0:20:35now trying to find this
0:20:38argmax can be very computationally intensive and if you're google you can do beam search
0:20:43if you're university of edinburgh you just too greedy search any works just fine
0:20:50now
0:20:52can people see the problem with this assumption of actually decoding into a string
0:20:58remember the second problem but i said we have these we have to make sure
0:21:03that the logical form is a well formed
0:21:07and by assuming that everything is a sequence i have no way to check for
0:21:12example that my parentheses are being matched
0:21:15i don't all these because i've forgotten what i've generated
0:21:19so i keep going to get mine at some point i
0:21:22it he the end of sequence and that's it
0:21:24so we actually want
0:21:26should be able to enforce some constraints of well formedness on the output
0:21:32so how are we gonna do that
0:21:34we're gonna do this with this idea of coarse to fine decoding which i'm gonna
0:21:38explain
0:21:39so again we will have are not sure language input here all slides from dallas
0:21:43before ten am
0:21:45and i what we would do before is we will be called the entire
0:21:49natural language string into this logical form representation but now what can insert a second
0:21:55stage
0:21:56where we first
0:21:58the cold
0:21:59to a meaning sketch
0:22:01what the meeting's sketch does is it abstracts away details
0:22:05from the very detailed logical form it's an abstraction
0:22:11it doesn't have arguments it doesn't have variable names you can think of it
0:22:15if you're familiar with
0:22:16template it's a template of the
0:22:19logical form of the meaning representation
0:22:22so first we will have a natural language
0:22:25to decode into this meeting sketch and then we will use this meeting this case
0:22:29to fill in the details
0:22:31know why does this make sense
0:22:34well there is several arguments first of all you disentangle higher level information from low-level
0:22:41information
0:22:43so there are some things that are the same
0:22:45across logical forms
0:22:47but you want to capture
0:22:49so you're meaning representation in this case at the sketch level is gonna to be
0:22:53more compact so in if for example a need to switch is the dataset we
0:22:57work with
0:22:58these catch use nine point two tokens as opposed to twenty one twenty one tokens
0:23:04is a very long logical form
0:23:06another thing that is important is that the model level because then you explicitly share
0:23:12the core structure
0:23:14that is the same for multiple examples so you use your data more efficiently
0:23:19and you learn to represent commonalities across examples which the other model did not know
0:23:24so you do provide global context
0:23:27to do the find meaning decoding no i have a graph coming up in a
0:23:31minute
0:23:32now
0:23:32the formulation of the problem is the same as before we again map natural language
0:23:37input to the logical form representation
0:23:40except now that we have two stages in this model and so we again the
0:23:45model the probability of the output given the input
0:23:48but now
0:23:49this is factorized into two terms
0:23:51the probability of
0:23:53the meetings kitsch given the input
0:23:56and the probability of the output
0:23:59given the input in the meetings catch
0:24:02so the meetings get
0:24:04i is shared between those two terms
0:24:07and i'm sure you a graph here so the
0:24:11green nodes are to be encoder units the orange or brown i don't know how
0:24:16comes out here this colour
0:24:18are the decoder human it's so in the beginning we have a natural and which
0:24:23we will encoded with your favourite encoder
0:24:25here are you see a bidirectional lstm
0:24:29then we will use this encoding
0:24:31to decode two s catch
0:24:33which is this abstraction of the high-level meaning representation
0:24:38once would you call it this catch we will
0:24:41and coded again
0:24:43we do not or bidirectional lstm into some representation
0:24:47that we will fit in to our final decoder that fills in all the details
0:24:52we're missing
0:24:54and you can see at their the red bits are the information that i'm filling
0:24:59in
0:25:00you will see a list of the this decoder
0:25:03this the coder takes into account
0:25:05not only the encoding
0:25:07all of the sketch
0:25:08but also the input
0:25:10remember in the probably probability terms it is
0:25:13be probability of x given x and a
0:25:16the probably y given x n a y and use our output x is their
0:25:21input and the a is the encoding of my sketch
0:25:26okay this is what why we say
0:25:29the sketch provides context for the decoding
0:25:33okay
0:25:34no training and inference works the same way to gain maximizing the log-likelihood of the
0:25:38generated meaning representations given the natural language
0:25:42and a test set i'm again we have to predict both the sketch and the
0:25:49more detailed logical form
0:25:51and we do this via greedy search
0:25:55okay so a question that they have not addressed is where do these templates come
0:26:00from
0:26:01where do we find the meaning sketches
0:26:04and if the answer that i would like to give you use our work we
0:26:09would just an errand
0:26:11now
0:26:12that is fine we can their them
0:26:14but a first will try something very simple no show you examples because of the
0:26:19simple thing doesn't work then learning will never work
0:26:22so
0:26:24actually example so the different meanings sketches
0:26:27for different kinds of a meaning representations
0:26:31so here we have logical form lambda calculus
0:26:34and it's very trivial
0:26:36to understand how would you would get the meeting sketches you would just
0:26:40get rid of arable information
0:26:43you know lambda counts and arg max this gets you would anything that is specific
0:26:48to that would remove we would remove any notions of arguments
0:26:53and
0:26:54a any sort of
0:26:56information that may be specific to the logical form so you see here
0:27:00this is the details for and this
0:27:03whole the expression becomes lambda to a fight there is known numeric information so these
0:27:09are variables
0:27:10this is for logical form
0:27:13if you have source code this is python a thinks are very easy actually would
0:27:17just substitute tokens with token types
0:27:22so here is the python called and
0:27:25s will become a name for will become a number
0:27:30named here is the name of the function and then this is a string
0:27:34of course
0:27:35we want to keep the structure of the expression as it is so we will
0:27:39not substitute delimiters operators or built-in keywords
0:27:43because that would change actually what the problem program is meaning to do
0:27:49if we have sql query is
0:27:52it's again simple to get this meeting sketches so this is above you can see
0:27:56this is the s two l syntax
0:27:58so we have a select clause and we have two
0:28:02first select the columns so industrial we have tables and they have columns
0:28:07here we have to select the call them and then
0:28:10we have the where clause that has conditions on it so in the example we're
0:28:14selecting a record company
0:28:16and here we are saying
0:28:19the where clause put some conditions so the hearer reporting in this record company has
0:28:24to be after nineteen ninety six of the contact conductor has to be
0:28:28michael thus need cohesive russian composer now if you want to create a meeting scheduled
0:28:33very simple
0:28:34well we'll just have the syntax of the were close where
0:28:37larger and
0:28:39and equal
0:28:40so we'll just have the were close in the conditions on it
0:28:43these are not filled out yet so we could apply
0:28:47too many different columns in an sql table
0:28:53okay let me show you some results so i'm gonna compare
0:28:56the simple model that have shown you the simple is supposed to sequence model
0:29:02with this more sophisticated model but that's constrained decoding
0:29:06and this is comparing two state-of-the-art of course
0:29:10the state-of-the-art is a moving target in the sense that now all these numbers with
0:29:15barrett
0:29:16a people are familiar with paired rate and so these numbers with paired
0:29:20go up by some percent so whatever show you
0:29:23you can add in your head
0:29:25two or three percent
0:29:28it so this is that it is models do not use but so this is
0:29:31the previous to the state-of-the-art this is geo query and the eighties this some gonna
0:29:35trigger results for and
0:29:37different datasets
0:29:38and this important to see that it works in different datasets with very different meaning
0:29:43representation so somehow of logical form do you play an eighties have logical form
0:29:48and then we have an example with python code and with sql so here is
0:29:53the system
0:29:55uses syntactic the coding
0:29:58so it uses
0:29:59i
0:30:00quite sophisticated grammatical operations that then get compose two with neural networks
0:30:05to perform semantic parsing
0:30:07this is the simple sequences you ones model or showed you before
0:30:10and this is coarse to fine decoding so
0:30:13you do get a three percent increase
0:30:16with regards to eight is a this is very interesting it has fan every very
0:30:20long utterances in very long logical forms
0:30:24again at six you do almost as well
0:30:27remember what is said about you know
0:30:29syntactic the coding does not give so much of an advantage
0:30:33and then again
0:30:35we get a bows with coarse to fine
0:30:37and a similar pattern can be observed when you use
0:30:40sql
0:30:43for you jump from seventy four to seventy nine
0:30:45and the john goal use these
0:30:50pi phone so you execute python code and again from seventy to seventy four
0:30:57okay
0:30:59now this is on the side no just mention it a very briefly
0:31:04all the all the tasks and i'm talking about here
0:31:08are dealing with the fact that you have
0:31:10your input and you're output pre-specified some human goal was and writes down to logical
0:31:17form
0:31:17for the utterance
0:31:19and the community has realise that this is not scalable
0:31:22so what we're also trying to do is to work with weak supervision where you
0:31:27have the question
0:31:28and then you have the answer
0:31:30no logical form
0:31:32the logical form is latent
0:31:33and you have to
0:31:34come up with it the model has to come up with it so now this
0:31:37is good because it's more realistic
0:31:39but it opens another huge kind of warms which is you have to come up
0:31:43with a logical forms you have to have a way of generating them
0:31:47and then you have a and their this variance because you don't know which ones
0:31:50are correct and which ones are and
0:31:52so here we show you table you're given the table
0:31:56you're given how many silver medals in the nation of turkey when
0:32:00and the answer which is zero and that you have to hallucinate all the rest
0:32:04so this idea of actually using the meaning skate used
0:32:08is very useful in this scenario
0:32:10because it sort of restricts the search space
0:32:14so rather than actually a looking for all the types of logical forms you can
0:32:20have you sort of first generate a map struck
0:32:24program or and meaning sketch
0:32:26and then
0:32:27once you have that
0:32:29you can feel in pdtb so this idea of obstruction
0:32:32is helpful that would say
0:32:33in this scenario even more
0:32:37okay
0:32:37now
0:32:38let's go back to the third challenge which has to do with linguistic coverage
0:32:44and this is the problem
0:32:46that will always be with this it will be whatever used all of the human
0:32:50is unpredictable
0:32:51i think that you know what was it things that you're model does not anticipate
0:32:55and so we have to have a way of dealing with it
0:33:00okay so
0:33:03this is not then you at a
0:33:05whatever has done question answering has come up with this problem
0:33:09or of g how do i increase the coverage of my system
0:33:14so what people have done and this is actually unbounded thing to do you have
0:33:18a question there and you paraphrase it to in ir for example people to query
0:33:24expansion it's the analogous idea what i have a question i will have some paraphrases
0:33:28that will paraphrase it and then
0:33:31you know what i will submit the paraphrases and i will get some answers and
0:33:34the this is the problem solved
0:33:36except that it is and if any of you have worked with paraphrases you see
0:33:40but you know
0:33:42the paraphrases can be really bad
0:33:44and so you get a couple answers so now you have the problem and then
0:33:49you've created a problem and the reason why this happens is because the
0:33:55paraphrases are generated
0:33:58independently
0:33:59all your task of the qa module but you have so you have accurate module
0:34:04you paraphrasing the questions and then you get answers and that not point do you
0:34:08have v
0:34:09and sir communicate with the paraphrase
0:34:12to get something that you know
0:34:14is appropriate for the task or for the qa model
0:34:18so what i'm gonna show you now is how
0:34:20we train these paraphrase model jointly
0:34:24with a qa model for and then turn task and our task is again semantic
0:34:28parsing except that this time because this is a more realistic tasks we're gonna be
0:34:33asking a knowledge base like freebase or was knowledge graph
0:34:37and of course there is a question that i will address in the bit where
0:34:41do the paraphrases come from
0:34:43who gives the most who what where are they
0:34:48okay so this is don think this slide of but it's actually really simple and
0:34:52i'm gonna take it through this so this is how we see the
0:34:58modeling framework as
0:35:00we have a question who created microsoft
0:35:03and we have some paraphrases
0:35:06bettered even with this and i will tell you mean the minute whole gives the
0:35:09paraphrases assume for a moment we have these paraphrases
0:35:13now what we will do is we will first take all these paraphrases here
0:35:19and score them
0:35:22okay
0:35:22so we will then called we will get question vectors we will have a model
0:35:27that gives the score how what is this paraphrase for question
0:35:31how would is who founded microsoft as a paraphrase for who created microsoft
0:35:36now once we normalize this course
0:35:39then we have our question answering module so we have two modules one is the
0:35:43paraphrasing module in one the question answering module and their trained jointly
0:35:47so once i have my scores for my paraphrases these are gonna may be used
0:35:52to weight the answers given the question
0:35:56so this is gonna tell your model well look
0:35:59this answer is quite good given your paraphrase or this answer is not so good
0:36:05giving your paraphrases do you see now that you kind of latter which paraphrases are
0:36:10important for your task
0:36:12for your question answering model
0:36:14and your answer jointly
0:36:18okay
0:36:20so
0:36:20a bit more formally we have
0:36:23them the modeling problem is we have the an answer
0:36:26and we want to model the probability of the answer given the question
0:36:30and this is factorized into two models one is the question answering model
0:36:35and the other one is the paraphrasing model
0:36:37now for the question answering model you can use whatever you like
0:36:41your latest neural qa model you can plug in there and
0:36:46this is what the paraphrase model
0:36:48if whatever you have as long as you can actually
0:36:52and called them somehow
0:36:54it doesn't really matter
0:36:56now i will not talk a lot about the question answering model we used an
0:37:01in-house model that is based on graphs that the
0:37:05is quite simple be it just as graph matching on wheels knowledge graph
0:37:10and i'm gonna tell you a bit more about the paraphrasing model
0:37:15okay so this is how we score of the paraphrases
0:37:20we have a question
0:37:22we generate paraphrases for this question
0:37:25and then for each of these paraphrases so we will just
0:37:30score them how good r-d given
0:37:33my question
0:37:34and this is you know a dot product essentially
0:37:37is a good paraphrase or not
0:37:39but it's trained and they're and
0:37:42with the answer in mind
0:37:44so
0:37:46is this paraphrases going to help me to find the right answer
0:37:50and now
0:37:51as far as the paraphrases are concerned again this is applied can play module you
0:37:55can use your favourite so if you are in limited domain you can write them
0:38:00yourself
0:38:02manually
0:38:03you could use wordnet
0:38:05or pp db which is this database which has a lot of paraphrases
0:38:10but we do something else a
0:38:12using neural machine translation
0:38:17okay so this like to put it i know everybody knows it but it's my
0:38:20favourite slide of all times
0:38:22because
0:38:23but we address tried to do this slide again
0:38:26it's not as good as the original
0:38:29like you do it in particular if you go to machine translation talks about that
0:38:32all this is a machine translation
0:38:34or ever come to capture so beautifully
0:38:37the fact that bob sorry the fact that you have this language here
0:38:41you have this english language and that you have attention weights so beautiful
0:38:46and then you take it is sensational weights and you wait them
0:38:49with the decoder and hey presto you get the french language
0:38:53so
0:38:54this is your usual machine translation your vanilla machine translation engine
0:38:59it's again and encoder-decoder model with attention
0:39:02and we assume we have access to this engine
0:39:06now
0:39:07you may wonder how i'm not gonna get paraphrases out of this
0:39:12this again an old idea which goes back a back actually the martin k somatic
0:39:17a i think can be eighties
0:39:19notice this thing so what we wanted to ease
0:39:23in the case of english goal from english to english
0:39:27so we want to be able to sort of paraphrase and english expression to another
0:39:31english expression but in machine translation i don't have any direct path
0:39:35from english to english
0:39:37what i don't have is a path from english to german
0:39:40and german to english
0:39:42so
0:39:43the theory goal is if i have to english phrase is
0:39:47like here under control
0:39:49and
0:39:50in check
0:39:51if they are aligned or if they correspond to the same phrase in another language
0:39:57there are likely to be a paraphrase
0:40:00now i'm gonna use these alignments this is for you'd understand the concept but you
0:40:04can see that i have english i translate english to german
0:40:09then german gets back translated to english
0:40:13i have my paraphrase
0:40:19more specifically
0:40:20i have my input which is in one language
0:40:24okay i encoded i decode it into some translations in the foreign language g stance
0:40:29here for german
0:40:31i encode my german and then i decoded back to english
0:40:36there is
0:40:37two or three things you should not just about this thing
0:40:41first of all
0:40:42these things in the middle the translation so called people it's
0:40:46and you see that we have k people it's
0:40:49i don't have one translation but i have multiple translations distance out to be really
0:40:53important because a single translation may be very wrong and then i'm completely screwed i
0:40:58have very bad paraphrases
0:41:00so i have to have multiple people it's i don't only that i could also
0:41:05have multiple people it's in multiple languages
0:41:08which then i take into account while i'm the coding
0:41:12now this is very different from what do you may think of as paraphrases because
0:41:17the paraphrases there never
0:41:20explicitly stored anywhere they're all model internal
0:41:23so what this thing variance i give it english you just paraphrases english into english
0:41:30but i don't have an explicit database
0:41:32with paraphrases
0:41:34and of course they are all vectors and they're all scored but
0:41:37i you know i cannot ball in say
0:41:39where is that paraphrase i cannot give the model the paraphrase and it generates another
0:41:44one which is very nice because you do generation for free in the past if
0:41:49you had rules you have to see how you actually use them to generate something
0:41:53that is meaningful and so on
0:41:55okay
0:41:55let me show again example
0:41:57this is a paraphrasing the question what is the zip code of the largest car
0:42:02manufacturer if we put people through french
0:42:06so french tells us what is the zip code of the largest vehicle manufacturer or
0:42:11what is the zip code of the largest car producer
0:42:14if we people through german
0:42:16what's the postal code of the biggest automobile manufacturer
0:42:20what is the postcode of the biggest car manufacturer
0:42:24and if we people through check
0:42:25what is the largest car manufacturers postal code
0:42:29or zip code of the largest car manufacturer
0:42:32can i see a show of hands which are people to language do you think
0:42:36gives you the best
0:42:37paraphrases
0:42:39i mean it's a sample of two
0:42:43check
0:42:44very good
0:42:44check
0:42:45proved out to be the best pay but
0:42:47for the by german
0:42:49french was not so good
0:42:51and again here there's the question how many people it's to use what languages do
0:42:56you choose i mean these are all experimental variables that you can manipulate okay
0:43:00then we show you some results
0:43:03the grey you don't need to understand
0:43:05these are all be used baselines that somebody can use
0:43:10to show that the model is doing something over and above the obvious things
0:43:16this is
0:43:17c grad the this graph here is using nothing so you go from forty nine
0:43:23to fifty one
0:43:25this it from sixteen to twenty
0:43:27these are web questions a graph questions is our datasets that people have developed this
0:43:33graph questions is very difficult it has like
0:43:36very complicated questions that have a multihop reasoning so who's the bombers daughters friend dog
0:43:43called a very difficult that's why the performance is really bad
0:43:48what you should a c d's that
0:43:52here pink is apparent that
0:43:54is so in all cases
0:43:56using the hold on a pad paranoid is pink
0:44:00a here is second best system
0:44:03and
0:44:05read here is best system and you can see that it is very well in
0:44:08the difficult dataset
0:44:09in the other dataset there is another system that is better
0:44:12but they use a lot of external knowledge which we don't have a better exploits
0:44:16the graph itself which is another avenue for future work
0:44:21okay
0:44:22now this my last slide and then our take questions
0:44:27what have we learned is so there is a couple of things that are interesting
0:44:31first of all he's that
0:44:34if you use encoder-decoder models
0:44:36are
0:44:37good enough
0:44:38for mapping natural language to meaning representations with minimal engineering effort and the cannot emphasise
0:44:46that
0:44:48more
0:44:49before
0:44:50these paradigm shift
0:44:53what we used to do is we would spend a huge is coming up with
0:44:56features that we would have to re engineer
0:44:58for every single domain so if i go from lambda calculus to sql and then
0:45:02to python code are would have to do the whole process from scratch
0:45:05here you have one model
0:45:08with some experimental variables that you know you can keep fixed or change and it
0:45:13works very well of across domains
0:45:17a constrained decoding improves performance and only for this setting the type show to you
0:45:22but for more weakly supervised settings
0:45:25and i'll people are using this constraint encoding even
0:45:29not in semantic parsing i so you know in generation for example
0:45:34the paraphrases n and hands the robustness of the model and in general it would
0:45:38say their useful
0:45:40if you have other tasks leave for dialogue for example
0:45:43you could give robustness to a dialogue model to generate answer of a chat board
0:45:49and the models could transfer to other tasks or architectures i've shown for the purposes
0:45:54of this talk
0:45:56you know so as not to overwhelm people
0:45:59simple architectures but you know you can put neural networks left right and centres you
0:46:03feel like
0:46:04now in the future i think there is a couple of a venues from future
0:46:08work worth pursuing one is of course learning the sketch is so big could be
0:46:12a latent variable in your model trying to you know generalise and that would mean
0:46:18that you don't need to do any preprocessing you don't need to give the algorithm
0:46:21the sketches
0:46:23how do you do with multiple languages that have a semantic parser in english
0:46:27how do i try switching chinese big problem in particular industry they have the come
0:46:33up this problem a lot and their answers we higher annotators
0:46:39how do you
0:46:42train this model seaview have no data at all so just a database
0:46:47and of course there is something but i would be in of interest to you
0:46:51is how do i actually
0:46:53do coreference how do i
0:46:56model a sequence of turns
0:46:59are suppose to a single turn
0:47:01and without further ado i have one last slide and it's a very depressing slide
0:47:07so
0:47:08when they get this talk like a couple months ago i used to have this
0:47:11where it was to resume
0:47:13and a this is on twitter and she's to the david the jockeys to resume
0:47:18will ask alexi to negotiate for her
0:47:21and it will be fine i try to find another one with boris johnson
0:47:25and failed i don't think it does technology
0:47:28so and he doesn't of negotiating either
0:47:30so she would have been she would at least negotiate and at this point out
0:47:35just a questions thank you very much
0:47:38really
0:47:43and my store
0:47:45the time for question
0:47:48thank you this is result from i j p morgan so my question is do
0:47:53we really need to do
0:47:56to extract the logical forms
0:47:58given the fact that
0:48:00probably humans don't do we really except in really complicated
0:48:05case
0:48:06about my daughter that
0:48:10do we really need to do that for a well in that world machine translation
0:48:15we don't really extract all these things
0:48:18but we do translate i even to
0:48:22like personal data stuff
0:48:24that's a that's a good question so the answer is the
0:48:27yes no
0:48:28so if you look at a lexus l or google these people
0:48:33they have very complicated systems where they have
0:48:37one module that does what you're say i don't translate to logical form i just
0:48:41you know like to query matching and then extract the answer
0:48:44but for some of the highly compositional way switch to get with to execute the
0:48:49mean databases
0:48:51and they all have internal representations of what they're which means
0:48:55also
0:48:56if you are developer and for example
0:48:59whenever you have a database
0:49:02and that has think so i seven genes or i still fruit and have a
0:49:06database and the deal with
0:49:07customers and i have to have a spoken interface there you would have to extracted
0:49:12somehow now for the phone when you say cv a set my alarm clock i
0:49:17would agree with you there you just need to recognize intents
0:49:20and do the attribute slot filling
0:49:22and then you're done
0:49:24but whenever you know how
0:49:27more like to beak infrastructure in the
0:49:30output a of the answer space and then you do this
0:49:39thanks for a very nice to
0:49:42had a question on the on the paraphrase
0:49:47the scoring and it seem to me something wasn't quite right if i understood it
0:49:51well but what's more the you have an equation with the summation of thing that's
0:49:57what so intuitively
0:50:01to make the right thing is to you look for the closest paraphrase that actually
0:50:06has an answer that you can a good quality actually can find it so you're
0:50:09trying to optimize that's two things by finding something that means the same that we're
0:50:14i can find an answer if i can't find a matter of the original question
0:50:17but when you some that the problem as paraphrases that have been an equal
0:50:22distribution out of some phrases have many paraphrases are many paraphrases in a particular direction
0:50:27but maybe not so many in the others just depending on how many synonyms you
0:50:31haven't so trying to add them up and weight them if you have a lot
0:50:35of paraphrases here for the wrong answer and one for something that's better you know
0:50:39it seems like the
0:50:40closeness should dominated if you have a very high quality after and it seems like
0:50:45your models trying to do something different that i'm wondering if that
0:50:48is causing problems or something that are not seen that no right so this is
0:50:52how morally strange at the case we have to make it robust
0:50:55and you can manipulate the n-best paraphrases
0:50:59access time you're absolutely right would just find the one the one max the one
0:51:03that is best
0:51:05so you are right it's and i did not explain well but you are absolutely
0:51:09right that you know you don't have
0:51:11you know you can be all over the place if you're just looking for the
0:51:14sum of but its time we just want to one
0:51:21a high thank you for the great war decision model for microsoft research so my
0:51:25question is for the coarse to fine decoding would you think of its potential in
0:51:30generating natural language outputs like dialogue like summarisation
0:51:35a what get come again ask the question again what would be o
0:51:40would you think of the potential of you close to find that's a good question
0:51:44that connection question so
0:51:46i think well i think it's very interesting now
0:51:51for a
0:51:52sentence generation so you mentioned summarisation i'll do one thing at a time so if
0:51:57you're just want to generate
0:51:59from some input a sentence
0:52:02you want to do surface realization people have already done this is a rash they
0:52:06have a very similar model where the first sort of
0:52:11produce a template which they learn in from the temple at the surface realize a
0:52:15sentence
0:52:16however summarization which is the more interesting case
0:52:20you would have to have a document template
0:52:24and
0:52:25it's not clear what this document template might look like in how you might learn
0:52:29it so you may
0:52:31for example i assume that the template it uses some sort of a tree or
0:52:36a graph
0:52:37with generalizations and then from there you just generate the summary
0:52:41and i believe it's like very
0:52:44we should do this but it will not be as trivial as
0:52:50what to do right now which is the encode the document in the vector and
0:52:53that have attention and then a bit of coffee and then here's your summary
0:52:57so the question their want the template is
0:53:01nobody has an answer
0:53:13i was wondering if you could elaborate on your very late this work on generating
0:53:19the abstract meaning representation because of course my reaction
0:53:23what you are saying in the first five was
0:53:26well
0:53:27it's all good then where and when you have you know
0:53:29a
0:53:30corpus where you at the mapping between the query and did not and the and
0:53:35logical form what do you do if you don't have which is the majority of
0:53:40cases
0:53:41see okay so this is a tough problem a so how do you do inference
0:53:47with weak supervision a
0:53:49and there is two things their that we found out that have
0:53:56because the space you have dinner somewhere doing a but merely a it's
0:53:59of
0:54:01potential programs that execute and we haven't always signal
0:54:04other than the right answer
0:54:06so because the only signal is the right answer there's two things that can happen
0:54:10one is ambiguity
0:54:12so
0:54:13it's entities it may be ambiguous we can be can be another turkey or both
0:54:17took the country interactively
0:54:20government
0:54:21and so that then you're screwed and you will get things and the other one
0:54:24is spurious this so you have things that execute to the right answer
0:54:29they don't have the right intent the right semantics
0:54:31and so what people do what do things we do the templates here
0:54:36and then we have another step which actually again tries to do
0:54:41some structural matching and tries to say okay so i have this abstract program
0:54:46this will cut down the search space
0:54:49and then
0:54:49you also have to do some alignment and put some constraints of the sensei for
0:54:55example
0:54:55i cannot have
0:54:57column silver repeated twice
0:55:00because this is no well formed
0:55:02but
0:55:02the accuracy of these i didn't put it is like forty four percent
0:55:06knots you know
0:55:09note anywhere i mean the global in amazon would laugh
0:55:12there is a more work to be
0:55:18so thank you for the talk so i have a question about your calls lane
0:55:21deporting so you go your course plaintiff or being you use a meaning representation but
0:55:27you're the whole being final deporting these of these two based on the cross marks
0:55:33it'll be both old ones but it to be politically
0:55:37o and it means that there is no guarantee that the meaning representation we use
0:55:42the on wavelet the that intonation without but in some cases so we need to
0:55:48consider such things because if we consider of the semantics some arguments over the eight
0:55:54it was something
0:55:56of the d scene which should be included in that the warnings
0:56:00that is a very good i'm glad they are you guys were paying attention so
0:56:04yes we don't have we don't have this and
0:56:08we saved constraint a coding but what you really do is you constraining the encoding
0:56:12hoping of their your decoder will be more constrained by the encoding
0:56:17you could include we didn't know analysis where we saw two things one is how
0:56:22good are the temple so if you're templates are
0:56:25not great so what you're saying
0:56:28will be more problematic
0:56:31and we didn't analysis let me see if i have a slide that shows that
0:56:34actually the templates are working quite well
0:56:37i might have a slight i don't remember
0:56:41yes
0:56:42so this slide shows you see
0:56:46the sequence to sequence model the first row use the sequence to sequence model
0:56:50and without any sketches
0:56:53and the second is a coarse to fine where you have to predict the sketch
0:56:56and you see that the coarse to fine predicts a sketch is much better
0:57:01then the one stage more than one but does sequence to sequence
0:57:04so this tells you that you
0:57:06are kind of winning but not exactly
0:57:09so it's i don't know what if what would happen if you includes these constraints
0:57:14might
0:57:14my answer would be this doesn't happen a lot it could be but it's the
0:57:18logical forms we tried if you have vary along very complicated so we've and then
0:57:23you really huge sql where is then
0:57:25i would say that you're approach
0:57:27would be required
0:57:30okay no it's
0:57:33this could do
0:57:35so maybe ask one question okay it's that in the last time that's what you
0:57:40said that the model seventies this doesn't
0:57:43so you so what is i mean it double that all use related to the
0:57:47qa or once in this and one up but in a dialogue case we have
0:57:52a multiple times
0:57:54so what is the common problems more will be good
0:57:57yes so i i'll send you i have a nice of this so we did
0:58:01try to do
0:58:02this paper in submission multiple turns
0:58:06so where you say an example i want to buy this levi's jeans
0:58:14how much to the course to do you have the mean another side
0:58:18or other two why well what is the colour so you know you elaborate a
0:58:22new questions and there's patterns of you know these multiturn dialogue but you can do
0:58:28and
0:58:29you can do this but the one thing that we actually need to sort out
0:58:34before doing please
0:58:35is coreference
0:58:37and
0:58:37because right now this model some take a reference into account if you model coreference
0:58:42in the simple way of like a look at the past and they do modeled
0:58:45as a sequence it doesn't really work that well so i think definitely
0:58:49sequential question answering is the way the goal i have not seen any models that
0:58:54make me go like all this is great but
0:59:00yes it's a very problem and the very not sure but you know one step
0:59:04at the time
0:59:05so thank you much so that sense because they give him