0:00:15thank you
0:00:17so electrical because a recent work and the to be a scaled down there
0:00:23exactly in collaboration with a one company the company score the added combine the
0:00:31and is this company is of interest e and e commerce in the promise a
0:00:39scenario a we have been working on a and decoder can you show no for
0:00:44conversational agents a in this in this scenario
0:00:50so they get the general idea is that and that the long-term goal of this
0:00:56work is a kind of conversational agent that kind of the shop assistant that the
0:01:02media but users in a buying products in that one income in any comments you
0:01:10so forth for instance i e for the user say second i find a kind
0:01:16of that was the arc of rules for my most of the suppose behavior of
0:01:23these shop assistant who presented the user really to the products
0:01:30this is a kind of task oriented scenario and the and basically it can be
0:01:36approached the with the traditional as follows really approach
0:01:41so we have several what the system is supposed to recognize in than the likes
0:01:47of a
0:01:48and then the to it's a classifier
0:01:52because if i you put the using the categories to provide the brand the colour
0:01:57and the and other properties
0:02:00so the approach should work on a seven dollar on a many to me for
0:02:07instance the cameras for mutual or groups through the and that that's
0:02:14why a relative and the probabilities of these us in its that the basically there
0:02:21are there are no i'm not stated that utterances so that i and all sentences
0:02:27request from the user that which i i'm not stated that we the
0:02:33the i
0:02:34the properties the intent and the and the properties that
0:02:39of the specific domain
0:02:41and another problem another relevant factors that
0:02:46it might be easier
0:02:48to find that handles where
0:02:52information about verb that the is present
0:02:57so given these along with a scenario with focus the on the specific issues in
0:03:03the in this work
0:03:05so we focus on entity recognition
0:03:08so for instance the cup on the capacity to recognize kind of who's park that
0:03:13is one lady with the
0:03:16a user utterance
0:03:19we based our work on a gaseous
0:03:24that's its in this
0:03:25in this scenario are catalogues basically comparable opera dataset
0:03:30that we can get the from vendors the yukon the subband
0:03:35the main research question for as he is how far we can go without any
0:03:40and located and so this is why we call the other so the link this
0:03:48so few words about the specific issues of and it's nice product needs to and
0:03:55in this in this is in hong kong
0:03:59so basically this is different from traditional named entities we have what they in that
0:04:07the tradition of information extraction has been called the nominal in
0:04:13so for instance an entity may contain also connectives like a black and white fisher
0:04:22we have a black a black paint
0:04:25so black is a property all of open-ended
0:04:30entity names may contain bright adidas so wide beam that's for sure
0:04:36or even a proper names that i've a lot if you if the think about
0:04:41and use it you know how many
0:04:43and need to add products
0:04:50another a very important property for a for our approach is composed is compositionally
0:04:56so being and nominal entities we may assume and they'll respect to some competition my
0:05:03principle of the lane
0:05:05so if we have of for instance in the folder domain pasta with broccoli
0:05:12i and but we base also this is the and now plus the positional modifier
0:05:18we can add the objective alamouti five times faster would based
0:05:24but then knowing that we may have a slight but we may still
0:05:28a we may in fact that the
0:05:32being getting having both past that wouldn't broccoli and but it will broccoli may be
0:05:38maybe it's a it's a good name also spaghetti we'd based okay even if it
0:05:44is not as never been
0:05:46see before
0:05:47so this means that we can
0:05:49our the approach the should be able to take advantage of composition on
0:05:57a then might be the case of having a multiple of convinced it's all this
0:06:03of a semantic category in the same utterance
0:06:06which is not the colour of
0:06:10it's a d r that was synonymous a like a in booking flights usually one
0:06:16just unique shown
0:06:17one at time of our evaluation but not maybe
0:06:21it's quite well maybe i would like to work the salami pizza and most analysis
0:06:26so two entities to of the content of the scene categories to in the same
0:06:34and then there is a strong the need of multilinguality of course so that it
0:06:40you can to spend also they
0:06:42they need to translate accuracy in a in multiple languages
0:06:49okay so which are that were working hypotheses we would like a twenty nine but
0:06:54the model for entity recognition based only on if you look at how to cut
0:07:00the ropes like i got it
0:07:02and then we would like to apply this model
0:07:05to in order to label
0:07:08unseen entities in a user
0:07:11so the only is that we have a density is not nothing at and we
0:07:15will open all
0:07:17right to understand how far we can go
0:07:19that's the working on a on the t
0:07:23so the main into a ds set of our approach have the following
0:07:27take advantage of composition and nature of problem needs so we want to extract as
0:07:33much as possible knowledge from got it
0:07:37use as much as possible syntactically generated data
0:07:40a having no reality come from users
0:07:44a we need to work on syntactically generated data
0:07:49and then we would like to be as much as possible language independent
0:07:56so this is the approach basically for states to
0:08:00at the beginning we collected against india
0:08:03for a step in domain
0:08:06then the starting from these get here we generated both positive and negative example
0:08:13of all the and it needs can be regarded e
0:08:18so that will be on the base of positive and negative example
0:08:23will be the a classifier in our case a new classifier
0:08:28be able to recognize the entities in that in this in that the specific domain
0:08:34having these are classified this model that this classifier
0:08:38which is able to discriminate the weather
0:08:41is to design a sequence of tokens is that is contained in a in a
0:08:46second domain or more than we are we want to apply this model
0:08:50to recognise
0:08:53and names in utterance is
0:08:56and so we apply the classifier to all the stuff sequences
0:09:00of in user utterance
0:09:02in order to select and select all the best the sequence is that which are
0:09:07normal that like
0:09:09i will see the forced that we wish to the force that with that was
0:09:12some example
0:09:14so collected additive for a separate for the domain
0:09:18we just by screen and the website of a fine so for the number of
0:09:27the main okay like for the real the and four
0:09:34the underlying assumption here is that the screen being a website collect the
0:09:42well as it often fails on the all four entity names that is marks g
0:09:47then i'm not eating good data
0:09:50particularly because we don't have
0:09:52after a system you not
0:09:55so this is the first that
0:09:56just collecting
0:10:00the second step is to generate the
0:10:03positive and negative examples okay so the positive example
0:10:07i at least in our money now our approach our initial approach is quite simple
0:10:12all of the old i frames university of it are also conducts
0:10:16okay we downloaded them from a website so we trust the website
0:10:22as for melody that
0:10:24for each
0:10:26well as you can example which have at the and negative example or and number
0:10:30of negative examples that
0:10:32spalling disciplines and the rules
0:10:35so for instance at each step sequence of imposing example is in it
0:10:41okay that's
0:10:42that is simple
0:10:45we have the second row and second the
0:10:49perceive you
0:10:50we have a positive example one token a randomly selected from the forced located in
0:10:58the list of the data in the data yes or the last okay
0:11:02okay so we
0:11:04compose a negative the negative examples
0:11:08for instance if we start with the black and white t shirt the
0:11:12okay this is the positive and negative a to all the some sequences play why
0:11:18the black and white black and white and so on but negative but also black
0:11:24and white to show the preceded by as being the randomly selected the from the
0:11:31i think that in this capacity of a which i've downloaded from the web there
0:11:35is a local noise
0:11:37okay we don't have any control on that we all the vendors a ride the
0:11:42needs of products
0:11:44make sure that might be completely you know
0:11:49so the second step we generate positive and negative now on the basis of positive
0:11:54and negative we built a more than
0:11:56and you modality
0:11:58so a classifier which is able to say even in a sequence of tokens
0:12:03yes this is a
0:12:04a full that it's a novice is not the full
0:12:07this is the
0:12:10for mature not this is not performance
0:12:13so we was the not really x easy to
0:12:19classifier a so we this is based on a new world model proposed by the
0:12:25lamp holder and the and the others a couple of years ago
0:12:29and uses a kind of a classical l target you know that detector is you
0:12:36know that that's all both a word embeddings that and are active embeddings
0:12:43we have data a few handcrafted feature
0:12:48which we are available on the for this classifier you like the features about the
0:12:55relative to a certain token
0:12:57the position of the token the frequency the length of the to enter
0:13:02you don't probability of a token the and also all the this is the only
0:13:06linguistic information that the we have using all the a part-of-speech for that though okay
0:13:13so without any disambiguation
0:13:18so at the end of this classifier assays yes this is the this is a
0:13:23sequence of cocaine
0:13:24is a multiple
0:13:27a first step in for seven containing
0:13:31and it is a confidence score that so the thing that simple
0:13:37so no we have this classifier
0:13:40okay you mode that we have but we want will but our goal is to
0:13:44recognize and easy and it needs to in a so in that sentences request okay
0:13:56i think about the this example this is a possible a request that from one
0:14:02user and looking for a building the yellow sure so and that
0:14:07lucia the
0:14:10we need to the classifier although this sequences of these additional request
0:14:19so and the we asked to the classifier to say whether x sub-sequences positive or
0:14:25so in this case the
0:14:26a positive will be sure cellular shorts a little bit in the yellow sure to
0:14:33and then i will be i'm looking for a gold in a short and darpa
0:14:38and blind
0:14:40then we train k
0:14:43of the well the or the policy
0:14:50classified stuff sub-sequences on the base of the confidence to all the new remote
0:14:56okay and we select the rules which have not of them that a simple i
0:15:01so in this may rewrite golden yellow shore so that a short so that loser
0:15:09a this set one that is discarded the because these the overlap between the first
0:15:14one and so we looked at i'm looking for a we will close
0:15:21golden yellow sure so and the data which
0:15:25so this is the methodology we want to apply
0:15:29we would like to know how we can will with this is impermissible
0:15:35so we did the some experiment so as aforesaid that we collected the got cynthia
0:15:41set as i mention that
0:15:42now we have a density as for a three domains to the for sure across
0:15:48the two languages english and italian with different characteristics
0:15:52so for each that it here we have number of and it is the number
0:15:56of talking to
0:15:58the lane the and the standard deviation of that okay so the standard deviation a
0:16:06kind of index that can scroll how much is the complexity albany
0:16:12so the more this contribution to the more likely it is of the complexity of
0:16:16the names university of
0:16:19we have that i do you know ratio so they are is that it indicates
0:16:24a high lexical but at feast or more complexity again
0:16:29a real also added to the what the proportion of time that the first token
0:16:35appears in the position or the name
0:16:38and this may make it is a sum of our educational
0:16:43about the
0:16:46well matched is how much the
0:16:50semantic a that i need a stable okay
0:16:54and these with different
0:16:58different designs here at like you see that the project i mean and is like
0:17:05to the italian
0:17:07this is and low-level value then the each
0:17:11it means that in time and the first okay and it is usually the hey
0:17:16why this is not the this is not for english
0:17:21and the last the last feature that we want to point see that are easy
0:17:26to how maps
0:17:29and the proportion of anything the that can be in that entity these we give
0:17:35some idea of the
0:17:37compositionality of a of a set thing or something that
0:17:42okay so the moral you the more we can find the with the intent in
0:17:47any and now the name pricerange is a good the more it is it is
0:17:54this is the experimental setup the
0:17:58we have the six the comments that the sse domains two languages to just the
0:18:03just mentioned
0:18:05we split each density of it in that way in that case
0:18:10one import company is that there is no
0:18:14anoint at present in the training is present in the text
0:18:20for and it's innovation also negative entity needs to see you later the one information
0:18:27to displease one quality for each for a very positive we generate pruning
0:18:36then test this is important we don't titanium we are that a real test data
0:18:41so you can that has a synthetic it is you know rate
0:18:45okay a start from a number of templates are a little bit more than two
0:18:50hundred the template so both for english and italian
0:18:56a typical in place a do correspond to intensity comments apply templates for selecting a
0:19:05plate for asking description templates for i think it a product of two that will
0:19:11use the
0:19:12like i'm finally we the name of the and whether the name of the entity
0:19:17and the in from it is the
0:19:20that is the a part of the data
0:19:27we have two baselines atlanta's of that a simple rule based to a baseline where
0:19:32a us this sometimes you get
0:19:38a time to in a certain utterance is recognized as belonging to look at the
0:19:42early eva
0:19:44any all of the all that all kinds of the chunk of present indicative for
0:19:48something typically
0:19:49and then we wanted to test also
0:19:52a new and more data we live in of rollment
0:19:56syntactically generated that screen the
0:19:59okay so we apply the same methodology for generating testing data testing data also for
0:20:05generating synthetic for synthetic data generating the training data
0:20:13this have the result of this of our experiments of the two baseline class of
0:20:18our system and the
0:20:21the last row so we see that the for all our dataset a
0:20:27the system based the convexity of significantly outperforms of the that will be the two
0:20:34which is already i think original result
0:20:39something more about that the problems that this is the more complex or
0:20:44as you can image and as it
0:20:49it amounts from the got it is for the ease has a high us the
0:20:54variability and they have just compositionality both in italian and in english so the results
0:21:00are nowhere among the three the three domains
0:21:05for an issue to is the last compositional so basically they easier
0:21:10it's the one who percent or more tool named entity on a set then project
0:21:16the point of view but actually this is the smaller dataset that we are okay
0:21:20just a few one of its of with respect to
0:21:25save about a thousand one of its for
0:21:31and cruel think is very regular graph and the high composition
0:21:37so here we have a good results
0:21:42okay so just want to compute the
0:21:46so this i where i have reported the some experiments about the other short approach
0:21:54for entity recognition on the web we can see that gravity is only as the
0:21:59only source of information
0:22:03so it does not assume any annotated sentences the training it but also for testing
0:22:10we have generated the syntactically the
0:22:15we focus on a nominal entities because this out of domain entities in the second
0:22:21also for naming the product so
0:22:24and the we
0:22:27the approach to tries to take advantage as much as much as possible or extract
0:22:32noted from density as a particular due to the compositionality all the names or products
0:22:40and the menu of respect this is a very initial work the and the we
0:22:46see that quite a lot a room for improvement
0:22:50three activities are going to for us to
0:22:56the first one is just considering the fact that the state of a column of
0:23:00sequence labeling is improving are actually about daily we have new and approaches the new
0:23:07and more there's a for instance we tried the last the
0:23:13more data the value by my and all the and the this is maybe better
0:23:19than the previous there is a lot of room for experimenting and improve even acknowledges
0:23:27for a generating a synthetic data
0:23:30so we have we experimented with some parameters c one positive to negative but well
0:23:37out that there might be maybe that all of the model setting for these parameters
0:23:45and then of course it might be very interesting to integrate the exact idiots to
0:23:51and he is that we have a some data maybe a little data few data
0:23:55i i'm not take a few sentences and an integrated to
0:24:00what also integrated the guys at a more than what we call and then g
0:24:06a syntactically anatomy the more than a from an okay the doctor acts
0:24:14so the reason i think about a lot of work of four
0:24:18where the forties to make these approaches as much as possible domain independent soul and
0:24:26be able to move from one domain to another with the same technology and also
0:24:30language independent
0:24:32and you
0:24:54yes sure
0:24:56so templates that are disjoint
0:24:58both entities
0:25:00and templates are disjoint so we try to separate as much as possible
0:25:06training from that's
0:25:32that woman
0:25:38or maybe it's a good question but i don't think i have any also for
0:25:42the moment so the focus was
0:25:47and to do recognition in basically isolated sentences and the right so i don't have
0:25:55so these are asked to be probably c and consider data even a and a
0:26:02broad three more of a dialogue system
0:26:06actually a this work is closer to traditional information extraction then
0:26:12problem so we have no still there
0:26:21sorry i think i to the possible
0:26:27not all even that will the word embedded the word vectors so i'm generated the
0:26:34front cavity
0:26:35that is it's a good point so we don't vector or server for all mixing
0:26:41we can be or other stuff everything is generated from a density
0:26:48so this is the only source of information that