| 0:00:15 | okay so | 
|---|
| 0:00:16 | hi everyone | 
|---|
| 0:00:18 | and one's a and i would like to | 
|---|
| 0:00:19 | talk about the new data sets that you can't in a | 
|---|
| 0:00:25 | that i nine we have | 
|---|
| 0:00:27 | created at three at what university and it's a dataset design for | 
|---|
| 0:00:34 | and two and the natural language generation | 
|---|
| 0:00:36 | with that we mean that a generating a fully from data and from unaligned | 
|---|
| 0:00:45 | data pairs so that means a pair of the meaning representation and the corresponding textual | 
|---|
| 0:00:49 | reference | 
|---|
| 0:00:51 | with no water additional annotation | 
|---|
| 0:00:54 | this has already been down but so far all the approach is where limited to | 
|---|
| 0:01:00 | relatively small datasets and all of them use of delexicalization | 
|---|
| 0:01:06 | induce are the datasets you can see on the slide | 
|---|
| 0:01:09 | and our goal here is to go a bit for the with the data driven | 
|---|
| 0:01:14 | approach and to replicate the | 
|---|
| 0:01:17 | rich dialogue can discourse phenomena | 
|---|
| 0:01:20 | that had been targeted but you know the year and non end-to-end the rule based | 
|---|
| 0:01:24 | or also statistical approaches | 
|---|
| 0:01:28 | and what | 
|---|
| 0:01:28 | we have down is | 
|---|
| 0:01:31 | we have collected a new training dataset that should be challenging enough to | 
|---|
| 0:01:37 | show | 
|---|
| 0:01:38 | some | 
|---|
| 0:01:40 | more interesting outputs more interesting sentences | 
|---|
| 0:01:43 | and | 
|---|
| 0:01:44 | it is also much bigger than all the previous datasets we have over fifty thousand | 
|---|
| 0:01:49 | pairs of meaning representations and textual references | 
|---|
| 0:01:55 | the textual references a longer so we usually | 
|---|
| 0:01:58 | have more sentences that's | 
|---|
| 0:02:00 | describe | 
|---|
| 0:02:01 | one meaning representation and the sentences themselves are all also longer than in previous datasets | 
|---|
| 0:02:07 | we | 
|---|
| 0:02:08 | have also made the effort to collect the data set in as divers way as | 
|---|
| 0:02:13 | possible | 
|---|
| 0:02:14 | and that's why we used editorial | 
|---|
| 0:02:18 | instructions to crowd workers on a | 
|---|
| 0:02:21 | crowdsourcing website | 
|---|
| 0:02:23 | and | 
|---|
| 0:02:24 | we have found out that this leads to more divers descriptions so | 
|---|
| 0:02:29 | if you if you look at these two examples | 
|---|
| 0:02:33 | you we have a low cost | 
|---|
| 0:02:35 | japanese-style cuisine and | 
|---|
| 0:02:36 | you we have cheap japanese food so the | 
|---|
| 0:02:39 | descriptions are very diapers and | 
|---|
| 0:02:42 | also there's more of them on average than in most previous nlg datasets we have | 
|---|
| 0:02:48 | more than eight | 
|---|
| 0:02:50 | our preference texts better meaning representation | 
|---|
| 0:02:55 | we have evaluated the dataset in various ways and compared it with the previous datasets | 
|---|
| 0:03:02 | in the same domain | 
|---|
| 0:03:03 | and we have found out that | 
|---|
| 0:03:05 | we have | 
|---|
| 0:03:08 | higher lexical richness which means | 
|---|
| 0:03:11 | more | 
|---|
| 0:03:12 | divers text and terms of words used and a higher proportion of rare words in | 
|---|
| 0:03:19 | the data | 
|---|
| 0:03:20 | the sentences are also | 
|---|
| 0:03:23 | on average more syntactically complex so we have | 
|---|
| 0:03:29 | longer and more complex sentences | 
|---|
| 0:03:32 | and we have also up | 
|---|
| 0:03:34 | us | 
|---|
| 0:03:35 | kind of a semantic challenge because we asked the crowd workers only to verbalise information | 
|---|
| 0:03:40 | that seems relevant given the instructional picture so actually this requires content selection also for | 
|---|
| 0:03:49 | natural language generation which hasn't notes it's not present in the previous | 
|---|
| 0:03:55 | state of sets of the same type | 
|---|
| 0:03:58 | and we are organising a shell challenge with this dataset so | 
|---|
| 0:04:03 | you can | 
|---|
| 0:04:05 | all register for the challenge we would like to encourage you to do so | 
|---|
| 0:04:09 | and try to train your own nlg system and | 
|---|
| 0:04:15 | sub made your results | 
|---|
| 0:04:16 | by the end of local october | 
|---|
| 0:04:18 | we provide the data and also a baseline system along with the baseline system outputs | 
|---|
| 0:04:23 | and metrics creates | 
|---|
| 0:04:25 | is that | 
|---|
| 0:04:26 | will be used for the challenge along with us some human evolution | 
|---|
| 0:04:32 | so | 
|---|
| 0:04:33 | is it and i woods | 
|---|
| 0:04:35 | like to invite you to comment c or a poster later on and we can | 
|---|
| 0:04:40 | talk about this some more | 
|---|
| 0:04:42 | and definitely | 
|---|
| 0:04:44 | and downloads the data and take part in your challenge | 
|---|
| 0:04:48 | thank you | 
|---|