| 0:00:17 | graph to everyone | 
|---|
| 0:00:19 | uh | 
|---|
| 0:00:19 | i only a low of i them you have i i them yeah in kind issue sure that when you | 
|---|
| 0:00:23 | an inverse | 
|---|
| 0:00:26 | so that title of my presentation is cost says that is taking for audio tag annotation and retrieval | 
|---|
| 0:00:35 | so he uh here is a of | 
|---|
| 0:00:37 | example one famous | 
|---|
| 0:00:39 | uh music taking web | 
|---|
| 0:00:42 | that that every N | 
|---|
| 0:00:44 | uh we show the uh | 
|---|
| 0:00:46 | sound track that T and it's | 
|---|
| 0:00:48 | a so your text | 
|---|
| 0:00:51 | and this so take | 
|---|
| 0:00:53 | provide reach information four | 
|---|
| 0:00:55 | oh a and and then is this | 
|---|
| 0:00:57 | so for example we can trace and class wise using | 
|---|
| 0:01:01 | uh | 
|---|
| 0:01:01 | using uh the audio | 
|---|
| 0:01:04 | uh | 
|---|
| 0:01:05 | and that the audio so your at we use us that then take | 
|---|
| 0:01:10 | so in this paper we focus on two important information the first one | 
|---|
| 0:01:15 | at a um | 
|---|
| 0:01:16 | a which means the number of different users who have annotated this tech | 
|---|
| 0:01:21 | uh in this i guess each simple a larger font | 
|---|
| 0:01:25 | uh indicates a a higher take on | 
|---|
| 0:01:28 | and the second uh information used tech corporation | 
|---|
| 0:01:32 | uh as we know that | 
|---|
| 0:01:34 | uh sound takes a a uh often cork curve | 
|---|
| 0:01:39 | so we propose a a cost is at this they keep for X probably in take a long and take | 
|---|
| 0:01:44 | a should | 
|---|
| 0:01:45 | joint to the | 
|---|
| 0:01:49 | okay we first introduce | 
|---|
| 0:01:51 | uh to use the information retrieval task | 
|---|
| 0:01:54 | the first is | 
|---|
| 0:01:55 | a audio annotation that is given a an audio creep | 
|---|
| 0:01:59 | oh we will uh we can make pretty shouldn't use sound take five | 
|---|
| 0:02:04 | so we would know that | 
|---|
| 0:02:05 | are these audio you be but so at with which takes | 
|---|
| 0:02:10 | and the they why is | 
|---|
| 0:02:12 | uh take based audio retrieval | 
|---|
| 0:02:15 | uh given uh take where | 
|---|
| 0:02:17 | carrie | 
|---|
| 0:02:18 | we can opt to prediction score using in the | 
|---|
| 0:02:22 | take cross fine | 
|---|
| 0:02:23 | and they we will have a range in these | 
|---|
| 0:02:26 | for the query | 
|---|
| 0:02:30 | okay | 
|---|
| 0:02:31 | so the first interest is since is the take on | 
|---|
| 0:02:34 | uh | 
|---|
| 0:02:35 | uh i we know that uh uh show take a a a sign | 
|---|
| 0:02:39 | by people with a label musical and knowledge | 
|---|
| 0:02:43 | so they inevitable ready | 
|---|
| 0:02:45 | concave noisy information | 
|---|
| 0:02:49 | oh we think that uh take on information should be can see the rate in all to make all automatic | 
|---|
| 0:02:55 | may take it because luck on | 
|---|
| 0:02:57 | for X | 
|---|
| 0:02:58 | the constants degree of the tech | 
|---|
| 0:03:01 | and the higher take on | 
|---|
| 0:03:02 | a more reliable and at in | 
|---|
| 0:03:07 | yeah were here we can't that uh | 
|---|
| 0:03:09 | experiment woman | 
|---|
| 0:03:10 | we have to select a at high to take | 
|---|
| 0:03:14 | and sound a low can take | 
|---|
| 0:03:16 | we uh come and they are uh the prediction performance | 
|---|
| 0:03:21 | according to false negative rate | 
|---|
| 0:03:25 | as we can see that | 
|---|
| 0:03:26 | the force snake you rate on the high can take | 
|---|
| 0:03:30 | uh match | 
|---|
| 0:03:31 | more more than in the goal con | 
|---|
| 0:03:33 | okay | 
|---|
| 0:03:34 | so we believe that uh it is uh ever that that | 
|---|
| 0:03:38 | uh high context um more reliable way stay at a | 
|---|
| 0:03:45 | okay we try to calm be is you by D | 
|---|
| 0:03:47 | this it's temple yeah we show the uh not train likely be and it's | 
|---|
| 0:03:52 | so text | 
|---|
| 0:03:54 | and this a the height how high high cut takes | 
|---|
| 0:03:58 | uh include tea | 
|---|
| 0:04:01 | six these | 
|---|
| 0:04:02 | peters | 
|---|
| 0:04:03 | british | 
|---|
| 0:04:03 | drastic rock | 
|---|
| 0:04:05 | all D | 
|---|
| 0:04:07 | so we put believe that | 
|---|
| 0:04:08 | these are more reliable the D and and important take | 
|---|
| 0:04:15 | so uh is some uh previous work | 
|---|
| 0:04:19 | uh that that take on is transformed into one or zero by using a this a whole | 
|---|
| 0:04:24 | and then a binary classifier each | 
|---|
| 0:04:27 | trend for each take to make prediction | 
|---|
| 0:04:30 | but | 
|---|
| 0:04:30 | a this may uh at it uh this may have a some problem | 
|---|
| 0:04:35 | a first slice that | 
|---|
| 0:04:36 | take on information is lost | 
|---|
| 0:04:38 | but take a side twice is traded it | 
|---|
| 0:04:41 | in the say same way as a take | 
|---|
| 0:04:44 | a hunter of time | 
|---|
| 0:04:46 | is the second probably used let | 
|---|
| 0:04:48 | a it is hard to determine the source all | 
|---|
| 0:04:52 | and that's supper so | 
|---|
| 0:04:54 | that's there probably is that | 
|---|
| 0:04:55 | uh they will be ambiguity in and lower class membership | 
|---|
| 0:05:00 | all that is that is nearby by less is all | 
|---|
| 0:05:02 | for example you we set take on the so to ten | 
|---|
| 0:05:07 | so that is that is | 
|---|
| 0:05:08 | moos take on is | 
|---|
| 0:05:10 | ten it will be kind see there eyes up as the use simple | 
|---|
| 0:05:14 | but | 
|---|
| 0:05:15 | but it uh if it's take out is nine they you would be can see there S a negative example | 
|---|
| 0:05:21 | i with the are that you is it it is there is strange | 
|---|
| 0:05:27 | so i'll question use | 
|---|
| 0:05:29 | how to use the take on information for audio tag annotation and retrieval | 
|---|
| 0:05:35 | and all i and there is cost is it the name with the take on as | 
|---|
| 0:05:39 | cost | 
|---|
| 0:05:43 | so in close this at the learning we are given a training in | 
|---|
| 0:05:46 | set | 
|---|
| 0:05:47 | X Y in C | 
|---|
| 0:05:49 | the X is the feature vector | 
|---|
| 0:05:51 | and why is | 
|---|
| 0:05:52 | the class label and C is | 
|---|
| 0:05:55 | the | 
|---|
| 0:05:56 | crap and misclassification cost of these it's simple | 
|---|
| 0:06:01 | or look for all | 
|---|
| 0:06:03 | uh | 
|---|
| 0:06:03 | close says that the than the in is to then a class fine | 
|---|
| 0:06:06 | which minimize the expected | 
|---|
| 0:06:09 | cost and on and thing is | 
|---|
| 0:06:12 | and it is a more general state apple | 
|---|
| 0:06:14 | all | 
|---|
| 0:06:15 | traditional classification problem | 
|---|
| 0:06:20 | so in all uh application i'll court is to | 
|---|
| 0:06:24 | minimize | 
|---|
| 0:06:25 | mays classified take on for audio tag annotation | 
|---|
| 0:06:29 | and retrieval | 
|---|
| 0:06:31 | so if | 
|---|
| 0:06:31 | one hundred you use annotate a an audio clip which is rock | 
|---|
| 0:06:36 | but that | 
|---|
| 0:06:37 | five years | 
|---|
| 0:06:38 | oh force the egg negative | 
|---|
| 0:06:40 | then the cost is one hundred | 
|---|
| 0:06:43 | so the cost it's of the than in were | 
|---|
| 0:06:46 | where we pay more attention on the reliable or at and and important take | 
|---|
| 0:06:52 | and | 
|---|
| 0:06:53 | so we have it | 
|---|
| 0:06:55 | uh we have a it's probably at to close sensitive by binary classifiers | 
|---|
| 0:06:59 | the first why is close since that these support vector machine | 
|---|
| 0:07:03 | uh is a public the machine the training error wrote ten "'cause" see | 
|---|
| 0:07:07 | uh uh uh will uh will be a some shady | 
|---|
| 0:07:11 | with a cost | 
|---|
| 0:07:12 | to | 
|---|
| 0:07:13 | i | 
|---|
| 0:07:16 | and the second "'cause" since they class twice | 
|---|
| 0:07:19 | uh a close to the end up pose | 
|---|
| 0:07:21 | so here we show the update they uh way | 
|---|
| 0:07:24 | is that weight update do E do | 
|---|
| 0:07:27 | in add up pose | 
|---|
| 0:07:29 | and uh | 
|---|
| 0:07:30 | though uh weight updating eighteen all and is that is will be proportion to the cost of these is that | 
|---|
| 0:07:41 | okay | 
|---|
| 0:07:41 | uh the second | 
|---|
| 0:07:43 | uh you put "'em" information in is uh take variation | 
|---|
| 0:07:47 | i and a is on previous work the take on notation as is | 
|---|
| 0:07:51 | separated it into several | 
|---|
| 0:07:53 | a binary classification problem | 
|---|
| 0:07:55 | so uh | 
|---|
| 0:07:57 | les assume that that takes a are independent | 
|---|
| 0:08:02 | so | 
|---|
| 0:08:03 | the take colouration information use lost | 
|---|
| 0:08:06 | for example we know that he have and wrap open call curve | 
|---|
| 0:08:11 | for | 
|---|
| 0:08:11 | for example we yeah in | 
|---|
| 0:08:13 | our database | 
|---|
| 0:08:15 | uh | 
|---|
| 0:08:16 | we can't all the that | 
|---|
| 0:08:17 | they call curve | 
|---|
| 0:08:19 | vol one hundred and sixty times | 
|---|
| 0:08:22 | and they are only uh seventy and | 
|---|
| 0:08:25 | so T six times that | 
|---|
| 0:08:27 | they all curve | 
|---|
| 0:08:28 | a little | 
|---|
| 0:08:30 | or we propose close this at these take into it's probably eight | 
|---|
| 0:08:34 | take on and ageing information | 
|---|
| 0:08:36 | joint of the | 
|---|
| 0:08:39 | so uh | 
|---|
| 0:08:41 | in so uh for the uh a so how close is that these in these that | 
|---|
| 0:08:47 | uh in this first stage way which change stand close since the D take for a fine | 
|---|
| 0:08:53 | for each take | 
|---|
| 0:08:55 | and | 
|---|
| 0:08:56 | thus thinking class vice use the output put all take class at | 
|---|
| 0:09:01 | as | 
|---|
| 0:09:01 | it | 
|---|
| 0:09:02 | inputs | 
|---|
| 0:09:04 | and we use the in yeah class five for that's taking cows five | 
|---|
| 0:09:08 | so if the you if we then | 
|---|
| 0:09:11 | uh the and so we can then the | 
|---|
| 0:09:14 | top you here | 
|---|
| 0:09:16 | and if | 
|---|
| 0:09:17 | uh W i they is greater than zero | 
|---|
| 0:09:20 | then it means | 
|---|
| 0:09:22 | take | 
|---|
| 0:09:22 | they is positive the core eight at to take i | 
|---|
| 0:09:27 | so | 
|---|
| 0:09:28 | uh the take or if you information can be | 
|---|
| 0:09:31 | a head the read by that's taking cost five | 
|---|
| 0:09:36 | okay here you know we discuss uh with these by our experimental state up | 
|---|
| 0:09:42 | so uh | 
|---|
| 0:09:43 | our baseline is our weenie met the | 
|---|
| 0:09:46 | all E matrix | 
|---|
| 0:09:47 | two thousand nine audio take in task | 
|---|
| 0:09:50 | uh this mess the use cost is sensitive | 
|---|
| 0:09:53 | and | 
|---|
| 0:09:54 | only use binary class | 
|---|
| 0:09:57 | and all | 
|---|
| 0:09:57 | uh oh experiments basic the follow the matrix | 
|---|
| 0:10:02 | to like to thousand nice it up | 
|---|
| 0:10:04 | we use the then forty five take | 
|---|
| 0:10:07 | and uh we a little uh audio problem | 
|---|
| 0:10:10 | may john mind the with sign | 
|---|
| 0:10:12 | which is a web base you C get | 
|---|
| 0:10:15 | i we have so many | 
|---|
| 0:10:17 | uh a the paper | 
|---|
| 0:10:19 | and | 
|---|
| 0:10:20 | a it parameters amateurs a they can be | 
|---|
| 0:10:22 | select the based on in a course by dish you on a training data | 
|---|
| 0:10:26 | and we P | 
|---|
| 0:10:28 | cross validation one hundred time | 
|---|
| 0:10:32 | okay if you know we show our experiment results | 
|---|
| 0:10:35 | uh the the audio annotation is even a by trade but a use the | 
|---|
| 0:10:41 | that | 
|---|
| 0:10:41 | that is | 
|---|
| 0:10:42 | a a a keep we were run the correct | 
|---|
| 0:10:46 | uh take to be rank higher | 
|---|
| 0:10:49 | and audio retrieval is | 
|---|
| 0:10:52 | you rank by take | 
|---|
| 0:10:53 | use C and of F major | 
|---|
| 0:10:56 | that is given to take we one the | 
|---|
| 0:10:59 | correct in is is that is to be dragged higher | 
|---|
| 0:11:03 | so we have uh use uh different class Y i different class wise | 
|---|
| 0:11:07 | in the first | 
|---|
| 0:11:08 | state | 
|---|
| 0:11:09 | uh including and uh pose and S yet | 
|---|
| 0:11:12 | and the and sample is a combination of the these two | 
|---|
| 0:11:16 | uh these two got | 
|---|
| 0:11:18 | and we have come for method | 
|---|
| 0:11:21 | uh though first slice out matrix baseline line and the second a one is | 
|---|
| 0:11:27 | a a it's that the | 
|---|
| 0:11:28 | uh close a send the that need only | 
|---|
| 0:11:31 | a the sir is | 
|---|
| 0:11:32 | staking only and of force is our proposed | 
|---|
| 0:11:35 | a a at these taking | 
|---|
| 0:11:38 | as we can see that | 
|---|
| 0:11:40 | in all cases uh the close is at least expected problem | 
|---|
| 0:11:44 | better | 
|---|
| 0:11:45 | in | 
|---|
| 0:11:46 | a the of uh to other them is the | 
|---|
| 0:11:52 | and | 
|---|
| 0:11:53 | uh you thus taking only and | 
|---|
| 0:11:55 | close to the or learning all only | 
|---|
| 0:11:58 | will be better then | 
|---|
| 0:12:01 | our matrix baseline | 
|---|
| 0:12:07 | okay so i'll cook conclusion she's | 
|---|
| 0:12:09 | uh | 
|---|
| 0:12:10 | take on a hot and take a if you got two important you formation for so your take pretty should | 
|---|
| 0:12:16 | a are time media data | 
|---|
| 0:12:19 | and we have first formulate the | 
|---|
| 0:12:22 | oh T take should task as a close since T classification problem | 
|---|
| 0:12:27 | to minimize | 
|---|
| 0:12:28 | the means classified take on | 
|---|
| 0:12:32 | and we have uh then for me rate the task as a cost sensitive multi label classification problem | 
|---|
| 0:12:39 | and propose | 
|---|
| 0:12:40 | close says of these they kid to exploit | 
|---|
| 0:12:43 | uh take on and core you formation joint to the | 
|---|
| 0:12:47 | and the experiment | 
|---|
| 0:12:49 | experiment results show that the new approach | 
|---|
| 0:12:53 | oh i'll to prove our matrix two thousand than i we knee have the | 
|---|
| 0:12:59 | so uh here we have a a me uh uh a journal paper so please see out the our journal | 
|---|
| 0:13:06 | version of this paper of four | 
|---|
| 0:13:08 | start uh more details and start it's station walk | 
|---|
| 0:13:12 | all this idea | 
|---|
| 0:13:15 | okay thank you | 
|---|
| 0:13:37 | uh | 
|---|
| 0:13:39 | yeah i have a a a a a try to a i i've have used to so of the first | 
|---|
| 0:13:43 | mess the is uh transform the | 
|---|
| 0:13:46 | uh i'll put all S yeah and and outputs into power would be a T then every average you a | 
|---|
| 0:13:50 | proper bit | 
|---|
| 0:13:52 | and | 
|---|
| 0:13:52 | uh | 
|---|
| 0:13:53 | those stick a mess the is to transform the pretty she's goal in into read at rate | 
|---|
| 0:13:59 | and uh final decision use the uh | 
|---|
| 0:14:01 | every rate | 
|---|
| 0:14:09 | thank you | 
|---|