0:00:13 | okay so let me but you know they four |
---|---|

0:00:16 | or conference |

0:00:17 | well as |

0:00:18 | well uh should no practical remarks are of the this morning |

0:00:24 | you know i guess doesn't have any closing ceremony in the afternoon |

0:00:29 | and the |

0:00:30 | greens but |

0:00:32 | name |

0:00:34 | but it then |

0:00:35 | yeah grew the one last year |

0:00:38 | uh seems to come down |

0:00:40 | so uh oh i would like to |

0:00:43 | thank you all for being here |

0:00:45 | stranger |

0:00:47 | you next your encoder |

0:00:49 | and actually |

0:00:51 | the only thing |

0:00:53 | that uh after having done |

0:00:55 | slides is that we have a again in a problem will be not so probably |

0:01:00 | some like to being written |

0:01:03 | again |

0:01:04 | so for that and we are working uh for a while |

0:01:08 | ooh it |

0:01:11 | oh we will have the last a line or E |

0:01:14 | given a uh by michael jordan |

0:01:18 | from uh there |

0:01:20 | and the yeah more um uh problem uh university of you know or um |

0:01:25 | we introduce |

0:01:26 | or speaker |

0:01:32 | think you're much one yeah so |

0:01:35 | a great on two |

0:01:38 | this a michael jordan |

0:01:40 | or you've got |

0:01:42 | i |

0:01:43 | as uh |

0:01:44 | last |

0:01:45 | you you for this |

0:01:47 | uh a you view |

0:01:48 | a new |

0:01:50 | of is uh work |

0:01:51 | uh |

0:01:54 | legend |

0:01:55 | if if use of artificial intelligence and machine |

0:01:59 | and on some student for which i mentioned |

0:02:02 | a two |

0:02:04 | uh is uh work is in fact |

0:02:07 | for just a two |

0:02:09 | what you was it P two |

0:02:11 | san go |

0:02:12 | then be key effect number eighty might way |

0:02:15 | uh if not uh |

0:02:17 | uh interest |

0:02:18 | machine learning and |

0:02:19 | you you general if you have to |

0:02:22 | where |

0:02:23 | for last one years |

0:02:24 | i |

0:02:25 | a word fundamental them then so |

0:02:26 | uh |

0:02:28 | be with |

0:02:29 | i to use you |

0:02:30 | uh and the light |

0:02:32 | scratch this |

0:02:33 | you we you work and you need |

0:02:35 | absolutely incredible contributions to last |

0:02:40 | uh the topic is if to talk about day is uh a non permit can i |

0:02:45 | region |

0:02:46 | methods |

0:02:47 | uh |

0:02:48 | a general use be stressed it is quite this |

0:02:51 | for a work on a graphical models we see graphic |

0:02:55 | which which found application |

0:02:58 | a speaker you |

0:02:59 | and many statistical signal processing so |

0:03:02 | in all the rest like a natural language course |

0:03:06 | nation the G |

0:03:08 | statistical genetics |

0:03:10 | uh |

0:03:12 | a as being of course recognise |

0:03:14 | a tremendous if voice work uh |

0:03:16 | both in statistics |

0:03:18 | engineering in for instance |

0:03:20 | a year here was a V two |

0:03:23 | a a national actually number |

0:03:26 | S |

0:03:26 | what is to |

0:03:27 | national shall i could be of uh |

0:03:28 | this |

0:03:29 | and |

0:03:31 | so uh |

0:03:32 | a fellow |

0:03:34 | american association for the advanced |

0:03:37 | first person is slice and |

0:03:39 | good do three |

0:03:41 | distinctions the this |

0:03:43 | in a uh is to follow |

0:03:46 | usefulness |

0:03:47 | statistics |

0:03:48 | uh of |

0:03:49 | solution for |

0:03:51 | a machinery |

0:03:52 | of light you be of course a would you really |

0:03:55 | a statistic is to so even lecture |

0:03:58 | as well as see that |

0:04:02 | a special distinctions |

0:04:05 | before that i mean you just read a |

0:04:08 | i |

0:04:10 | and |

0:04:10 | train |

0:04:10 | because you use that's basket |

0:04:13 | i some uh |

0:04:15 | um |

0:04:16 | some some words from students |

0:04:18 | for |

0:04:18 | michael has a large number of students |

0:04:21 | with |

0:04:22 | extremely successful positions |

0:04:26 | here's what i read |

0:04:29 | michael jordan if you imagine |

0:04:32 | artificial |

0:04:36 | yeah |

0:04:37 | Q or |

0:04:38 | if you can become used to |

0:04:40 | we really should be with region students you have |

0:04:46 | you |

0:04:52 | thank you |

0:04:59 | i'm delighted to be here and i thank the organisers very much for inviting me |

0:05:04 | uh so the main goal the talk is to tell you a little bit about what |

0:05:07 | uh the title means what is a nonparametric |

0:05:10 | um just to anticipate a little bit first bayesian just means you use probability pretty seriously |

0:05:16 | and is already community uh where probabilities been used uh |

0:05:20 | probably more than any other apply committed i know of for a long long time so that's easy part of |

0:05:23 | my story |

0:05:25 | nonparametric C doesn't mean there no parameters it is just the opposite it means there's a growing number of parameters |

0:05:31 | so you get more more data at uh bayes not permit matrix a large have more more parameters |

0:05:35 | i grows so that something a |

0:05:37 | a a rates or the number of data points but it grows |

0:05:40 | uh i think that's kind of a key |

0:05:42 | moderate perspective |

0:05:44 | uh |

0:05:44 | at about out statistics and uh signal process |

0:05:47 | and the bayesian has a particular take on |

0:05:50 | oh uh a really does a toolbox high my talk and gonna try to do the talk is give you |

0:05:54 | idea what this toolbox is |

0:05:55 | you can use it to solve apply problems |

0:05:57 | i has a beautiful mathematical structure and it's some since it's really just getting going just getting started |

0:06:03 | uh a song good try to convince you that you to contribute here |

0:06:06 | uh both and a fundamental middle side and i and a track |

0:06:10 | a a big dollars collaborators are the bottom you their names or appear about the talk |

0:06:14 | um |

0:06:15 | what |

0:06:15 | what you do them |

0:06:18 | um now so i've of the one slide on sort of just sort of a historical philosophy i i but |

0:06:22 | computer science your but this can equally be bill uh well be |

0:06:26 | um |

0:06:27 | so the all these fields in the forties and fifties uh uh were somehow together you know people like uh |

0:06:32 | a common core of and to rain and so on |

0:06:34 | some of in this many these |

0:06:36 | it's separate "'cause" the problems got really hard i believe |

0:06:39 | um |

0:06:40 | and to peer sides when a often start looking at data structures an out |

0:06:43 | it didn't for much on an certain |

0:06:45 | so this is a ring and focused on uncertainty and certainly and funk focus too much on algorithms and data |

0:06:50 | structures |

0:06:51 | and so maybe to not per much as one |

0:06:53 | a venue you where these two things are coming together and mathematically really at it amounts to using stochastic process |

0:06:59 | i set of classical parametric distributions |

0:07:02 | i use a growing number parameters so you have |

0:07:04 | distributions indexed by |

0:07:06 | um large spaces those are just called stochastic process |

0:07:10 | so |

0:07:10 | yeah you're gonna see that are right the unit the talk some of the stochastic process we're just in looking |

0:07:15 | at |

0:07:15 | and sort of put this a little bit of a more of a clack a statistical context if you pick |

0:07:19 | up a book on bayesian analysis |

0:07:20 | you will see a the posterior written is proportional to the like the times the prior |

0:07:25 | uh and usually right out in a parametric way with the data index in the parameter space |

0:07:30 | um and so this the priors is a likelihood in there's sure posterior |

0:07:34 | in this talk a we don't want data to be a a dimensional object we want to be in three |

0:07:39 | dimensional open in that |

0:07:40 | so we a right set of data but the right G of like that |

0:07:43 | so of X given G is still to be a likelihood in this talk this is mainly get be hidden |

0:07:47 | markov model actually |

0:07:49 | that's so G will be the structure and all the parameters of a hidden markov model the state space and |

0:07:53 | so on and this is just the usual hmm |

0:07:55 | i and this will be some kind of a structural component of hmms or multiple priors on that |

0:08:00 | so all be talking mostly about this and less about that that |

0:08:03 | um all right so but it with a mathematical story just that instead of having a classical prior distribution |

0:08:08 | you have a distribution an in an image dimensional space |

0:08:11 | and that's not so strange mathematically that is what a stochastic processes |

0:08:15 | uh so we have us a prior stochastic process it's could be multiplied by some kind of us fairly classical |

0:08:20 | likelihood |

0:08:21 | "'cause" once G is fixed |

0:08:23 | it it out care what the size space is is just a fixed object |

0:08:26 | is the probably of data given G can be easily specified typically |

0:08:30 | as we multiply this post no prior by this likely that we get ourselves up |

0:08:34 | posterior stochastic process |

0:08:36 | that's to get some more concrete idea what these things are |

0:08:38 | we'll be talking a little bit about |

0:08:40 | uh distributions on trees of one bounded than an down the fan out you get these in genetics and and |

0:08:45 | natural language processing quite a bit |

0:08:47 | uh so cats process on partitions are rise a lot we talk about class tree models to the number of |

0:08:51 | clusters is a known a priori |

0:08:53 | we can put distributions on grammars on sparse matrices on |

0:08:57 | something copy copy really we but distributions on distributions with this tool as well as you can get kind of |

0:09:01 | a recursive a E |

0:09:02 | that you see in computer science |

0:09:04 | um i'm to talk about one particular a large class of um stochastic processes is a random measures what we |

0:09:10 | get to what those are |

0:09:11 | those are |

0:09:12 | one the main ingredients of the tool but some try to a |

0:09:14 | a at talking |

0:09:16 | okay so here the familiar problem um i learned about this from us so my always at icsi a we've |

0:09:20 | work on this a bit um um |

0:09:22 | and the but use it as a con way to to show you the some of the methods i'm talking |

0:09:25 | about be rolled out in out applied to domain |

0:09:28 | uh so i think everyone here knows with this problem is there's a single microphone there's a meeting going on |

0:09:32 | here's the waveform |

0:09:33 | and ground truth is like this the bob spoke for well than john then chilled than bob and someone |

0:09:38 | we'd like to infer this from this |

0:09:40 | and we don't know how many people in the room will we don't know anything about the spectral characteristics of |

0:09:44 | their speech |

0:09:46 | here's the other problem to talk about was a little less traditional for for you guys i think |

0:09:49 | um but it's a segmentation problem |

0:09:51 | that's a multivariate segmentation and it's a multi type time series uh segmentation problem |

0:09:57 | okay so um someone came into a room and this is uh a motion capture we have sensors on their |

0:10:01 | body |

0:10:02 | um |

0:10:03 | and i think it's about sixty dimensional |

0:10:05 | uh so we have a sixty dimensional time-series for this person |

0:10:08 | and you can see it segment the able they did a little bit of jumping jacks a little bit of |

0:10:11 | splits little but was so on so for |

0:10:14 | um but we're not supposed to know that a pretty or we don't know what the library of exercise routines |

0:10:18 | was and we don't know for this particular person which routines of of the big library they decided to use |

0:10:24 | how long they lasted and where the break points were all that |

0:10:27 | so like to and for all of that |

0:10:28 | moreover this is just one person sixty dimensional time-series series have its for many different people |

0:10:33 | so each person gonna come in a will get a sixty dimensional times for each one of them |

0:10:37 | and the we some overlap some people to twist some people able to touch your toes and so so forth |

0:10:41 | not every won't do all we're teens null be some overlap |

0:10:44 | we like to find that and then exploit it so far learn a little bit about what twists look like |

0:10:48 | for one person |

0:10:50 | i like that it be available to be used to segment the times or the that for some other person |

0:10:55 | it's of the joint inference problem over multiple i |

0:10:57 | multiple to |

0:10:58 | a a high dimensional time series |

0:11:00 | okay |

0:11:02 | um okay so everyone this audience with hmms are plus an audience to talk to for that reason so here's |

0:11:07 | some of here's three of the diagrams are often use i like the graphical model one here what here's the |

0:11:11 | states |

0:11:12 | there is a markov chain i |

0:11:14 | um i multi normal random variables |

0:11:16 | you have in missions coming off of those in here the parameter sitting on there |

0:11:20 | um the core of but of course is the transition matrix is the K by K matrix at each role |

0:11:24 | of it |

0:11:25 | is the next state transition probability distributions so give a urine state |

0:11:29 | um um say three |

0:11:31 | or two then you have a transition out of that i'm we represent that it's this little |

0:11:35 | discrete measure here it just a bunch of atoms and look at locations which of the integers |

0:11:40 | and we have masses is so C with them and also the transition probabilities going out from state to as |

0:11:45 | is my two |

0:11:46 | to all the next day there's K of these things so it's a it's a measure that has finite support |

0:11:51 | and will be talking about measures that infinite support here |

0:11:54 | pretty so |

0:11:55 | so there there would be pi two pi one might be a different measure that maybe is is sparse it |

0:11:59 | his two non-zero atoms |

0:12:01 | and um and so one pi three and so on |

0:12:04 | okay so those of the representation that the lattice the graphical model |

0:12:07 | and i i |

0:12:08 | let's represents is of the |

0:12:10 | next state transitions that will be talked about the talk |

0:12:12 | alright so uh lots of issues with hmms that are still some some sense and resolved uh how many states |

0:12:17 | are we use |

0:12:18 | so for the diarisation problem we don't know the number of speakers so we have to infer that in some |

0:12:22 | way |

0:12:23 | um and the uh segmentation probably don't know the number of behaviours i i got twists send job job out |

0:12:28 | touch your toes and jumping jacks and all |

0:12:30 | or some uh no number of behaviour |

0:12:32 | i think you more interestingly this ladder problem and it has uh it she's about the structure of the state |

0:12:37 | space |

0:12:38 | so how to be an co the notion that a particular times is makes use of particular subset of the |

0:12:42 | is the comet oral notion |

0:12:43 | and i i don't know how to and code that in a classical hmm was just set states |

0:12:47 | how to share states among times zero |

0:12:50 | i so don't know have think about that there's a structure come sort of stuck state space |

0:12:54 | um um that we don't have a class which to um so that's gonna going to my P of G |

0:12:58 | i'm gonna put a |

0:12:59 | structural information of my prior and and given the particular choice i'm gonna have that |

0:13:04 | and information available for inference at the uh hmm low |

0:13:07 | alright so based on parametric solve these i think it a very elegant way and not lots of other the |

0:13:11 | problem so i'm gonna try to show you what |

0:13:13 | what we do |

0:13:14 | a to do that |

0:13:16 | um |

0:13:16 | okay so uh i'm get a a again be talking about random measures here be kind skip this slide |

0:13:21 | i was not really much been set here go right to an example of a random measure |

0:13:26 | um no right so everyone here knows what a measure is my is just a |

0:13:29 | set you take a set in you get a number out |

0:13:32 | and the sets them some sigma algebra |

0:13:34 | um we wanna talk about now a random measures |

0:13:37 | i was gonna be probabilistic and do inference on measures we got to have random measures |

0:13:42 | and so that's first of all star was something really easy that's put a a random distribution on integers |

0:13:47 | can just go from one to infinity |

0:13:49 | and a distribution on is the sum of the set of numbers a countable of the sum to one |

0:13:53 | we wanna a random distributions so there could be a set of random numbers that some the one |

0:13:58 | right how do we do that they have to kind of a in some ways they can some the one |

0:14:02 | and uh there many ways you think of doing that here's a way that turns out have beautiful comet a |

0:14:06 | properties that allow was to |

0:14:07 | to develop inference algorithms based on this idea |

0:14:10 | as this called stick breaking it's old i in probability theory |

0:14:14 | um so what do to do is what take a number of beta random variables that that um uh we |

0:14:18 | it take in in a number of them and do the draw an independent so beta to pay a paid |

0:14:22 | a random variable |

0:14:23 | it has two parameters normally but a little pen the first one to one in the other will be free |

0:14:27 | a random variables live between zero and one right and if you take a L uh not to be bigger |

0:14:32 | bigger bigger this think tilt to up to the to the left |

0:14:34 | so most the masses near the origin and less of the masses near one |

0:14:38 | i so think of getting lots of small numbers out of this |

0:14:42 | is beat around |

0:14:43 | so don't to do but guess stick the goes from zero to one and we get a break off the |

0:14:47 | first fraction of but according to be the one |

0:14:49 | and but call that high one |

0:14:51 | and the remainder of the stick is one minus to one and i'm the take a fraction of that beaded |

0:14:56 | to |

0:14:56 | and this little part right here is that amount beta two times the red |

0:15:00 | and a call that pike two |

0:15:01 | here's pi three five for pie five and so on |

0:15:05 | which keep break you know peace the stick and uh the total stick is has has mass one |

0:15:10 | and so as we do this at to infinity we're gonna get a the pies will eventually some to one |

0:15:16 | oh actually easy to prove if you want to prove it here |

0:15:18 | you get these pies was sum to one or in this procedure |

0:15:22 | alright so now that's a way to get a we now have a these pies are they are distribution for |

0:15:26 | any fixed pies and the random so we have a random distribution only in |

0:15:30 | so having learned how to do that we now how can i we can promote these up to distribution arbitrary |

0:15:34 | spaces using that tool to here's how that's done |

0:15:37 | you use these same price before as weights in a little mixture model an infinite mixture |

0:15:42 | and the makes come the next chirp or uh components are these delta function these are do right dealt is |

0:15:48 | they re unit masses |

0:15:49 | at locations he K |

0:15:51 | and the peak here in depended draws from a distribution G not on some space of this some space can |

0:15:56 | be some arbitrary space |

0:15:57 | it's that it got out of euclidean space it can be a function space it can be |

0:16:01 | uh a a but space it can be just about anything |

0:16:04 | um and so these atoms as are living on that's space and their be weighted by these these mixing proportions |

0:16:09 | pike K |

0:16:10 | i so each one these has unit mass and their been weighted by these things which some the ones so |

0:16:14 | this total object G is a spiky object it's a it's a measure |

0:16:19 | um it's total mass one |

0:16:21 | and it lives on some space |

0:16:23 | that's it's random and two ways it's a because the pies got my stick breaking and because these P case |

0:16:28 | we're drawn from some distribution G not was the source of these things a me calling you think atoms |

0:16:33 | so G not as the source of the atoms |

0:16:35 | so with these weighted atoms and some space |

0:16:38 | alright so that is a random measure if i take G of some set a it'll get me and number |

0:16:43 | so it's a measure |

0:16:45 | i it he's added even so on |

0:16:47 | and it's random a to weight so it's a random measure so this is a very general way now of |

0:16:51 | getting atoms if i do the pies according to stick breaking |

0:16:54 | this particular object G has an name it's called a racially process and usually right it like this it as |

0:16:59 | these two parameters the stick breaking parameter |

0:17:01 | and the source of the atoms for |

0:17:03 | a but we can break sticks in different ways and um get the atoms a different ways so this that |

0:17:08 | actually a uh |

0:17:09 | a the useful tool |

0:17:11 | a right so we can have use this as a component of various kinds of statistical models are here's something |

0:17:16 | called that a racially process mixture model and sky just what you think it was |

0:17:19 | we're the use |

0:17:20 | G a draw from G which are now drawn this way it's on some space but the speech here's the |

0:17:25 | real line just like and draw at |

0:17:27 | it has at that locations given by drawing from some underline distribution G not |

0:17:32 | it has |

0:17:33 | heights to these atoms give my stick breaking with this parameter alpha not |

0:17:36 | so if i draw specific G again an object looking like that |

0:17:40 | i the uses of the kind the a mixture model just uses as a as the distribution |

0:17:44 | it's not your typical gaussian it's it's the distribution and random |

0:17:48 | and i draw from it so i might draw this at in the middle of their uh this you know |

0:17:52 | has high probability has a big high |

0:17:55 | and have been drawn that that's a parameter now for some underlying likelihood |

0:17:59 | um and i wrote it down here that X a given data is kind of some distribution a indexed by |

0:18:04 | data |

0:18:05 | so i do that again and again that's this here that box |

0:18:08 | and that gives me |

0:18:09 | um i mixture model in fact there's some probability are get the same atom |

0:18:13 | on six |

0:18:14 | draws of theta and so those |

0:18:17 | uh |

0:18:18 | indices i would be coming of from the same at we would think of those is at belong to the |

0:18:22 | same cluster |

0:18:25 | alright right so that's a dirichlet process mixture model here some kind |

0:18:28 | are data drawn from one of these thing and drive more more data is like go across you start seeing |

0:18:33 | the blue dots more more data |

0:18:34 | and to the parameters they data in this case are means and covariances of gaussians state as a whole big |

0:18:40 | long vector |

0:18:41 | a and you see that the number of distinct once was growing there only if you just ones and the |

0:18:45 | you get more more |

0:18:47 | and they grow it's sum rate fact that rate turns of to be logarithm of and |

0:18:51 | logarithm of the number of number date |

0:18:55 | so the number of parameters we have in this system is growing we we have small number of parameters here |

0:18:59 | and we keep dry from G again and again and again we get more more parameters |

0:19:02 | and like i said it grows a rate log ins we're nonparametric here we don't have a fixed parameter space |

0:19:07 | no matter how much data we have |

0:19:08 | as you me more more data i'm gonna give you more more parameters |

0:19:14 | okay so let's go back to this little slide here |

0:19:17 | and is i've been alluding to uh with some probability five picked a theta one equal to this adam here |

0:19:22 | in the middle |

0:19:23 | with some probability data to decode that exact same atom and so those two data points will come from the |

0:19:28 | same cluster |

0:19:29 | uh you can ask what's the kind of comical structure induced by this kind of procedural how often does that |

0:19:34 | happen |

0:19:34 | what's the probability that data to is equal to theta one no matter where |

0:19:39 | theta one occur |

0:19:40 | probably that you get a lightning twice in the same place |

0:19:43 | and what is that probability fact not over for this particular G but over all possible choices of G under |

0:19:49 | this procedure that i outlined |

0:19:51 | this P G |

0:19:52 | so you think that would be a really hard problems solve that sounds complicated and it turns out that's really |

0:19:58 | easy to solve terms of the answer is |

0:20:01 | and um so that understand that you need or stance a call the chinese rest is another stochastic process |

0:20:07 | this is on partition |

0:20:08 | um not on parameters this is on partitions |

0:20:11 | and this here i've have all drawing this down here |

0:20:14 | a you have customers coming to a restaurant with an infinite number of tables |

0:20:18 | um so here these round tables that the chinese rest raw that's why they're round |

0:20:22 | and and the first customer sits here with probably one |

0:20:25 | the second customer or joins then was probably half an sort the new table probably half |

0:20:29 | and the job is used to let table proportional the number of people already the table |

0:20:33 | so that's often called preferential attachment you start to get a big cluster merging at some small clusters after that |

0:20:39 | and you can easily prove from this this little set up here the number of occupied tables grows as rate |

0:20:43 | at rate log in |

0:20:45 | um okay so it's a beautiful mathematical fact that if you do the integrals that was talking about for the |

0:20:49 | a racially process this turns out to be the marginal probability of the racially process of who sits with who |

0:20:56 | i so it's it it's another perspective it's the marginal |

0:20:59 | uh under that other big probability measure |

0:21:02 | um P G |

0:21:03 | you can make this into an explicit clustering model a mixture model effectively but now |

0:21:08 | a of in terms of clustering in that a mixture components |

0:21:10 | all you do is that the first person set at a table |

0:21:13 | a draws a parameter for that table |

0:21:15 | from some prior which would call G not the same G not as before actually |

0:21:20 | and everybody was sits a table here at that same parameter vector |

0:21:23 | so if fee one here is a mean in a variance of a gaussian that all the data points for |

0:21:27 | customers around this table |

0:21:28 | well all come from the same gal see have the same mean cover it's |

0:21:32 | okay so this is actually i i |

0:21:35 | exactly a marginal of the dirichlet process mixture my |

0:21:40 | um okay so that's kind of little but a tutorial this is been able you know forty years old the |

0:21:45 | the the the like process mixture models it's that's leave all the in and slowly getting taken up and apply |

0:21:50 | communities |

0:21:51 | um but then very very slow |

0:21:53 | um now |

0:21:54 | to use this in a richer class of problems that just mixture models and clustering |

0:21:58 | a you want to have to face the problem that you have multiple estimation problems you have one not just |

0:22:03 | one set a at have multiple sets of data |

0:22:05 | and the hmm case that's gonna come because we have the current state which indexes the next state |

0:22:10 | we have a bunch of distribution |

0:22:12 | the next day distributions for all of the the current states |

0:22:15 | so we don't have one distribution we have a whole bunch of distribution |

0:22:18 | or all the rows of the transition i |

0:22:21 | we need to estimate all those rows |

0:22:23 | um so in statistics we off that have this a rise in we have multiple estimation problems or something parameter |

0:22:29 | and there's a data based on that of another other parameters data based on that |

0:22:32 | and we often want tie these things to get because there might be very little data over here a lot |

0:22:36 | of data over here of these are related problems that makes sense that tie these to get |

0:22:41 | and that called a hierarchical model it's one the main reasons to be a bayesian which is a very easy |

0:22:45 | to make these kind of hard ca models at tie things together |

0:22:47 | um and so what you do is you assume that E parameter |

0:22:51 | a for all these subproblems uh are these primes are related by coming from an underline parameter |

0:22:57 | as you draw these randomly from an my data |

0:22:59 | and now all the data over here kinda goes up to this tree down to here and the the posterior |

0:23:03 | estimate of data i |

0:23:05 | depends on all the data here |

0:23:07 | it's a convex combination of all the data |

0:23:09 | so that arises out the hierarchical bayesian perspective and here's kind of a picture of that |

0:23:13 | here's the i just showed |

0:23:14 | here's a kind of the graphical model representation that using these boxes to represent replication |

0:23:19 | so that our boss box represents these |

0:23:21 | amber applications down below |

0:23:25 | okay so we can do that now what the dirichlet process and so this the paper that we published few |

0:23:28 | years ago a hard could racially process |

0:23:31 | um this is i think really a very useful tool is just the hierarchy applied to a racially process next |

0:23:36 | we have the same kind of setup as before |

0:23:38 | uh but now we don't just have one G we have G one G two G three G four |

0:23:43 | um for the different |

0:23:44 | estimation problems the different groups |

0:23:46 | and if you want be concrete about this is the different rows of the hmm |

0:23:49 | each one of them has that is the right a measure |

0:23:52 | and we have a set of measure |

0:23:53 | all the different rows we want tie them together so they have the same next state space that they all |

0:23:58 | transition to |

0:23:59 | um so are we do is not good draw these things independently and like a lump them all together we |

0:24:04 | want different transition |

0:24:05 | proper probabilities is we don't want them to be complete separate state space they gotta be tied together in some |

0:24:10 | way |

0:24:10 | so the way you do this is that you you have a a an extra layer of this graph you |

0:24:14 | first of all draw uh from some |

0:24:16 | source of atoms |

0:24:17 | a |

0:24:18 | mother distribution G not |

0:24:21 | it's now a random instead with be fixed parameter like before |

0:24:24 | and now that the base measure that you used |

0:24:26 | to going to each of the children |

0:24:28 | they they draw the atoms from G not |

0:24:31 | um and they re weight them according to their own stick breaking process |

0:24:35 | okay |

0:24:35 | so this ties together a set of random measures it makes them cooperate operate on what are the atoms there |

0:24:41 | all to use |

0:24:42 | and what are the weights at they're gonna use |

0:24:45 | uh this also as a as a rest raw underneath it which is a really easy way to understand these |

0:24:49 | ideas |

0:24:50 | we now don't have just one rest route we have multiple restaurants in restaurants in general and again this in |

0:24:55 | my application hmms these will be the different current state |

0:24:59 | or for rows of a transition sure |

0:25:01 | right so if you're rest run number two |

0:25:03 | you go when you sit at a table proportional number of people or to the table like before that would |

0:25:07 | get |

0:25:08 | uh sharing or clustering with a restaurant |

0:25:11 | and then if i'm the first person as sit at a table i i up to a global menu of |

0:25:15 | dishes for the entire franchise |

0:25:17 | and i pick it is from this menu maybe i pick |

0:25:19 | um |

0:25:21 | uh a chicken up here |

0:25:23 | um and i pick "'em" project i bring it down to my table and everybody who joint we my table |

0:25:27 | has to that same dish |

0:25:29 | we all the same parameter |

0:25:30 | but i also put a check mark next to a dish |

0:25:33 | and if you are in some other restaurant |

0:25:35 | and and when you're the first person to your table you got to this menu |

0:25:39 | and you pick it dish proportional the number of check marks next the dish |

0:25:43 | case they're big you get some dishes a peer there are popular across all of the restaurants not just within |

0:25:47 | one restaurant |

0:25:48 | and they get transfer between the rest according to this this kind of preferential attachment it's |

0:25:54 | okay so again get the hmm setting |

0:25:56 | these will be the rows of the transition matrix |

0:25:59 | the number of possible transitions as a in as you get more more data so just have K states i |

0:26:03 | have a growing number of state |

0:26:06 | and the state to be shared among all the different rows of the transition matrix according to this rest strong |

0:26:13 | um |

0:26:14 | now i have a really nice application of this that we publish their this your that i'm good i think |

0:26:18 | skip in the interest of time in this talk but to be of a paper on this |

0:26:22 | i think it's kind of a killer out for this problem |

0:26:24 | you're basically trying to model some of the geometry of proteins |

0:26:27 | here's kinds of the angles that you get in proteins |

0:26:30 | and if you put all the data for a particular a know so on top of each other you get |

0:26:33 | kind of these fuzzy diagrams where they're not repress side |

0:26:37 | and so what you really wanna do is not just have one estimation problem of the density of angles |

0:26:42 | i you wanna break them up according to the context like in |

0:26:45 | you wanna not just have one |

0:26:47 | a distribution one have a bunch of them depending on the context |

0:26:50 | when you break them up according the context or over you have a sparse data problem many of the context |

0:26:55 | every little data |

0:26:56 | as a very similar kind of problem lots of settings including signal process in speech |

0:27:00 | um and so we want to have models like this at that that are depending on the context of the |

0:27:05 | neighbouring an amino acids |

0:27:06 | so this setup is exactly that the groups or the neighbourhood |

0:27:10 | so for each context you of a group of data |

0:27:12 | and you don't treat them separately you don't want them together read you have them cooperate right this tree |

0:27:17 | and everyone that racially is looks kind like a virtually process like that's synthetic data should you earlier |

0:27:22 | but you really have |

0:27:23 | twenty amino mean as a left when twenty the right we get four hundred estimation problem |

0:27:27 | or or context |

0:27:28 | as we have four hundred diagrams like this |

0:27:30 | and we get them to share |

0:27:32 | atoms according to this procedure so |

0:27:34 | uh i guess skip over some the results but they are um they're they're impressive this is kind of log |

0:27:39 | a lot probably improve on test set for all the to and you know acids |

0:27:43 | here's no improvement in this is over what they did probably due in in the protein folding literature |

0:27:47 | so it's really quite massive pro |

0:27:50 | um okay let's go back to hidden markov models uh where are that is a is a hidden markov model |

0:27:55 | and our prior now is the structural |

0:27:57 | machine or in telling you about |

0:27:59 | um okay so now we have a hidden markov model we don't have a fixed number of states we call |

0:28:03 | this a H P M |

0:28:05 | we have an infinite number of states so here's time here's the state space |

0:28:09 | and so when you in one of these current state you have a distribution on next states |

0:28:13 | you get they get a like a trying as rest are pasta here's table one table two table three table |

0:28:16 | four |

0:28:17 | and as a a a a growing number of tape |

0:28:20 | i i do that for table for every time i hit state number two |

0:28:23 | um |

0:28:24 | then and that's great i can get a growing number of tables but i wanna share those tables between state |

0:28:29 | number two and three and four and so on |

0:28:31 | so is i been talking about i need this |

0:28:33 | hierarchical dirichlet process to tie together all those transitions |

0:28:37 | um okay so i'm get kind of go quickly to slides like again details are what i'm trying to convey |

0:28:41 | here just kind of but at this very |

0:28:43 | and and and uh |

0:28:45 | briefly what you do is you draw a mother transition distribution for all states |

0:28:49 | and then everybody a for each |

0:28:52 | um current state is a kind is a perturbation of of that so here's spike three |

0:28:55 | it takes this in kind of every weights all these atoms |

0:28:58 | i i will for |

0:28:59 | and so one so for |

0:29:01 | okay |

0:29:02 | alright right so um |

0:29:04 | i hope that was signed of kind of clear this is a way of doing hmms were you don't know |

0:29:08 | the number states a priori |

0:29:09 | and actually are putting a prior distribution on the number of states and it can grow you get more more |

0:29:13 | day |

0:29:14 | a kind of a nice solution to the a classical problem in hmm lan |

0:29:18 | um so we implemented this uh there is a simple sampling procedure um |

0:29:23 | we did a slightly non samples procedure but there's a whole bunch of procedures the can be used to do |

0:29:27 | a poster inference with this i given some emission some data |

0:29:31 | uh in for |

0:29:32 | the most probable hmm and for the uh trajectory D a viterbi path and for everything else you want to |

0:29:38 | a for all the parameters about this |

0:29:39 | you can do this pretty easily on a computer |

0:29:41 | just kind of a |

0:29:42 | we can standard methodology |

0:29:44 | it's hmm a |

0:29:46 | right so what we did this uh we apply to the uh diarisation data we did okay but not particularly |

0:29:51 | well |

0:29:52 | we identify one problem we need to all before we could start to get performance at which was was satisfactory |

0:29:57 | and that was that we have a little bit too much state splitting going on so here's a little synthetic |

0:30:01 | problem in which which time |

0:30:02 | there were three states in our synthetic data as you can see here so you know this state was on |

0:30:06 | for a few uh time frames and then this state and so on |

0:30:10 | um here was the data and you can sort of see that there are three states here but you know |

0:30:13 | it's pretty noisy be kind hard to tell that's by looking at |

0:30:15 | um here the output of our inference system are computer program that |

0:30:19 | for the H D P H |

0:30:21 | and it did find three states here but then it actually count of for state and the problem was that |

0:30:26 | the parameters for state three in state for the emission distributions |

0:30:29 | happen to be just about the same |

0:30:32 | could happen it happened here |

0:30:34 | and so when you were state three and four from the outside world point of view the emission probably is |

0:30:37 | base of the same |

0:30:39 | so the system was do it perfectly well |

0:30:41 | at high likelihood |

0:30:42 | and within flickering between state three and four |

0:30:44 | and again i have high like this so why not |

0:30:47 | there's or that prevents it from doing that |

0:30:48 | but we don't want that of course because now we didn't a very good the diarisation problem thought there for |

0:30:52 | people the room they're only really three |

0:30:54 | okay so we're being bayesian in this problem and so we don't my put in a little bit more prior |

0:30:58 | knowledge and this we put base nine N at this point so is put a little bit more and which |

0:31:02 | is that people don't tend to type uh talk |

0:31:05 | for microsecond millisecond time interval |

0:31:07 | they ten to talk for sec |

0:31:09 | we have one extra parameter a self transition probability in the hmm the diagonal of this infinite matrix |

0:31:15 | is good be treated all but special get a extra boost |

0:31:18 | right so we have one extra parameter which is the extra boost for self transition something like a a |

0:31:22 | a um |

0:31:24 | um |

0:31:26 | uh sim my mark of |

0:31:27 | my mark you've H M |

0:31:29 | okay |

0:31:30 | as so we call that is sticky H D A H E P H M it just has the transition |

0:31:34 | distributions before |

0:31:35 | plus one parameter which boost these all transitions |

0:31:38 | so if there's the distribution you had before |

0:31:41 | then we add a little bit to that to um |

0:31:44 | you or a little bit of boost |

0:31:45 | for the uh self transition |

0:31:47 | okay |

0:31:48 | that parameter were being in bayesian is again |

0:31:50 | i parameter B for by the uh given the data it's it's random |

0:31:53 | it has a poster distribution |

0:31:57 | okay so uh we put that into our system and um where now ready report some results |

0:32:02 | okay so we went back the speaker diarisation problem |

0:32:04 | and um |

0:32:06 | implemented this on data uh i think this was two thousand and seven the uh in this R T competition |

0:32:11 | data a we didn't compete in this we simply took the data compared to what had been done |

0:32:16 | by people who do compete |

0:32:17 | um |

0:32:18 | okay so uh this is diarisation error rate |

0:32:21 | and icsi results of been the state-of-the-art uh for this at that time and i i don't remember if they |

0:32:25 | are still um but they they were that see result |

0:32:29 | no by a wide margin at the time |

0:32:31 | um |

0:32:32 | and this is over the twenty one meetings |

0:32:34 | and these are comparisons to the icsi just to get a flavour |

0:32:37 | um so are sticky results are basically comparative to the red ones which of the icsi |

0:32:42 | so if you just kind of scan through here |

0:32:44 | the green ones are the non sticky each and you can see there worse and that was assigned to as |

0:32:48 | we actually were reasonable this if something was not quite work |

0:32:51 | so if you do a head to head |

0:32:53 | basically were compare we comparative with the icsi result at this time |

0:32:56 | i do wanna say that might |

0:32:58 | my goal showing these numbers is not to say that we are |

0:33:01 | we are be anybody or that were competitive this was two thousand seven |

0:33:05 | we we're not we're not |

0:33:06 | speech people we didn't try to compete |

0:33:08 | um but we got the results are in fact you know compare with they state-of-the-art the art system and the |

0:33:13 | main point a wanna make is it this was done by |

0:33:15 | and lee fox a grad student to visit my group in the summer and she learned about |

0:33:20 | the H H P hmm and she implemented this and they all that are one summer project |

0:33:25 | so this is not that hard to do it's a tool just like the hmm you can learn and you |

0:33:29 | can use it and you can get |

0:33:30 | you know competitive results doing that |

0:33:33 | um we have not pursue this and you know i know that the icsi team actually did a discriminative method |

0:33:37 | that after that that that's a better numbers |

0:33:39 | um |

0:33:40 | but all these numbers are getting pretty good at and i think that this this particular approach if if um |

0:33:44 | if pursued can be a |

0:33:46 | you know what the best that that's out there |

0:33:48 | i i i'm row i'm i'd recall when hmms first came about i was actually a grad to at the |

0:33:53 | time |

0:33:54 | um um the very first paper on hmms gives a numerical results complied compare to dynamic time warping |

0:33:59 | and they were okay but not great better |

0:34:01 | um but of course that was enough and that set the whole field often |

0:34:04 | heading in that direction i i would like to think that this could be the case also these |

0:34:08 | be the on parametric methods |

0:34:10 | there easy to implement |

0:34:11 | they're just a small twist or what you're used to they're just hmms a little bit of extra that you |

0:34:15 | can move to a new table in the restaurant |

0:34:17 | a little bit index heat of multiple restaurants that really in implement |

0:34:20 | i the robust a you know one student can just implement it it in it were |

0:34:24 | well |

0:34:25 | so |

0:34:26 | i do think the gonna play a role |

0:34:28 | um |

0:34:30 | a C Ds are some examples of actual means these are with the diarization rates for both of X you're |

0:34:34 | quite small you can see here we basically solve the problem |

0:34:38 | and here's a meeting with things were a little bit worse |

0:34:40 | there was one meeting which we did particularly badly |

0:34:42 | turned out that wasn't the model as bad it's turned that the mixing we were use in a mcmc out |

0:34:47 | with the for this the mix C was quite slow one that media |

0:34:49 | we had mixed by the time we start |

0:34:51 | and um so that still on when issue of how to |

0:34:53 | make sure the markov chains mix |

0:34:55 | if we're gonna use markov chain |

0:34:59 | okay so that's all i want to say but there is a show um |

0:35:01 | uh |

0:35:02 | uh that that you just |

0:35:03 | briefly mention a couple of other applications of each D P the uh been used in a many other kinds |

0:35:08 | of problems not just for hmms |

0:35:10 | in fact you can use these for P C F G's for problems the context free grammars |

0:35:14 | and on this case you of a parse tree |

0:35:16 | and the number of rules |

0:35:18 | rules are like clusters that you're doing statistical an L |

0:35:21 | and you grow the number of clusters is you see more more data |

0:35:24 | and the same rule can appear multiple places in the parse tree |

0:35:27 | that's what you have multiple restaurants of the chinese rest for exactly the same reason |

0:35:31 | what share |

0:35:32 | this your strings |

0:35:33 | in different locations of parse tree so we have |

0:35:35 | built a system that does that and |

0:35:38 | um |

0:35:38 | are able to then in for uh most probable |

0:35:41 | rules sets |

0:35:42 | uh uh from data |

0:35:43 | parsing |

0:35:44 | build part |

0:35:48 | um okay so that was it racially process |

0:35:51 | and um and the start talking about virtual processors lot more a lot people more you on it but one |

0:35:56 | that wanna move on |

0:35:57 | it's all you about some of the other a a stochastic process that are in the toolbox |

0:36:01 | so one i'm particularly interested in these days called the beta process |

0:36:05 | where the racially like process uh when you or the rest you have to sit at one and one only |

0:36:09 | and only one table |

0:36:10 | the beta process allows you know the rest right and sit multiple T |

0:36:15 | i |

0:36:15 | so you to think about the tables now was not like clusters is like a feature a bit vector description |

0:36:20 | of of of T |

0:36:22 | so if you set a table one three and seventeen |

0:36:25 | and i sit at three seventeen and thirty five we able to bit of overlap among each other |

0:36:28 | so bit factors gone overlap an interesting ways |

0:36:31 | and so that's what the beta process allows us to do the dirichlet process does not |

0:36:36 | um |

0:36:38 | okay now be on the beta process all tell you but more but the be the process or map about |

0:36:41 | one tell you briefly but the general framework |

0:36:43 | um soaking they had a very important paper uh in nineteen sixty seven and i sixty eight uh a call |

0:36:49 | on complete a measures |

0:36:50 | the beta process as an example of a complete random a measure |

0:36:53 | and right image are really simple to talk about it to work with |

0:36:56 | all they are they are measures they're random measures on that's just like before |

0:37:01 | but what's new here's a T assign independent mast an or second subsets |

0:37:05 | of the space i a picture of this |

0:37:07 | an X |

0:37:07 | yeah are we got the here some arbitrary space |

0:37:10 | a here's a set a and here's a set P |

0:37:13 | and here this red a check is a random measure a discrete measure on this space |

0:37:17 | and so there's a random amount a mass fell in to set a a a random out the fill to |

0:37:21 | be |

0:37:22 | and if that random variable |

0:37:24 | um |

0:37:25 | a here the random mass |

0:37:27 | is that dependent |

0:37:29 | because these are non overlapping sets |

0:37:31 | then this random as is called completely random |

0:37:34 | so it's a really nice concept it leads to divide and conquer out where them it's basically |

0:37:38 | um you know |

0:37:40 | as computational concept |

0:37:42 | okay so uh there are lots of lots of things you you know you know what brown emotion is what |

0:37:46 | brownie motion it turns out is is is a special case of this |

0:37:50 | gamma processes process is and all kinds of other interesting object |

0:37:53 | to respect is not completely random process but it's a normalized gamma prior |

0:38:00 | okay so now |

0:38:01 | um king men had a very beautiful results um |

0:38:04 | characterising completely random process |

0:38:07 | and turns out that they can be derived from poisson prop point process |

0:38:11 | so the pa some process lies behind all of this |

0:38:14 | and it's really beautiful construction or some tell you bout here a really briefly what we're trying to do remember |

0:38:18 | as we have this space so make a |

0:38:20 | we're trying put random measures on a at all job put the racially process measures on spaces |

0:38:24 | i mean now be more general side but a how all kinds of other random a results space |

0:38:28 | what you do it is you take the original space to make a and you cross it with the real |

0:38:31 | line you look at the product space so make across all |

0:38:35 | okay |

0:38:36 | i'm gonna put a poisson process in that product space |

0:38:39 | how do put a poisson process on things |

0:38:41 | well you to tell but with the rate function in |

0:38:43 | i you're probably fill with the homogeneous poisson process where you have a flat rate for a concentrate function |

0:38:49 | and then every the in a little interval or action this case a little set |

0:38:53 | um the number of points for there is a poisson random variable |

0:38:56 | i with some re numb |

0:38:58 | like a let the rate variance so now the poisson on uh number uh the possible rate is an integral |

0:39:03 | of a rate function over that little set |

0:39:05 | so you have to write on the rate function here it is |

0:39:08 | and so i in a great of that rate function to get um |

0:39:10 | i now possible rates for every little small set of this space |

0:39:14 | and here's a draw from that some point process so you see the thing was tilted up and the left |

0:39:18 | i got more points down here and if few are up here |

0:39:22 | now having drawn from this are pa process with this rate function then i forget all this machine or look |

0:39:26 | at this red thing here |

0:39:28 | i take each X i drop a line from that X down to the or make at |

0:39:32 | and now that resulting object is a as measure on the all make a space it's a discrete measure |

0:39:37 | and it's random |

0:39:38 | and it's close it's a completely random measure because any uh mass the falls in a some set a here |

0:39:43 | and some set be here will be independent |

0:39:45 | because it's an underlying plus um pro |

0:39:49 | the beautiful fact is that all the random is gonna got this way |

0:39:53 | okay so more directions trivial other directions complete it's quite non sure |

0:39:57 | so if you like a plea ran a measures and i do |

0:39:59 | uh then this |

0:40:01 | there are says you can reduce the study of |

0:40:03 | these measures to study of the pa some |

0:40:05 | and just the rate functions the possible process |

0:40:08 | so i think that's a tool |

0:40:09 | i think that |

0:40:10 | in this field the me others that we will be studying rate measures for poisson process |

0:40:13 | as a ways to specify |

0:40:16 | comet or structures on thing |

0:40:20 | okay so the particular example of the beta process has this thing is it's rate function or a function at |

0:40:24 | to arg gonna sort the a make a part and the real part |

0:40:27 | the make a part just some at a some measure that gives you um |

0:40:31 | you um |

0:40:32 | the prior on atoms |

0:40:33 | just like a was G not be for not called be not |

0:40:36 | and then here for the beta process is the uh is the beta density |

0:40:40 | and the improper beta density |

0:40:42 | um |

0:40:43 | which gives you infinite collection of a of points we draw from point process which is what we want we |

0:40:47 | don't want find a number that would be a parametric |

0:40:49 | prior we wanna a nonparametric power we do this thing to be |

0:40:53 | improper gender |

0:40:55 | okay so so i that's probably too much map me just draw picture |

0:40:58 | the here is that rate function it has a singularity the origins a really tilts up sharply at the origin |

0:41:04 | it breaks off it stops at one |

0:41:06 | and so all the uh voice we do the poisson some point process you get lots and lots of |

0:41:10 | of points that are very near the origin |

0:41:12 | and you now take this um by dropping from down from the uh |

0:41:16 | uh |

0:41:17 | the Y court down the X N |

0:41:19 | then it looks like the |

0:41:20 | it's O P I is the height of an adam and all make i is the location of the atom |

0:41:25 | and you take that's a infinite sum and that a now is a random measures as another way of getting |

0:41:30 | random measure like to be but stick breaking the four |

0:41:32 | this is another way of getting random measure which is much more general |

0:41:35 | um and a particular these P I do not some to one there between zero and one |

0:41:40 | but they do not some the one |

0:41:42 | and they're independent |

0:41:44 | i so i like to think of these pieces coin tossing probabilities and i like i think that this is |

0:41:48 | now an infinite collection of coins |

0:41:50 | uh and most of the coins have probably nearly zero also picked a all these coins |

0:41:55 | i'll get a few ones and lots of zero |

0:41:58 | and get an infinite set of zeros and a few ones |

0:42:01 | and i do that again a all can get a few ones and a lot of zero |

0:42:05 | a keep doing that they'll be a few places where i'm gonna get lots of ones |

0:42:09 | a lot of overlap |

0:42:10 | and lots of other the places with there's less so |

0:42:13 | of a picture showing that |

0:42:14 | a here i've drawn from the beta process that that blue thing right there this between zero one it's mostly |

0:42:19 | nearly zero |

0:42:20 | and then here's a hundred as |

0:42:23 | with these think of these as coins uh and think of these heights as the probability of one have head |

0:42:28 | i draw this sum has a big a probably here's like a lots of one that are relatively lots of |

0:42:33 | ones like go down the column |

0:42:34 | and some region over here |

0:42:36 | and uh no know i've nearly all zeros why quantizer probably like a almost all zero |

0:42:41 | okay so think of a row of this matrix now snaps as a sparse binary |

0:42:46 | infinite dimensional random age |

0:42:48 | i think of this is a feature vector for a bunch of entities |

0:42:51 | so here's a hundred and tease and if you i think of this like a chinese restaurant it D number |

0:42:56 | one hundred came to the restaurant |

0:42:58 | and didn't just sit it one table it's sat at this table this table in this table this table on |

0:43:02 | the table be for different tables |

0:43:04 | the next person comes in number ninety nine and sat at didn't sit the first table but set at the |

0:43:08 | four stable table and bubble bobble |

0:43:10 | right of this captures the it's it in pattern of a hundred people of in this restaurant and all the |

0:43:15 | tables i cetera at |

0:43:16 | and the total number of tables you keep doing that's gonna grow it's the denser and denser fill out |

0:43:21 | but it grows again at its slow rate |

0:43:25 | okay so um and |

0:43:27 | you know different centres the for parameters of this process and you can get different right is to that on |

0:43:31 | the settings |

0:43:33 | i so that's probably way too abstract for you appreciate um why |

0:43:37 | you wanna do that |

0:43:38 | let me just say that there are also was a restaurant metaphor here |

0:43:41 | um there's something the indian buffet process which captures the sitting pattern directly on the matrix |

0:43:46 | not talking about the underlying beta process just like the chinese restaurant |

0:43:50 | captures the |

0:43:51 | sitting pattern for the de racially process and not talking about the underlined to racially process literally the marginal probability |

0:43:56 | and of the beta process |

0:43:58 | so i'm skip that um i one move to |

0:44:01 | um |

0:44:02 | all the way there we go back to this problem i could make it concrete again |

0:44:06 | okay so number this problem of the multiple time series i have these people coming in into the room and |

0:44:11 | doing these exercise routines |

0:44:12 | and um and i don't know how many |

0:44:15 | routines there and in the library and i don't know who does what routine |

0:44:19 | and each person doesn't do just one routine they do a subset of reading |

0:44:24 | i i |

0:44:25 | right so how my gonna model that |

0:44:27 | i well it's time series and has segments and an i'm plug use an hmm as might basic like |

0:44:32 | but now got put some structure around that to capture all this come oral structure about my problem |

0:44:37 | right |

0:44:37 | so way that i'm gonna do that as a me i'm use the beta process |

0:44:41 | and uh i have a slide on as i think and is that try say this an english what you |

0:44:44 | start to slide |

0:44:46 | um and the math if you care to |

0:44:48 | but |

0:44:48 | a but i encourage you not just listen to me tell you how does were |

0:44:52 | um um okay so let's both everybody this room as good come up here on the stage do the lecture |

0:44:56 | size routines to every one of you is good to your all the subset |

0:44:59 | there's a infinite library up their possible routines you could do |

0:45:03 | you choose before you come up on stage which subset of that library you good at you're good actually pick |

0:45:07 | out |

0:45:08 | do maybe i i'm to do jumping jacks and twist that's on there |

0:45:12 | right |

0:45:13 | now |

0:45:13 | a up and have and there is an infinite by in |

0:45:16 | transition matrix |

0:45:18 | um um |

0:45:19 | and that got possesses the not us get to possess |

0:45:22 | and uh i pick out |

0:45:24 | um twist |

0:45:25 | and jumping jacks |

0:45:26 | from that in for matrix so i'm gonna pick a little to by two sub matrix maybe twist is uh |

0:45:30 | column number thirty seven |

0:45:32 | jumping jacks as number forty two |

0:45:34 | so i pick out that that those those columns in the corresponding rows i get a little to by two |

0:45:38 | matrix and i bring it down thing in for the matrix |

0:45:40 | and i and stan see that a classical hmm |

0:45:44 | it's actually autoregressive hmms like it'll bit of oscillation "'cause" he's are also to remove |

0:45:49 | i and now i run a classical a jim for for me dream my exercise routine my emissions or the |

0:45:53 | six dimensional |

0:45:54 | vector of positions |

0:45:56 | and just a hmm |

0:45:58 | right now i run the forward-backward algorithm and i gets "'em" update to the parameter |

0:46:02 | i don't have they might the local a to go back up an infinite matrix and that little to by |

0:46:06 | two that's right but the updates from the upper bound welch |

0:46:10 | right now he comes then he's the next person a couple the stage and he takes also column thirty five |

0:46:14 | but he did the node one a one one a one seventeen i |

0:46:17 | so you out a three by three subset of the matrix |

0:46:20 | he runs C hmm on his data |

0:46:22 | it's about mulch update and then goes back to the infinite matrix and changes those that three by three submatrix |

0:46:28 | as |

0:46:29 | and as we all keep doing that we're gonna be changing overlapping subsets of that for the minimum mean |

0:46:35 | okay |

0:46:36 | and so that's the beta process |

0:46:38 | ar-hmm |

0:46:40 | so i hope that you kind of got the spirit of that it is just an hmm there one hmm |

0:46:45 | for each of the people do the exercise routine |

0:46:47 | and then the beta process is this machine up here which gives us a a |

0:46:50 | a feature vector a which of the subsets of states are the infinite set a state that i actually you |

0:46:57 | i so the this maybe look a little bit complicated it's not it's actually very easy to put on the |

0:47:01 | computer |

0:47:02 | and again um this is actually in again emily came an implemented this center second summer with us |

0:47:07 | um |

0:47:08 | and |

0:47:10 | but the software to do this |

0:47:12 | um |

0:47:13 | okay so uh anyway it just a hmm like the with the beta process prior on some on the way |

0:47:18 | the parameters struck |

0:47:20 | now this actually really worked |

0:47:21 | so um |

0:47:22 | here motion capture results |

0:47:24 | uh |

0:47:26 | and this is nontrivial trivial problem of a lot lots of methods don't do uh well at all on this |

0:47:30 | problem |

0:47:31 | um it's a bit qualitative as how well we're doing but i think you'll kind of if you look adults |

0:47:34 | were doing the well so here is the first feature |

0:47:37 | if you pick to that's state i E feature |

0:47:39 | and then and you just |

0:47:40 | held that fixed an hmm it's or autoregressive hmms it'll oscillate |

0:47:44 | the oscillations the arms go up and down |

0:47:47 | this kind of picked out this jumping jacks |

0:47:49 | um |

0:47:50 | rudy |

0:47:51 | this one if you take that feature in you put in hmm a you don't you don't any transitions happen |

0:47:56 | you get the knees wobbling back back |

0:47:58 | here you get some kind it was motion of the hips |

0:48:00 | here's the more wobbling something or other here's where the arms are go in circles |

0:48:04 | i so when your the bottom you start to get a little bit more subdivision the states than maybe we |

0:48:08 | might like although emily thinks that there's kind of |

0:48:11 | good can a mac reasons of these are actually to be gift |

0:48:14 | a but this really this that nail the problem it took in this sixty dimensional time-series multiple once do not |

0:48:18 | about the number of segments and where the segments occur |

0:48:21 | and jointly segment them among on all the six all all all the different users |

0:48:25 | of the sector |

0:48:28 | alright right |

0:48:29 | um how much you on time |

0:48:31 | um |

0:48:31 | okay so i'm a about the not time i'm this good uh i think i against get some slides here |

0:48:35 | and just say something about um |

0:48:38 | um |

0:48:38 | this model |

0:48:39 | uh |

0:48:40 | so this is something i've were done for a number of years it's say um exchangeable model i bag of |

0:48:44 | words model for text probably were shocked nations it's fairly widely known |

0:48:48 | it's extremely simple it's too simple for really lots of we're a role work phenomena |

0:48:53 | and so we've been working i make a more interesting |

0:48:55 | and um |

0:48:56 | so |

0:48:56 | what the lda model does it takes a bag of words representation of text |

0:49:01 | and it re represents texans since terms of work on top top |

0:49:05 | a topic is a probable vision on word |

0:49:07 | and so i given document can express a subset of topics maybe be sports and trap |

0:49:12 | and so all the words in the document come from |

0:49:14 | the sports topic |

0:49:15 | or the travel top can you get mixtures were called had mixtures |

0:49:19 | um um of topics within a single talk |

0:49:22 | um um the problem with this approach would many problems one of them is that the |

0:49:26 | words like |

0:49:27 | a a function words |

0:49:29 | uh ten do occur every single topic |

0:49:31 | because of i have a duck with only about travel and the function words don't occur in that topic |

0:49:35 | i can get function words in that document |

0:49:37 | i get a very low probably document |

0:49:39 | right so we like to separate out things like function words other kinds of abstract words from more concrete words |

0:49:45 | and make a traction higher |

0:49:47 | right so we have done that was the paper the that that was G A C and the you're and |

0:49:51 | i approach you look at this site more proud out of this and some sense than lda as a kind |

0:49:55 | of a |

0:49:56 | path forward |

0:49:57 | for this field |

0:49:58 | uh we call this the nested chinese restaurant process |

0:50:01 | and so this is a a whole chinese restaurant up here and here's another chinese rest are all these are |

0:50:05 | chinese restaurants are now organise in a tree |

0:50:08 | when you go to a chinese restaurant you pick a table like before and that tells you what branch to |

0:50:12 | leave |

0:50:13 | to go to the next rest wrought |

0:50:15 | so you think about this is a as the first night here in prague |

0:50:18 | you pick some rest and then you'll or what branch you you know you know what rest are you need |

0:50:22 | a to on the second night |

0:50:24 | and then a third not and so on as you keep going |

0:50:27 | the nights of the conference |

0:50:29 | alright so what document comes down here and picks the path down this tree |

0:50:33 | another dog it comes in a picks another path and they what that that the overlap |

0:50:37 | and so you get these |

0:50:38 | overlapping branching structures |

0:50:40 | alright and now you put a topic at every node this tree it has a distribution words |

0:50:45 | and then a document has a path down the tree that gives it a set of topics can draw from |

0:50:50 | and that draws the words from those that that set of top |

0:50:54 | right |

0:50:55 | now that a up of the top is been used by all documents |

0:50:58 | so it makes a lot sense to put the function words up there |

0:51:02 | or i as this no down here is only been used by small so so the docking so you might |

0:51:06 | will put some more concrete words down there |

0:51:09 | and that's what the statistics does we fit this the data |

0:51:11 | it did develop stop it's at the top of the tree which are more abstract a more concrete topics as |

0:51:15 | you go down the tree |

0:51:17 | i so i'm get just yep |

0:51:18 | and here's my last slide actually this is the a result to turn your head |

0:51:22 | uh of um in this to us like i think is as action not psychology all you change this errors |

0:51:26 | like that |

0:51:28 | psych |

0:51:28 | um psych review view he's are abstracts or a particular journal and psychology |

0:51:32 | and this is the most probable hold tree we're put a distribution on the tree turn |

0:51:36 | is everything is uh |

0:51:38 | distribution here |

0:51:39 | and the was high probability tree in is the was high probably topic |

0:51:43 | um |

0:51:44 | or a high probably words that the topic at that node so we get a and of is the high |

0:51:47 | probably words at the root |

0:51:49 | so we have to strip away the function words in this case they just pop |

0:51:52 | what well hold up the |

0:51:54 | um the next level we get a model memory uh self social psychology motion vision binocular |

0:51:59 | dried food brain |

0:52:00 | oh so this looks kind of like social psychology cognitive psychology uh physiological psychology and visual psychology |

0:52:07 | and so once you good other the is actually infinite tree we are shown the first three levels of that |

0:52:12 | um so any about hope that "'cause" you or more flavour of you know the toolbox box no should here |

0:52:15 | we but |

0:52:16 | try rest not together and a |

0:52:18 | with a a object which first distributions on is and we could sell do things like how traction |

0:52:23 | and reason about |

0:52:25 | um so i'm done uh i was just kind of a tour of a few highlights of a literature um |

0:52:30 | they're probably about a hundred people worldwide are working in this topic |

0:52:34 | uh actively we as a composer "'cause" there's a conference call bayesian a nonparametric parametric it's we held of very |

0:52:39 | cruise later this model |

0:52:41 | um every two years as |

0:52:42 | you held |

0:52:42 | um |

0:52:44 | uh |

0:52:44 | and it is just be getting going so for much of the younger people the audience you uh |

0:52:48 | i highly encourage you look at this is there's a little bit of work |

0:52:51 | beam brought in the speech and signal processing but there as good be a whole whole lot more |

0:52:55 | um so on my publication page there two papers which are my point you to if you enjoy the talk |

0:52:59 | that are written for |

0:53:01 | for for uh a non expert and give you lots of pointers to um more |

0:53:05 | a literature |

0:53:06 | thank you very much |

0:53:16 | a there's time for questions |

0:53:17 | yes |

0:53:17 | thank you both to a them for two |

0:53:20 | so can we see have time for |

0:53:22 | some questions yeah |

0:53:31 | hearing is read channel you get of make it |

0:53:35 | and we already |

0:53:36 | in the money |

0:53:37 | a good a so i have two questions a lot first well thank you very much things to this and |

0:53:40 | is possible for to still uh |

0:53:42 | a |

0:53:43 | a community |

0:53:44 | the first question to i have a a a um |

0:53:48 | don't i'm just a |

0:53:50 | full not a scale problem |

0:53:52 | since this set and a |

0:53:54 | choir monte carlo simulation |

0:53:56 | i |

0:53:57 | okay |

0:53:58 | to need that is to make a |

0:54:00 | really |

0:54:01 | real |

0:54:01 | difficult you know i don't believe that at all and so no one knows really we go to large scale |

0:54:07 | uh so |

0:54:08 | you know the em algorithm does that apply to large scale problems or not |

0:54:11 | yeah yes and no |

0:54:12 | i mean for some problems |

0:54:14 | things |

0:54:15 | really quickly settled down after a very quick number of iterations of the uh maybe like two iterations of em |

0:54:20 | or for some problems |

0:54:21 | these algorithms |

0:54:22 | that should do a are just give samplers |

0:54:25 | and so with the E out with a plus one extra step |

0:54:28 | just a bitterly em do like em before uh you not change the indicator from this to this |

0:54:33 | or or or go to a brand table |

0:54:35 | right so we actually don't know whether uh poster inference are guns on large scale or gonna mix |

0:54:40 | maybe they mix may of more quickly than were used to from a small day |

0:54:44 | in the last point to make is that this has not you you C C that's just what we happen |

0:54:48 | use because this was we had three months to the project |

0:54:50 | any other procedures are split and merge algorithm there's is variational methods and so on for post your as can |

0:54:55 | be used here |

0:54:56 | um so um |

0:54:58 | it's it's low |

0:54:59 | yeah not a second question i have is that many people in this all |

0:55:03 | where lee |

0:55:05 | um the map of the use of a well yeah it's sample |

0:55:08 | a a not a really small a good model |

0:55:11 | yeah is for speech |

0:55:12 | in in a fist of |

0:55:14 | just |

0:55:15 | yeah that we select we use that to them what in france easy |

0:55:19 | and and then |

0:55:20 | easy correct so i just want to know you'll cake all |

0:55:24 | to what extent |

0:55:25 | this this ten |

0:55:27 | general |

0:55:28 | your yeah you not okay that's a great great question like like a very much a so how those two |

0:55:33 | papers or a minute |

0:55:34 | and your question trying to show the toolbox box shows a range of other kinds of models we |

0:55:39 | consider a |

0:55:40 | a there need a racially process for example doesn't have |

0:55:43 | a power law behavior |

0:55:44 | you might want power law behavior |

0:55:46 | or something else called the pitman-yor which gives you power be |

0:55:48 | you could do a pitman-yor version of H P H M M |

0:55:52 | um |

0:55:52 | and so there there are many many generalisations there's inverse gamma forms of the weights and so on so for |

0:55:58 | i so um |

0:56:00 | yeah uh you know |

0:56:01 | this is really is a tool box |

0:56:03 | and i think also the point about you know i'm a statistician we're used the to models which if you |

0:56:07 | find the right card to it can really work surprisingly well |

0:56:10 | and so you hmm yes it's not the right model for speech but that's lee set and i at |

0:56:15 | i |

0:56:15 | and it still was the a you know the elephant to the got speech all the way to where it |

0:56:19 | is |

0:56:20 | maybe badly may wrongly |

0:56:22 | right |

0:56:23 | um |

0:56:23 | you know but uh i it is a very useful to make cartoons tunes that to have nice that as |

0:56:28 | is go and computational probably and can be general |

0:56:31 | so i think a hmm was too much of a box could be generalized easily and i think that these |

0:56:35 | methods go beyond that |

0:56:36 | once again |

0:56:37 | but still staying with a problem |

0:56:42 | but it's like back up |

0:56:43 | oh he have to do that the back |

0:56:45 | i have control over that |

0:56:48 | or another question |

0:57:00 | i can summarise ways |

0:57:11 | yeah so is questions about over fitting wise you getting killed on over fitting a nonparametric world |

0:57:16 | well you know i |

0:57:17 | uh |

0:57:18 | it's easy for is but bayesian don't have such problems of the over fitting |

0:57:21 | it's kind of i'm not always a bayesian but what i a bayesian |

0:57:24 | um you know it's one the is i am |

0:57:27 | um so you know of to first or we don't have a over fitting troubles with these systems even on |

0:57:31 | pretty large scale problem |

0:57:34 | a fact |

0:57:36 | um so you know we compare this for example one our context free grammars stuff the |

0:57:40 | be a pet of there was ice with a split merge em algorithm |

0:57:43 | and there they had |

0:57:44 | a big over fitting problem they kind don't with that |

0:57:47 | eventually pretty fact way |

0:57:49 | um but we didn't have to think about that |

0:57:51 | just didn't |

0:57:51 | you know |

0:57:52 | come up |

0:57:53 | um |

0:57:54 | you know so that's the first order the second or you have a couple of hyper parameter of to get |

0:57:58 | them in the right range |

0:57:59 | you know get the right range of have some overfitting fitting but were very robust to that |

0:58:03 | and so you're right yeah the day didn't totally work just of about |

0:58:07 | we had a little over fitting there was a one more state there should a in |

0:58:10 | but that's not too bad |

0:58:11 | that it was not to easy do a little bit of engineering and think about the problem a little bit |

0:58:14 | more they are well little bit of time scale and have at you know uh prior needs to be put |

0:58:18 | in |

0:58:20 | so i i think that's fine were sort of artists an engineers were trying to we don't mix wanna build |

0:58:24 | a box |

0:58:25 | that in a you give to a high schools to the it's done |

0:58:27 | oh always be a little bit thinking |

0:58:29 | and a lot of engineering |

0:58:30 | and to about |

0:58:32 | um but i really the you know i'm not always a bayesian i've of of what's to be a non |

0:58:36 | be i go back and forth every single day of my life |

0:58:39 | um |

0:58:39 | you know but for a lot of these really high dimensional hard |

0:58:42 | yeah inference problems we you have multiple things which need to kind of |

0:58:45 | collaborate by the harry |

0:58:47 | a just gives you a a a you know from the get go a lot control over those sorts |

0:58:56 | in you the question |

0:59:05 | so for speech we use mixture models within each yeah a is there way too |

0:59:10 | this side went to great then use a versus one but you a new component yeah X question i should |

0:59:15 | made that clear |

0:59:16 | so |

0:59:16 | the state specific emission distribution here is not a single gal C |

0:59:20 | for reasons you guy know extremely well |

0:59:22 | is the the mixture of gas |

0:59:24 | no |

0:59:24 | it's a the racially process mixture of dallas |

0:59:27 | right and it turned out that was critical for us to get this to work |

0:59:32 | absolutely we need more just like a the classical gmm |

0:59:35 | a single permit speech done more |

0:59:37 | we don't wanna have a |

0:59:38 | L distributions there we wanted to grow as you get more all patches to be more more pictures of that |

0:59:44 | mission distribution which a rise we put in the hard the like cross |

0:59:47 | it's a hard like process |

0:59:49 | that helps to make the point about two boxing better |

0:59:55 | a well we should uh we've one more question asked one |

1:00:04 | uh we actually use quite a bit up a all |

1:00:08 | i hidden markov model in for |

1:00:11 | and it's very good so make mean everybody to |

1:00:14 | take a low but one that the major problems is in the |

1:00:18 | a prior evil lucien because we have to estimate a lot |

1:00:23 | for a hyper hour um |

1:00:25 | so sometimes |

1:00:26 | the number of high but from is is more than |

1:00:29 | the pair on the this is so |

1:00:31 | now now yeah talking about a |

1:00:33 | in for a number of |

1:00:35 | for it is |

1:00:36 | so we're have probably i for a number of hyper per on no we don't know that's really critical so |

1:00:41 | the classical bayesian hmm had a lot hyper parameter that because it had |

1:00:46 | fixed K states yes and number of hyper ever scaled the size K |

1:00:50 | right and you had the I C or something else like that the choose K |

1:00:53 | oh |

1:00:54 | we're not do we any of that |

1:00:55 | we have a distribution the number of states and a number of hyper parameters constant |

1:00:59 | okay a small |

1:01:01 | so it's share she's here and i was actually there's a sharing by this high this time these uh choices |

1:01:06 | a french |

1:01:07 | the hyper programs real little with a top level |

1:01:10 | okay okay at the menu that one there's of so |

1:01:12 | a very small number of that so there is uh a pry pollution according to how you is you give |

1:01:17 | the have breast they are correct the very number of small number of hyper parameters here okay some sense almost |

1:01:22 | too small then you can't believe it it it it probably has to be a go a little bit to |

1:01:26 | know is a is very good we but is that we not |

1:01:29 | this is not your classical |

1:01:30 | a bayesian hmm or you have the number of states things |

1:01:33 | were in a of the number state it's friend |

1:01:35 | okay as the city pattern in the rest |

1:01:38 | it's it's grows that the prior with log in and the posterior just random |

1:01:41 | okay |

1:01:43 | thank you thank you |

1:01:44 | a well i would like a us to thing |

1:01:46 | see once again for if |

1:01:56 | and now we have the uh a few break outside T one |

1:01:59 | nine or |

1:02:08 | hmmm |

1:02:09 | i |

1:02:09 | yeah |