0:00:13okay so let me but you know they four
0:00:16or conference
0:00:17well as
0:00:18well uh should no practical remarks are of the this morning
0:00:24you know i guess doesn't have any closing ceremony in the afternoon
0:00:29and the
0:00:30greens but
0:00:34but it then
0:00:35yeah grew the one last year
0:00:38uh seems to come down
0:00:40so uh oh i would like to
0:00:43thank you all for being here
0:00:47you next your encoder
0:00:49and actually
0:00:51the only thing
0:00:53that uh after having done
0:00:55slides is that we have a again in a problem will be not so probably
0:01:00some like to being written
0:01:04so for that and we are working uh for a while
0:01:08ooh it
0:01:11oh we will have the last a line or E
0:01:14given a uh by michael jordan
0:01:18from uh there
0:01:20and the yeah more um uh problem uh university of you know or um
0:01:25we introduce
0:01:26or speaker
0:01:32think you're much one yeah so
0:01:35a great on two
0:01:38this a michael jordan
0:01:40or you've got
0:01:43as uh
0:01:45you you for this
0:01:47uh a you view
0:01:48a new
0:01:50of is uh work
0:01:55if if use of artificial intelligence and machine
0:01:59and on some student for which i mentioned
0:02:02a two
0:02:04uh is uh work is in fact
0:02:07for just a two
0:02:09what you was it P two
0:02:11san go
0:02:12then be key effect number eighty might way
0:02:15uh if not uh
0:02:17uh interest
0:02:18machine learning and
0:02:19you you general if you have to
0:02:23for last one years
0:02:25a word fundamental them then so
0:02:28be with
0:02:29i to use you
0:02:30uh and the light
0:02:32scratch this
0:02:33you we you work and you need
0:02:35absolutely incredible contributions to last
0:02:40uh the topic is if to talk about day is uh a non permit can i
0:02:48a general use be stressed it is quite this
0:02:51for a work on a graphical models we see graphic
0:02:55which which found application
0:02:58a speaker you
0:02:59and many statistical signal processing so
0:03:02in all the rest like a natural language course
0:03:06nation the G
0:03:08statistical genetics
0:03:12a as being of course recognise
0:03:14a tremendous if voice work uh
0:03:16both in statistics
0:03:18engineering in for instance
0:03:20a year here was a V two
0:03:23a a national actually number
0:03:26what is to
0:03:27national shall i could be of uh
0:03:31so uh
0:03:32a fellow
0:03:34american association for the advanced
0:03:37first person is slice and
0:03:39good do three
0:03:41distinctions the this
0:03:43in a uh is to follow
0:03:48uh of
0:03:49solution for
0:03:51a machinery
0:03:52of light you be of course a would you really
0:03:55a statistic is to so even lecture
0:03:58as well as see that
0:04:02a special distinctions
0:04:05before that i mean you just read a
0:04:10because you use that's basket
0:04:13i some uh
0:04:16some some words from students
0:04:18michael has a large number of students
0:04:22extremely successful positions
0:04:26here's what i read
0:04:29michael jordan if you imagine
0:04:37Q or
0:04:38if you can become used to
0:04:40we really should be with region students you have
0:04:52thank you
0:04:59i'm delighted to be here and i thank the organisers very much for inviting me
0:05:04uh so the main goal the talk is to tell you a little bit about what
0:05:07uh the title means what is a nonparametric
0:05:10um just to anticipate a little bit first bayesian just means you use probability pretty seriously
0:05:16and is already community uh where probabilities been used uh
0:05:20probably more than any other apply committed i know of for a long long time so that's easy part of
0:05:23my story
0:05:25nonparametric C doesn't mean there no parameters it is just the opposite it means there's a growing number of parameters
0:05:31so you get more more data at uh bayes not permit matrix a large have more more parameters
0:05:35i grows so that something a
0:05:37a a rates or the number of data points but it grows
0:05:40uh i think that's kind of a key
0:05:42moderate perspective
0:05:44at about out statistics and uh signal process
0:05:47and the bayesian has a particular take on
0:05:50oh uh a really does a toolbox high my talk and gonna try to do the talk is give you
0:05:54idea what this toolbox is
0:05:55you can use it to solve apply problems
0:05:57i has a beautiful mathematical structure and it's some since it's really just getting going just getting started
0:06:03uh a song good try to convince you that you to contribute here
0:06:06uh both and a fundamental middle side and i and a track
0:06:10a a big dollars collaborators are the bottom you their names or appear about the talk
0:06:15what you do them
0:06:18um now so i've of the one slide on sort of just sort of a historical philosophy i i but
0:06:22computer science your but this can equally be bill uh well be
0:06:27so the all these fields in the forties and fifties uh uh were somehow together you know people like uh
0:06:32a common core of and to rain and so on
0:06:34some of in this many these
0:06:36it's separate "'cause" the problems got really hard i believe
0:06:40and to peer sides when a often start looking at data structures an out
0:06:43it didn't for much on an certain
0:06:45so this is a ring and focused on uncertainty and certainly and funk focus too much on algorithms and data
0:06:51and so maybe to not per much as one
0:06:53a venue you where these two things are coming together and mathematically really at it amounts to using stochastic process
0:06:59i set of classical parametric distributions
0:07:02i use a growing number parameters so you have
0:07:04distributions indexed by
0:07:06um large spaces those are just called stochastic process
0:07:10yeah you're gonna see that are right the unit the talk some of the stochastic process we're just in looking
0:07:15and sort of put this a little bit of a more of a clack a statistical context if you pick
0:07:19up a book on bayesian analysis
0:07:20you will see a the posterior written is proportional to the like the times the prior
0:07:25uh and usually right out in a parametric way with the data index in the parameter space
0:07:30um and so this the priors is a likelihood in there's sure posterior
0:07:34in this talk a we don't want data to be a a dimensional object we want to be in three
0:07:39dimensional open in that
0:07:40so we a right set of data but the right G of like that
0:07:43so of X given G is still to be a likelihood in this talk this is mainly get be hidden
0:07:47markov model actually
0:07:49that's so G will be the structure and all the parameters of a hidden markov model the state space and
0:07:53so on and this is just the usual hmm
0:07:55i and this will be some kind of a structural component of hmms or multiple priors on that
0:08:00so all be talking mostly about this and less about that that
0:08:03um all right so but it with a mathematical story just that instead of having a classical prior distribution
0:08:08you have a distribution an in an image dimensional space
0:08:11and that's not so strange mathematically that is what a stochastic processes
0:08:15uh so we have us a prior stochastic process it's could be multiplied by some kind of us fairly classical
0:08:21"'cause" once G is fixed
0:08:23it it out care what the size space is is just a fixed object
0:08:26is the probably of data given G can be easily specified typically
0:08:30as we multiply this post no prior by this likely that we get ourselves up
0:08:34posterior stochastic process
0:08:36that's to get some more concrete idea what these things are
0:08:38we'll be talking a little bit about
0:08:40uh distributions on trees of one bounded than an down the fan out you get these in genetics and and
0:08:45natural language processing quite a bit
0:08:47uh so cats process on partitions are rise a lot we talk about class tree models to the number of
0:08:51clusters is a known a priori
0:08:53we can put distributions on grammars on sparse matrices on
0:08:57something copy copy really we but distributions on distributions with this tool as well as you can get kind of
0:09:01a recursive a E
0:09:02that you see in computer science
0:09:04um i'm to talk about one particular a large class of um stochastic processes is a random measures what we
0:09:10get to what those are
0:09:11those are
0:09:12one the main ingredients of the tool but some try to a
0:09:14a at talking
0:09:16okay so here the familiar problem um i learned about this from us so my always at icsi a we've
0:09:20work on this a bit um um
0:09:22and the but use it as a con way to to show you the some of the methods i'm talking
0:09:25about be rolled out in out applied to domain
0:09:28uh so i think everyone here knows with this problem is there's a single microphone there's a meeting going on
0:09:32here's the waveform
0:09:33and ground truth is like this the bob spoke for well than john then chilled than bob and someone
0:09:38we'd like to infer this from this
0:09:40and we don't know how many people in the room will we don't know anything about the spectral characteristics of
0:09:44their speech
0:09:46here's the other problem to talk about was a little less traditional for for you guys i think
0:09:49um but it's a segmentation problem
0:09:51that's a multivariate segmentation and it's a multi type time series uh segmentation problem
0:09:57okay so um someone came into a room and this is uh a motion capture we have sensors on their
0:10:03and i think it's about sixty dimensional
0:10:05uh so we have a sixty dimensional time-series for this person
0:10:08and you can see it segment the able they did a little bit of jumping jacks a little bit of
0:10:11splits little but was so on so for
0:10:14um but we're not supposed to know that a pretty or we don't know what the library of exercise routines
0:10:18was and we don't know for this particular person which routines of of the big library they decided to use
0:10:24how long they lasted and where the break points were all that
0:10:27so like to and for all of that
0:10:28moreover this is just one person sixty dimensional time-series series have its for many different people
0:10:33so each person gonna come in a will get a sixty dimensional times for each one of them
0:10:37and the we some overlap some people to twist some people able to touch your toes and so so forth
0:10:41not every won't do all we're teens null be some overlap
0:10:44we like to find that and then exploit it so far learn a little bit about what twists look like
0:10:48for one person
0:10:50i like that it be available to be used to segment the times or the that for some other person
0:10:55it's of the joint inference problem over multiple i
0:10:57multiple to
0:10:58a a high dimensional time series
0:11:02um okay so everyone this audience with hmms are plus an audience to talk to for that reason so here's
0:11:07some of here's three of the diagrams are often use i like the graphical model one here what here's the
0:11:12there is a markov chain i
0:11:14um i multi normal random variables
0:11:16you have in missions coming off of those in here the parameter sitting on there
0:11:20um the core of but of course is the transition matrix is the K by K matrix at each role
0:11:24of it
0:11:25is the next state transition probability distributions so give a urine state
0:11:29um um say three
0:11:31or two then you have a transition out of that i'm we represent that it's this little
0:11:35discrete measure here it just a bunch of atoms and look at locations which of the integers
0:11:40and we have masses is so C with them and also the transition probabilities going out from state to as
0:11:45is my two
0:11:46to all the next day there's K of these things so it's a it's a measure that has finite support
0:11:51and will be talking about measures that infinite support here
0:11:54pretty so
0:11:55so there there would be pi two pi one might be a different measure that maybe is is sparse it
0:11:59his two non-zero atoms
0:12:01and um and so one pi three and so on
0:12:04okay so those of the representation that the lattice the graphical model
0:12:07and i i
0:12:08let's represents is of the
0:12:10next state transitions that will be talked about the talk
0:12:12alright so uh lots of issues with hmms that are still some some sense and resolved uh how many states
0:12:17are we use
0:12:18so for the diarisation problem we don't know the number of speakers so we have to infer that in some
0:12:23um and the uh segmentation probably don't know the number of behaviours i i got twists send job job out
0:12:28touch your toes and jumping jacks and all
0:12:30or some uh no number of behaviour
0:12:32i think you more interestingly this ladder problem and it has uh it she's about the structure of the state
0:12:38so how to be an co the notion that a particular times is makes use of particular subset of the
0:12:42is the comet oral notion
0:12:43and i i don't know how to and code that in a classical hmm was just set states
0:12:47how to share states among times zero
0:12:50i so don't know have think about that there's a structure come sort of stuck state space
0:12:54um um that we don't have a class which to um so that's gonna going to my P of G
0:12:58i'm gonna put a
0:12:59structural information of my prior and and given the particular choice i'm gonna have that
0:13:04and information available for inference at the uh hmm low
0:13:07alright so based on parametric solve these i think it a very elegant way and not lots of other the
0:13:11problem so i'm gonna try to show you what
0:13:13what we do
0:13:14a to do that
0:13:16okay so uh i'm get a a again be talking about random measures here be kind skip this slide
0:13:21i was not really much been set here go right to an example of a random measure
0:13:26um no right so everyone here knows what a measure is my is just a
0:13:29set you take a set in you get a number out
0:13:32and the sets them some sigma algebra
0:13:34um we wanna talk about now a random measures
0:13:37i was gonna be probabilistic and do inference on measures we got to have random measures
0:13:42and so that's first of all star was something really easy that's put a a random distribution on integers
0:13:47can just go from one to infinity
0:13:49and a distribution on is the sum of the set of numbers a countable of the sum to one
0:13:53we wanna a random distributions so there could be a set of random numbers that some the one
0:13:58right how do we do that they have to kind of a in some ways they can some the one
0:14:02and uh there many ways you think of doing that here's a way that turns out have beautiful comet a
0:14:06properties that allow was to
0:14:07to develop inference algorithms based on this idea
0:14:10as this called stick breaking it's old i in probability theory
0:14:14um so what do to do is what take a number of beta random variables that that um uh we
0:14:18it take in in a number of them and do the draw an independent so beta to pay a paid
0:14:22a random variable
0:14:23it has two parameters normally but a little pen the first one to one in the other will be free
0:14:27a random variables live between zero and one right and if you take a L uh not to be bigger
0:14:32bigger bigger this think tilt to up to the to the left
0:14:34so most the masses near the origin and less of the masses near one
0:14:38i so think of getting lots of small numbers out of this
0:14:42is beat around
0:14:43so don't to do but guess stick the goes from zero to one and we get a break off the
0:14:47first fraction of but according to be the one
0:14:49and but call that high one
0:14:51and the remainder of the stick is one minus to one and i'm the take a fraction of that beaded
0:14:56and this little part right here is that amount beta two times the red
0:15:00and a call that pike two
0:15:01here's pi three five for pie five and so on
0:15:05which keep break you know peace the stick and uh the total stick is has has mass one
0:15:10and so as we do this at to infinity we're gonna get a the pies will eventually some to one
0:15:16oh actually easy to prove if you want to prove it here
0:15:18you get these pies was sum to one or in this procedure
0:15:22alright so now that's a way to get a we now have a these pies are they are distribution for
0:15:26any fixed pies and the random so we have a random distribution only in
0:15:30so having learned how to do that we now how can i we can promote these up to distribution arbitrary
0:15:34spaces using that tool to here's how that's done
0:15:37you use these same price before as weights in a little mixture model an infinite mixture
0:15:42and the makes come the next chirp or uh components are these delta function these are do right dealt is
0:15:48they re unit masses
0:15:49at locations he K
0:15:51and the peak here in depended draws from a distribution G not on some space of this some space can
0:15:56be some arbitrary space
0:15:57it's that it got out of euclidean space it can be a function space it can be
0:16:01uh a a but space it can be just about anything
0:16:04um and so these atoms as are living on that's space and their be weighted by these these mixing proportions
0:16:09pike K
0:16:10i so each one these has unit mass and their been weighted by these things which some the ones so
0:16:14this total object G is a spiky object it's a it's a measure
0:16:19um it's total mass one
0:16:21and it lives on some space
0:16:23that's it's random and two ways it's a because the pies got my stick breaking and because these P case
0:16:28we're drawn from some distribution G not was the source of these things a me calling you think atoms
0:16:33so G not as the source of the atoms
0:16:35so with these weighted atoms and some space
0:16:38alright so that is a random measure if i take G of some set a it'll get me and number
0:16:43so it's a measure
0:16:45i it he's added even so on
0:16:47and it's random a to weight so it's a random measure so this is a very general way now of
0:16:51getting atoms if i do the pies according to stick breaking
0:16:54this particular object G has an name it's called a racially process and usually right it like this it as
0:16:59these two parameters the stick breaking parameter
0:17:01and the source of the atoms for
0:17:03a but we can break sticks in different ways and um get the atoms a different ways so this that
0:17:08actually a uh
0:17:09a the useful tool
0:17:11a right so we can have use this as a component of various kinds of statistical models are here's something
0:17:16called that a racially process mixture model and sky just what you think it was
0:17:19we're the use
0:17:20G a draw from G which are now drawn this way it's on some space but the speech here's the
0:17:25real line just like and draw at
0:17:27it has at that locations given by drawing from some underline distribution G not
0:17:32it has
0:17:33heights to these atoms give my stick breaking with this parameter alpha not
0:17:36so if i draw specific G again an object looking like that
0:17:40i the uses of the kind the a mixture model just uses as a as the distribution
0:17:44it's not your typical gaussian it's it's the distribution and random
0:17:48and i draw from it so i might draw this at in the middle of their uh this you know
0:17:52has high probability has a big high
0:17:55and have been drawn that that's a parameter now for some underlying likelihood
0:17:59um and i wrote it down here that X a given data is kind of some distribution a indexed by
0:18:05so i do that again and again that's this here that box
0:18:08and that gives me
0:18:09um i mixture model in fact there's some probability are get the same atom
0:18:13on six
0:18:14draws of theta and so those
0:18:18indices i would be coming of from the same at we would think of those is at belong to the
0:18:22same cluster
0:18:25alright right so that's a dirichlet process mixture model here some kind
0:18:28are data drawn from one of these thing and drive more more data is like go across you start seeing
0:18:33the blue dots more more data
0:18:34and to the parameters they data in this case are means and covariances of gaussians state as a whole big
0:18:40long vector
0:18:41a and you see that the number of distinct once was growing there only if you just ones and the
0:18:45you get more more
0:18:47and they grow it's sum rate fact that rate turns of to be logarithm of and
0:18:51logarithm of the number of number date
0:18:55so the number of parameters we have in this system is growing we we have small number of parameters here
0:18:59and we keep dry from G again and again and again we get more more parameters
0:19:02and like i said it grows a rate log ins we're nonparametric here we don't have a fixed parameter space
0:19:07no matter how much data we have
0:19:08as you me more more data i'm gonna give you more more parameters
0:19:14okay so let's go back to this little slide here
0:19:17and is i've been alluding to uh with some probability five picked a theta one equal to this adam here
0:19:22in the middle
0:19:23with some probability data to decode that exact same atom and so those two data points will come from the
0:19:28same cluster
0:19:29uh you can ask what's the kind of comical structure induced by this kind of procedural how often does that
0:19:34what's the probability that data to is equal to theta one no matter where
0:19:39theta one occur
0:19:40probably that you get a lightning twice in the same place
0:19:43and what is that probability fact not over for this particular G but over all possible choices of G under
0:19:49this procedure that i outlined
0:19:51this P G
0:19:52so you think that would be a really hard problems solve that sounds complicated and it turns out that's really
0:19:58easy to solve terms of the answer is
0:20:01and um so that understand that you need or stance a call the chinese rest is another stochastic process
0:20:07this is on partition
0:20:08um not on parameters this is on partitions
0:20:11and this here i've have all drawing this down here
0:20:14a you have customers coming to a restaurant with an infinite number of tables
0:20:18um so here these round tables that the chinese rest raw that's why they're round
0:20:22and and the first customer sits here with probably one
0:20:25the second customer or joins then was probably half an sort the new table probably half
0:20:29and the job is used to let table proportional the number of people already the table
0:20:33so that's often called preferential attachment you start to get a big cluster merging at some small clusters after that
0:20:39and you can easily prove from this this little set up here the number of occupied tables grows as rate
0:20:43at rate log in
0:20:45um okay so it's a beautiful mathematical fact that if you do the integrals that was talking about for the
0:20:49a racially process this turns out to be the marginal probability of the racially process of who sits with who
0:20:56i so it's it it's another perspective it's the marginal
0:20:59uh under that other big probability measure
0:21:02um P G
0:21:03you can make this into an explicit clustering model a mixture model effectively but now
0:21:08a of in terms of clustering in that a mixture components
0:21:10all you do is that the first person set at a table
0:21:13a draws a parameter for that table
0:21:15from some prior which would call G not the same G not as before actually
0:21:20and everybody was sits a table here at that same parameter vector
0:21:23so if fee one here is a mean in a variance of a gaussian that all the data points for
0:21:27customers around this table
0:21:28well all come from the same gal see have the same mean cover it's
0:21:32okay so this is actually i i
0:21:35exactly a marginal of the dirichlet process mixture my
0:21:40um okay so that's kind of little but a tutorial this is been able you know forty years old the
0:21:45the the the like process mixture models it's that's leave all the in and slowly getting taken up and apply
0:21:51um but then very very slow
0:21:53um now
0:21:54to use this in a richer class of problems that just mixture models and clustering
0:21:58a you want to have to face the problem that you have multiple estimation problems you have one not just
0:22:03one set a at have multiple sets of data
0:22:05and the hmm case that's gonna come because we have the current state which indexes the next state
0:22:10we have a bunch of distribution
0:22:12the next day distributions for all of the the current states
0:22:15so we don't have one distribution we have a whole bunch of distribution
0:22:18or all the rows of the transition i
0:22:21we need to estimate all those rows
0:22:23um so in statistics we off that have this a rise in we have multiple estimation problems or something parameter
0:22:29and there's a data based on that of another other parameters data based on that
0:22:32and we often want tie these things to get because there might be very little data over here a lot
0:22:36of data over here of these are related problems that makes sense that tie these to get
0:22:41and that called a hierarchical model it's one the main reasons to be a bayesian which is a very easy
0:22:45to make these kind of hard ca models at tie things together
0:22:47um and so what you do is you assume that E parameter
0:22:51a for all these subproblems uh are these primes are related by coming from an underline parameter
0:22:57as you draw these randomly from an my data
0:22:59and now all the data over here kinda goes up to this tree down to here and the the posterior
0:23:03estimate of data i
0:23:05depends on all the data here
0:23:07it's a convex combination of all the data
0:23:09so that arises out the hierarchical bayesian perspective and here's kind of a picture of that
0:23:13here's the i just showed
0:23:14here's a kind of the graphical model representation that using these boxes to represent replication
0:23:19so that our boss box represents these
0:23:21amber applications down below
0:23:25okay so we can do that now what the dirichlet process and so this the paper that we published few
0:23:28years ago a hard could racially process
0:23:31um this is i think really a very useful tool is just the hierarchy applied to a racially process next
0:23:36we have the same kind of setup as before
0:23:38uh but now we don't just have one G we have G one G two G three G four
0:23:43um for the different
0:23:44estimation problems the different groups
0:23:46and if you want be concrete about this is the different rows of the hmm
0:23:49each one of them has that is the right a measure
0:23:52and we have a set of measure
0:23:53all the different rows we want tie them together so they have the same next state space that they all
0:23:58transition to
0:23:59um so are we do is not good draw these things independently and like a lump them all together we
0:24:04want different transition
0:24:05proper probabilities is we don't want them to be complete separate state space they gotta be tied together in some
0:24:10so the way you do this is that you you have a a an extra layer of this graph you
0:24:14first of all draw uh from some
0:24:16source of atoms
0:24:18mother distribution G not
0:24:21it's now a random instead with be fixed parameter like before
0:24:24and now that the base measure that you used
0:24:26to going to each of the children
0:24:28they they draw the atoms from G not
0:24:31um and they re weight them according to their own stick breaking process
0:24:35so this ties together a set of random measures it makes them cooperate operate on what are the atoms there
0:24:41all to use
0:24:42and what are the weights at they're gonna use
0:24:45uh this also as a as a rest raw underneath it which is a really easy way to understand these
0:24:50we now don't have just one rest route we have multiple restaurants in restaurants in general and again this in
0:24:55my application hmms these will be the different current state
0:24:59or for rows of a transition sure
0:25:01right so if you're rest run number two
0:25:03you go when you sit at a table proportional number of people or to the table like before that would
0:25:08uh sharing or clustering with a restaurant
0:25:11and then if i'm the first person as sit at a table i i up to a global menu of
0:25:15dishes for the entire franchise
0:25:17and i pick it is from this menu maybe i pick
0:25:21uh a chicken up here
0:25:23um and i pick "'em" project i bring it down to my table and everybody who joint we my table
0:25:27has to that same dish
0:25:29we all the same parameter
0:25:30but i also put a check mark next to a dish
0:25:33and if you are in some other restaurant
0:25:35and and when you're the first person to your table you got to this menu
0:25:39and you pick it dish proportional the number of check marks next the dish
0:25:43case they're big you get some dishes a peer there are popular across all of the restaurants not just within
0:25:47one restaurant
0:25:48and they get transfer between the rest according to this this kind of preferential attachment it's
0:25:54okay so again get the hmm setting
0:25:56these will be the rows of the transition matrix
0:25:59the number of possible transitions as a in as you get more more data so just have K states i
0:26:03have a growing number of state
0:26:06and the state to be shared among all the different rows of the transition matrix according to this rest strong
0:26:14now i have a really nice application of this that we publish their this your that i'm good i think
0:26:18skip in the interest of time in this talk but to be of a paper on this
0:26:22i think it's kind of a killer out for this problem
0:26:24you're basically trying to model some of the geometry of proteins
0:26:27here's kinds of the angles that you get in proteins
0:26:30and if you put all the data for a particular a know so on top of each other you get
0:26:33kind of these fuzzy diagrams where they're not repress side
0:26:37and so what you really wanna do is not just have one estimation problem of the density of angles
0:26:42i you wanna break them up according to the context like in
0:26:45you wanna not just have one
0:26:47a distribution one have a bunch of them depending on the context
0:26:50when you break them up according the context or over you have a sparse data problem many of the context
0:26:55every little data
0:26:56as a very similar kind of problem lots of settings including signal process in speech
0:27:00um and so we want to have models like this at that that are depending on the context of the
0:27:05neighbouring an amino acids
0:27:06so this setup is exactly that the groups or the neighbourhood
0:27:10so for each context you of a group of data
0:27:12and you don't treat them separately you don't want them together read you have them cooperate right this tree
0:27:17and everyone that racially is looks kind like a virtually process like that's synthetic data should you earlier
0:27:22but you really have
0:27:23twenty amino mean as a left when twenty the right we get four hundred estimation problem
0:27:27or or context
0:27:28as we have four hundred diagrams like this
0:27:30and we get them to share
0:27:32atoms according to this procedure so
0:27:34uh i guess skip over some the results but they are um they're they're impressive this is kind of log
0:27:39a lot probably improve on test set for all the to and you know acids
0:27:43here's no improvement in this is over what they did probably due in in the protein folding literature
0:27:47so it's really quite massive pro
0:27:50um okay let's go back to hidden markov models uh where are that is a is a hidden markov model
0:27:55and our prior now is the structural
0:27:57machine or in telling you about
0:27:59um okay so now we have a hidden markov model we don't have a fixed number of states we call
0:28:03this a H P M
0:28:05we have an infinite number of states so here's time here's the state space
0:28:09and so when you in one of these current state you have a distribution on next states
0:28:13you get they get a like a trying as rest are pasta here's table one table two table three table
0:28:17and as a a a a growing number of tape
0:28:20i i do that for table for every time i hit state number two
0:28:24then and that's great i can get a growing number of tables but i wanna share those tables between state
0:28:29number two and three and four and so on
0:28:31so is i been talking about i need this
0:28:33hierarchical dirichlet process to tie together all those transitions
0:28:37um okay so i'm get kind of go quickly to slides like again details are what i'm trying to convey
0:28:41here just kind of but at this very
0:28:43and and and uh
0:28:45briefly what you do is you draw a mother transition distribution for all states
0:28:49and then everybody a for each
0:28:52um current state is a kind is a perturbation of of that so here's spike three
0:28:55it takes this in kind of every weights all these atoms
0:28:58i i will for
0:28:59and so one so for
0:29:02alright right so um
0:29:04i hope that was signed of kind of clear this is a way of doing hmms were you don't know
0:29:08the number states a priori
0:29:09and actually are putting a prior distribution on the number of states and it can grow you get more more
0:29:14a kind of a nice solution to the a classical problem in hmm lan
0:29:18um so we implemented this uh there is a simple sampling procedure um
0:29:23we did a slightly non samples procedure but there's a whole bunch of procedures the can be used to do
0:29:27a poster inference with this i given some emission some data
0:29:31uh in for
0:29:32the most probable hmm and for the uh trajectory D a viterbi path and for everything else you want to
0:29:38a for all the parameters about this
0:29:39you can do this pretty easily on a computer
0:29:41just kind of a
0:29:42we can standard methodology
0:29:44it's hmm a
0:29:46right so what we did this uh we apply to the uh diarisation data we did okay but not particularly
0:29:52we identify one problem we need to all before we could start to get performance at which was was satisfactory
0:29:57and that was that we have a little bit too much state splitting going on so here's a little synthetic
0:30:01problem in which which time
0:30:02there were three states in our synthetic data as you can see here so you know this state was on
0:30:06for a few uh time frames and then this state and so on
0:30:10um here was the data and you can sort of see that there are three states here but you know
0:30:13it's pretty noisy be kind hard to tell that's by looking at
0:30:15um here the output of our inference system are computer program that
0:30:19for the H D P H
0:30:21and it did find three states here but then it actually count of for state and the problem was that
0:30:26the parameters for state three in state for the emission distributions
0:30:29happen to be just about the same
0:30:32could happen it happened here
0:30:34and so when you were state three and four from the outside world point of view the emission probably is
0:30:37base of the same
0:30:39so the system was do it perfectly well
0:30:41at high likelihood
0:30:42and within flickering between state three and four
0:30:44and again i have high like this so why not
0:30:47there's or that prevents it from doing that
0:30:48but we don't want that of course because now we didn't a very good the diarisation problem thought there for
0:30:52people the room they're only really three
0:30:54okay so we're being bayesian in this problem and so we don't my put in a little bit more prior
0:30:58knowledge and this we put base nine N at this point so is put a little bit more and which
0:31:02is that people don't tend to type uh talk
0:31:05for microsecond millisecond time interval
0:31:07they ten to talk for sec
0:31:09we have one extra parameter a self transition probability in the hmm the diagonal of this infinite matrix
0:31:15is good be treated all but special get a extra boost
0:31:18right so we have one extra parameter which is the extra boost for self transition something like a a
0:31:22a um
0:31:26uh sim my mark of
0:31:27my mark you've H M
0:31:30as so we call that is sticky H D A H E P H M it just has the transition
0:31:34distributions before
0:31:35plus one parameter which boost these all transitions
0:31:38so if there's the distribution you had before
0:31:41then we add a little bit to that to um
0:31:44you or a little bit of boost
0:31:45for the uh self transition
0:31:48that parameter were being in bayesian is again
0:31:50i parameter B for by the uh given the data it's it's random
0:31:53it has a poster distribution
0:31:57okay so uh we put that into our system and um where now ready report some results
0:32:02okay so we went back the speaker diarisation problem
0:32:04and um
0:32:06implemented this on data uh i think this was two thousand and seven the uh in this R T competition
0:32:11data a we didn't compete in this we simply took the data compared to what had been done
0:32:16by people who do compete
0:32:18okay so uh this is diarisation error rate
0:32:21and icsi results of been the state-of-the-art uh for this at that time and i i don't remember if they
0:32:25are still um but they they were that see result
0:32:29no by a wide margin at the time
0:32:32and this is over the twenty one meetings
0:32:34and these are comparisons to the icsi just to get a flavour
0:32:37um so are sticky results are basically comparative to the red ones which of the icsi
0:32:42so if you just kind of scan through here
0:32:44the green ones are the non sticky each and you can see there worse and that was assigned to as
0:32:48we actually were reasonable this if something was not quite work
0:32:51so if you do a head to head
0:32:53basically were compare we comparative with the icsi result at this time
0:32:56i do wanna say that might
0:32:58my goal showing these numbers is not to say that we are
0:33:01we are be anybody or that were competitive this was two thousand seven
0:33:05we we're not we're not
0:33:06speech people we didn't try to compete
0:33:08um but we got the results are in fact you know compare with they state-of-the-art the art system and the
0:33:13main point a wanna make is it this was done by
0:33:15and lee fox a grad student to visit my group in the summer and she learned about
0:33:20the H H P hmm and she implemented this and they all that are one summer project
0:33:25so this is not that hard to do it's a tool just like the hmm you can learn and you
0:33:29can use it and you can get
0:33:30you know competitive results doing that
0:33:33um we have not pursue this and you know i know that the icsi team actually did a discriminative method
0:33:37that after that that that's a better numbers
0:33:40but all these numbers are getting pretty good at and i think that this this particular approach if if um
0:33:44if pursued can be a
0:33:46you know what the best that that's out there
0:33:48i i i'm row i'm i'd recall when hmms first came about i was actually a grad to at the
0:33:54um um the very first paper on hmms gives a numerical results complied compare to dynamic time warping
0:33:59and they were okay but not great better
0:34:01um but of course that was enough and that set the whole field often
0:34:04heading in that direction i i would like to think that this could be the case also these
0:34:08be the on parametric methods
0:34:10there easy to implement
0:34:11they're just a small twist or what you're used to they're just hmms a little bit of extra that you
0:34:15can move to a new table in the restaurant
0:34:17a little bit index heat of multiple restaurants that really in implement
0:34:20i the robust a you know one student can just implement it it in it were
0:34:26i do think the gonna play a role
0:34:30a C Ds are some examples of actual means these are with the diarization rates for both of X you're
0:34:34quite small you can see here we basically solve the problem
0:34:38and here's a meeting with things were a little bit worse
0:34:40there was one meeting which we did particularly badly
0:34:42turned out that wasn't the model as bad it's turned that the mixing we were use in a mcmc out
0:34:47with the for this the mix C was quite slow one that media
0:34:49we had mixed by the time we start
0:34:51and um so that still on when issue of how to
0:34:53make sure the markov chains mix
0:34:55if we're gonna use markov chain
0:34:59okay so that's all i want to say but there is a show um
0:35:02uh that that you just
0:35:03briefly mention a couple of other applications of each D P the uh been used in a many other kinds
0:35:08of problems not just for hmms
0:35:10in fact you can use these for P C F G's for problems the context free grammars
0:35:14and on this case you of a parse tree
0:35:16and the number of rules
0:35:18rules are like clusters that you're doing statistical an L
0:35:21and you grow the number of clusters is you see more more data
0:35:24and the same rule can appear multiple places in the parse tree
0:35:27that's what you have multiple restaurants of the chinese rest for exactly the same reason
0:35:31what share
0:35:32this your strings
0:35:33in different locations of parse tree so we have
0:35:35built a system that does that and
0:35:38are able to then in for uh most probable
0:35:41rules sets
0:35:42uh uh from data
0:35:44build part
0:35:48um okay so that was it racially process
0:35:51and um and the start talking about virtual processors lot more a lot people more you on it but one
0:35:56that wanna move on
0:35:57it's all you about some of the other a a stochastic process that are in the toolbox
0:36:01so one i'm particularly interested in these days called the beta process
0:36:05where the racially like process uh when you or the rest you have to sit at one and one only
0:36:09and only one table
0:36:10the beta process allows you know the rest right and sit multiple T
0:36:15so you to think about the tables now was not like clusters is like a feature a bit vector description
0:36:20of of of T
0:36:22so if you set a table one three and seventeen
0:36:25and i sit at three seventeen and thirty five we able to bit of overlap among each other
0:36:28so bit factors gone overlap an interesting ways
0:36:31and so that's what the beta process allows us to do the dirichlet process does not
0:36:38okay now be on the beta process all tell you but more but the be the process or map about
0:36:41one tell you briefly but the general framework
0:36:43um soaking they had a very important paper uh in nineteen sixty seven and i sixty eight uh a call
0:36:49on complete a measures
0:36:50the beta process as an example of a complete random a measure
0:36:53and right image are really simple to talk about it to work with
0:36:56all they are they are measures they're random measures on that's just like before
0:37:01but what's new here's a T assign independent mast an or second subsets
0:37:05of the space i a picture of this
0:37:07an X
0:37:07yeah are we got the here some arbitrary space
0:37:10a here's a set a and here's a set P
0:37:13and here this red a check is a random measure a discrete measure on this space
0:37:17and so there's a random amount a mass fell in to set a a a random out the fill to
0:37:22and if that random variable
0:37:25a here the random mass
0:37:27is that dependent
0:37:29because these are non overlapping sets
0:37:31then this random as is called completely random
0:37:34so it's a really nice concept it leads to divide and conquer out where them it's basically
0:37:38um you know
0:37:40as computational concept
0:37:42okay so uh there are lots of lots of things you you know you know what brown emotion is what
0:37:46brownie motion it turns out is is is a special case of this
0:37:50gamma processes process is and all kinds of other interesting object
0:37:53to respect is not completely random process but it's a normalized gamma prior
0:38:00okay so now
0:38:01um king men had a very beautiful results um
0:38:04characterising completely random process
0:38:07and turns out that they can be derived from poisson prop point process
0:38:11so the pa some process lies behind all of this
0:38:14and it's really beautiful construction or some tell you bout here a really briefly what we're trying to do remember
0:38:18as we have this space so make a
0:38:20we're trying put random measures on a at all job put the racially process measures on spaces
0:38:24i mean now be more general side but a how all kinds of other random a results space
0:38:28what you do it is you take the original space to make a and you cross it with the real
0:38:31line you look at the product space so make across all
0:38:36i'm gonna put a poisson process in that product space
0:38:39how do put a poisson process on things
0:38:41well you to tell but with the rate function in
0:38:43i you're probably fill with the homogeneous poisson process where you have a flat rate for a concentrate function
0:38:49and then every the in a little interval or action this case a little set
0:38:53um the number of points for there is a poisson random variable
0:38:56i with some re numb
0:38:58like a let the rate variance so now the poisson on uh number uh the possible rate is an integral
0:39:03of a rate function over that little set
0:39:05so you have to write on the rate function here it is
0:39:08and so i in a great of that rate function to get um
0:39:10i now possible rates for every little small set of this space
0:39:14and here's a draw from that some point process so you see the thing was tilted up and the left
0:39:18i got more points down here and if few are up here
0:39:22now having drawn from this are pa process with this rate function then i forget all this machine or look
0:39:26at this red thing here
0:39:28i take each X i drop a line from that X down to the or make at
0:39:32and now that resulting object is a as measure on the all make a space it's a discrete measure
0:39:37and it's random
0:39:38and it's close it's a completely random measure because any uh mass the falls in a some set a here
0:39:43and some set be here will be independent
0:39:45because it's an underlying plus um pro
0:39:49the beautiful fact is that all the random is gonna got this way
0:39:53okay so more directions trivial other directions complete it's quite non sure
0:39:57so if you like a plea ran a measures and i do
0:39:59uh then this
0:40:01there are says you can reduce the study of
0:40:03these measures to study of the pa some
0:40:05and just the rate functions the possible process
0:40:08so i think that's a tool
0:40:09i think that
0:40:10in this field the me others that we will be studying rate measures for poisson process
0:40:13as a ways to specify
0:40:16comet or structures on thing
0:40:20okay so the particular example of the beta process has this thing is it's rate function or a function at
0:40:24to arg gonna sort the a make a part and the real part
0:40:27the make a part just some at a some measure that gives you um
0:40:31you um
0:40:32the prior on atoms
0:40:33just like a was G not be for not called be not
0:40:36and then here for the beta process is the uh is the beta density
0:40:40and the improper beta density
0:40:43which gives you infinite collection of a of points we draw from point process which is what we want we
0:40:47don't want find a number that would be a parametric
0:40:49prior we wanna a nonparametric power we do this thing to be
0:40:53improper gender
0:40:55okay so so i that's probably too much map me just draw picture
0:40:58the here is that rate function it has a singularity the origins a really tilts up sharply at the origin
0:41:04it breaks off it stops at one
0:41:06and so all the uh voice we do the poisson some point process you get lots and lots of
0:41:10of points that are very near the origin
0:41:12and you now take this um by dropping from down from the uh
0:41:17the Y court down the X N
0:41:19then it looks like the
0:41:20it's O P I is the height of an adam and all make i is the location of the atom
0:41:25and you take that's a infinite sum and that a now is a random measures as another way of getting
0:41:30random measure like to be but stick breaking the four
0:41:32this is another way of getting random measure which is much more general
0:41:35um and a particular these P I do not some to one there between zero and one
0:41:40but they do not some the one
0:41:42and they're independent
0:41:44i so i like to think of these pieces coin tossing probabilities and i like i think that this is
0:41:48now an infinite collection of coins
0:41:50uh and most of the coins have probably nearly zero also picked a all these coins
0:41:55i'll get a few ones and lots of zero
0:41:58and get an infinite set of zeros and a few ones
0:42:01and i do that again a all can get a few ones and a lot of zero
0:42:05a keep doing that they'll be a few places where i'm gonna get lots of ones
0:42:09a lot of overlap
0:42:10and lots of other the places with there's less so
0:42:13of a picture showing that
0:42:14a here i've drawn from the beta process that that blue thing right there this between zero one it's mostly
0:42:19nearly zero
0:42:20and then here's a hundred as
0:42:23with these think of these as coins uh and think of these heights as the probability of one have head
0:42:28i draw this sum has a big a probably here's like a lots of one that are relatively lots of
0:42:33ones like go down the column
0:42:34and some region over here
0:42:36and uh no know i've nearly all zeros why quantizer probably like a almost all zero
0:42:41okay so think of a row of this matrix now snaps as a sparse binary
0:42:46infinite dimensional random age
0:42:48i think of this is a feature vector for a bunch of entities
0:42:51so here's a hundred and tease and if you i think of this like a chinese restaurant it D number
0:42:56one hundred came to the restaurant
0:42:58and didn't just sit it one table it's sat at this table this table in this table this table on
0:43:02the table be for different tables
0:43:04the next person comes in number ninety nine and sat at didn't sit the first table but set at the
0:43:08four stable table and bubble bobble
0:43:10right of this captures the it's it in pattern of a hundred people of in this restaurant and all the
0:43:15tables i cetera at
0:43:16and the total number of tables you keep doing that's gonna grow it's the denser and denser fill out
0:43:21but it grows again at its slow rate
0:43:25okay so um and
0:43:27you know different centres the for parameters of this process and you can get different right is to that on
0:43:31the settings
0:43:33i so that's probably way too abstract for you appreciate um why
0:43:37you wanna do that
0:43:38let me just say that there are also was a restaurant metaphor here
0:43:41um there's something the indian buffet process which captures the sitting pattern directly on the matrix
0:43:46not talking about the underlying beta process just like the chinese restaurant
0:43:50captures the
0:43:51sitting pattern for the de racially process and not talking about the underlined to racially process literally the marginal probability
0:43:56and of the beta process
0:43:58so i'm skip that um i one move to
0:44:02all the way there we go back to this problem i could make it concrete again
0:44:06okay so number this problem of the multiple time series i have these people coming in into the room and
0:44:11doing these exercise routines
0:44:12and um and i don't know how many
0:44:15routines there and in the library and i don't know who does what routine
0:44:19and each person doesn't do just one routine they do a subset of reading
0:44:24i i
0:44:25right so how my gonna model that
0:44:27i well it's time series and has segments and an i'm plug use an hmm as might basic like
0:44:32but now got put some structure around that to capture all this come oral structure about my problem
0:44:37so way that i'm gonna do that as a me i'm use the beta process
0:44:41and uh i have a slide on as i think and is that try say this an english what you
0:44:44start to slide
0:44:46um and the math if you care to
0:44:48a but i encourage you not just listen to me tell you how does were
0:44:52um um okay so let's both everybody this room as good come up here on the stage do the lecture
0:44:56size routines to every one of you is good to your all the subset
0:44:59there's a infinite library up their possible routines you could do
0:45:03you choose before you come up on stage which subset of that library you good at you're good actually pick
0:45:08do maybe i i'm to do jumping jacks and twist that's on there
0:45:13a up and have and there is an infinite by in
0:45:16transition matrix
0:45:18um um
0:45:19and that got possesses the not us get to possess
0:45:22and uh i pick out
0:45:24um twist
0:45:25and jumping jacks
0:45:26from that in for matrix so i'm gonna pick a little to by two sub matrix maybe twist is uh
0:45:30column number thirty seven
0:45:32jumping jacks as number forty two
0:45:34so i pick out that that those those columns in the corresponding rows i get a little to by two
0:45:38matrix and i bring it down thing in for the matrix
0:45:40and i and stan see that a classical hmm
0:45:44it's actually autoregressive hmms like it'll bit of oscillation "'cause" he's are also to remove
0:45:49i and now i run a classical a jim for for me dream my exercise routine my emissions or the
0:45:53six dimensional
0:45:54vector of positions
0:45:56and just a hmm
0:45:58right now i run the forward-backward algorithm and i gets "'em" update to the parameter
0:46:02i don't have they might the local a to go back up an infinite matrix and that little to by
0:46:06two that's right but the updates from the upper bound welch
0:46:10right now he comes then he's the next person a couple the stage and he takes also column thirty five
0:46:14but he did the node one a one one a one seventeen i
0:46:17so you out a three by three subset of the matrix
0:46:20he runs C hmm on his data
0:46:22it's about mulch update and then goes back to the infinite matrix and changes those that three by three submatrix
0:46:29and as we all keep doing that we're gonna be changing overlapping subsets of that for the minimum mean
0:46:36and so that's the beta process
0:46:40so i hope that you kind of got the spirit of that it is just an hmm there one hmm
0:46:45for each of the people do the exercise routine
0:46:47and then the beta process is this machine up here which gives us a a
0:46:50a feature vector a which of the subsets of states are the infinite set a state that i actually you
0:46:57i so the this maybe look a little bit complicated it's not it's actually very easy to put on the
0:47:02and again um this is actually in again emily came an implemented this center second summer with us
0:47:10but the software to do this
0:47:13okay so uh anyway it just a hmm like the with the beta process prior on some on the way
0:47:18the parameters struck
0:47:20now this actually really worked
0:47:21so um
0:47:22here motion capture results
0:47:26and this is nontrivial trivial problem of a lot lots of methods don't do uh well at all on this
0:47:31um it's a bit qualitative as how well we're doing but i think you'll kind of if you look adults
0:47:34were doing the well so here is the first feature
0:47:37if you pick to that's state i E feature
0:47:39and then and you just
0:47:40held that fixed an hmm it's or autoregressive hmms it'll oscillate
0:47:44the oscillations the arms go up and down
0:47:47this kind of picked out this jumping jacks
0:47:51this one if you take that feature in you put in hmm a you don't you don't any transitions happen
0:47:56you get the knees wobbling back back
0:47:58here you get some kind it was motion of the hips
0:48:00here's the more wobbling something or other here's where the arms are go in circles
0:48:04i so when your the bottom you start to get a little bit more subdivision the states than maybe we
0:48:08might like although emily thinks that there's kind of
0:48:11good can a mac reasons of these are actually to be gift
0:48:14a but this really this that nail the problem it took in this sixty dimensional time-series multiple once do not
0:48:18about the number of segments and where the segments occur
0:48:21and jointly segment them among on all the six all all all the different users
0:48:25of the sector
0:48:28alright right
0:48:29um how much you on time
0:48:31okay so i'm a about the not time i'm this good uh i think i against get some slides here
0:48:35and just say something about um
0:48:38this model
0:48:40so this is something i've were done for a number of years it's say um exchangeable model i bag of
0:48:44words model for text probably were shocked nations it's fairly widely known
0:48:48it's extremely simple it's too simple for really lots of we're a role work phenomena
0:48:53and so we've been working i make a more interesting
0:48:55and um
0:48:56what the lda model does it takes a bag of words representation of text
0:49:01and it re represents texans since terms of work on top top
0:49:05a topic is a probable vision on word
0:49:07and so i given document can express a subset of topics maybe be sports and trap
0:49:12and so all the words in the document come from
0:49:14the sports topic
0:49:15or the travel top can you get mixtures were called had mixtures
0:49:19um um of topics within a single talk
0:49:22um um the problem with this approach would many problems one of them is that the
0:49:26words like
0:49:27a a function words
0:49:29uh ten do occur every single topic
0:49:31because of i have a duck with only about travel and the function words don't occur in that topic
0:49:35i can get function words in that document
0:49:37i get a very low probably document
0:49:39right so we like to separate out things like function words other kinds of abstract words from more concrete words
0:49:45and make a traction higher
0:49:47right so we have done that was the paper the that that was G A C and the you're and
0:49:51i approach you look at this site more proud out of this and some sense than lda as a kind
0:49:55of a
0:49:56path forward
0:49:57for this field
0:49:58uh we call this the nested chinese restaurant process
0:50:01and so this is a a whole chinese restaurant up here and here's another chinese rest are all these are
0:50:05chinese restaurants are now organise in a tree
0:50:08when you go to a chinese restaurant you pick a table like before and that tells you what branch to
0:50:13to go to the next rest wrought
0:50:15so you think about this is a as the first night here in prague
0:50:18you pick some rest and then you'll or what branch you you know you know what rest are you need
0:50:22a to on the second night
0:50:24and then a third not and so on as you keep going
0:50:27the nights of the conference
0:50:29alright so what document comes down here and picks the path down this tree
0:50:33another dog it comes in a picks another path and they what that that the overlap
0:50:37and so you get these
0:50:38overlapping branching structures
0:50:40alright and now you put a topic at every node this tree it has a distribution words
0:50:45and then a document has a path down the tree that gives it a set of topics can draw from
0:50:50and that draws the words from those that that set of top
0:50:55now that a up of the top is been used by all documents
0:50:58so it makes a lot sense to put the function words up there
0:51:02or i as this no down here is only been used by small so so the docking so you might
0:51:06will put some more concrete words down there
0:51:09and that's what the statistics does we fit this the data
0:51:11it did develop stop it's at the top of the tree which are more abstract a more concrete topics as
0:51:15you go down the tree
0:51:17i so i'm get just yep
0:51:18and here's my last slide actually this is the a result to turn your head
0:51:22uh of um in this to us like i think is as action not psychology all you change this errors
0:51:26like that
0:51:28um psych review view he's are abstracts or a particular journal and psychology
0:51:32and this is the most probable hold tree we're put a distribution on the tree turn
0:51:36is everything is uh
0:51:38distribution here
0:51:39and the was high probability tree in is the was high probably topic
0:51:44or a high probably words that the topic at that node so we get a and of is the high
0:51:47probably words at the root
0:51:49so we have to strip away the function words in this case they just pop
0:51:52what well hold up the
0:51:54um the next level we get a model memory uh self social psychology motion vision binocular
0:51:59dried food brain
0:52:00oh so this looks kind of like social psychology cognitive psychology uh physiological psychology and visual psychology
0:52:07and so once you good other the is actually infinite tree we are shown the first three levels of that
0:52:12um so any about hope that "'cause" you or more flavour of you know the toolbox box no should here
0:52:15we but
0:52:16try rest not together and a
0:52:18with a a object which first distributions on is and we could sell do things like how traction
0:52:23and reason about
0:52:25um so i'm done uh i was just kind of a tour of a few highlights of a literature um
0:52:30they're probably about a hundred people worldwide are working in this topic
0:52:34uh actively we as a composer "'cause" there's a conference call bayesian a nonparametric parametric it's we held of very
0:52:39cruise later this model
0:52:41um every two years as
0:52:42you held
0:52:44and it is just be getting going so for much of the younger people the audience you uh
0:52:48i highly encourage you look at this is there's a little bit of work
0:52:51beam brought in the speech and signal processing but there as good be a whole whole lot more
0:52:55um so on my publication page there two papers which are my point you to if you enjoy the talk
0:52:59that are written for
0:53:01for for uh a non expert and give you lots of pointers to um more
0:53:05a literature
0:53:06thank you very much
0:53:16a there's time for questions
0:53:17thank you both to a them for two
0:53:20so can we see have time for
0:53:22some questions yeah
0:53:31hearing is read channel you get of make it
0:53:35and we already
0:53:36in the money
0:53:37a good a so i have two questions a lot first well thank you very much things to this and
0:53:40is possible for to still uh
0:53:43a community
0:53:44the first question to i have a a a um
0:53:48don't i'm just a
0:53:50full not a scale problem
0:53:52since this set and a
0:53:54choir monte carlo simulation
0:53:58to need that is to make a
0:54:01difficult you know i don't believe that at all and so no one knows really we go to large scale
0:54:07uh so
0:54:08you know the em algorithm does that apply to large scale problems or not
0:54:11yeah yes and no
0:54:12i mean for some problems
0:54:15really quickly settled down after a very quick number of iterations of the uh maybe like two iterations of em
0:54:20or for some problems
0:54:21these algorithms
0:54:22that should do a are just give samplers
0:54:25and so with the E out with a plus one extra step
0:54:28just a bitterly em do like em before uh you not change the indicator from this to this
0:54:33or or or go to a brand table
0:54:35right so we actually don't know whether uh poster inference are guns on large scale or gonna mix
0:54:40maybe they mix may of more quickly than were used to from a small day
0:54:44in the last point to make is that this has not you you C C that's just what we happen
0:54:48use because this was we had three months to the project
0:54:50any other procedures are split and merge algorithm there's is variational methods and so on for post your as can
0:54:55be used here
0:54:56um so um
0:54:58it's it's low
0:54:59yeah not a second question i have is that many people in this all
0:55:03where lee
0:55:05um the map of the use of a well yeah it's sample
0:55:08a a not a really small a good model
0:55:11yeah is for speech
0:55:12in in a fist of
0:55:15yeah that we select we use that to them what in france easy
0:55:19and and then
0:55:20easy correct so i just want to know you'll cake all
0:55:24to what extent
0:55:25this this ten
0:55:28your yeah you not okay that's a great great question like like a very much a so how those two
0:55:33papers or a minute
0:55:34and your question trying to show the toolbox box shows a range of other kinds of models we
0:55:39consider a
0:55:40a there need a racially process for example doesn't have
0:55:43a power law behavior
0:55:44you might want power law behavior
0:55:46or something else called the pitman-yor which gives you power be
0:55:48you could do a pitman-yor version of H P H M M
0:55:52and so there there are many many generalisations there's inverse gamma forms of the weights and so on so for
0:55:58i so um
0:56:00yeah uh you know
0:56:01this is really is a tool box
0:56:03and i think also the point about you know i'm a statistician we're used the to models which if you
0:56:07find the right card to it can really work surprisingly well
0:56:10and so you hmm yes it's not the right model for speech but that's lee set and i at
0:56:15and it still was the a you know the elephant to the got speech all the way to where it
0:56:20maybe badly may wrongly
0:56:23you know but uh i it is a very useful to make cartoons tunes that to have nice that as
0:56:28is go and computational probably and can be general
0:56:31so i think a hmm was too much of a box could be generalized easily and i think that these
0:56:35methods go beyond that
0:56:36once again
0:56:37but still staying with a problem
0:56:42but it's like back up
0:56:43oh he have to do that the back
0:56:45i have control over that
0:56:48or another question
0:57:00i can summarise ways
0:57:11yeah so is questions about over fitting wise you getting killed on over fitting a nonparametric world
0:57:16well you know i
0:57:18it's easy for is but bayesian don't have such problems of the over fitting
0:57:21it's kind of i'm not always a bayesian but what i a bayesian
0:57:24um you know it's one the is i am
0:57:27um so you know of to first or we don't have a over fitting troubles with these systems even on
0:57:31pretty large scale problem
0:57:34a fact
0:57:36um so you know we compare this for example one our context free grammars stuff the
0:57:40be a pet of there was ice with a split merge em algorithm
0:57:43and there they had
0:57:44a big over fitting problem they kind don't with that
0:57:47eventually pretty fact way
0:57:49um but we didn't have to think about that
0:57:51just didn't
0:57:51you know
0:57:52come up
0:57:54you know so that's the first order the second or you have a couple of hyper parameter of to get
0:57:58them in the right range
0:57:59you know get the right range of have some overfitting fitting but were very robust to that
0:58:03and so you're right yeah the day didn't totally work just of about
0:58:07we had a little over fitting there was a one more state there should a in
0:58:10but that's not too bad
0:58:11that it was not to easy do a little bit of engineering and think about the problem a little bit
0:58:14more they are well little bit of time scale and have at you know uh prior needs to be put
0:58:20so i i think that's fine were sort of artists an engineers were trying to we don't mix wanna build
0:58:24a box
0:58:25that in a you give to a high schools to the it's done
0:58:27oh always be a little bit thinking
0:58:29and a lot of engineering
0:58:30and to about
0:58:32um but i really the you know i'm not always a bayesian i've of of what's to be a non
0:58:36be i go back and forth every single day of my life
0:58:39you know but for a lot of these really high dimensional hard
0:58:42yeah inference problems we you have multiple things which need to kind of
0:58:45collaborate by the harry
0:58:47a just gives you a a a you know from the get go a lot control over those sorts
0:58:56in you the question
0:59:05so for speech we use mixture models within each yeah a is there way too
0:59:10this side went to great then use a versus one but you a new component yeah X question i should
0:59:15made that clear
0:59:16the state specific emission distribution here is not a single gal C
0:59:20for reasons you guy know extremely well
0:59:22is the the mixture of gas
0:59:24it's a the racially process mixture of dallas
0:59:27right and it turned out that was critical for us to get this to work
0:59:32absolutely we need more just like a the classical gmm
0:59:35a single permit speech done more
0:59:37we don't wanna have a
0:59:38L distributions there we wanted to grow as you get more all patches to be more more pictures of that
0:59:44mission distribution which a rise we put in the hard the like cross
0:59:47it's a hard like process
0:59:49that helps to make the point about two boxing better
0:59:55a well we should uh we've one more question asked one
1:00:04uh we actually use quite a bit up a all
1:00:08i hidden markov model in for
1:00:11and it's very good so make mean everybody to
1:00:14take a low but one that the major problems is in the
1:00:18a prior evil lucien because we have to estimate a lot
1:00:23for a hyper hour um
1:00:25so sometimes
1:00:26the number of high but from is is more than
1:00:29the pair on the this is so
1:00:31now now yeah talking about a
1:00:33in for a number of
1:00:35for it is
1:00:36so we're have probably i for a number of hyper per on no we don't know that's really critical so
1:00:41the classical bayesian hmm had a lot hyper parameter that because it had
1:00:46fixed K states yes and number of hyper ever scaled the size K
1:00:50right and you had the I C or something else like that the choose K
1:00:54we're not do we any of that
1:00:55we have a distribution the number of states and a number of hyper parameters constant
1:00:59okay a small
1:01:01so it's share she's here and i was actually there's a sharing by this high this time these uh choices
1:01:06a french
1:01:07the hyper programs real little with a top level
1:01:10okay okay at the menu that one there's of so
1:01:12a very small number of that so there is uh a pry pollution according to how you is you give
1:01:17the have breast they are correct the very number of small number of hyper parameters here okay some sense almost
1:01:22too small then you can't believe it it it it probably has to be a go a little bit to
1:01:26know is a is very good we but is that we not
1:01:29this is not your classical
1:01:30a bayesian hmm or you have the number of states things
1:01:33were in a of the number state it's friend
1:01:35okay as the city pattern in the rest
1:01:38it's it's grows that the prior with log in and the posterior just random
1:01:43thank you thank you
1:01:44a well i would like a us to thing
1:01:46see once again for if
1:01:56and now we have the uh a few break outside T one
1:01:59nine or