thanks thanks very much to very nice to be here and

thank you for organising

special

session um

between talking about particle filters

uh uh uh to sell not your filtering problem so uh

problem is to estimate X

which is D dimensional

and were given a dynamical system which can be a linear

in X

as well as measurement sequence of measurements

which can be nonlinear in in and

the usual approach and particle filtering is to compute

the entire are

conditional density of X conditioned on all measurement

which is a very very and vicious thing

uh to to uh and hence the

basic problem is computational complexity

so here's a plot that we made about eight or nine years ago for the um bootstrap particle filter

uh what i call the classical particle filter it um

illustrates the problem in high dimensions

uh a different dimensional state vectors a one dimensional problems to easy and high dimensional problems are hard it's the

message of the chart the

vertical axes is the error estimation error roll to the optimal so this is optimal performance

and the horizontal axis is the number of particles

uh a million particles uh

ten dimensional state vector

that ten orders of magnitude worse than optimal

that's the problem now

uh it's also been the focus of the research and particle filters for the last fifteen

or uh

so here's the

the theory are we talking about today um

fix this problem at least for a certain class of examples so here is the same

problem uh for a thirty dimensional state vectors stable plant

uh you get essentially optimal performance

even for just the few that's was a hundred

particle so if you contrast

uh ten dimensional state vector with a million particles ten orders of magnitude worse than optimal

to the theory you'll be talking about today

or uh about a four order of magnitude and prove meant

at least for this example

uh that example was a stable plant here is an extremely unstable plants meeting all the eigenvalues are in the

left have

in the right have

plane and uh five dimensional state vector essentially optimal

for just a hundred particles

if the plant is uh a higher dimensional say thirty dimensional

you need more particles to approach optimality

but one shouldn't get too excited about that that's sort of result these are just linear system

um it's an assurance that at least for linear system the new three L be talking about

is interesting here is a highly nonlinear example

hundred particles six dimensions

the i'm talking about here

each the extended kalman filter by roughly

uh and order of magnitude so there's a hint

that there something interesting useful new in that theory um talking about all show you more

not an your examples later

second question is the computational complexity the theory i'm talking about here which be call exact flow

as compared to say the bootstrap particle filter and other particle filters um

for are for different dimensional state vectors going from five to thirty

uh for a hundred thousand particles

the new theory is roughly four orders of magnitude faster that's running in matlab on a single

P C uh and the question is i is that that

quite surprising

because we don't resample we have no proposal density and we never ever resample particles

and the

the bottleneck in a normal particle filter is the resampling step by eliminating

that we've

uh improve performance by

about

for orders of magnitude

speed up so we have

about three or four orders of magnitude fewer particles required in addition to

three or four orders of magnitude faster execution of the code

so these put together

six to eight orders of magnitude faster computation for the same

accuracy

in addition to that

um the theory all be telling you about this afternoon is uh

embarrassingly parallel liza meaning that

each individual particle has its own individual computation not interacting

in a strong way with other particles uh

this allows one to code these algorithms and say gpus thousand gpus you might give the speed up

factor

of a hundred this is unlike other particle filters where there's a very strong

bottleneck because of the resample

and again because we do not

resample we've eliminated that

well

so um how does this work

uh what's the theory

um well actually the first question is what's the problem the problem is sort of surprising

um any nonlinear filter would consist of these two parts a prediction of the probably density

and the application of bayes rule

so this is a pretty complicated operation it amounts to solving a partial differential equation

where is this operation is uh not as simple as you can imagine is just multiplying two functions of

point

and the very interesting surprising

but true

situation is that that

this is where the problem is

uh for the most part

well known

oh it's been well known for many years and has goes by the name of particle the generous C and

this is just a partition the double straight

what it means uh

if the prior density of the state is well represented at these points school particles

but you wish to multiply the red curve times the blue curve it turns out that uh

quite often it's unlikely to have particles in the right position

to represent that problem of the red and the blue curve

so somehow or other one would like to

position the product

we're get a new set of particles or moves the particles

so was to be able to get got a good representation of the product

and that's what we'll do

a do it starting with an idea yeah that is uh uh not our our idea uh not a new

idea uh it

first occurs

in a paper about ten years ago by simon got soul will be giving a talk later

here uh

it's the to find of flow of the probably city

from the prior G

to the post E the product of G times a

or this single value parameter lambda a lambda is like time

lan to go from zero to one

from the prior to the posteriori

and you can see when the is equal to one

uh we

P will be equal to the product of G times

what you really like to do however is somehow or other computer flow of particle

and that's what i'll tell you how to do flow the means

move the particles

using an ordinary differential equation

uh from lambda at

zero to lambda at one

so the name of the game is to compute the flow

we can do that in a large number of examples and uh the way to do it is to uh

expressed that uh flow as a function

and tight turns out the functions related to the solution of this partial differential equation

partial differential equation is uh

stream really simple if you had to have a partial differential equation it's about the nice one

to get it's linear

it's first water and it has constant color fish

on the other hand

we're particular about the solution that is we want the to be a stable dynamical system and secondly the left

hand side is a known function

but it's only known a random points in particular at the particle

uh but we can solve but nevertheless uh using many different methods

it's a um it's furthermore i should point out it's a

say

it's a

it's a linear partial differential equation that's highly under determined

at at it's a single scalar valued partial differential equation

but it's in the known

so uh

is generically only the term

uh there are many many ways to solve it and i'll show you a few

in this brief

period of time that i have

the talk about um

subject the first one is the most concrete the simplest to understand and the fast

for a special case

the special cases

if the prior and the likelihood are both gaussian

then we can solve the partial differential equation this function that's uh affine index

the matrix a a

and the vector V or explicitly computable

and the very good news is that that dynamical flow

is automatically stable

well the eigenvalues are in the left half

right

a second simple explicit

solution

that's also quite fast is given here

suppose you were given the divergence of F then you can pick the minimum norm solution F

as the generalized inverse as gradient of the log of the uh

the density times the uh log likelihood blessed virgin

and uh although that's quite

possibly

come suitable um

it turns out that the um

yeah generalized inverse is nothing of this gradient is nothing more than the transpose divided by the square of the

norm and so with

like only amounts to this

so the flow oh

particles goes in the direction of the gradient

of a lot of the density

and uh if that

gradient along the density is zero

the flow stuff

which is in agreement with ones

intuition

uh yeah the third solution

which is fairly easy to grasp

uh intuitively

and quickly is that uh we can solve that P D we can turn the pde into plus sounds equation

famous and Z

by asking that the vector feel be the gradient of the scalar potential

and then um to greens function for plus sounds equation in even in D dimensions as is well it's that

differentiated to get the gradient

we can pick either either of these two solution

and it a

turns into an algorithm

with the algorithm is the following

the direction of

the flow of the uh particles is given as the sum over particles

uh equal to the uh basically the electron density in this uh

charged a particle feel

times uh something that looks an awful lot like uh

a well of D was equal to three this to be an inverse square

law so it looks like cool ones a cool hoodlums law famous from physics

uh in particular lecture min attics and fluid flow and also

gravity

so here we have particles which are mathematical constructs to solve a nonlinear filtering problem

that should slow according to a famous physical law called cool homes

law uh in addition to being interesting from a physical viewpoint it's very fast

other ways to solve

at uh

concrete example

for a highly nonlinear problem

to solve the problem where you were given the

measurements of the cosine of an angle and then asked to estimate the and of course that's highly in figure

a single measurement

so one has to depend on the motion i'll of

the angle

over time

and the knowledge of that dynamics

and um

the dense the conditional density is highly multimodal uh bimodal at the beginning

and so a kalman filter or extended kalman filter would die verge the error would grow with time

um but if you wait about twenty seconds the particle filter that we design on version

and B the extended kalman filter

and uh the reason for this is of course extended kalman filters do not cannot

represent multi-modal dance

uh i here as a movie

of the particle flow uh

based on the very first measurement

a part there's about a thousand particles it's dot as the particle

it's in this uh as two dimensional state space

and it there's a uniform distribution of uncertainty in angle and angle rate

and as a result of making a single measurement

what the differential equations that we derive

automatically do

is create that

conditional them

so we go from a uniform density to a bimodal

then

wondering what uh

five but five minutes

most general solution for this uh um P D E can be written out this way

terms the generalized inverse C sharp sharper of this when you're difference operator

which is a matrix post the divergence operator

a plus a projection and to the D minus one dimensions orthogonal to that

direction and Y is anything

so this is a very explicit

simple representation of the

uh ambiguity of the arbitrariness in the solution

the solution of this equation is highly arbitrary um

and

a question is can you pick a Y

that is a good

uh solution

take it good solution the answer is yes you can pick Y

stabilise this flow to make that

that's stable dynamical system

and you can do it in a very very simple way can just picked white why the minus

ten X

as an exam

uh quickly to nonlinear examples

twelve dimensional state vector

you measure three components the a quadratic measurements

uh the error of the particle filter decreases as the number of particles increase

and you finally be the extended kalman filter performance by about two

orders of magnitude similar for the cubic

linearity

one summarise by saying um

like just repeating what i said

uh are ready

emphasising if you fact

speed is of the essence here

and what i've channel you is an is a new algorithm based on a new theory that's orders of magnitude

faster than

certainly the bootstrap filter

and other standard particle filter

at least for certain non linear examples that i've shown you it's orders of magnitude more accurate than the extended

kalman filter

uh it focuses on solving

the key

a problem with standard particle filters that is

the problem particle to C

and it does it

uh by moving the particles to the right

part of state space

to compute

the product of two functions corresponding to bases

rule

we never resample particles because we don't have to and we'd rather not because that's a time consuming operation

since we never resample we never need a proposal density

uh we don't use any

a markov chain monte carlo that method

uh we never read represent the probably density but rather we represent the log

of the a normalized probably density

i've asserted that this is

highly parallel lies able

and uh

you might believe need but if you don't you can

try for yourself when your favourite seven

you use

and a final point is um

a a is this magic were fraud or is there some reason

that one can grasp intuitively Y

we should be getting better performance

uh and i think

i hint at

a reason is

where lot solving the most general filtering problem

we are not solving wall of the

i only full during problems that particle filters could solve

we solving the special

class of

problems where the probably densities are nowhere vanishing

and continuously french

the state

which is a more restricted class

so that we can write differential equation

and exploit the good idea is uh a mathematicians of

cooked top over the last two hundred years

to a van

a at least that's my curve

maybe be think something else after uh

a few years of reflection

questions uh please

simon

uh nowhere vanishing

yeah

a course in terms of computational complexity

the structure of the problem um um

can greatly influence

as you well know the um resulting computational complexity that from a theoretical

you point a it is nowhere vanishing continuously differentiable

then sit

peter

the particles are asked to follow the conditional density

uh throughout the predict time prediction step

as in any particle filter and the particles are asked to follow the conditional density

uh at the big

but four and after bayes rule

so if the algorithm is doing what i've asked to do the particles do in fact

satisfy the conditional then

yeah every single point

in the L

and at each

step

there's no proof that that happen

i have no idea what these sufficient conditions would be on such or

proof obviously you need enough particles

you need an of um

regularity of the densities and the dynamics

and the likelihood

uh we're no close to having a theory

that would be useful in guaranteeing such a result

it's proof by map

on a honest on the subset of

examples that we chose

for for all examples that we've were we we have worked out uh six different classes of

exam

lots of linear examples

quadratic

uh rate examples both with and without range measurements

and the um

measurements of phase

the cosine

so when all those examples that we

run our

we get

better

much better

accuracy than the extended

comes to

right

for the linear K yes for the linear your cases um

and for the quadratic and Q

for a given number

particles we be the

well

a selected

set of standard particle

you probably know that there's lots and lots of particle filter

and will probably hear about new ones

sept F known that i

that one of that so

an exhaustive check be

the

uh suppose you not believe me i'll give you the code and you can check it on your very for

uh set of

particle

a point of comparison

i think that's a way to make progress

or questions

yeah