okay the um the next yeah are shown and and you mentioned a structure i one you to long way monica i i being here so i'm to the it was to just to the work and we present a joint work as is prime the will of most wouldn't a of how much whose lose you didn't know was don't you you couldn't be here and if a used to but a written by them so i don't as a very that was means so i try and begin with a fairly broad introduction i don't wanna apologise in advance for not being able to cover all but details that process that something which i think it's more but if leave but even understanding of at least what we like to do and why it's important rather than to try and of the but is is it it is to do with you we also had a as a to P is as to be run this has just recently of your you B C but my i provide i begin can that motivation by don't why is an important problem and talk about are only non coding not an in particular and i i structure dance the not talk about what that's one is a prediction both for single and multiple sequence but that a technique is a what easy but method and that's with analogy with that what it comes send so i present a um at a double for i don't high level and show how this is an it the probabilistic method for decoding of of much lower on a common and a very strong analogy would that would be coding in digital communications put in the way you look at the problem and the tools we saw all and sort of the stage all literature also this point at which we don't that it's quite and be in the business a present experiment results in this process present present how people form as compared to what it out and which are from the same frame and finally and but some right so i think everybody's family would be an and the double helix this is the famous discovery be what's and right what good but it discovered that the in a happen the in is found in this W can back to we have a i don't a complementary based pairs a with each other a process he likes and on knee is very similar to the any except that hi mine is replaced by you or what i was as do you think about audrey as a leading a in one Q these you imaging is a strong a what i wants is that what they can look at the mall as compared to these point which are are stored row to to to to look at a mall and a trend is exponential in the free energy change so these points here are a much harder to break than the one that way i so this is what was the for to a me structure and usually the a are going from the five point to three by man so you listen this i'm i'm structure as a sequence of the autonomy much like a dog of the in human humans you know and the sequence of you don't you will have a similar by me structure of that i as a sequence of be dies along the model i like the in a or what happens in a don't is that it is common for this money to to for the one so so you typically have a single molecule rather than to document model use like to put the point that forming a variety of struck or the longest amount of time people believe that a on so just one function which will being a transient copy of the information in this is a a a simple dot mine it so you you know the size and the new yes a of it gets we transcribe right and do a missing and your in which comes out of the side of button and then it relies protein synthesis in that i was on this is you meant factory for producing proteins and all of that and you know so the in belief for that information for or you that in and fashion from the a or any on routine more recently what is the most how a that you never realise that are a bit a very active role in but i have realised that these additional types of are nice and a characterization of these on an is by what did do not do not but what so they do not hold for me that for to was non coding are nice because there are providing a function without really being translated into protein so they not coding and there all and that in these numbers being really discover or do not but right of the C O a what a different point in time for a variety of these and this is showing but sort of a can you please bass can happen one of these is we must your on it this i think at less is its own splice same which is cutting out of a segment all the concentrate on an a from this to produce actual a money it operate in it and down plates we use probably and the that reactions actions in the cell so there's a idea of these functions that are nice in these rules a like the role that you're familiar with and protein instances with a what on coding is the one is is important in these second roles for a any the it is the structure was determines function and that's almost all in an G for almost all molecules including proteins and that's the reason why this problem would be is a grand challenge problem and science to be based from the relation of structure it's quite challenging it it a well X like it's a about three and this is difficult to do because to purify a sample and so the slice that and then there questions finally with of those conditions actually represent present physiological conditions the body in which the what it actually opera so what will be interested in this computational estimation of are and it's can be structure and if you can do this the kind of things you can on on so that's what is this function of these non putting on because once you know structure you have a we've proving what should be function you also have a ease of understanding this you know i do you know so it is a whole sequence based what one on is for the reasons that a lot lately to the structure is what the function so that we need to work so that you will have more eight sequences were from the same have the same structure of on the same function so you would like to be able to figure out which of these are in different out it's rather than comparing based on sequence a sequence you'd like to compared based structure finally a as the standing close you got like to users a a it's right i that's the the quality of such a prediction to be able to some sized it's rather than just test a right but if you know one thing which was spherical in at feast for i and six are any collection structure prediction as compared to prove and that are any have a our our mention to you that this primary structure consists or linear chain monte you which is laid out the we have from the pipeline to by and and this just rolled away way of for but was a fitting the space is it just the in monte Q this for one itself so the formation of these complementary be spare this is i'm not as to that of mind that in the time i we N betting with side side in the N a a that in this case time as the base that you're so so you have a you in addition you also have a do you pairs and R is also possible so this same one you you see from here by frame to prime is laid out over here and i don't i or for me and the sign to see what you can see that it's going around round i'm coming well the be an on coming back this is referred to as the primary structure which is the sequence and this is done but it is can make a mean this is what high throughput sequencing does what you interested then is predicting the second we structure once to predicted this the dot or take those three structure and you have structure becomes easier because you already know the interactions that the and and i think is this progression of interactions is that's simply strong already mentioned this is very strong ones here a one i one and over here that wants all much speaker a the trash that this little are given you because so there's progression of prediction more prediction provision of formation of structure also guys the mechanisms by but you pretty so our goal in this work will be the prediction of segments this is referred to only as fourteen of a not any you and that much greater variety of structure than the in and up is it an example of an non you this is uh are are is P and you will see that what you have all these various more piece which are made up of you D Cs and loops do the two types to is as bad or just to describe them so this you can not strong flat as a ladder or would here actually is the he likes and the way that a wasn't you'd structure and then you have these these two and i was applied for data what set of base pairings a lotta given the C now like you stop with just was and dynamics will be used only in the dominant in the room you can have a variety of different structures the property of a given structure or not do this one quantity the was meant constant which is but actually using with the free energy change you have a i have free energy but have just like a structure energy that that you want so the most likely structure comes the one which minimize is free energy and accordingly techniques for prediction of secondary structure a by coming up with models which predict this free energy structure what of the most efficient models tends to be one which is called the nearest neighbor model it looks at a based betting interactions sense to the one nearest neighbouring base pair and has a we have become the true free energy change in terms of this based pairing right as work also done i didn't to protest just to a and not in as lab who was a chemistry not for speeding and a work but this a model which is down use but i mean i so one can imagine various algorithms buttons now for predicting second structure by trying to minimize free energy and that's something which is been done prior to a well people and understand we programming the what want what make a were here is this method does a very much that some to do to be a that or what is the minimum free energy structure a set of possible these the dynamic program you do is an of the yeah oh for those of you who were go estimation of the coding you also know that that the or with the south and the P C G are a button does this in a soft sense and it is also the an it was in in the setting which is referred to and the chemistry the noise he of the partition function which because of the property uh a base their location i would be at a location in based G and have a lot of what about how this done but also compute a by using a dynamic program so that a a a a a a new techniques one is a hard decision would you can note like to do what is this think the structure which minimize free energy are there is a prediction of based pairing properties in the C i don't sell what is the connection the double for the yeah but i a the same day and then try for white a joint decoding you got to joint decoding exactly because a computationally expensive so you do approximate joint decoding by using it usually decoding and probably information from one sequence to that well in major don't out the same structure which probably a the same function are encoded as that sequence and that's the connection do would be so over here we showing of what is the R any across different organisms and that of and and this to do with here and what you would see is that the structure is the same and when i C C am i mean of and a logical sense rather than a very exact sense that's some sir a some that about to use but if you look at these closely you will see that there are bases which are modified for instance this do you see where is change to and you below it becomes obvious that as not as you make you patients in a compensating fashion each time you change you do any a you jane the corresponding C do you you can still made in the beast bidding interaction and maintain the integrity of a second be structure so the structure and still be form a log in as a most to be able and will therefore be seen in the H so that multiple encodings all the same structures are provided to us by nature or true it's process of compensating nations and our goal is to try and predict signal structures by using this model of a lost to get a that that in as an would be decoding you want to use them collectively to decode and you can now look at a similarity to in them you can see the same you see the responding regions and also in addition you have the information from the alignment of these two C a a of what alignment but you also have information about alignment from this and so the goal of all are structure prediction structure and alignment i to come up with a production of these structures and well as a conforming a line an obvious this constraints from one which impose constraints on what we can do without so this is in some sense the frame but our goal is to take a number of input sequences the model C construct a prediction but that was pretty structures of these and also up or what's an optional and that's so that you phone like this all so that a programming out there is a mapping by just cycle this and again the similarity but that would coding is very telling this is exponential and topics complexity and the number of sequences you can the joint according two sequences and double you can give indications and the decoding the complex exponential and the interleaving that so this is something which is not feasible you and for two sequences this is something you cannot do without think one so our goal is to try and come up with a proper stick take me does this by iteratively computing single sequence for like properties and updating these as to go from iteration preparation and much the same be as the decoding for i so do talk about this in detail level present this but in this to to form so the way you can do but this and to real form is that you have these two sequences which have this structure but the just can be shown and this lower triangular matrix a with here but issuing showing what are the peace betting interaction so this space at this location a with the base at this location in the screen in this very that we're here and so on so these he is of or at least traces of lines as is you here corresponding to seconds to you have a corresponding set of based pair shown over here and then there is the alignment green the do which is between the two sequence oh the they just try to predict the best possible second structures can be a a a a dynamic program to and figure out what is the bar for alignment and what are the bearing interactions that way maximise you free energy chi in order to do this and the double frame we cannot live with hard decisions so the first thing that you do to to actually to present this in a soft frame but with information is problem so the base pairing properties become properties of base pairing the problem and i and properties become properties of alignment and then if you sequences as the figure you see at at this point you realise that if there is a very likely like a base pair in the sequence and it's highly likely that the fight i'm and of that base spare as a line with a given by prime and a second sequence and G prime and all that base pairs aligned with the T by of the second sequence it's providing you information about what of the second sequence and this is the information that you get a a bit of an alice so we can easily see and four a to your properties or base pairing for a second sequence by using the information or base baiting one sequence along with the alignment problem and because everything is a probabilistic all the information or and saw and this is something we can now incorporate in the voting of the sec you have to a sequence these sequences the process is not much different you can use the do this information to two sequences and in for a what are the properties of base pairing for the third sequence same way and you can be this and that is a weighting scheme that we come up with which so here is essentially a that a scheme works if you're trying to predict what is the extrinsic information what of the information provided you for you for for of a given sequence X M but other sequences use use information from all the other sequences the corresponding alignment property matrices in for these we them an appropriate uh combine them to come up with a extending thing information for paying of a given C this and the information can be incorporated it a frame but in much the same way at as done that would coding it has an interpretation as to the posterior property in the stuff that you have a in the what decoding coding also i well it a lot of to drop the you of see and or or or a good also has the structure of to to the updating the base being property oh is the summary of this and then want to a it this you can find pretty how would the high prediction how oh present this before for i make that presentation of what was point out the computational complexity is also similar to would decoding we get the complex be compared to single sequence folding while i to get the benefits a a joint sequence training the joint sequence for would be exponential in the number of sequences this is you do the for two D and to the part okay a whereas as a complexity is you square so we can uh well it is a look at how these performed and i will give you the results quickly so we are it this or a benchmark dataset but not structures and we can value these by looking at a sense be was as P P V since to really is the number of actual that he predicted directly P B we had the number of predictions that i so that a standard are a of you are what is to be in the upper right corner of what they're and here is double for for three sequence double for for ten sequences and or need techniques this is log on any which is a technique would just probably stick information on a highly four and single sequence for so the message here is that by using this information and initiatives fashion you get do significantly better and what is these that time i'm also be disk if a better than these on an L for is much faster but then you can always give the wrong on so but got load at present a double for a multi sequence structure prediction the which has strong and is with that would be coding and is motivated by this and as you hear and provide that is the close to or high that everything is forming well having like city similar to sing i this collection coding T and for the to yeah so i given weeks is the one who was on shape based techniques we you collaborating with him trying to see how we can incorporate shape i that in addition to the data that incorporating there's a very strong analogy in the way the she that can also be an now incorporated it's also was to get property which you can be in do the forming of sequence traditionally that has been a a sequence you can single sequence for you working on trying to see how you and are in that the much as C that's right i will and more recently also like to see how we can apply this to a I V N S I B