0:00:09hi everyone and change
0:00:12and this is mike and we're gonna talk about that big to inputs
0:00:17and first of all thank you all for attending i'll talk was
0:00:23when i don't here and this is a force time to glottic
0:00:27and i was talking to seven people about input so that nobody was kind of
0:00:32interesting stuff so i guess you have the guy is kind of interested in and
0:00:38that is really good for us
0:00:40so first of all i would like to time t-norm being because they are the
0:00:46first one that for that
0:00:49the there S all the audience as well
0:00:52who are really interested in non in using you all the languages
0:00:57and maybe last year we integrate that i would see the norm and that was
0:01:02you most were listing but we had some this solar discussion i don't it around
0:01:06ignore mailing list and things but honestly for us
0:01:11is it by testing which put how have one in the back stop
0:01:15and i would really like to thank john than the T S
0:01:20and we for the work
0:01:23and maybe let's talk that then i'll start let's talk
0:01:28you will be i'm going to talk about more about
0:01:31what are can put them at the side
0:01:34then why i
0:01:36help protect input matters that a quite and a bit of terror ticket part behind
0:01:41it and then
0:01:43the projects currently what we are working on so that you really get to know
0:01:47about more in a boat
0:01:49you predicted stuff and
0:01:53that's just for the i didn't have to the
0:01:56and if you are having any questions at any time nice feel to interrupt us
0:02:01so that we can and so at that point at so i'll be happy to
0:02:05take down the questions as well
0:02:07so all let starts
0:02:10a one of the input matters because i did this slide
0:02:15because most of you are not over know what i input like this are because
0:02:20most of the new bodies are using the
0:02:23in the this spanish keyboard or all the english keyboard or the next a keyboard
0:02:29so i thought it would be really good idea to use it to have the
0:02:33slice like this
0:02:36then i put ice of input matters
0:02:40one is kind of the rest input matters and all the rest and dispose input
0:02:44so characterbased input matters basically in D and
0:02:48cool year or vietnamese we call you at as a transliteration best input matters why
0:02:55be qualities transliteration based bit because we have the conversion between be
0:03:00ask al products or like you know products in the other are to be similar
0:03:06we can
0:03:07all the languages so that is why we called be characterbased input matters and for
0:03:12the in chinese and japanese stuff the core let's
0:03:16it's a sentence was input matters because in those input matters you do you don't
0:03:21have a
0:03:23space in between the words so it's really complex to have these such important matters
0:03:28if you see how job a japanese input methods are the japanese
0:03:33a sentence looks like
0:03:35this looks like this
0:03:39this one
0:03:43a that is a one
0:03:45this that is the whole sentence
0:03:47and is nothing but we are names in japanese
0:03:51honestly i really don't know much about japanese but mike knows here so he has
0:03:56inputted those characters if you see that on most basis in between the characters
0:04:03but there are but
0:04:05naturally they are more strict be space in between the chinese and or
0:04:11japanese stick so that becomes really hot
0:04:15to buy you japanese and chinese onto the computer
0:04:18because apparently we have only i guess thirty to a in general i'm speaking about
0:04:24but you to alphabets at such what to buy be a cactus other than the
0:04:30english or be lacking characters it's really difficult job and
0:04:36if you see right now if i use
0:04:40you know the computer in my mother tongue that is not what i think is
0:04:43moderately i of this full force at it and if you see this state of
0:04:49input matters
0:04:51the state of input matters on the next all
0:04:53after typing something you see like this
0:04:57i wasn't makes its kind of face
0:05:00why was for example
0:05:02i mean you want about like norm on my own on language on the deck
0:05:10i ideally it should take twenty fives you still but apparently
0:05:16it takes our own it nine you strokes and that's makes me mad why need
0:05:21to die ninety still by a word which i could buy in a english or
0:05:28be or
0:05:29i know like a keyboard profile it us
0:05:34so the predictive text is one of the way we are trying to solve that
0:05:38problem so that you that
0:05:40have to buy the less
0:05:42you get some solutions and
0:05:46maybe use this life will make this
0:05:54the need for such
0:05:56that big input methods i and it dislike
0:06:00baby force today because i was a listening to keynote by
0:06:05a date and let's more when that actually arms
0:06:10i mean four buttons now but he has shown he had shown the you with
0:06:16the next
0:06:19that that's okay
0:06:22he shown some more statistics about the brazil so i thought why not why not
0:06:29are working because we have like one point two one billion a population out of
0:06:35which seventy four percent are
0:06:38you can just read alright and in the language and out of reach what you
0:06:43five to six of the whole population bunch of population they can understand english i
0:06:49explicitly i did this because i've been telling on europe since last seven days and
0:06:55i met several people and the have the misconception awarding get that everyone in get
0:07:01can understand english it's really false
0:07:04in there's a out of this population five two four six percent a percent of
0:07:09the total population their billion just an english and i potentially could be one percent
0:07:14of the you open the one point two billion
0:07:17they have the you want and they use your technology they use in the operating
0:07:23system or anymore well devices and for then if you don't you do better prediction
0:07:28kind of thing they're gonna not they are not going to use E
0:07:33do you a softer for example in
0:07:36in the last year officially someone be more when you companies they sell more than
0:07:42two million and burn devices and why it's so popular in india because in and
0:07:47right you get lot of three acts as well as you get good input matters
0:07:52apparently in this room as well we use all kinds of input matters indeed more
0:07:57while or one devices
0:07:59and if you can see the dallas adjusting we have twenty two of which any
0:08:04recognise languages and i'm not just groups
0:08:08if you can see that the rest of the world could be so the and
0:08:11i good languages and the users you should provide good input matters to them so
0:08:16that they can so that it will be have to present the languages
0:08:22and another point is a are we are also having the that inputs or normal
0:08:27on tablet kind of thing
0:08:30and maybe for that we need putting matter size but
0:08:34and another thing for example if you know we one language and you got really
0:08:40good in typing one language and apparently you more stuff us be more than one
0:08:45language and we know one language really but what do we really don't know the
0:08:51are the language and to typing such kind of languages it makes a really hot
0:08:56for example if you go to china and david data like really good in chinese
0:09:02what if you tell them to type in english it because makes
0:09:06because they know the language but they are not
0:09:09really good in the particular language
0:09:11so that is the need of such input matters
0:09:16let's talk about how we can implement such things in fact is because to get
0:09:22this additions
0:09:24it's really hard because we have the number of words in the school you know
0:09:28and how you can predict the next one
0:09:32because you really don't know that okay what i'm going to say next
0:09:36so there are two techniques what is just we use some several techniques such a
0:09:41statistical techniques and you probably did very a pretty the next one
0:09:46i'll be on it as a language model
0:09:49so language model is nothing but
0:09:52of we just
0:09:53consider the problem
0:09:55in and you and language what is the probability that one what would follow before
0:09:59that word
0:10:00for example like no i'm speaking something some something about the predicted X so you
0:10:06can guess my next flawed all my neck sentences would be are something regarding the
0:10:11language model
0:10:13so similarly in probably get ready
0:10:16are incomplete us or any and but matters that does the same thing then we
0:10:22have be simple language model in that what you can see that is the number
0:10:26of a princess of words
0:10:29and divided by the number of hold what's in the language so that you get
0:10:32the probability because somewhat some sentences some words they try to getting together
0:10:39well for example i'm going so whenever i say a i then probability of the
0:10:46next what would be and
0:10:48the more score and saying it's not be exactly what
0:10:52but just you probably
0:10:56if you know little about do mathematics ideally don't want to going to the that
0:11:01a good that direction what its kind of boring and will not like you much
0:11:06so the amount goes sent is in i guess in nineteen sixties or seventies had
0:11:11propose a really good
0:11:13more T V that a visa like
0:11:17if you know the idea of history
0:11:21in the hysteria meant the same than you can calculate the future
0:11:26so saint at is been using machine learning technics
0:11:32you can just this team next word but you can just betting the next what
0:11:36but that probability is kind of eighty percent you client base a hundred percent goes
0:11:42wide so
0:11:44because we are humans and human mind these kind of "'em" because we really don't
0:11:50know what
0:11:51we would do next
0:11:53so that makes a really hard for the text prediction
0:11:58so you probably don't do would depends on the probability of D and probably previous
0:12:03words that is the basic thing what we of what is been used in the
0:12:09text prediction so we calculate do you need honest bigrams and by bigrams unigrams is
0:12:15nothing but that's a single word by defence is nothing but set up to words
0:12:21and diagrams is nothing but a set of to us so for example know normally
0:12:27so unique it on
0:12:28well known these
0:12:30is a kind of bigram and norm is also is a trigram
0:12:38you can relate such probabilities on a huge part of course say we have will
0:12:42be and so words on a given sentence so we try to calculate the unique
0:12:47don's diagrams and
0:12:50or trigrams and
0:12:53depending on that to try to calculate we try to predict the next work support
0:13:00so for example containing said you have to instances
0:13:05aborting think is also norm is also and norm shall is also so there are
0:13:10two different words
0:13:12and start and stop on the team but
0:13:17space is in what you can consider the special symbol
0:13:20so that you can guess this sentence has been started and this sentence has been
0:13:26so in this example say it should know would be vocabulary in you a document
0:13:33or in your corpus here
0:13:37you will contribute start what i ease also stall and that show
0:13:42and if you want to calculate the you need a model you need ample a
0:13:46probably just for this morning is D probably you might want to consider the probability
0:13:51of you what glottic so it's one S to sixteen how com is one S
0:13:55to sixteen a because to got it is used when you understand the whole corpus
0:14:02and the number of words in the corpus to sixteen so the probabilities one it
0:14:05into sixteen
0:14:08similarly the probability of ease is do what is
0:14:12a team into sixteen this if you can apply to see mythological here so you
0:14:17can get D
0:14:19you need on model so similarly if you want to apply the same logic into
0:14:24the background model as i said trigram or the lizzie a set of keywords
0:14:29so it's so you can on
0:14:32and divided by D starts time that means that a be probably the norm using
0:14:39placing this whole corpus and in the number of sentences starting with just a startup
0:14:45scene so it's politics to but see so if you apply the same logic to
0:14:50the whole sentence
0:14:51for example of
0:14:54a probability of noam got X is also meant and start
0:14:58you want to do like this you want to a lady same logic to the
0:15:01was and then you will get like probably you go text asked you into probably
0:15:08you'll ease glottic starts a single and to that end
0:15:11so it's kind of motivated by
0:15:14so that's all about the paralegal part which is kind of then that is again
0:15:19beeps and that's like
0:15:21if you don't if you get the unknown synthesis kind of thing but i to
0:15:25so how to normalise such sentences but i really don't want to will be getting
0:15:30to that complexity
0:15:31so let's talk about the projects we are working on so one of the project
0:15:35is i was type english the that do we are working on so at this
0:15:40point of time
0:15:42i didn't get to them i couldn't talk i would be
0:15:46i posted melissa
0:15:55so i tried to
0:15:59demonstrated it's
0:16:01so it
0:16:12i guess okay so we implemented something like that as and i was in to
0:16:17implement that it supports most language which
0:16:23can be easily transmitted weighted so it doesn't support astonish already said it doesn't support
0:16:28chinese and japanese because
0:16:31extra more complicated step to conversion to chinese characters is necessary but
0:16:38practically all other languages which can be well after consultation it's already finished are supported
0:16:46and all where directly what input is already enough
0:16:52and it users the way known input method from the M seventeen and lot of
0:16:57the so users who know D's don't a need to get used to use stuff
0:17:05and the hope is to improve typing speed a lot by getting very good predictions
0:17:12and typing on the if you look have to select the hard work
0:17:20most of the prediction comes from what do you the user types it learns from
0:17:25the user input
0:17:26and it one can speed it up by
0:17:29giving some topeka text for what the user usually types to it used to time
0:17:36needed for learning
0:17:38and if i mean explain these the prediction is based on the previous two thoughts
0:17:45on that i com database and if no suitable word can lose most suitable type
0:17:51them can be found in the database it for expect to i'm spare dictionary some
0:17:55shows predictors from huntsville dictionaries and it also uses times pay for collecting minor spelling
0:18:01it was
0:18:04and currently it's implemented in the front end five what's implemented in python and this
0:18:10a database for you see collide
0:18:14and i
0:18:16why should shoulder little bit how it works
0:18:38so i'm kind the german i was typing was to
0:18:42first of all i delete everything which has learned been done so far too
0:18:47to demonstrate that
0:18:49S G and it
0:18:52so if i'd type some german text
0:19:16so you see the second time i typed at
0:19:22i quit just selected that typing one that and see like because it be men
0:19:27but the next about based on the previous context
0:19:30actually i this type the last about so that support on the last to be
0:19:35because i did a typing mistake and so the first say a suggestion is no
0:19:40longer if i want to delete this from the database i can selected not this
0:19:45one but this control one and sell so know this suggestion this one from the
0:19:51and to speed up this learning process
0:19:58i can that we didn't some
0:20:01no not text file
0:20:03i can select lot context five
0:20:18so some example i have few have some
0:20:22some book which that the system a date
0:20:30and now if
0:20:37look at some text in this book i can easily input the
0:20:42the same text again this very little typing
0:21:05the because it are just
0:21:11you see that i'm using the german typing boost actually what i typed years english
0:21:15so for the it doesn't really matter for that items what language you are using
0:21:24you can mix the languages freely just like this with key application for the on
0:21:30the way it does
0:21:33currently we still have different engines for every language but i want to much is
0:21:41in much un languages you much few engines
0:21:46to support the same them saying which is in
0:21:50on use more number of engines
0:21:55it's to something else like for a nice model to
0:22:04so you can also do the same system for practically and the
0:22:09i don't know what this means that company come out here
0:22:19or queen you see that the
0:22:22suggestions the first character of suggestion
0:22:26is in i'm will actually so we see only the first john more of the
0:22:32i've typed only one jumble and the first act of that suggested lots as the
0:22:38first run most is
0:22:45okay that's the or did i think for the demonstration
0:23:09i think
0:23:16so the current problem solved i was
0:23:29you can't use the same code to go other in jeans
0:23:33or if you want to use the same girl it's really tedious so we have
0:23:38started one more project
0:23:40and if you can it's
0:23:43it's an X prediction library of which is written in the vol a so that
0:23:47you can using audit of projects as well
0:23:51just nothing but you had to well the lab is nothing but
0:23:56V handle all the key here but key variance and decline have to just subscribe
0:24:00product expectation so that once you have subscribed you'll get a prediction as it it's
0:24:09the next the next service we honestly need you have
0:24:14we need help in testing
0:24:17then this additions for improvements what new features because you are you guys at the
0:24:22uses and if you have some suggestions we
0:24:27we have a happy to implement those kind of things
0:24:30and again they huntsville additional is what we are using know i honestly don't think
0:24:37nobody meant instance will dictionaries this mean or a if your C D
0:24:43i don't know i mean loss of difference billy studies
0:24:47it's kind of maybe five to six years ago somebody created them
0:24:53and all that this to something huntsville dictionaries and we would like improve grows
0:24:58and also
0:25:00a creation of we got was
0:25:02that is the thing which is really need it for us
0:25:08in all what we it's really hard to get if we call was for this
0:25:24so in future we might want to add some grammatical analysis as well so for
0:25:29that corpus might be interesting at the moment we are doing only this markov model
0:25:33stuff and having a big corpus doesn't actually had that much if you need to
0:25:38know you which takes like all of picky pdf for english and the prediction based
0:25:42on the simple markov model for the next about this something one out of two
0:25:47hundred fifty or one out of five hundred which isn't very good so it works
0:25:51only where at the moment if it's the
0:25:55textual on from is what the user actually uses so
0:26:01normal users don't hide and all the don't try to know complicated style like oscar
0:26:07wilde or people tend to write a better vehicle for lunch or something like this
0:26:12or the button to be could use that type just much more repetitive and having
0:26:18really learning from the user input is the markov model much more help for them
0:26:25the meeting at be corpus
0:26:30and maybe that's thank you thank you only thing
0:26:42you all your book on predictive implemented this are that your demonstrated also held at
0:26:48E users
0:26:52five we didn't get and if you become katie use us so far actually V
0:26:56to get pretty very little feedback so i'm is asked for test as i asked
0:27:01some of the type colleagues to tested in court some nice suggestions for improvements that
0:27:05right implemented but there wasn't that much is a feedback and that kind remember anybody
0:27:11from katie it works katie don't know so it's
0:27:23so obviously for the i think and useful and it's context but roughly make a
0:27:28production in terms of one thing keyboards
0:27:31so i'm wondering you know what you thought if you give a thought to how
0:27:36we can take this and apply it when somebody's
0:27:40using on screen keyboard results we have more general issue of how we integrate i'd
0:27:45methods with on screen keyboards but i was curious what that you had
0:27:49the county doesn't get work this on screen keyboard spot and we want to make
0:27:53it work in future this one's thinking about and that this also one of the
0:27:57reasons why on each wants to put it into a liability because the nets will
0:28:01be easier to use from an on screen keyboard and with the current implementation and
0:28:06i've just one time
0:28:13problems can see what i think it makes much more sense for actually for myself
0:28:16when i type german or english i'm typing too fast so usually for me it's
0:28:22easier to just finish typing the about instead of looking and selecting but a nice
0:28:27at that many people in india are not comfortable with the way that consultation this
0:28:32time and hard time figuring it out and so for people who use computer for
0:28:38the first time in india it's very helpful if they get some suggestions after typing
0:28:43only if you let us similar like people on the touch us clean
0:28:48have difficulties typing
0:28:50i guess that it makes me wonder a question have you thought about whether they
0:28:53should be enabled by default in some languages should just we wanted if you
0:29:00choose indian input language program should just work like this by default yes
0:29:05of the on planning like to people but one meeting the people do we need
0:29:10to fix them up to code bugs for example when you try to integrate it
0:29:16as a text of input method
0:29:19but you need to fix shootings for example you if you're typing in a say
0:29:25if you're typing something in google
0:29:28you wouldn't want to situations
0:29:31i guess i'm which means that but it it's have display some suggestions and they
0:29:35don't say it don't function
0:29:37look up table gets into the way of the good suggestion so they overlap each
0:29:42so it's
0:29:44minute to switch it off and on all the time
0:29:49actually we need that would be then what that for example if you want to
0:29:52type something in those the
0:29:55and in that case as well you wouldn't require suggestions as well
0:30:00we need to do to my
0:30:05i mean indies you can actually there is maybe i to control that in some
0:30:09way now so there are these input hints that you can apply to text entry
0:30:13fields you can say i don't want it's you know this calculator i want and
0:30:17you mac stuff and this field or you can say then in your inhibit the
0:30:21on screen keyboard which you know you could then maybe imply okay and want to
0:30:25hear well prediction so maybe we can extend that technique and apply that to other
0:30:29toolkits and things that we have no good at a for something like the google
0:30:33search the field at the moment because sometimes of course if you type in the
0:30:36balls i wanted if you remain used it also for checking or whatever and how
0:30:41to find out that the user is typing into the google search for years so
0:30:46i don't know how to do that at the moment
0:30:53i think that it may do the right thing on and right i i'm not
0:30:58completely sure but i think might be maybe in is
0:31:01in H T M L so we just have to your out of expose them
0:31:05through to get the to the right place a you mean that's and they hmms
0:31:10to that page maybe
0:31:13i what we should we should listen deca see there
0:31:25okay so another questions thank you very much