hi everyone and change

and this is mike and we're gonna talk about that big to inputs

and first of all thank you all for attending i'll talk was

when i don't here and this is a force time to glottic

and i was talking to seven people about input so that nobody was kind of

interesting stuff so i guess you have the guy is kind of interested in and

that is really good for us

so first of all i would like to time t-norm being because they are the

first one that for that

the there S all the audience as well

who are really interested in non in using you all the languages

and maybe last year we integrate that i would see the norm and that was

you most were listing but we had some this solar discussion i don't it around

ignore mailing list and things but honestly for us

is it by testing which put how have one in the back stop

and i would really like to thank john than the T S

and we for the work

and maybe let's talk that then i'll start let's talk

you will be i'm going to talk about more about

what are can put them at the side

then why i

help protect input matters that a quite and a bit of terror ticket part behind

it and then

the projects currently what we are working on so that you really get to know

about more in a boat

you predicted stuff and

that's just for the i didn't have to the

and if you are having any questions at any time nice feel to interrupt us

so that we can and so at that point at so i'll be happy to

take down the questions as well

so all let starts

a one of the input matters because i did this slide

because most of you are not over know what i input like this are because

most of the new bodies are using the

in the this spanish keyboard or all the english keyboard or the next a keyboard

so i thought it would be really good idea to use it to have the

slice like this


then i put ice of input matters


one is kind of the rest input matters and all the rest and dispose input


so characterbased input matters basically in D and

cool year or vietnamese we call you at as a transliteration best input matters why

be qualities transliteration based bit because we have the conversion between be

ask al products or like you know products in the other are to be similar

we can

all the languages so that is why we called be characterbased input matters and for

the in chinese and japanese stuff the core let's

it's a sentence was input matters because in those input matters you do you don't

have a

space in between the words so it's really complex to have these such important matters

if you see how job a japanese input methods are the japanese

a sentence looks like

this looks like this

this one

a that is a one

this that is the whole sentence

and is nothing but we are names in japanese

honestly i really don't know much about japanese but mike knows here so he has

inputted those characters if you see that on most basis in between the characters

but there are but

naturally they are more strict be space in between the chinese and or

japanese stick so that becomes really hot

to buy you japanese and chinese onto the computer

because apparently we have only i guess thirty to a in general i'm speaking about

but you to alphabets at such what to buy be a cactus other than the

english or be lacking characters it's really difficult job and

if you see right now if i use

you know the computer in my mother tongue that is not what i think is

moderately i of this full force at it and if you see this state of


input matters

the state of input matters on the next all

after typing something you see like this

i wasn't makes its kind of face

why was for example

i mean you want about like norm on my own on language on the deck


i ideally it should take twenty fives you still but apparently

it takes our own it nine you strokes and that's makes me mad why need

to die ninety still by a word which i could buy in a english or

be or

i know like a keyboard profile it us

so the predictive text is one of the way we are trying to solve that

problem so that you that

have to buy the less

you get some solutions and

maybe use this life will make this



the need for such

that big input methods i and it dislike

baby force today because i was a listening to keynote by

a date and let's more when that actually arms

i mean four buttons now but he has shown he had shown the you with

the next

that that's okay


he shown some more statistics about the brazil so i thought why not why not

are working because we have like one point two one billion a population out of

which seventy four percent are

you can just read alright and in the language and out of reach what you

five to six of the whole population bunch of population they can understand english i

explicitly i did this because i've been telling on europe since last seven days and

i met several people and the have the misconception awarding get that everyone in get

can understand english it's really false

in there's a out of this population five two four six percent a percent of

the total population their billion just an english and i potentially could be one percent

of the you open the one point two billion

they have the you want and they use your technology they use in the operating

system or anymore well devices and for then if you don't you do better prediction

kind of thing they're gonna not they are not going to use E

do you a softer for example in

in the last year officially someone be more when you companies they sell more than

two million and burn devices and why it's so popular in india because in and

right you get lot of three acts as well as you get good input matters

apparently in this room as well we use all kinds of input matters indeed more

while or one devices

and if you can see the dallas adjusting we have twenty two of which any

recognise languages and i'm not just groups


if you can see that the rest of the world could be so the and

i good languages and the users you should provide good input matters to them so

that they can so that it will be have to present the languages

and another point is a are we are also having the that inputs or normal

on tablet kind of thing

and maybe for that we need putting matter size but

and another thing for example if you know we one language and you got really

good in typing one language and apparently you more stuff us be more than one

language and we know one language really but what do we really don't know the

are the language and to typing such kind of languages it makes a really hot

for example if you go to china and david data like really good in chinese

what if you tell them to type in english it because makes

because they know the language but they are not

really good in the particular language

so that is the need of such input matters


let's talk about how we can implement such things in fact is because to get

this additions

it's really hard because we have the number of words in the school you know


and how you can predict the next one

because you really don't know that okay what i'm going to say next

so there are two techniques what is just we use some several techniques such a

statistical techniques and you probably did very a pretty the next one


i'll be on it as a language model

so language model is nothing but

of we just

consider the problem

in and you and language what is the probability that one what would follow before

that word

for example like no i'm speaking something some something about the predicted X so you

can guess my next flawed all my neck sentences would be are something regarding the

language model

so similarly in probably get ready

are incomplete us or any and but matters that does the same thing then we

have be simple language model in that what you can see that is the number

of a princess of words

and divided by the number of hold what's in the language so that you get

the probability because somewhat some sentences some words they try to getting together

well for example i'm going so whenever i say a i then probability of the

next what would be and

the more score and saying it's not be exactly what

but just you probably


if you know little about do mathematics ideally don't want to going to the that

a good that direction what its kind of boring and will not like you much

so the amount goes sent is in i guess in nineteen sixties or seventies had

propose a really good

more T V that a visa like

if you know the idea of history


in the hysteria meant the same than you can calculate the future

so saint at is been using machine learning technics


you can just this team next word but you can just betting the next what

but that probability is kind of eighty percent you client base a hundred percent goes

wide so

because we are humans and human mind these kind of "'em" because we really don't

know what

we would do next

so that makes a really hard for the text prediction

so you probably don't do would depends on the probability of D and probably previous

words that is the basic thing what we of what is been used in the

text prediction so we calculate do you need honest bigrams and by bigrams unigrams is

nothing but that's a single word by defence is nothing but set up to words

and diagrams is nothing but a set of to us so for example know normally

so unique it on

well known these

is a kind of bigram and norm is also is a trigram


you can relate such probabilities on a huge part of course say we have will

be and so words on a given sentence so we try to calculate the unique

don's diagrams and

or trigrams and

depending on that to try to calculate we try to predict the next work support


so for example containing said you have to instances

aborting think is also norm is also and norm shall is also so there are

two different words

and start and stop on the team but

space is in what you can consider the special symbol

so that you can guess this sentence has been started and this sentence has been


so in this example say it should know would be vocabulary in you a document

or in your corpus here

you will contribute start what i ease also stall and that show

and if you want to calculate the you need a model you need ample a

probably just for this morning is D probably you might want to consider the probability

of you what glottic so it's one S to sixteen how com is one S

to sixteen a because to got it is used when you understand the whole corpus

and the number of words in the corpus to sixteen so the probabilities one it

into sixteen

similarly the probability of ease is do what is

a team into sixteen this if you can apply to see mythological here so you

can get D

you need on model so similarly if you want to apply the same logic into

the background model as i said trigram or the lizzie a set of keywords

so it's so you can on

and divided by D starts time that means that a be probably the norm using


placing this whole corpus and in the number of sentences starting with just a startup

scene so it's politics to but see so if you apply the same logic to

the whole sentence

for example of

a probability of noam got X is also meant and start

you want to do like this you want to a lady same logic to the

was and then you will get like probably you go text asked you into probably

you'll ease glottic starts a single and to that end

so it's kind of motivated by

so that's all about the paralegal part which is kind of then that is again

beeps and that's like

if you don't if you get the unknown synthesis kind of thing but i to

so how to normalise such sentences but i really don't want to will be getting

to that complexity

so let's talk about the projects we are working on so one of the project

is i was type english the that do we are working on so at this

point of time

i didn't get to them i couldn't talk i would be

i posted melissa

so i tried to

demonstrated it's

so it



i guess okay so we implemented something like that as and i was in to

implement that it supports most language which

can be easily transmitted weighted so it doesn't support astonish already said it doesn't support

chinese and japanese because

extra more complicated step to conversion to chinese characters is necessary but

practically all other languages which can be well after consultation it's already finished are supported

and all where directly what input is already enough

and it users the way known input method from the M seventeen and lot of

the so users who know D's don't a need to get used to use stuff

and the hope is to improve typing speed a lot by getting very good predictions

and typing on the if you look have to select the hard work


most of the prediction comes from what do you the user types it learns from

the user input

and it one can speed it up by

giving some topeka text for what the user usually types to it used to time

needed for learning

and if i mean explain these the prediction is based on the previous two thoughts

on that i com database and if no suitable word can lose most suitable type

them can be found in the database it for expect to i'm spare dictionary some

shows predictors from huntsville dictionaries and it also uses times pay for collecting minor spelling

it was

and currently it's implemented in the front end five what's implemented in python and this

a database for you see collide

and i

why should shoulder little bit how it works


so i'm kind the german i was typing was to

first of all i delete everything which has learned been done so far too

to demonstrate that

S G and it

so if i'd type some german text

so you see the second time i typed at


i quit just selected that typing one that and see like because it be men

but the next about based on the previous context

actually i this type the last about so that support on the last to be

because i did a typing mistake and so the first say a suggestion is no

longer if i want to delete this from the database i can selected not this

one but this control one and sell so know this suggestion this one from the


and to speed up this learning process

i can that we didn't some

no not text file

i can select lot context five

so some example i have few have some

some book which that the system a date

and now if

look at some text in this book i can easily input the

the same text again this very little typing

the because it are just

you see that i'm using the german typing boost actually what i typed years english

so for the it doesn't really matter for that items what language you are using

you can mix the languages freely just like this with key application for the on

the way it does


currently we still have different engines for every language but i want to much is

in much un languages you much few engines

to support the same them saying which is in

on use more number of engines

it's to something else like for a nice model to

so you can also do the same system for practically and the

i don't know what this means that company come out here


or queen you see that the

suggestions the first character of suggestion

is in i'm will actually so we see only the first john more of the

i've typed only one jumble and the first act of that suggested lots as the

first run most is


okay that's the or did i think for the demonstration




i think

so the current problem solved i was



you can't use the same code to go other in jeans

or if you want to use the same girl it's really tedious so we have

started one more project

and if you can it's

it's an X prediction library of which is written in the vol a so that

you can using audit of projects as well

just nothing but you had to well the lab is nothing but

V handle all the key here but key variance and decline have to just subscribe

product expectation so that once you have subscribed you'll get a prediction as it it's


the next the next service we honestly need you have

we need help in testing

then this additions for improvements what new features because you are you guys at the

uses and if you have some suggestions we

we have a happy to implement those kind of things

and again they huntsville additional is what we are using know i honestly don't think

nobody meant instance will dictionaries this mean or a if your C D

i don't know i mean loss of difference billy studies

it's kind of maybe five to six years ago somebody created them

and all that this to something huntsville dictionaries and we would like improve grows

and also

a creation of we got was

that is the thing which is really need it for us


in all what we it's really hard to get if we call was for this



so in future we might want to add some grammatical analysis as well so for

that corpus might be interesting at the moment we are doing only this markov model

stuff and having a big corpus doesn't actually had that much if you need to

know you which takes like all of picky pdf for english and the prediction based

on the simple markov model for the next about this something one out of two

hundred fifty or one out of five hundred which isn't very good so it works

only where at the moment if it's the

textual on from is what the user actually uses so

normal users don't hide and all the don't try to know complicated style like oscar

wilde or people tend to write a better vehicle for lunch or something like this

or the button to be could use that type just much more repetitive and having

really learning from the user input is the markov model much more help for them

the meeting at be corpus


and maybe that's thank you thank you only thing


you all your book on predictive implemented this are that your demonstrated also held at

E users

five we didn't get and if you become katie use us so far actually V

to get pretty very little feedback so i'm is asked for test as i asked

some of the type colleagues to tested in court some nice suggestions for improvements that

right implemented but there wasn't that much is a feedback and that kind remember anybody

from katie it works katie don't know so it's

so obviously for the i think and useful and it's context but roughly make a

production in terms of one thing keyboards

so i'm wondering you know what you thought if you give a thought to how

we can take this and apply it when somebody's

using on screen keyboard results we have more general issue of how we integrate i'd

methods with on screen keyboards but i was curious what that you had

the county doesn't get work this on screen keyboard spot and we want to make

it work in future this one's thinking about and that this also one of the

reasons why on each wants to put it into a liability because the nets will

be easier to use from an on screen keyboard and with the current implementation and

i've just one time

problems can see what i think it makes much more sense for actually for myself

when i type german or english i'm typing too fast so usually for me it's

easier to just finish typing the about instead of looking and selecting but a nice

at that many people in india are not comfortable with the way that consultation this

time and hard time figuring it out and so for people who use computer for

the first time in india it's very helpful if they get some suggestions after typing

only if you let us similar like people on the touch us clean

have difficulties typing

i guess that it makes me wonder a question have you thought about whether they

should be enabled by default in some languages should just we wanted if you

choose indian input language program should just work like this by default yes

of the on planning like to people but one meeting the people do we need

to fix them up to code bugs for example when you try to integrate it

as a text of input method

but you need to fix shootings for example you if you're typing in a say

if you're typing something in google

you wouldn't want to situations

i guess i'm which means that but it it's have display some suggestions and they

don't say it don't function

look up table gets into the way of the good suggestion so they overlap each


so it's

minute to switch it off and on all the time

actually we need that would be then what that for example if you want to

type something in those the

and in that case as well you wouldn't require suggestions as well

we need to do to my

i mean indies you can actually there is maybe i to control that in some

way now so there are these input hints that you can apply to text entry

fields you can say i don't want it's you know this calculator i want and

you mac stuff and this field or you can say then in your inhibit the

on screen keyboard which you know you could then maybe imply okay and want to

hear well prediction so maybe we can extend that technique and apply that to other

toolkits and things that we have no good at a for something like the google

search the field at the moment because sometimes of course if you type in the

balls i wanted if you remain used it also for checking or whatever and how

to find out that the user is typing into the google search for years so

i don't know how to do that at the moment

i think that it may do the right thing on and right i i'm not

completely sure but i think might be maybe in is

in H T M L so we just have to your out of expose them

through to get the to the right place a you mean that's and they hmms

to that page maybe

i what we should we should listen deca see there

okay so another questions thank you very much